encoding - UTF-8 decoding with ascii code in it with Python -
from question , answer in utf-8 coding in python, use binascii package decode utf-8 string '_' in it.
def toutf(r): try: rhexonly = r.replace('_', '') rbytes = binascii.unhexlify(rhexonly) rtext = rbytes.decode('utf-8') except typeerror: rtext = r return rtext
this code works fine utf-8 characters:
r = '_ed_8e_b8' print toutf(r) >> 편
however, code not work when string has normal ascii code in it. ascii can anywhere in string.
r = '_2f119_ed_8e_b8' print toutf(r) >> doesn't work - _2f119_ed_8e_b8 >> should '/119편'
maybe, can use regular expression extract utf-8 part , ascii part reassmeble after conversion, wonder if there easier way conversion. solution?
quite straightforward re.sub
:
import re bytegroup = r'(_[0-9a-z]{2})+' def replacer(match): return toutf(match.group()) rtext = re.sub(bytegroup, replacer, r, flags=re.i)
Comments
Post a Comment