encoding - UTF-8 decoding with ascii code in it with Python -


from question , answer in utf-8 coding in python, use binascii package decode utf-8 string '_' in it.

def toutf(r):     try:         rhexonly = r.replace('_', '')         rbytes = binascii.unhexlify(rhexonly)         rtext = rbytes.decode('utf-8')     except typeerror:         rtext = r     return rtext 

this code works fine utf-8 characters:

r = '_ed_8e_b8' print toutf(r) >> 편  

however, code not work when string has normal ascii code in it. ascii can anywhere in string.

r = '_2f119_ed_8e_b8' print toutf(r) >> doesn't work - _2f119_ed_8e_b8 >> should '/119편' 

maybe, can use regular expression extract utf-8 part , ascii part reassmeble after conversion, wonder if there easier way conversion. solution?

quite straightforward re.sub:

import re  bytegroup = r'(_[0-9a-z]{2})+'  def replacer(match):     return toutf(match.group())  rtext = re.sub(bytegroup, replacer, r, flags=re.i) 

Comments

Popular posts from this blog

Delphi XE2 Indy10 udp client-server interchange using SendBuffer-ReceiveBuffer -

Qt ActiveX WMI QAxBase::dynamicCallHelper: ItemIndex(int): No such property in -

Enable autocomplete or intellisense in Atom editor for PHP -