Processing Urdu Bidirectional text in text editors and Python -
i wanted process bidirectional text (in urdu , english) in ms word document python script transforms text table markup. can't directly access bidirectional text word document in binary format , if copy paste text word document text editor bidirectional text renders incorrectly losing directionality.
example:
the following text rendered in reverse direction original msword text copied (urdu text involved):
images پر ہے۔
so how process such bidi text rendered correctly in text editor notepad++ , hence can faithfully processed python script?
first, don't rely on bidi text appearing correctly in word file. doesn't guarantee same text appear correctly when in other environment. microsoft word has own way of handling bidirectional text in current , legacy versions not way unicode-compliant text-editors (like gedit) handle text. might or might not resolved microsoft implement newer version of unicode bidirectional algorithm in products.
secondly, reason don't see copied text text environment (including here) doesn't support bidi text , it's not possible have right-to-left text displayed. copied sample string in unicode-compliant text-editor , change direction right , result correct.
now able process text in word file using python need improvise bit. can export text content unicode text , process python. or in case want process text content in-place (inside word), might able satisfactory results out of ole component scripting python. see related question here.
Comments
Post a Comment