java - Parsing PDF file using Apache PDFBox -


i trying modify contents of pdf document using pdfbox. used this example is, observed text pdf file getting split @ character level (or worse). example, string,em? is: gets split into:

cosstring{e} cosstring{m?} cosstring{ } cosstring{w} cosstring{hat } cosstring{it } cosstring{is} cosstring{:} 

(when checked printing cosstring in above mentioned code). far can see, there latin characters in file, , encoding iso-8859-1. ideas?

regards,

salil

this pdf formatting issue. how particular pdf stores text in order correct letter spacing or kerning. varies pdf pdf, depending on how created.

typically, suggest merging different tokens 1 big content string.


Comments

Popular posts from this blog

Delphi XE2 Indy10 udp client-server interchange using SendBuffer-ReceiveBuffer -

Qt ActiveX WMI QAxBase::dynamicCallHelper: ItemIndex(int): No such property in -

Enable autocomplete or intellisense in Atom editor for PHP -