java - Parsing PDF file using Apache PDFBox -
i trying modify contents of pdf document using pdfbox. used this example is, observed text pdf file getting split @ character level (or worse). example, string,em? is:
gets split into:
cosstring{e} cosstring{m?} cosstring{ } cosstring{w} cosstring{hat } cosstring{it } cosstring{is} cosstring{:}
(when checked printing cosstring
in above mentioned code). far can see, there latin characters in file, , encoding iso-8859-1. ideas?
regards,
salil
this pdf formatting issue. how particular pdf stores text in order correct letter spacing or kerning. varies pdf pdf, depending on how created.
typically, suggest merging different tokens 1 big content string.
Comments
Post a Comment