Python: how to open a file and loop through word for word and compare to a list -


i have file strings , want loop through file , check contents against file. both files big place in code have open each file open method , turn each loop iterates on file word word (in each file) , compare every word every word in other file. ideas how this?

if files both sorted, or if can produce sorted versions of files, relatively easy. simplest approach (conceptually speaking) take 1 word file a, call a, , read word file b, calling b. either b alphabetically prior a, or after a, or same. if same, add word list you're maintaining. if b prior a, read b file b until b >= a. if equal, collect word. if < b, obviously, read until >= b, , collect if equal. since file size problem, might need write collected words out results file avoid running out of memory. i'll let worry detail.

if not sorted , can't sort them, it's harder problem. naive approach take word a, , scan through b looking word. since files large, not attractive option. better reading in chunks , b , working set intersections, little more complex.

putting can, read in reasonably-sized chunks of file a, , convert set of words, call a1. read similarly-sized chunks of b sets b1, b2, ... bn. union of intersections of (a1, b1), (a1, b2), ..., (a1, bn) set of words appearing in a1 , b. repeat chunk a2, a3, ... an.

i hope makes sense. if haven't played sets, might not, guess there's cool thing learn about.


Comments

Popular posts from this blog

Delphi XE2 Indy10 udp client-server interchange using SendBuffer-ReceiveBuffer -

Qt ActiveX WMI QAxBase::dynamicCallHelper: ItemIndex(int): No such property in -

Enable autocomplete or intellisense in Atom editor for PHP -