Python: how to open a file and loop through word for word and compare to a list -
i have file strings , want loop through file , check contents against file. both files big place in code have open each file open method , turn each loop iterates on file word word (in each file) , compare every word every word in other file. ideas how this?
if files both sorted, or if can produce sorted versions of files, relatively easy. simplest approach (conceptually speaking) take 1 word file a, call a, , read word file b, calling b. either b alphabetically prior a, or after a, or same. if same, add word list you're maintaining. if b prior a, read b file b until b >= a. if equal, collect word. if < b, obviously, read until >= b, , collect if equal. since file size problem, might need write collected words out results file avoid running out of memory. i'll let worry detail.
if not sorted , can't sort them, it's harder problem. naive approach take word a, , scan through b looking word. since files large, not attractive option. better reading in chunks , b , working set intersections, little more complex.
putting can, read in reasonably-sized chunks of file a, , convert set of words, call a1. read similarly-sized chunks of b sets b1, b2, ... bn. union of intersections of (a1, b1), (a1, b2), ..., (a1, bn) set of words appearing in a1 , b. repeat chunk a2, a3, ... an.
i hope makes sense. if haven't played sets, might not, guess there's cool thing learn about.
Comments
Post a Comment