Removing duplicate rows from a csv file using a python script -
goal
i have downloaded csv file hotmail, has lot of duplicates in it. these duplicates complete copies , don't know why phone created them.
i want rid of duplicates.
approach
write python script remove duplicates.
technical specification
windows xp sp 3 python 2.7 csv file 400 contacts
update: 2016
if happy use helpful more_itertools
external library:
from more_itertools import unique_everseen open('1.csv','r') f, open('2.csv','w') out_file: out_file.writelines(unique_everseen(f))
a more efficient version of @icyflame's solution
with open('1.csv','r') in_file, open('2.csv','w') out_file: seen = set() # set fast o(1) amortized lookup line in in_file: if line in seen: continue # skip duplicate seen.add(line) out_file.write(line)
to edit same file in-place use this
import fileinput seen = set() # set fast o(1) amortized lookup line in fileinput.fileinput('1.csv', inplace=1): if line in seen: continue # skip duplicate seen.add(line) print line, # standard output redirected file
Comments
Post a Comment