Removing duplicate rows from a csv file using a python script -

- August 15, 2012

goal

i have downloaded csv file hotmail, has lot of duplicates in it. these duplicates complete copies , don't know why phone created them.

i want rid of duplicates.

approach

write python script remove duplicates.

technical specification

  windows xp sp 3 python 2.7 csv file 400 contacts

update: 2016

if happy use helpful more_itertools external library:

from more_itertools import unique_everseen open('1.csv','r') f, open('2.csv','w') out_file:     out_file.writelines(unique_everseen(f))

a more efficient version of @icyflame's solution

with open('1.csv','r') in_file, open('2.csv','w') out_file:     seen = set() # set fast o(1) amortized lookup     line in in_file:         if line in seen: continue # skip duplicate          seen.add(line)         out_file.write(line)

to edit same file in-place use this

import fileinput seen = set() # set fast o(1) amortized lookup line in fileinput.fileinput('1.csv', inplace=1):     if line in seen: continue # skip duplicate      seen.add(line)     print line, # standard output redirected file

Search This Blog

Scrio

Removing duplicate rows from a csv file using a python script -

Comments

Post a Comment

Popular posts from this blog

python - cx_oracle unable to find Oracle Client -

Delphi XE2 Indy10 udp client-server interchange using SendBuffer-ReceiveBuffer -

Qt ActiveX WMI QAxBase::dynamicCallHelper: ItemIndex(int): No such property in -