statistics - Bash: I have 2 country-level datasets in .csv format & would like to filter them for common elements & plot the result -


i took 2 data-sets country wikipedia, pasted them libreoffice calc & saved them .csv files. e.g:

first .csv file:

"algeria", 76 "angola", 100 ... "united arab emirates", 27 

second .csv file:

"algeria", .67 "argentina", .45 ... "zimbabwe", .57 

i want filter lists countries datapoints in both .csv files (assume no duplicates or alternate spellings), match 2 datapoints (e.g. 76, .67 algeria) , output rudimentary scatterplot, quick visual idea of relationship.

i tried lots of different ways parse files & of them worked kept getting tripped not knowing enough awk, grep, bash pipes, gnuplot , like.

i'm sure it'd easier/better done in python or perl or somesuch , ended using "lookup" function in libreoffice calc, having started i'd know how done in bash. ideally, data-gathering automated parsing html these data-sets in pdf tables , on.

any class of pointer appreciated. thanks.

i made quick , dirty perl 1 liner script should output need, guess. spend 3 or 5 minutes.

$ perl -e 'while(<>){my @dt = split(/,/);chomp $dt[1]; $tmp=`fgrep $dt[0] two.csv`; @rs = split(/,/,$tmp);chomp $rs[1]; print $dt[0],$dt[1],$rs[1],"\n" }' one.csv  

output :

"algeria" 76 .67 "angola" 100 "united arab emirates" 27 

i not treat errors if 1 country not exist on second.csv file appear information, , same way if country exists on second.csv no either.

with output should able use gnuplot o wanna. or open file on excel or openoffice calc.

i hope help.


Comments

Popular posts from this blog

Delphi XE2 Indy10 udp client-server interchange using SendBuffer-ReceiveBuffer -

Qt ActiveX WMI QAxBase::dynamicCallHelper: ItemIndex(int): No such property in -

Enable autocomplete or intellisense in Atom editor for PHP -