statistics - Bash: I have 2 country-level datasets in .csv format & would like to filter them for common elements & plot the result -
i took 2 data-sets country wikipedia, pasted them libreoffice calc & saved them .csv files. e.g:
first .csv file:
"algeria", 76 "angola", 100 ... "united arab emirates", 27
second .csv file:
"algeria", .67 "argentina", .45 ... "zimbabwe", .57
i want filter lists countries datapoints in both .csv files (assume no duplicates or alternate spellings), match 2 datapoints (e.g. 76, .67 algeria) , output rudimentary scatterplot, quick visual idea of relationship.
i tried lots of different ways parse files & of them worked kept getting tripped not knowing enough awk, grep, bash pipes, gnuplot , like.
i'm sure it'd easier/better done in python or perl or somesuch , ended using "lookup" function in libreoffice calc, having started i'd know how done in bash. ideally, data-gathering automated parsing html these data-sets in pdf tables , on.
any class of pointer appreciated. thanks.
i made quick , dirty perl 1 liner script should output need, guess. spend 3 or 5 minutes.
$ perl -e 'while(<>){my @dt = split(/,/);chomp $dt[1]; $tmp=`fgrep $dt[0] two.csv`; @rs = split(/,/,$tmp);chomp $rs[1]; print $dt[0],$dt[1],$rs[1],"\n" }' one.csv
output :
"algeria" 76 .67 "angola" 100 "united arab emirates" 27
i not treat errors if 1 country not exist on second.csv file appear information, , same way if country exists on second.csv no either.
with output should able use gnuplot o wanna. or open file on excel or openoffice calc.
i hope help.
Comments
Post a Comment