python - Grouping by many columns in Pandas -
i have dataset looks follows
col1 col2 col3 count b 1 50 b 1 50 c 20 1 d 17 2 e 5 70 e 15 20
suppose called data. data.groupby(by=['col1', 'col2', 'col3'], as_index=false, sort=false).sum()
, should give me this:
col1 col2 col3 count b 1 100 c 20 1 d 17 2 e 5 70 e 15 20
however, returns empty dataset, have columns want no rows. caveat by
parameter getting calculated dynamically, instead of fixed (thats because columns might change, although count there).
any ideas on why failing, , how fix it?
edit: further searching revealed pandas' groupby removes rows have null @ column. problem me because every single column might null. hence, actual question is: reasonable way deal nulls , still use groupby?
would love corrected here, i'm not sure if there clean way handle missing data. noted, pandas exclude rows groupby contain nan values
you fill nan values beyond range of data:
data = pd.read_csv("c:/users/simon/desktop/data.csv") data.fillna(-999, inplace=true) new = data.groupby(by=['col1', 'col2', 'col3'], as_index=false, sort=false).sum()
which messy because wont add values correct group summation. theres no real way groupby thats missing
another method might fill each column separately missing value appropriate variable.
Comments
Post a Comment