python - Grouping by many columns in Pandas -

- January 15, 2015

i have dataset looks follows

col1  col2  col3  count      b     1      50      b     1      50      c     20     1      d     17     2      e     5      70      e     15     20

suppose called data. data.groupby(by=['col1', 'col2', 'col3'], as_index=false, sort=false).sum(), should give me this:

col1  col2  col3  count      b     1      100      c     20     1      d     17     2      e     5      70      e     15     20

however, returns empty dataset, have columns want no rows. caveat by parameter getting calculated dynamically, instead of fixed (thats because columns might change, although count there).

any ideas on why failing, , how fix it?

edit: further searching revealed pandas' groupby removes rows have null @ column. problem me because every single column might null. hence, actual question is: reasonable way deal nulls , still use groupby?

would love corrected here, i'm not sure if there clean way handle missing data. noted, pandas exclude rows groupby contain nan values

you fill nan values beyond range of data:

data = pd.read_csv("c:/users/simon/desktop/data.csv")  data.fillna(-999, inplace=true)  new = data.groupby(by=['col1', 'col2', 'col3'], as_index=false, sort=false).sum()

which messy because wont add values correct group summation. theres no real way groupby thats missing

another method might fill each column separately missing value appropriate variable.

Search This Blog

Scrio

python - Grouping by many columns in Pandas -

Comments

Post a Comment

Popular posts from this blog

python - cx_oracle unable to find Oracle Client -

Delphi XE2 Indy10 udp client-server interchange using SendBuffer-ReceiveBuffer -

Qt ActiveX WMI QAxBase::dynamicCallHelper: ItemIndex(int): No such property in -