data modeling in cassandra for CDR data -


i trying design data model in cassandra cdr (call detail records) data such store , keep adding call details in same row of same mobile number or set of columns added dynamically each call same mobile number. can support queries like, mobile no. called maximum times between 2 given date or given time(e.g., between 9am 7pm)?

your suggestions are highly appreciable. in advance.

when comes designing cassandra data models, first thing need list of queries need satisfied. important consider amount of incoming cdr data (so can shard data appropriately) , how each query run (so high frequency queries matched against fast read performance).

due non-relational nature of cassandra, , limited querying capabilities of cql (compared traditional rds), database design largely determined queries need run. based on examples, need multiple column families satisfy kinds of queries.

as starting point, in terms of storing raw cdrs, have single 'wide row' column family row key mobile number , column name timestamp of when call made. then, each cdr comes in, add new column row matching mobile number.

cdr_column_family     mobile_number <- row key         timestamp:null <- column name:column value 

what need watch out here how wide rows might become. if dealing odd call every day might suffice, if more hundreds of calls every day, might want shard data not degrade performance. so, row key become mobile number/month composite (e.g. '07870 831137:201304'), , have row per mobile number per month.

this cf satisfy queries "how many calls made 07870 831137 between 9am , 7pm" wont tell "which number called between 9am , 7pm", without querying every single row in cf (which, in distributed database, isnt going particularly efficient).

for query "which number called between 9am , 7pm", consider second cf list of calls made in chronological order.

callindex_column_family     month <- row key         timestamp:mobile_number <- column name:column value 

so every time write cdr cf, add new column callindex cf, listing time of call , number dialed. can query callindex cf columns between 2 date/time ranges , parse results number called most.


Comments

Popular posts from this blog

Delphi XE2 Indy10 udp client-server interchange using SendBuffer-ReceiveBuffer -

Qt ActiveX WMI QAxBase::dynamicCallHelper: ItemIndex(int): No such property in -

Enable autocomplete or intellisense in Atom editor for PHP -