Python Get N max values from dictionary -
this question has answer here:
- 5 maximum values in python dictionary 4 answers
- top values dictionary 4 answers
assume have dictionary:
items = {'a': 7, 'b': 12, 'c': 9, 'd': 0, 'e': 24, 'f': 10, 'g': 24}
i want dictionary, containing 4 elements maximum values. e.g. expect get:
subitems = {'e': 24, 'g': 24, 'b': 12, 'f': 10}
what pythonic , efficient (memory consumption, execution speed - when f.e. i'll have dict 1000000 elements) way this? generators, lambdas, another?
heapq.nlargest
correct answer when question "how small number of maximum values huge set of inputs?" minimizes memory usage , cpu usage better else in python using heaps. example:
import heapq operator import itemgetter items = {'a': 7, 'b': 12, 'c': 9, 'd': 0, 'e': 24, 'f': 10, 'g': 24} topitems = heapq.nlargest(items.items(), key=itemgetter(1)) # use .iteritems() on py2 topitemsasdict = dict(topitems)
sorted
, slicing result can win when number of max items requested large percentage of input, huge inputs , small numbers of max items, memory savings of heapq.nlargest
win.
for cs theory geeks, heapq.nlargest
, input of size n
, selecting k
max values, requires o(n log k)
computation, , k
storage. sorted
followed slicing requires o(n log n)
computation , n
storage. 1024 inputs , 4 selected items, work nlargest
~1024 * 2 computation storage required of 4; sorted
+ slicing ~1024 * 10 computation storage of 1024. in practice, python's timsort used in sorted
has lower overhead big-o notation can convey, , performs better big-o notation indicate, why for, say, selecting top 200 items out of 1024, sorted
+ slicing can still win, nlargest
lacks pathological degradation huge inputs , outputs; may slower on occasion, it's not slower, sorted can faster, can much slower.
Comments
Post a Comment