Python Get N max values from dictionary -
this question has answer here:
- 5 maximum values in python dictionary 4 answers
- top values dictionary 4 answers
assume have dictionary:
items = {'a': 7, 'b': 12, 'c': 9, 'd': 0, 'e': 24, 'f': 10, 'g': 24} i want dictionary, containing 4 elements maximum values. e.g. expect get:
subitems = {'e': 24, 'g': 24, 'b': 12, 'f': 10} what pythonic , efficient (memory consumption, execution speed - when f.e. i'll have dict 1000000 elements) way this? generators, lambdas, another?
heapq.nlargest correct answer when question "how small number of maximum values huge set of inputs?" minimizes memory usage , cpu usage better else in python using heaps. example:
import heapq operator import itemgetter items = {'a': 7, 'b': 12, 'c': 9, 'd': 0, 'e': 24, 'f': 10, 'g': 24} topitems = heapq.nlargest(items.items(), key=itemgetter(1)) # use .iteritems() on py2 topitemsasdict = dict(topitems) sorted , slicing result can win when number of max items requested large percentage of input, huge inputs , small numbers of max items, memory savings of heapq.nlargest win.
for cs theory geeks, heapq.nlargest, input of size n, selecting k max values, requires o(n log k) computation, , k storage. sorted followed slicing requires o(n log n) computation , n storage. 1024 inputs , 4 selected items, work nlargest ~1024 * 2 computation storage required of 4; sorted + slicing ~1024 * 10 computation storage of 1024. in practice, python's timsort used in sorted has lower overhead big-o notation can convey, , performs better big-o notation indicate, why for, say, selecting top 200 items out of 1024, sorted + slicing can still win, nlargest lacks pathological degradation huge inputs , outputs; may slower on occasion, it's not slower, sorted can faster, can much slower.
Comments
Post a Comment