groupby
and countby
for pythonPython has the standard methods for applying functions over iterables, viz. map, filter, and reduce.
For example, we can use filter to filter some numbers by some criterion:
even = lambda x: x % 2 is 0
odd = lambda x: not even(x)
data = [1, 2, 3, 4]
assert filter(even, data) == [2, 4]
assert filter(odd, data) == [1, 3]
These built-in methods are supplemented by the collection methods in itertools and itertoolz.
What follows is just a quick demonstration of how you might implement and use two iteration methods commonly used for data summarization: groupby and countby.
Group a collection by a key function.
def groupby(f, seq):
result = {}
for value in seq:
key = f(value)
if key in result:
result[key].append(value)
else:
result[key] = [value]
return result
Alternatively, leveraging defaultdict
…
from collections import defaultdict
def groupby(f, seq):
d = defaultdict(list)
for i in seq: d[f(i)].append(i)
return dict(d)
data = [1, 2, 3, 4]
assert groupby(even, data) == { False: [1, 3], True: [2, 4] }
assert groupby(odd, data) == { True: [1, 3], False: [2, 4] }
names = ['Alice', 'Bob', 'Charlie', 'Dan', 'Edith']
expected = {3: ['Bob', 'Dan'], 5: ['Alice', 'Edith'], 7: ['Charlie']}
assert groupby(len, names) == expected
Count elements of a collection by a key function.
def countby(f, seq):
result = {}
for value in seq:
key = f(value)
if key in result:
result[key] += 1
else:
result[key] = 1
return result
Alternatively, leveraging defaultdict
…
def countby(f, seq):
d = defaultdict(int)
for i in seq: d[f(i)] += 1
return dict(d)
assert countby(len, ['cat', 'mouse', 'dog']) == {3: 2, 5: 1}
assert countby(even, [1, 2, 3]) == {True: 1, False: 2}