block by joyrexus 9814456

groupby and countby for python

groupby and countby for python

Python has the standard methods for applying functions over iterables, viz. map, filter, and reduce.

For example, we can use filter to filter some numbers by some criterion:

even = lambda x: x % 2 is 0
odd  = lambda x: not even(x)
data = [1, 2, 3, 4]

assert filter(even, data) == [2, 4]
assert filter(odd, data) == [1, 3]

These built-in methods are supplemented by the collection methods in itertools and itertoolz.

What follows is just a quick demonstration of how you might implement and use two iteration methods commonly used for data summarization: groupby and countby.

groupby

Group a collection by a key function.

def groupby(f, seq):
    result = {}
    for value in seq: 
        key = f(value)
        if key in result:
            result[key].append(value) 
        else: 
            result[key] = [value]
    return result

Alternatively, leveraging defaultdict

from collections import defaultdict

def groupby(f, seq):
    d = defaultdict(list)
    for i in seq: d[f(i)].append(i)
    return dict(d)
data = [1, 2, 3, 4]
assert groupby(even, data) == { False: [1, 3], True: [2, 4] }
assert groupby(odd, data)  == { True: [1, 3], False: [2, 4] }
names = ['Alice', 'Bob', 'Charlie', 'Dan', 'Edith']
expected = {3: ['Bob', 'Dan'], 5: ['Alice', 'Edith'], 7: ['Charlie']}
assert groupby(len, names) == expected

countby

Count elements of a collection by a key function.

def countby(f, seq):
    result = {}
    for value in seq: 
        key = f(value)
        if key in result:
            result[key] += 1
        else: 
            result[key] = 1
    return result

Alternatively, leveraging defaultdict

def countby(f, seq):
    d = defaultdict(int)
    for i in seq: d[f(i)] += 1
    return dict(d)
assert countby(len, ['cat', 'mouse', 'dog']) == {3: 2, 5: 1}
assert countby(even, [1, 2, 3]) == {True: 1, False: 2}

See also