Aggregate

Suppose you have a list of elements like this:

[ [1,1,1], [1,1,2], [1,1,3], [1,1,4], [1,2,1], [1,2,2], [1,2,3], [1,3,1], [1,3,2] ]

It would be nice to arbitrarily aggregate this such that the result is:

[ [1, 1, [1, 2, 3, 4]], [1, 3, [1, 2]], [1, 2, [1, 2, 3]] ]

That is, all elements whose first two columns are equal are lumped into one lineitem. The remaining data (the third column) gets aggregated into a list.

Code

I've written such a thing in Python. Here is the code:

# Aggregation of lists # # Author: Michal Guerquin # January, 2005 def ag(lol, doconsider, donotconsider): result = {} for element in lol: contribute(result, element, doconsider, donotconsider) return result.values() def contribute(result, element, consider, unconsider): fingerprint = tuple([ element[i] for i in consider ]) x = result.get( fingerprint, [None]*len(element) ) for c in unconsider: if type(x[c])==list: x[c].append(element[c]) else: x[c] = [element[c]] for c in consider: x[c] = element[c] result[fingerprint] = x

Usage

ag.py

In [1]: import ag In [2]: x = [ [1,1,1], [1,1,2], [1,1,3], [1,1,4], [1,2,1], [1,2,2], [1,2,3], [1,3,1], [1,3,2] ] In [3]: ag.ag(x, [0,1], [2]) Out[3]: [[1, 1, [1, 2, 3, 4]], [1, 3, [1, 2]], [1, 2, [1, 2, 3]]]

The doconsider parameter identifies the column numbers that should be used to identify distinct rows in the result, while the donotconsider parameter idenfities the column numbers whose values should be lumped together.

Here is a more elaborate example:

x = [ ["John Q. Public", "book", "Cooking", "456 pages"], ["John Q. Public", "book", "Painting", "123 pages"], ["John Q. Public", "article", "Cleaning", "2 pages"], ["Jane B. Brown", "book", "Sleeping", "243 pages"], ["Jane B. Brown", "article", "Running", "5 pages"], ["Jane B. Brown", "article", "Sitting", "1 page"], ["Jane B. Brown", "article", "Coding", "2 pages"] ] for foo in ag(x, [1], [0, 2, 3]): print foo

The result, re-formatted for readability, is:

[ [['John Q. Public', ['Cleaning', ['2 pages', 'Jane B. Brown', 'Running', '5 pages', 'Jane B. Brown', 'Sitting', '1 page', 'Jane B. Brown'] , 'article' , 'Coding'] , '2 pages']], [['John Q. Public', ['Cooking', ['456 pages', 'John Q. Public', 'Painting', '123 pages', 'Jane B. Brown'] , 'book' , 'Sleeping'] , '243 pages']] ]

I've found a really good use for this. Maybe you will too!

This is https://michal.guerquin.com/aggregate.html, updated 2005-01-22 21:58 EST

Contact: michalg at domain where domain is gmail.com (more)