Root
Aggregate

Aggregate

Suppose you have a list of elements like this:

[ [1,1,1],
  [1,1,2],
  [1,1,3],
  [1,1,4],
  [1,2,1],
  [1,2,2],
  [1,2,3],
  [1,3,1],
  [1,3,2] ]

It would be nice to arbitrarily aggregate this such that the result is:

[
  [1, 1, [1, 2, 3, 4]],
  [1, 3, [1, 2]],
  [1, 2, [1, 2, 3]]
]

That is, all elements whose first two columns are equal are lumped into one lineitem. The remaining data (the third column) gets aggregated into a list.

Code

I've written such a thing in Python. Here is the code:

# Aggregation of lists
#
# Author: Michal Guerquin
# January, 2005

def ag(lol, doconsider, donotconsider):
  result = {}
  for element in lol:
    contribute(result, element, doconsider, donotconsider)
  return result.values()

def contribute(result, element, consider, unconsider):
  fingerprint = tuple([ element[i] for i in consider ])
  x = result.get( fingerprint, [None]*len(element) )

  for c in unconsider:
    if type(x[c])==list:
      x[c].append(element[c])
    else:
      x[c] = [element[c]]

  for c in consider:
    x[c] = element[c]

  result[fingerprint] = x

Usage

You can try it out like this, assuming you saved the above in a file ag.py:
In [1]: import ag

In [2]: x = [ [1,1,1], [1,1,2], [1,1,3], [1,1,4], [1,2,1], [1,2,2], [1,2,3], [1,3,1], [1,3,2] ]

In [3]: ag.ag(x, [0,1], [2])
Out[3]: [[1, 1, [1, 2, 3, 4]], [1, 3, [1, 2]], [1, 2, [1, 2, 3]]]

The doconsider parameter identifies the column numbers that should be used to identify distinct rows in the result, while the donotconsider parameter idenfities the column numbers whose values should be lumped together.

Here is a more elaborate example:

x = [ ["John Q. Public", "book",    "Cooking",  "456 pages"],
      ["John Q. Public", "book",    "Painting", "123 pages"],
      ["John Q. Public", "article", "Cleaning", "2 pages"],
      ["Jane B. Brown",  "book",    "Sleeping", "243 pages"],
      ["Jane B. Brown",  "article", "Running",  "5 pages"],
      ["Jane B. Brown",  "article", "Sitting",  "1 page"],
      ["Jane B. Brown",  "article", "Coding",   "2 pages"]
    ]

for foo in ag(x, [1], [0, 2, 3]):
  print foo

The result, re-formatted for readability, is:

[
  [['John Q. Public',               ['Cleaning',  ['2 pages',
    'Jane B. Brown',                 'Running',    '5 pages',
    'Jane B. Brown',                 'Sitting',    '1 page',
    'Jane B. Brown']  , 'article' ,  'Coding']  ,  '2 pages']],
  
  [['John Q. Public',            ['Cooking',     ['456 pages',
    'John Q. Public',             'Painting',     '123 pages',
    'Jane B. Brown']  , 'book' ,  'Sleeping'] ,   '243 pages']]
]

I've found a really good use for this. Maybe you will too!


This is http://michal.guerquin.com/aggregate.html, updated 2005-01-22 21:58 EST

Contact: michalg at domain where domain is gmail.com (more)