If you blog it they will come?

Sunday, August 16, 2009

When a band I like comes to town, I'll know

The List is a comprehensive listing of all the known shows coming to the Bay Area listed by artist and by venue.

There are hundreds of concerts and nearly 1500 bands in the list so I threw together a script to scrape it and intersect those bands with the artists in my iTunes library.

The result is, I now know that these acts are playing in the near future:

bat for lashes
blink 182
butthole surfers
cat power
collective soul
dan deacon
dropkick murphys
elvis costello
fever ray
ghostface killah
girl talk
green day
grizzly bear
in flames
kenny rogers
lil wayne
meat puppets
modest mouse
no age
pearl jam
porcupine tree
sunny day real estate
tenacious d
thievery corporation
tv on the radio
yo la tengo

The next step is to scrape the concert details as well, use fuzzy matching, run it automatically, and set up alerts.

But this only took 15 minutes to write in python and it would have taken me way longer to parse manually.


Instead of doing a set intersect, I now use difflib to find "close matches." It's slower but still runs start to finish in about 10 seconds, which is fine considering especially that the data changes infrequently.

I also unescape the ampersand in the iTunes xml, and filter out "The " because:

>>> import difflib
>>> difflib.get_close_matches('foo', ['the foo', 'foods'], n=1)

...an exact match preceded by 'the' is penalized more than a suffix. So 'pixies' would match 'pixiestickers' instead of 'the pixies' in the case where I only select the top match (since ideally there's a one-to-one mapping).

Instead of writing my own fuzzy matching algorithm, for now I'll just chop off 'The ' and live with the results. Although some of the matches aren't useful, it does better at finding bands such as ...and you will know us by the trail of dead and The Ting Tings.

No comments: