If you blog it they will come?

Sunday, August 16, 2009

When a band I like comes to town, I'll know

The List is a comprehensive listing of all the known shows coming to the Bay Area listed by artist and by venue.

There are hundreds of concerts and nearly 1500 bands in the list so I threw together a script to scrape it and intersect those bands with the artists in my iTunes library.

The result is, I now know that these acts are playing in the near future:

atmosphere
bat for lashes
beirut
blink 182
butthole surfers
calexico
cat power
collective soul
dan deacon
deerhoof
deerhunter
dropkick murphys
elvis costello
fever ray
flipper
ghostface killah
girl talk
green day
grizzly bear
in flames
kenny rogers
lil wayne
m.i.a.
mastodon
meat puppets
mirah
modest mouse
mstrkrft
no age
nofx
pearl jam
placebo
porcupine tree
sunny day real estate
tenacious d
thievery corporation
tv on the radio
weezer
yo la tengo


The next step is to scrape the concert details as well, use fuzzy matching, run it automatically, and set up alerts.

But this only took 15 minutes to write in python and it would have taken me way longer to parse manually.

EDIT:

Instead of doing a set intersect, I now use difflib to find "close matches." It's slower but still runs start to finish in about 10 seconds, which is fine considering especially that the data changes infrequently.

I also unescape the ampersand in the iTunes xml, and filter out "The " because:

>>> import difflib
>>> difflib.get_close_matches('foo', ['the foo', 'foods'], n=1)
['foods']

...an exact match preceded by 'the' is penalized more than a suffix. So 'pixies' would match 'pixiestickers' instead of 'the pixies' in the case where I only select the top match (since ideally there's a one-to-one mapping).

Instead of writing my own fuzzy matching algorithm, for now I'll just chop off 'The ' and live with the results. Although some of the matches aren't useful, it does better at finding bands such as ...and you will know us by the trail of dead and The Ting Tings.

No comments: