Saturday, March 16, 2013

CrossFit Open, Hacking, Oh My

It has seriously been a long time since I've blogged. Ever since leaving academics to start a company ( with my business partner Pablo Fuentes, I've either not had time to dedicate to my blog or not had topics that I could really share in a public domain.

Luckily, I had a nice side project idea recently mashing together two of my loves, CrossFit and programming.

For those that are unaware, the CrossFit Open is just starting. This is the start of the competitive CrossFit season, consisting of a five week world-wide competition. Each week, a new workout is posted online and thousands of people complete that workout and log their scores online. The top athletes after all five weeks move onto the regional rounds.

Unfortunately, due to a knee injury, I am not taking part in the open this year :-(. Despite not participating, I've been actively keeping my eye on the games website and checking out the scores of athletes I care about.

One annoying thing about the site is it's hard to know when new scores come online and it's hard to know whether your favorite athletes have posted their scores yet. 

I started talking about this problem with some friends of mine and had an idea for an app/website that would automatically send you alerts whenever someone you cared about updated their score online. I couldn't get this idea out of my head, so I thought, why not just build it?

Thus, Crossfit Alerts was born.

The site is quite simple, following a Pinterest or NOTCOT type UI, you can sign up and choose to follow certain athletes. When they update their score on the games site, you receive an email alerting you about their new score.

I started the construction by scratching down a design for the website UI and database schema.

Since I don't have direct access to the games site database, I had to build a crawler to re-index all the athletes into my own database. If you look at the HTML for the leaderboard on the games site, the leaderboard is actually an iframe and each of the dropdown menus change parameters for the source of the iframe.

One of the parameters is numberperpage, which is 50 by default. For the crawler, I cranked this up to 10,000. My script downloads the HTML for the iframe URL, then for each athlete's name, I download the HTML for their profile and create a record storing their name, image, and a few personal details.

I then use the mini-leaderboard on their profile to index their existing scores. This mini-leaderboard is also an iframe, so I have to make another http request to get the source. Kind of slow, but it works.

Once I built my index, I started putting together a UI. On the front-end I went with familiar tools: bootstrap and jquery. For the back-end I used the CakePHP framework. I would have loved to experiment with some new technologies or at the very least, re-familiarized myself with backbone and underscore, but since this was a time-sensitive project (already into week two of the open), I went with what I could work fastest in.

I had the website designed, constructed and running locally within a few hours and then had to build the alert piece.

To perform the alerts, I created a second, more specific, crawler, that crawls everyone in the leaderboard with a current workout score. If their score has not already been saved in my database, I update their score. If they are followed by any user of my website, I write a record to an email queue table storing the athlete information. This crawler runs continuously.

To process the email queue, I wrote another script, which is triggered by a cron job every 5 minutes. It checks the email queue for any un-sent items and emails the athlete information to any users following that athlete.

The final stage was to register the domain and find a server to host it on. I threw everything onto an AWS node. In the spirit of minimal viable product, my current deployment method is an rsync one liner to copy the code from my machine to the AWS node.

All in all, the first version of the site probably took 4-6 hours to put together. There's a lot I'd like to do with it down the road. Maybe next games season.

Please let me know what you think and I hope it's useful to other people besides me.


  1. Hey Sean,
    I'm a 45 YO crossfitter, managment consultant, and all around data geek. For purely recreational purposes, I've been trolling the web looking for the data set you seem to have assembled with the purpose of doing some analysis on the drivers of performance e.g. how does work capacity change with age, athletic starting position, nutrition, etc.

    Is there a way you can share the dataset you've accumulated with me? I'd be interested in collaborating on what we can extract from it that would be useful to the "average" crossfiter vs. the plethora of analysis that seems aimed at the elete.

    You can e-mail me at eric_j_wick(at-symbol)

    Hope to hear from you.



  2. It was cool idea. Have you given up on that? The site is not opening.

    How fast did you crawl their website?