Saturday, March 16, 2013

CrossFit Open, Hacking, Oh My

It has seriously been a long time since I've blogged. Ever since leaving academics to start a company ( with my business partner Pablo Fuentes, I've either not had time to dedicate to my blog or not had topics that I could really share in a public domain.

Luckily, I had a nice side project idea recently mashing together two of my loves, CrossFit and programming.

For those that are unaware, the CrossFit Open is just starting. This is the start of the competitive CrossFit season, consisting of a five week world-wide competition. Each week, a new workout is posted online and thousands of people complete that workout and log their scores online. The top athletes after all five weeks move onto the regional rounds.

Unfortunately, due to a knee injury, I am not taking part in the open this year :-(. Despite not participating, I've been actively keeping my eye on the games website and checking out the scores of athletes I care about.

One annoying thing about the site is it's hard to know when new scores come online and it's hard to know whether your favorite athletes have posted their scores yet. 

I started talking about this problem with some friends of mine and had an idea for an app/website that would automatically send you alerts whenever someone you cared about updated their score online. I couldn't get this idea out of my head, so I thought, why not just build it?

Thus, Crossfit Alerts was born.

The site is quite simple, following a Pinterest or NOTCOT type UI, you can sign up and choose to follow certain athletes. When they update their score on the games site, you receive an email alerting you about their new score.

I started the construction by scratching down a design for the website UI and database schema.

Since I don't have direct access to the games site database, I had to build a crawler to re-index all the athletes into my own database. If you look at the HTML for the leaderboard on the games site, the leaderboard is actually an iframe and each of the dropdown menus change parameters for the source of the iframe.

One of the parameters is numberperpage, which is 50 by default. For the crawler, I cranked this up to 10,000. My script downloads the HTML for the iframe URL, then for each athlete's name, I download the HTML for their profile and create a record storing their name, image, and a few personal details.

I then use the mini-leaderboard on their profile to index their existing scores. This mini-leaderboard is also an iframe, so I have to make another http request to get the source. Kind of slow, but it works.

Once I built my index, I started putting together a UI. On the front-end I went with familiar tools: bootstrap and jquery. For the back-end I used the CakePHP framework. I would have loved to experiment with some new technologies or at the very least, re-familiarized myself with backbone and underscore, but since this was a time-sensitive project (already into week two of the open), I went with what I could work fastest in.

I had the website designed, constructed and running locally within a few hours and then had to build the alert piece.

To perform the alerts, I created a second, more specific, crawler, that crawls everyone in the leaderboard with a current workout score. If their score has not already been saved in my database, I update their score. If they are followed by any user of my website, I write a record to an email queue table storing the athlete information. This crawler runs continuously.

To process the email queue, I wrote another script, which is triggered by a cron job every 5 minutes. It checks the email queue for any un-sent items and emails the athlete information to any users following that athlete.

The final stage was to register the domain and find a server to host it on. I threw everything onto an AWS node. In the spirit of minimal viable product, my current deployment method is an rsync one liner to copy the code from my machine to the AWS node.

All in all, the first version of the site probably took 4-6 hours to put together. There's a lot I'd like to do with it down the road. Maybe next games season.

Please let me know what you think and I hope it's useful to other people besides me.

Saturday, January 8, 2011

Crash course on design - Stanford d.School

One of my colleagues from WorkerExpress, Joe Mellin, invited me to join him for a design workshop being put on by the Hasso Plattner Institute of Design at Stanford. Joe is a former graduate of the d.School's masters program. I'm interested in design and I even taught principles of user-centered design for a Human Computer Interaction course at the University of Victoria, so I was keen to join him.

I really had no idea what the day would entail, I just knew to show up at 9am and I'd be done sometime around 1pm. Arriving at the workshop, I discovered there were to be over 100 people involved. We were instructed immediately that we'd be breaking up into three different groups. Once in those groups, we were told that we would be led through a full design cycle with a target demographic of potential users (see picture of design process below). Our goal was simply to design a new holiday experience for kids. The d.School had arranged to have approximately 80 students ranging from around 11 to 17 years of age to be our volunteer users.

After this brief instruction, we were put into groups of four, where we quickly introduced ourselves, and then had to come up with interview questions to ask the kids that would soon be arriving to talk to us. Each group was also assigned a "d.leader", i.e. a student at the d.School, to help guide us through the process.

One of the really interesting things about the d.School is that it brings together people from very diverse backgrounds to teach them how to become design experts. The workshop today was able to mimic that as each person in my group had very different backgrounds than myself. One was an MD now doing his MBA at Stanford, one was currently working in the medical school at Stanford, one works at Google, and finally there's me, a reformed academic now working at a start-up.

We only had about 5 minutes to prepare for the interviews, then we interviewed two different pairs of students for about 15 minutes each. We asked them about their holidays, what were their likes/dislikes, did they miss school during the holidays, what was their favorite holiday ever, were their family around during the holidays, and so forth.

Actually talking to users is always interesting and usually surprising. We had two very different groups of users. Our first group were two 12 year old boys. Their parents were generally quite busy with work over the holidays so family time was limited. They were basically left to their own devices. In contrast, our second group were two 12 year old girls that spent their holidays traveling all over the world with their families.

Here's a picture of the notes we took during the interviews. Although difficult to tell, the sticky notes are organized into four different groups: feeling, quote, thought, and action.

Following the interviews, all the groups came back together to be quickly introduced to the next part of the design process: defining our user, their needs, and the insight about the problem. Although the users from our interviews had very different holiday experiences, there were some common themes. For example, all of the students mentioned that they sometimes get bored during the holidays. They sometimes miss school because they get to see their friends at school, but not as regularly over the holiday. They also sometimes didn't seem to know what to do with their time following Christmas celebrations.

Taking these themes into account, we determined that our users were extremely bright, junior high students that needed interaction and socialization in a world where they did not know how to choose what activities they do during the holidays (or something to that effect, can't remember exact phrasing).

After our point of view was established, we brainstormed ideas for how to possibly address it (see picture below). We had crazy ideas like group travel for kids, city-wide laser tag, to more tangible solutions like a community center that organizes activities. We eventually settled on proposing an online application that was able to combine a lot of our ideas. The idea was to create a market place for kids to be able to express interest in participating in certain activities. Once enough kids expressed interest in a particular activity, all the leg work for actually making that activity happen would be handled by us, which could include transportation, equipment, some supervision, and so forth. Another way to think about it would be summer camp, but coordinated completely online and based on your personal interests.

Using this idea as a basis, we designed a prototype and then interacted with different students to test the prototype. This time we had three high school girls as our users. These girls were pretty incredible, one even stated, "this seems to add a level of complexity that's unnecessary". Only in Palo Alto, California would that statement be in a 17 year old females vocabulary!

All in all, it was an extremely fun and interactive workshop that I highly recommend if you get the chance. I'd love to have all WorkerExpress engineers participate in this workshop as they join the company. It's a great way to learn the basics of user-centered design.

Check out the video below. As an exercise to get us ready for brainstorming we played rock, paper, and scissors. Everyone starts as an individual competitor. When you lose, you become the winner's biggest fan. The video below is at the end of the day when there were only two left standing.

Monday, January 3, 2011

ACM ICPC Pacific Northwest Regional Competition

Although I was not heavily involved in the ACM ICPC competitions this past Fall, I followed the Pacific Northwest regional quite closely. Several of my former students were competing for the University of Victoria team and of course my new university (Stanford) was also competing.

UVic had a great start to the competition, but ran into issues in the second half and ended up finishing in 9th. However, given that the university has only recently started competing in the ACM ICPC competitions, a 4th place finish in 2009 followed by a 9th place finish in 2010 is a pretty fantastic result.

The contest this year was quite interesting in a variety of ways. First, for the first time in a very long time the top team was not Stanford or the University of British Columbia. Second, although Stanford Red solved problem D at the 18 minute mark, there was 230 submissions for this problem, and only 3 were deemed correct. To anyone with much experience in these types of contests, this result looks a little fishy.

Rather than recap all the events and outcome, I'll point you to Brad Bart's (SFU coach) recounting and interpretation of the events:

I am personally not very happy with the way this screw up has been handled by the ACM ICPC. If you have an opinion, post it to the comments or better yet let ACM ICPC know what you think.

Friday, December 3, 2010

Semantic Search - For Reals?

Recently a project I am involved in at Stanford participated and won the Semantic Web Challenge at the International Semantic Web Conference. The Semantic Web Challenge is a competition for Semantic Web applications. For the uninformed, the Semantic Web "is a group of methods and technologies to allow machines to understand the meaning - or 'semantics' - of information on the World Wide Web". Essentially, the Semantic Web hopes to move web applications beyond simple syntax to actually supporting the "meaning" behind content.

In the challenge, the focus is not so much on the technical or research aspects of the tools, but more on their usefulness for target users. The challenge helps demonstrate what semantic technologies can bring to society.

Our application, NCBO Resource Index: Ontology - Based Search and Mining of Biomedical Resources, is a semantic search application for biomedical researchers (check out the video below). I designed and developed the user experience and interface.

The application allows researchers in the biomedical sciences to perform "concept-based search" over 22 different biomedical resource databases. Thus, rather than a researcher typing in keywords, the researcher would type in concepts from their domain of interest, those concepts come from ontologies, the backbone of the Semantic Web. For the sake of simplicity, an ontology is a structured terminology that describes the terms within a specific domain, the properties of those terms and the relationships between them.

In terms of user interaction and searching, the behavior is quite similar to conventional search engines like Google. The difference being that when you select a term from the auto-completion drop down, that term comes from an ontology that was used to index elements within the various biomedical resources. I also included helpful tag clouds (see figure below) for visualizing related concepts to your current search query and I use color intensity to represent more relevant resources based on your current search terms.

The power of the search tool comes from how the index gets constructed. The National Center for Biomedical Ontology maintains a project called BioPortal. This project is an open library of more than 200 ontologies in biomedicine. Using the terms from these ontologies as our term-base, we automatically annotate or "tag" textual descriptions of the data residing within the elements of the 22 biomedical resources. These could be things like patient records, gene expression data, research articles, clinical trials, etc.

These annotations act as a link connecting an ontology term to a data element. The really useful part of this process and also where the semantics begin to play a role, is that we can use the structure of the ontology to expand these annotations. So, rather than getting a simple keyword like "breast cancer" mapping to a particular clinical trial, we also know any synonyms of "breast cancer" described in the ontology. We can also use the hierarchy of the ontology to link more general or specific terms to the resource like "cancer" or "melanoma". Finally, we can use mappings between multiple ontologies to discover other related terms.

This is truly where the power of ontologies can be seen and where a semantic approach to search can be very useful. For example, searching for the keywords "retroperitoneal neoplasm" within the Gene Expression Omnibus website will return zero results. However, the same search in our tool will retrieve relevant results annotated by the child term of "retroperitoneal neoplasm" from the NCI Thesaurus, "pheochromocytoma". Results are scored based on the distance of the matching annotation from a given search concept.

Another big advantage to our application is that the ontologies form a "semantic bridge" between very different biomedical resources. One search execution automatically gives you access to 22 different databases, allowing a researcher to explore relationships between things like gene expressions and clinical trials relevant to a specific concept. Without this, researchers are forced to open up multiple web pages and search each database independently.

Of course all these annotations take a long time to index and it results in a boat load of data. The current index is stored in a 1.5 terabyte MySQL database that contains 16.4 billion annotations, 2.4 million ontology terms, and 3.5 million data elements (stylized graphic below). Other members of the team have worked to figure out clever ways to do a lot of this indexing rather efficiently. You can read about it in Paea Lependu's paper Optimize First, Buy Later: Analyzing Metrics to Ramp-up Very Large Knowledge Bases.

We are working to include more resources in the index and also speed up the search support. If you're interested in playing with the application, check it out in the BioPortal integration.

Thursday, November 18, 2010

Dinner with Thumbtack and Leaving Stanford

Before I get into this post, I should first explain that I have decided to leave Stanford after January, 2011 for the wild and whacky world of industry. I will be joining WorkerExpress as the Chief Technical Officer.

Briefly, WorkerExpress is start-up in San Francisco co-founded by Joe Mellin and Pablo Fuentes. We offer a service for blue collar workers to both market themselves and provide them with work. Our customers are contractors that essentially "lease" employees from us. We help the contractors by automatically matching available workers to the specifications of a particular job. We also provide a certain level of quality control on the workers in terms of their skill assessment, reference checks, internal reviews, background checks, and so forth. We use a lot of SMS technology to provide flexible on-demand labor.

Another San Francisco start-up called Thumbtack is somewhat playing in the same space as us. While we provide on-demand labor for contractors, essentially a business to business service, Thumbtack provides a market place for local services (i.e. business to consumer service). Think Craigslist but less 1995.

Joe and Pablo know some of the brains behind Thumbtack and as a result we were invited to Thumbtack's "world headquarters" for dinner last night. The headquarters is in a three-story house in San Francisco. One floor is dedicated to development, another is a board room, and there's also a dining and kitchen floor. A few of the developers actually live in the house too (can't beat that commute).

I actually really like the idea of using a house for an office building. The advantage is that you generally have a built-in full kitchen along with a shower and even rooms to crash in after a long hacking session. I also think the rent is probably less expensive than an office building.

A potential disadvantage is that you basically may never leave your work. However, in a start-up environment, this is often the case and if I am going to be working crazy hours, I'd prefer to be able to prepare decent meals and catch a nap if necessary.

So, back to the dinner. I had not met any of the Thumbtack crew prior to the dinner. Joe and Pablo convinced me to bring along my wife, Theresa, claiming that this would be a social event and not about business. Of course, you stick me in a room full of other geeks/programmers, it's pretty hard to keep the conversation in a realm where someone without a computer science degree can actually participate.

Theresa had not suffered that kind of geek assault since she accompanied me to the 2003 ACM ICPC world programming finals. Things got even worse when I realized one of the Thumbtack developers had competed in the ACM contests during his undergrad. Luckily our intern and sales guy from WorkerExpress were there to chat about non-programming related topics.

Despite my concern for Theresa's boredom, I had a great time at the dinner. The team at Thumbtack seem to have the same kind of hunger to make a successful business as we do at WorkerExpress. They understand the kind of work environment that is necessary to work long hours.

I am extremely excited to be starting full-time with WorkerExpress. I plan to hire a couple of software developers early in 2011. So if you're interested or know someone that is, shoot me an e-mail or leave a comment.

Friday, November 5, 2010

Bad design or evil plot to prevent caffeine addiction?

Above is a picture of the coffee maker we have at the Stanford Center for Biomedical Research. Can you figure out how to make a large coffee with light strength?

For the first month or two that I was at Stanford I did not know how to re-configure the coffee setting on this machine. I took whatever was the last configuration, so sometimes I got an expresso, sometimes a small coffee, or perhaps a large one, all at varying strengths. It wasn't until I finally observed someone that actually knew how to use the machine that I figured out how to change the coffee configuration.

So, the answer to my original question, "Can you figure out how to make a large cup of light coffee?" is that you need to press your finger against the lights at the top of the machine to adjust the coffee size and strength. Not only do you need to press down on the light, you need to do it for about a second. Thus, for a large coffee with light strength, you would first hold your finger against the large cup on the left-hand side, once that light switches, then hold your finger against the single coffee bean on the right, indicating light strength. (See picture below, I've circled the "buttons" you need to press.)

Is this poor design or am I just an idiot? Or perhaps I should have just asked someone about how to use the coffee machine? Well, first, I couldn't really ask someone because A) I'm a dude and B) I have a PhD in Computer Science preventing me from ever acknowledging that I don't understand something technical :-). So that option is out.

As for, am I just an idiot? My argument is of course, no, I am not :-). As further empirical evidence, I also know that I am not the only person that has struggled to use this thing. I was actually getting coffee recently and a guy that has been in the group for several years asked me how I was able to get two cups at once. Therefore, my argument is, it's a coffee machine for crying out loud, anyone should be able to use it. So, assuming that, what's wrong with the design?

The big big big issue with using what looks to simply be indicator lights as buttons to re-configure the coffee is that there is really no obvious feedback that the lights are something you can press. Not only that, but you have to press them for a period of time, so just running your finger over them, does not change their state. To toss out a Human Computer Interaction term, these lights have no pressing "affordance".

An affordance is "an actionable property between the world and an actor". That simply means that certain objects have physical properties that indicate specific actions. For example, a lever affords pulling or pushing, handles are for holding, wheels are for turning, etc. However, a light isn't really for pressing. On the coffee machine, the light is flat, there is no physical feedback to indicate that a pressing action or a change in the coffee configuration is about to occur when you place your finger on it.

The sad thing is, this is very easy to address. If the coffee manufacturer simply made the lights beveled or slightly concave like the iPhone's button and had them actually press in when you pushed your finger against it, I think people would figure it out.

Alas, that's enough harping on this issue. One of the problems with learning about design is that it's a tough skill to turn off. You'll constantly see poor design and be irritated by it. Welcome to my life of torture :-).

Monday, October 25, 2010

Visualizing your social circle - mashing up GMail and Facebook

What's this? Another post? Yes indeed, two within a week of each other. My blog tends to work this way, my writing comes in waves. This post relates to data mining, visualization, and mashing up GMail with Facebook.

I attended a talk last week at the CS department at Stanford and the speaker mentioned that they were using the "TO" field in e-mails to perform hierarchical clustering to discover social groups. The intuition is pretty simple, you typically only include multiple people in an e-mail if they make up some kind of social group that you interact with.

I thought I'd try out this idea on my own GMail account. After digging through the GMail's API documents, I downloaded a Java example of how to connect to your account using IMAP and OAuth. I wrote a Java program to go through all my Sent messages since I first created my GMail account back in 2005 and construct clusters based on the TO and CC fields.

The algorithm is very basic. If an email was sent to more than one person, then I assume those people are somehow associated. So, assume I sent an email to people A and B. I would append person B to A's list of associations and append A to B's list of associations. I did this for 5 years worth of sent messages.

The data I was getting looked pretty reasonable. I had clusters consisting of different people I've worked with over the past 5 years, people I socialize with, and students I've taught.

There's a number of potential applications for this. One could be making suggestions for label groups in GMail. Another idea is to use this kind of mining to suggest groupings in something like Facebook. To experiment with that idea, I took the groups I mined in GMail and mashed it up with data publicly available on Facebook.

That is, for each e-mail in the list I mined, I used Facebook's API to search to see if that person exists within the Facebook universe. If they did, I then processed all their associations, looking those people up on Facebook via the search API. That information forms an adjacency list, linking people's Facebook profiles based on how I have interacted with them through e-mail.

For example, if I e-mailed people A, B, and C together and at another time, e-mailed C, D, and E, and assume all those people exist on Facebook, then I end up with a graph description like this:

A: [B, C]
B: [A, C]
C: [A, B, D, E]
D: [C, E]
E: [C, D]

So, that's kind of fun, but these graph structures would be a lot more interesting to look at if there was a way to visualize them. Well, I just so happen to be working on a JQuery graph visualization plugin that wraps a graphing library called FlexViz.

After a little more hacking, I made a nice little web page for browsing these mashed up social groups. One such group is displayed at the top of this post. I created the group by extracting all people directly and indirectly associated with my wife (Theresa Demont).

If you look at the picture below (same as at the top, but annotated), there's basically two groups here. One on the left and one on the right. The two groups are connected by a bridge between Nick and Colton. This is because the people on the right are all students I coached in programming competitions. Nick also helped out with coaching for a while, but Nick is also friends with Theresa, so he bridges my programming competition world and the social world I have with my wife.

Another interesting example is the graph structure defined by my CrossFit associations. Below, the graph on the left are the various CrossFit people I interacted with through e-mail before I moved to California, while the graph on the right is the CrossFit people I've interacted with since moving to California. It appears that I do not bulk e-mail nearly as many CrossFitters now. The truth is, most of my social interactions have moved to Facebook.

Most of this effort was just for fun to see what the resulting application would look like. However, it's also kind of interesting way to browse some of your own history that you record through e-mail. In my graph, it's easy to see the various changes in my work associations as I've changed companies and universities. I have a UNB group, UVic group, and Stanford group. It would be great to add in a timeline filter so I can hide people in the graph based on when I actually communicated with them. This could actually be animated so you could visualize the changing landscape of your social circle!