What's this? Another post? Yes indeed, two within a week of each other. My blog tends to work this way, my writing comes in waves. This post relates to data mining, visualization, and mashing up GMail with Facebook.
I attended a talk last week at the CS department at Stanford and the speaker mentioned that they were using the "TO" field in e-mails to perform hierarchical clustering to discover social groups. The intuition is pretty simple, you typically only include multiple people in an e-mail if they make up some kind of social group that you interact with.
I thought I'd try out this idea on my own GMail account. After digging through the GMail's API documents, I downloaded a Java example of how to connect to your account using IMAP and OAuth. I wrote a Java program to go through all my Sent messages since I first created my GMail account back in 2005 and construct clusters based on the TO and CC fields.
The algorithm is very basic. If an email was sent to more than one person, then I assume those people are somehow associated. So, assume I sent an email to people A and B. I would append person B to A's list of associations and append A to B's list of associations. I did this for 5 years worth of sent messages.
The data I was getting looked pretty reasonable. I had clusters consisting of different people I've worked with over the past 5 years, people I socialize with, and students I've taught.
There's a number of potential applications for this. One could be making suggestions for label groups in GMail. Another idea is to use this kind of mining to suggest groupings in something like Facebook. To experiment with that idea, I took the groups I mined in GMail and mashed it up with data publicly available on Facebook.
That is, for each e-mail in the list I mined, I used Facebook's API to search to see if that person exists within the Facebook universe. If they did, I then processed all their associations, looking those people up on Facebook via the search API. That information forms an adjacency list, linking people's Facebook profiles based on how I have interacted with them through e-mail.
For example, if I e-mailed people A, B, and C together and at another time, e-mailed C, D, and E, and assume all those people exist on Facebook, then I end up with a graph description like this:
A: [B, C]
B: [A, C]
C: [A, B, D, E]
D: [C, E]
E: [C, D]
So, that's kind of fun, but these graph structures would be a lot more interesting to look at if there was a way to visualize them. Well, I just so happen to be working on a JQuery graph visualization plugin that wraps a graphing library called FlexViz.
After a little more hacking, I made a nice little web page for browsing these mashed up social groups. One such group is displayed at the top of this post. I created the group by extracting all people directly and indirectly associated with my wife (Theresa Demont).
If you look at the picture below (same as at the top, but annotated), there's basically two groups here. One on the left and one on the right. The two groups are connected by a bridge between Nick and Colton. This is because the people on the right are all students I coached in programming competitions. Nick also helped out with coaching for a while, but Nick is also friends with Theresa, so he bridges my programming competition world and the social world I have with my wife.
Another interesting example is the graph structure defined by my CrossFit associations. Below, the graph on the left are the various CrossFit people I interacted with through e-mail before I moved to California, while the graph on the right is the CrossFit people I've interacted with since moving to California. It appears that I do not bulk e-mail nearly as many CrossFitters now. The truth is, most of my social interactions have moved to Facebook.
Most of this effort was just for fun to see what the resulting application would look like. However, it's also kind of interesting way to browse some of your own history that you record through e-mail. In my graph, it's easy to see the various changes in my work associations as I've changed companies and universities. I have a UNB group, UVic group, and Stanford group. It would be great to add in a timeline filter so I can hide people in the graph based on when I actually communicated with them. This could actually be animated so you could visualize the changing landscape of your social circle!