Data Visualization For Greater Good

I’ve always been interested in understanding images. From how images are formed to how we can get machines to understand images in similar manner to the average human being. It is helpful that our visual world is rich and that images can capture so much information due to their high dimensionality.

The key word here is “visual”. Human beings are visual creatures. We rely on our eyes more than we would like to acknowledge for multiple tasks (as seen via various sight related idioms). As I delve into more areas of Computer Science, the use of data to accomplish superhuman feats is ever growing. Deep Learning for one is a new field that has tremendously tapped into these enormous collections of data to produce computational models with astounding capabilities. However, to better understand what models can be trained, many researchers recommend visualizing the data, again iterating the first line of this paragraph. Thus to marry the two above ideas, I decided to make my life easier by exploring some data visualization and explain why fundamental data viz is a highly useful and rewarding skill to have.

For my tooling, I use the well-designed and utilitarian Data Driven Documents or D3 library. Of course, I could have used other libraries such as Bokeh (Python), but D3 has a lot of great features that overshadow the fact that it is written in Javascript (ughh). For one, D3 directly renders to HTML rather than generating intermediate JS or images, making the visualizations super interactive. Moreover, D3’s idiomatic approach is what won me over. Loading, filtering and manipulating the data is done asynchronously in a systematic and declarative fashion, saving me a lot of headache. Finally, D3 has the amazing bl.ocks.org, maintained by D3 creator Mike Bostock (who has a PhD in Data Visualization from Stanford, by the way) and Mr. Bostock also write fantastic tutorials which use D3 and explain its capabilities.

Now for the actual goodness. Since we have more readily accessible datasets, visualizing them helps us to leverage our “visual creatures” persona to better understand them and leverage their latent information. For example, I created this amazing word cloud using D3, reading in the text of Andrej Karpathy’s excellent article on what it means to get a PhD:

wordcloud

All of a sudden, you understand the key themes of his article even though you may not have read it, not to mention this looks super cool! This is the power of data visualization and a key proponent of my belief that data viz is a useful and rewarding skill. You can take a look at the code on my block, or fork it and add your own text corpus.

I did mention interactivity didn’t I? That word cloud may not be interactive, but this plot sure is. That is nothing but a plot of the 1024 dimensional vectors generated from a Convolutional Neural Network on a Geolocation dataset, where the idea is to train a model to predict the location where the image was taken by having the model look only at the image. If you’re confused about how I managed to plot a 1024-D vector in 2-D space, then I would recommend taking a look at the fabulous t-SNE algorithm and the open source implementation of the faster Barnes Hut t-SNE algorithm available from the inventor himself (if you look closely, you may see my name in the list of contributors 😉 ).

Another cool example is that of choropleth’s or heat maps as they are commonly known. Mike Bostock has some amazing visualizations using choropleth’s on a simple statistic such as census data which I highly recommend taking a look at, amidst his other gorgeous visualizations. I personally plan to use choropleths to visualize some of the geolocation datasets I am playing around with.

Some visualizations you might start off with if data viz is new to you is stock market prediction. You can pick up the data from Yahoo Finance and then with the power of D3, you could quickly just see how a particular stock has been performing over weeks, months, or even years. Kind of nice, compared to staring at all those floating point numbers without too much trouble either.

Overall, I hope via these simple examples, I have demonstrated the inherent power and usefulness of data visualization and how it can help tackle some of more challenging problems which our society faces. Now go out there and visualize some greater good!

The Sound Of Commits

Well, all I can say is that I’ve been busy. How busy you ask? Let this post answer it for you.

I had come across the Github API a while back and ever since I have been contemplating ways to use the data provided by the API in a meaningful way. After weeks of brain-wracking, I finally hit upon the idea that why not render the commits for a year long period as musical notes? If you have a Github account, you might have noticed in your “Repositories” tab there is a wonderful little visualizer that has blocks of varying green colorations, showing your Github activity over the past 1 year. Why not audiolize it?

So first things first, I head over to Github’s API docs. Now the beauty of Github’s API is that it to read user and public repository a.k.a. repo information, I don’t need to sign in. That enabled me to write the whole thing in client side Javascript and not worry about Cross Origin Resource Sharing (CORS) since the audiolization would be hosted on my website. This made my task much, much easier (and more hassle-free for the user). Some quick jQuery setup and custom wrappers to the AJAX calls had me pulling the data in a jiffy.

The next step was preparing the data, i.e. getting the commits from the last one year for a user across repos and grouping them by date. Github allowed me to limit the data returned to the past 1 year, but grouping it was still an issue. That is where moment.js comes in. This intuitive and feature-rich library gave me all that I needed to manipulate and compare the date-time signatures on each commit as well as taught me the standard for representing dates and times. Couple this with a few articles on Javascript promises and closures (Lisp throwback!!) and I have my data ready to be audiolized.

A small digression: For all my difficulty with JS and its weak-typing model, the fact that Lisp has inspired JS and that JS allows me to come from a functional programming school of thought, made my code really clean and easy to work with. I was strictly against creating a global variable to store and work on the final data in the JS file since side effects from functions could mess up the data and be really hard to track down and fix. Functional programming, which by design avoids side affects on your data, really helped me out and JS’ closures really sold me to the language.

Now for the music. Musical Instrument Digital Interface or MIDI is the de facto standard for rendering music and this is no different on the web. Here is where I found this amazing project called MIDI.js on, you guessed it, Github! I forked the repo so I had my own version to play around with, and after some digging around since the documentation was substandard, I had attached MIDI musical notes to my commit data.

After some quick HTML and CSS where I set up the web page, I finally published it to the web. Sharing with people via Facebook and Twitter helped get some good feedback on it. You can find it here. Be sure to try it out, especially if you have a Github account. Seeing all my Github activity rendered as music was a joy like no other and I was glad to be able to help others see that joy as well.

Let me play you a song of your commits. If you have any suggestions, do feel free to comment.