Data Visualization For Greater Good

I’ve always been interested in understanding images. From how images are formed to how we can get machines to understand images in similar manner to the average human being. It is helpful that our visual world is rich and that images can capture so much information due to their high dimensionality.

The key word here is “visual”. Human beings are visual creatures. We rely on our eyes more than we would like to acknowledge for multiple tasks (as seen via various sight related idioms). As I delve into more areas of Computer Science, the use of data to accomplish superhuman feats is ever growing. Deep Learning for one is a new field that has tremendously tapped into these enormous collections of data to produce computational models with astounding capabilities. However, to better understand what models can be trained, many researchers recommend visualizing the data, again iterating the first line of this paragraph. Thus to marry the two above ideas, I decided to make my life easier by exploring some data visualization and explain why fundamental data viz is a highly useful and rewarding skill to have.

For my tooling, I use the well-designed and utilitarian Data Driven Documents or D3 library. Of course, I could have used other libraries such as Bokeh (Python), but D3 has a lot of great features that overshadow the fact that it is written in Javascript (ughh). For one, D3 directly renders to HTML rather than generating intermediate JS or images, making the visualizations super interactive. Moreover, D3’s idiomatic approach is what won me over. Loading, filtering and manipulating the data is done asynchronously in a systematic and declarative fashion, saving me a lot of headache. Finally, D3 has the amazing bl.ocks.org, maintained by D3 creator Mike Bostock (who has a PhD in Data Visualization from Stanford, by the way) and Mr. Bostock also write fantastic tutorials which use D3 and explain its capabilities.

Now for the actual goodness. Since we have more readily accessible datasets, visualizing them helps us to leverage our “visual creatures” persona to better understand them and leverage their latent information. For example, I created this amazing word cloud using D3, reading in the text of Andrej Karpathy’s excellent article on what it means to get a PhD:

wordcloud

All of a sudden, you understand the key themes of his article even though you may not have read it, not to mention this looks super cool! This is the power of data visualization and a key proponent of my belief that data viz is a useful and rewarding skill. You can take a look at the code on my block, or fork it and add your own text corpus.

I did mention interactivity didn’t I? That word cloud may not be interactive, but this plot sure is. That is nothing but a plot of the 1024 dimensional vectors generated from a Convolutional Neural Network on a Geolocation dataset, where the idea is to train a model to predict the location where the image was taken by having the model look only at the image. If you’re confused about how I managed to plot a 1024-D vector in 2-D space, then I would recommend taking a look at the fabulous t-SNE algorithm and the open source implementation of the faster Barnes Hut t-SNE algorithm available from the inventor himself (if you look closely, you may see my name in the list of contributors 😉 ).

Another cool example is that of choropleth’s or heat maps as they are commonly known. Mike Bostock has some amazing visualizations using choropleth’s on a simple statistic such as census data which I highly recommend taking a look at, amidst his other gorgeous visualizations. I personally plan to use choropleths to visualize some of the geolocation datasets I am playing around with.

Some visualizations you might start off with if data viz is new to you is stock market prediction. You can pick up the data from Yahoo Finance and then with the power of D3, you could quickly just see how a particular stock has been performing over weeks, months, or even years. Kind of nice, compared to staring at all those floating point numbers without too much trouble either.

Overall, I hope via these simple examples, I have demonstrated the inherent power and usefulness of data visualization and how it can help tackle some of more challenging problems which our society faces. Now go out there and visualize some greater good!

Cloud Services With Azure

Cloud Computing has been among the biggest buzzwords of the last 5 years and while over this time, I have managed to get a decent fundamental and conceptual understanding, practical implementation has always been an issue (usually due to the cost factor). That is until now!

I’ve been really messing around with the Windows Azure platform and my employers have also helped out by providing me with a free subscription as well as a group of peers who are as passionate about the technology as me and who are more than willing to share their knowledge. This has been the biggest driver to motivate me and help me learn about the core concepts of Cloud Computing in a practical way, so that no matter what the Cloud platform is in the future, Azure, Google Compute, Amazon EC2, etc., I can be agile enough to adapt to any of them.

From a high level, Azure provides us with a lot of pre-built templates. Things such as Cloud powered Websites, Cloud Mobile Services, Cloud Media services, Virtual Machines, SQL Storage and a lot many more, just makes the lives of developers really easy. Over that, the Azure SDK integrated so well with my Visual Studio that I didn’t have to waste any time configuring and could get to producing code in no time.

As for the techniques, the fundamentals of Service Bus with Queues, Topics, and Relays for message passing, running remote Virtual Machines, storing large files as Blob storage in SQL Databases, opened opportunities to me to really implement some of the ideas in my head, which had seemed infeasible to me before.

Add to that a cloud powered IDE, a sleek web interface to monitor all my resources and multiple programming language support (yes, I did all my cloud coding in Python!!) complimented with amazing and easy to understand documentation on MSDN really made the whole learning experience that much more enthralling.

The point of this post is not to show off the capabilities of Azure. Rather I want you to go out and pick a cloud computing platform of your choice, and really learn of the amazing capabilities provided to you and realize the brilliant ways in which you and others can leverage these capabilities to make the whole world a much better place!

Hope to hear some success stories in the comments. Eviva!