Introduction to the precision-recall plot

Great read on Precision-Recall curves. Helped clarify some doubts in my head on this very important model metric.

Source: Introduction to the precision-recall plot

An alternate read which is slightly more technical can be found here.

xkcd Font And The Fine Art of Kerning

I am huge fan of the webcomic xkcd not only because of its intelligent take on many scientific, mathematical and physical phenomenon in a subtly humorous manner, but also due to it’s simple style where all the characters are stick figures. This helps one focus on the the message of the comic as well as helping its creator, the totally awesome Randall Munroe, publish new strips very frequently, satiating our need to not only procrastinate, but also not feel too bad about it. Also, if you don’t know who Randall Munroe is, google him right now. I mean now! Yes, stop reading and go google him and come back.

Back from our digression, I’ve noticed that the font used in xkcd is very curious. After some simple searching around, I found out that the font is Randall’s very own handwriting. What is cool about this font is the kerning involved. Kerning is nothing but the spacing between characters in text and this matters more than you would think. For example, take a look at the below strip, courtesy of Mr. Munroe:

kerning

Being a perfectionist is the worst thing on this planet, since after I discovered kerning, I now see it everywhere. Moreover, the fact that Randall’s handwriting follows very good kerning on the webcomic is a minor miracle, in my humble opinion.

Of course, the xkcd font was enticing enough that I started to have very strong desires to use this font on my system, and lo and behold, the font was actually available online on the ipython repository. So, now I shall describe to you how to use this amazing font on your own system to use with everything you wish to. The process below is specifically for Linux, though I imagine it should be far easier to install the font on Windows and macOS since they have much better font managers.

First things first, get the font! You can simply go to the repository and either download the single font xkcd-Regular.otf or download the whole repo in a zip file and extract that file. The other file xkcd.otf isn’t important so you can just leave it be.

The next step is to create a directory .fonts in your home folder. Thus the location of the directory is ~/.fonts. You want to now copy over the font file to this directory since Linux is aware of this directory as a possible source of fonts even though it may not have existed on your system a couple of minutes ago. Oh well.

Now the final step is to let Linux know that you want this font to be available system-wide. So we run fc-cache -fv to rebuild the entire font cache and make the xkcd font available to every application on your system. To confirm the font is installed, you can open up Font Viewer and check to see if the font is installed. The nice thing about the name xkcd is that it is unique and it will mostly be the last font on your system.

Time for an application. Let’s make VS Code super cool by making xkcd the default font. If you don’t use VS Code, you are seriously missing out on one of the best cross-platform editors available right now. Open up the settings page on VS Code and add this line to your user settings section

"editor.fontFamily": "xkcd, 'Droid Sans Mono', 'Courier New', monospace, 'Droid Sans Fallback'",

As you can see, xkcd is the first font in the list, so if it is correctly installed, VS Code will use it. Save the file and voila! Your text should now look like this

xkcd

Pretty cool, huh? Hopefully you enjoy your fancy new font and read more of xkcd.

Eviva!

Creating Command Line Tools with Python

After spending considerable amount of time on a computer as a developer, the GUI starts to seem amazingly slow and you realize just how awesome command line tools can be. Especially if you follow the Unix philosophy of “one tool for one task”, you can quickly chain together multiple in-built tools and quickly accomplish most tasks without a single line of code.

However, what about when the tool you need doesn’t exist? Wouldn’t it be great to just create it? In this post, I’d like to show you how using Python. My reason for using python instead of the more native C/C++ is simple: easy to read code for even anybody new to programming to understand the basic structure, fast prototyping, a rich set of tools, and the general ease of use. You’re of course welcome to use other languages such as Ruby (I’ve seen a lot of great tools written in Ruby), but then you wouldn’t be reading this post, would you? Without further ado, let’s begin!

Project Setup

I faced a problem a couple of weeks ago where I needed a simple command to nuke a directory in order to rebuild some binaries and unfortunately no tool existed for this at the time. Hence I created nuke, a convenient command line tool that does exactly what it says, nuke directories.

To start, we create a project directory called nuke-tool. Inside nuke-tool, we create a directory called nuke and inside that directory, two files, nuke.py which will house all our main code logic and  __init__.py so that python is able to understand this is a package called nuke.

In the root of the project directory, we should also create a file test_nuke.py to create tests for our code. Don’t want to accidentally nuke something else now, do we? We also create a file called setup.py to aid pip in installing nuke. We’ll flesh these out one at a time.

I did mention a rich set of libraries to help us out in this process. I personally use argparse to help with argument parsing and clint from the awesome Kenneth Reitz to help with output text formatting and confirmation prompts (simple Yes/No prompts). Later on, you’ll see code snippets showing these libraries in action.

Nuke ’em

In any command line tool, the first thing we need is a way to parse command line arguments. It’s the arguments you pass to a command which are separated by spaces. For example, grep takes two arguments, a pattern and a file name grep main nuke.py.

In python, we can easily use the argparse module to help us create a simple argument parser. I wanted to provide two options to the user, an argument specifying the directory to nuke (with the default being the current directory), and a flag -y to override the confirmation prompt. The code to do this is:

import argparse
import os
.
.
.
def _argparse():
    parser = argparse.ArgumentParser("nuke")
    parser.add_argument("directory", nargs='?', default=os.getcwd(),
                        help="Directory to nuke! Default is current directory")
    parser.add_argument("-y", help="Confirm nuking", action="store_true")
                        args = parser.parse_args()
    return args

Continue reading

Data Visualization For Greater Good

I’ve always been interested in understanding images. From how images are formed to how we can get machines to understand images in similar manner to the average human being. It is helpful that our visual world is rich and that images can capture so much information due to their high dimensionality.

The key word here is “visual”. Human beings are visual creatures. We rely on our eyes more than we would like to acknowledge for multiple tasks (as seen via various sight related idioms). As I delve into more areas of Computer Science, the use of data to accomplish superhuman feats is ever growing. Deep Learning for one is a new field that has tremendously tapped into these enormous collections of data to produce computational models with astounding capabilities. However, to better understand what models can be trained, many researchers recommend visualizing the data, again iterating the first line of this paragraph. Thus to marry the two above ideas, I decided to make my life easier by exploring some data visualization and explain why fundamental data viz is a highly useful and rewarding skill to have.

For my tooling, I use the well-designed and utilitarian Data Driven Documents or D3 library. Of course, I could have used other libraries such as Bokeh (Python), but D3 has a lot of great features that overshadow the fact that it is written in Javascript (ughh). For one, D3 directly renders to HTML rather than generating intermediate JS or images, making the visualizations super interactive. Moreover, D3’s idiomatic approach is what won me over. Loading, filtering and manipulating the data is done asynchronously in a systematic and declarative fashion, saving me a lot of headache. Finally, D3 has the amazing bl.ocks.org, maintained by D3 creator Mike Bostock (who has a PhD in Data Visualization from Stanford, by the way) and Mr. Bostock also write fantastic tutorials which use D3 and explain its capabilities.

Now for the actual goodness. Since we have more readily accessible datasets, visualizing them helps us to leverage our “visual creatures” persona to better understand them and leverage their latent information. For example, I created this amazing word cloud using D3, reading in the text of Andrej Karpathy’s excellent article on what it means to get a PhD:

wordcloud

All of a sudden, you understand the key themes of his article even though you may not have read it, not to mention this looks super cool! This is the power of data visualization and a key proponent of my belief that data viz is a useful and rewarding skill. You can take a look at the code on my block, or fork it and add your own text corpus.

I did mention interactivity didn’t I? That word cloud may not be interactive, but this plot sure is. That is nothing but a plot of the 1024 dimensional vectors generated from a Convolutional Neural Network on a Geolocation dataset, where the idea is to train a model to predict the location where the image was taken by having the model look only at the image. If you’re confused about how I managed to plot a 1024-D vector in 2-D space, then I would recommend taking a look at the fabulous t-SNE algorithm and the open source implementation of the faster Barnes Hut t-SNE algorithm available from the inventor himself (if you look closely, you may see my name in the list of contributors 😉 ).

Another cool example is that of choropleth’s or heat maps as they are commonly known. Mike Bostock has some amazing visualizations using choropleth’s on a simple statistic such as census data which I highly recommend taking a look at, amidst his other gorgeous visualizations. I personally plan to use choropleths to visualize some of the geolocation datasets I am playing around with.

Some visualizations you might start off with if data viz is new to you is stock market prediction. You can pick up the data from Yahoo Finance and then with the power of D3, you could quickly just see how a particular stock has been performing over weeks, months, or even years. Kind of nice, compared to staring at all those floating point numbers without too much trouble either.

Overall, I hope via these simple examples, I have demonstrated the inherent power and usefulness of data visualization and how it can help tackle some of more challenging problems which our society faces. Now go out there and visualize some greater good!

The Right Way To Install Postgres

Ughh, my Ubuntu distro is acting up again, and this time it is for something as stupid as it being unable to connect to the PostgreSQL server on my machine. I mean, how hard is it for the OS to be able to report a conflict between versions rather than just give up with a “directory not found” error when it fails to get a response from a Unix socket? Apparently, with the mess that is apt-get and its lethargic rate of package updates, not to mention Canonical’s refusal to support forward compatibility of packages, very hard!

As Prof. Jennifer Widom has said, databases are ubiquitous, and Postgres is an industrial strength relational DBMS claiming to be the most advanced of its type, hence my preference for it (not going to open the NoSQL can of worms today, sorry).  However, as we are well aware, advanced and user-friendly are seldom synonymous, choosing to be strange bedfellows more often. Installing Postgres via binaries is a great idea if you work at a huge corporation and update your systems once every 5 years. However, us little folk wish to receive the latest and greatest as best as we can, and hence the reliance on package managers such as aptitude a.k.a. apt-get. So what’s the catch? There has to be one since I am making the effort to write this. Well, Postgres 9.6.2 released last week and apt-get still only shows me 9.5. 2 WHOLE patch updates later and apt-get is still coughing up furballs when I look for 9.6. So before I pull out all my hair, let’s get Postgres installed the right way, with Megadeth’s Dystopia aptly playing in the background.

First things first, how to get setup so that we can easily upgrade Postgres no matter what distro of Linux we use. Here’s where I introduce you to this handy little tool called linuxbrew. Linuxbrew is a package manager inspired from OSX’s extremely popular Homebrew package manager. What’s great about Linuxbrew is that it uses Github repos as its source rather than inaccessible private channels. This ensures we always have access to the latest fixes and updates. Installing linuxbrew is fairly straightforward:

ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Linuxbrew/install/master/install)"
PATH="$HOME/.linuxbrew/bin:$PATH"

Don’t forget to add that PATH update to your ~/.(zsh|bash)rc file so that you have access to the brew command. Once you have that done, the next step is super easy:

brew install postgres

This should take a while, since it will get the latest tarball, untar it, make it and get postgres up and running for you. On my dated machine, it took around 3.5 minutes, enough time for me to get a nice hot cup of mocha.

To verify that the Postgres server is up and running, type psql into your command line. This should open up the SQL prompt and you’re good to go and can stop reading here and go build something awesome. If not, then guess we have a few more steps to go. We have installed Postgres with the magic of linuxbrew, but linuxbrew isn’t a service manager, so it doesn’t quite know that it needs to start the server. Let’s do that. Type the command into your terminal:

pg_ctl start -D $HOME/.linuxbrew/var/postgres

Some online forums such as StackOverflow may have answers which add the -l flag to define a logfile. I am omitting that since I want Postgres to manage its logs and not have random logfiles throughout my filesystem. You should see the message server starting and a bunch of other status messages. Just hit enter and you should see your terminal again if you don’t already. Now if you type in psql, you should see that your Postgres SQL prompt functions perfectly.

As you have seen, with literally 4 lines of code (3 if you’re one of the lucky few), you’ve got a working installation of a super advanced RDBMS and that too one that is easily upgradeable. If you wish to update Postgres ever, just type in:

brew upgrade postgres

That’s it! Hope you enjoy using Postgres along with all your sweet NoSQL datastores. Looking forward to seeing what you build.

Eviva!

Terminal Kaio-Ken

Update 1: There are some issues with certain FontAwesome icons due to a mixup with the font codepoints. After speaking with folks who maintain the powerlvl9k project, a new release in the `next` branch has all the fixes.

Update 2: Added details about customizing your prompt for a richer developer environment.

In this post, I wish to elaborate on my command-line terminal setup and share what I believe is a productive, highly-utilitarian and not to mention gorgeous terminal experience. As a computer scientist/engineer, I spent more time using the terminal than a regular GUI. And if you’re Unix oriented like me, you will know that using the terminal is so much more efficient and nimble than using the mouse.

To begin with, I have eschewed the standard Bash shell in favor of the much more powerful ZSH shell. ZSH offers much more comprehensive file globbing and auto-complete features than Bash and is really nifty in how it uses its history to recall old commands you may wish to repeat. I highly recommend going through this slide-deck to gain an understanding of what ZSH can accomplish for you.

However, ZSH on its own is not enough. To really make our experience amazing, let me introduce you to the delightful Oh-My-ZSH, a community-driven project  which provides plugin and themes for the ZSH shell, and boy are they useful. To install ZSH, use the instructions here and for OMZ, all you need to do is run:

# An easy shell script to install Oh-My-ZSH. I use the wget version since it is more ubiquitous.
sh -c "$(wget https://raw.githubusercontent.com/robbyrussell/oh-my-zsh/master/tools/install.sh -O -)"

Now time for some styling! If you are a Dragon Ball Z fan like myself and some of my friends, the title of this post will bring back a lot of memories, specifically, this meme –

vegeta

The theme I most prefer is powerlevel9k. Based on the Powerline theme, it combines functionality with a good dose of fanciness, while making a nice little reference to a beloved anime character. Installing it though is not as straightforward as the above steps have been. To install the theme,  we first have to download it and add it to the right folder:

$ git clone https://github.com/bhilburn/powerlevel9k.git ~/.oh-my-zsh/custom/themes/powerlevel9k

Then, we open up the ~/.zshrc file and specify the theme

ZSH_THEME="powerlevel9k/powerlevel9k"

This uses the basic Oh-My-ZSH framework to install the theme, though you are more than welcome to use Antigen or ZPM if you are aware of them.

At this point, if you open your terminal, you may find some glaring deficiencies. This is because we haven’t yet set up the powerline fonts needed to render all the smooth jazz of this theme. Let’s do that. First we download the latest versions of the symbol font and the fontconfig files:

wget https://github.com/powerline/powerline/raw/develop/font/PowerlineSymbols.otf
wget https://github.com/powerline/powerline/raw/develop/font/10-powerline-symbols.conf

After this we need to create the font folder to store these files and create the font cache

  mkdir ~/.fonts

We can now move the font file to the above folder

mv PowerlineSymbols.otf ~/.fonts/

and generate the font-cache

fc-cache -vf ~/.fonts/

Note you may have to provide root privileges for the above step.

Finally, install the font config file:

mv 10-powerline-symbols.conf ~/.config/fontconfig/conf.d/

Now that we have the powerline fonts set up, restart your terminal. Doesn’t it look fabulous?

If you want a more fully featured font set for a developer, I recommend checking out Adobe’s Source Code Pro and the Hack font. To install, just copy the fonts to the ~/.fonts directory and run fc-cache -fv

But wait, there’s more! Since we’ve already made this much progress, why not go one more step ahead and add some amazing font icons from FontAwesome? This can be easily accomplished using the instructions here. I am not specifying them verbatim here since the repository is still undergoing active development and the install instructions are subject to change in the near future.

Your terminal should now look something like this:

screenshot-from-2017-01-17-225604

I generally dislike redundancy and useless information. Seeing my name on the prompt is exactly that, so why I not eschew it in favor of more meaningful information such as what Python virtualenv is currently active, or what Ruby version you are using? Let’s do that.

In your ~/.zshrc, you have to add some minor commands custom to powerlvl9k. These are:

POWERLEVEL9K_LEFT_PROMPT_ELEMENTS=(os_icon dir vcs)
POWERLEVEL9K_RIGHT_PROMPT_ELEMENTS=(virtualenv pyenv rvm rbenv go_version nvm status history time)

It should be pretty straightforward to understand what each item in the parentheses corresponds to. This is my setup, so feel free to omit things you don’t need. You can checkout what prompt elements are available here. While you’re there, be sure to look up how to stylize your prompt as well to get your creative juices flowing.

Finally, this is what you get:terminal

As a closing thought, one very useful OMZ plugin I use is zsh-syntax-highlighting which updates the color of the terminal text when it is correct, thus ensuring your commands are valid before you hit enter. This plugin can easily be installed using the plugins list in the ~/.zshrc and specifying the plugin exactly as above in the parentheses.

I hope you enjoy your new super-saiyan terminal experience. If you don’t particularly like it, the Oh-My-ZSH page has dozens of themes to choose from, and if you have the time, you can go ahead and create your own theme.

Eviva!

Why Pokemon Go! Is A Disappointment

Close to 6 months ago, the much awaited mobile game, Pokemon Go!, was released on most major mobile platforms. Now if you’ve been playing the classic Pokemon games series from Gamefreak, all the way from the GameBoy Color to the Nintendo DS, you will know that as kids, we wanted nothing more than to run into the wilderness and encounter rare pokemon to befriend, capture and battle. Go! gives us all of that. From running around catching pokemon, to challenging Gyms and Gym leaders. Unfortunately, that is where the similarities stop.

As someone who has grown up playing all the Pokemon games on the GameBoy systems, seen the first 12 seasons of the Pokemon anime and wasted a good chunk of money on trading cards, Go! was an exciting prospect that was eagerly awaited. After a good year of patience since Niantic made its initial announcement, the app was finally on the App and Play stores, and subsequently on my phone. The app was soon the most downloaded app in the history of the Play store and Nintendo stock rose an astounding 21% (which is ridiculous since a 2% increase is considered solid business). This app was next big thing.

The first few hours of Go! were bliss! Creating an avatar, selecting my starter, just walking around the city, using the proximity indicator to locate those ultra rare Pokemon, it was the largest nostalgia rush I’d had since I graduated college. So far, so good. Then it all changed. The first peeve was the candy system. For those of you in the know, (rare) candy is something we use to level up any Pokemon quickly, but the right way to do it is to battle Pokemon and give them experience points, a.k.a. real life hard-work. The candy system in Go! is nothing of the sort. It requires us to catch more Pokemon of the same species in order to collect more candy that is specific to that Pokemon species and use that to level them up. The concept of peer battles is non-existent and that killed some of my interest in the app. It was still a great app, tonnes of fun, and gave me the required motivation to get my backside into gear and move out of the house. However, constantly catching the same Pokemon again and again, just to level up the first one of that kind you caught, soon became redundant and boring. Strike one.

Then came the next big update: the removal of the proximity indicator. The footstep indicator that displayed how far you had to walk to find a Pokemon was completely removed and thus made the game almost unplayable. When you have to rely on dumb luck to find a Pokemon you’ve really been meaning to catch, it is no longer a game. This was compounded by the fact that Go! comes with no sort of instructions or gameplay objective whatsoever. Suddenly it seemed that the good people over at Niantic had no idea what they were doing. Strike two.

The final kicker for me was the proliferation of illicit means to locate Pokemon. Pokemon trackers and websites displaying where different kinds of Pokemon were located started showing up all over the internet. These were illicit since Niantic had never released an official API and they were pretty much reverse-engineered and hacked together. Niantic made poor choices trying to curb this kind of unfairness, either not addressing the issue, or making the game completely unplayable. By this time, the number of active users of Go! had declined drastically and I was pretty sure that Go! was in doldrums.

As I sit at my desk with the DesMuMe DS emulator running in a different window with Pokemon Platinum, I am reminded of all the reasons why Pokemon is a global phenomenon. Playing the original game series by GameFreak showcases how to take a good idea and make it great through sophisticated execution, something a lot of companies have taken upon themselves in recent times. In comparison, Go! was doomed by its own popularity and the poor decisions by Niantic. As I head to face the first Gym Leader in Pokemon Platinum, I’m quite content having given up Go! and gone back to the classic series. At the end of the day, you gotta catch ’em all!