Birthday Wish NLP Hack

Well, it was my 22nd birthday 11 days back, and while the real-world was quite uneventful, I managed to create a small stir in the virtual-world.

For this birthday, I decided to do something cool and what is cooler (and a greater sign of laziness) than an AI program that replies to all the birthday wishes on my Facebook wall? This was definitely cool and quite possible given a basic understanding of HTTP and some Artificial Intelligence. After experimenting for 2 days with the Facebook Graph API and FQL, I had all the know-how to create my little bot.

Note: This is from a guy who has never taken a single course on Natural Language Processing and who has next to zero exposure programming NLP programs. Basically, I am a complete NLP noob and this hack is something I am really proud of.

But one major problem still remained: How to create a NLP classifier that would classify wall-posts as birthday wishes? I tried looking for a suitable dataset so I could build either a Support-Vector Machine or Naive Bayes Classifier, but all my search attempts were futile. Even looking for related papers and publications were in vain. That’s when I decided to come up with a little hack of my own. I had read Peter Norvig’s amazing essay on How to Build a Toy Spell Checker and seen how he had used his intuition to create a classifier when he lacked the necessary training dataset. I decided to follow my intuition as well and since my code was in Python (a language well suited for NLP tasks), I started off promptly. Here is the code I came up with:

The first thing I do is create a list of keywords one would normally find in a birthday wish, things like “happy”, “birthday” and “returns”. My main intuition was that when wishing someone, people will use atleast 2 words in the simplest wish, e.g. “Happy Birthday”, so any messages just containing the word “Happy” will be safely ignored, and thus I simply have to check the message to see if atleast 2 such keywords exist in the message.

What I do first is remove all the punctuations from the message and get all the characters to lower-case to avoid string mismatching due to case sensitivity. Then I split the message into a list of words, the delimiter being the default whitespace. This is done by :

</p>
<p>s = ''.join(c for c in message if c not in string.punctuation and c in string.printable)<br />
t = s.lower().split()</p>
<p>

However, I later realized that there exist even lazier people than me who simply use wishes like “HBD”. This completely throws off my Atleast-2-Words theory, so I add a simple hack to check for these abbreviations and put in the expanded form into the message. Thus, I created a dictionary to hold these expansions and I simply check if the abbreviations are present. If they are, I add the expanded form of the abbreviation to a new list that contains all the other non-abbreviated message words added in verbatim [lines 15-20]. Since I never check for locations of keywords, where I add the expanded forms are irrelevant.

Then the next part is simple, bordering on trivial. I iterate through the list of words in my message and check if it is one of the keywords and simply maintain a counter telling me how many of the keywords are present. Python made this much, much easier than C++ or Java.
But alas, another problem: Some people have another bad habit of using extra characters, e.g. “birthdayyyy” instead of “birthday” and this again was throwing my classifier off. Yet another quick fix: I go through all the keywords and check if the current word I am examining has the keyword as a substring. This is done easily in Python strings using the count method [lines 31-34].

Finally, I simply apply my Atleast-2-Words theory. I check if my counter has a value of 2 or more and return True if yes, else False, thus completing a 2 class classifier in a mere 40 lines of code. In a true sense, this is a hack and I didn’t expect it to perform very well, but when put to work, it really managed to do a splendid job and managed to flummox a lot of my friends who tried posting messages that they thought could fool the classifier. Safe to say, I had the last laugh.

Hope you enjoyed reading this and now have enough intuition to create simple classifiers on your own. If you find any bugs or can provide me with improvements, please mention them in the comments.

Eviva!

Advertisements

PintOS on Ubuntu

Note: This process is currently broken and seems to throw up unexpected errors. I am trying to look for a solution but the internet is just not helping me at the time of writing this. Until I figure out a way to fix this, I recommend trying out NachOS or xv6 for your OS cravings.

PintOS is one brilliant skeletal Operating System and, given the right time and effort, is a great way to consolidate your knowledge on the design of modern operating systems. However, its installation can be quite a pain especially since the instructions on Stanford’s official site can be a tad bit confusing at times. Here I will walk you through the installation instructions. If you would rather just install than spend time reading my post, feel free to download an install script I wrote to automate the installation process from here: pintosInstall.

If you are using the script, please remember to change the file extension as WordPress does not accept .sh files. After that just run “bash pintosInstall.sh”, without the quotes, from a terminal for a completely automatic process. Also, while I can guarantee you the script runs well on Ubuntu in a folder you have root access to, the script is simple and generic enough for you to hack and customize to your distribution if required.

  1. Install some pre-requisites: GCC, Perl, QEMU, Make, GDB. Just run:
    sudo apt-get install gcc binutils perl make qemu gdb
  2. Create and installation folder. The script makes a folder ‘co302’ (the course number for OS in my college).
  3. Download the PintOS tar from here and extract it in the installation folder. I used the totally awesome wget tool in Linux (just like Mark Zuckerberg in the Social Network 😛 ).
  4. We need a folder where the PATH variable can point to, as it will have some executables that we need to run when coding the OS. I made a folder ‘bin’ in the base installation directory i.e. co302/pintos/bin.
  5. Move all the perl scripts from the src/utils folder of PintOS to the bin/ folder. The important ones are ‘backtrace’, ‘pintos’, ‘pintos-gdb’ and ‘pintos-mkdisk’.
  6. Edit your .bashrc file to add the path for the above bin/ folder. At the end of the .bashrc file, simply add the line
    export PATH=$PATH:$HOME/co302/pintos/bin/

    Again for this, I have used the awk tool as it provides a convenient, independent way to edit files programmantically.

  7. Now we have to make a change to one of the PintOS files. Open up the ‘pintos-gdb’ file (in Emacs I hope)  and edit the GDBMACROS variable to point to the ‘gdb-macros’ file in misc directory of the src directory. At this point, you have officially installed PintOS, so give yourself a pat on the back.
  8. Time to compile the utilities. Head over to the pintos/src/utils directory and run
    $ make

    . If you get a “Undefined reference to ‘floor’ ” error, simply open the Makefile and substitute LDFLAGS for LDLIBS and run make again.

  9. Copy the ‘squish-pty’ file to the PATH pointed bin directory.
  10. Head over to the pintos/src/threads/ directory and edit the ‘Make.vars’ file. Change the SIMULATOR variable from bochs to qemu, which should mostly be the last line of the file.
  11. Run make on the threads folder.
  12. Now we need to edit the ‘pintos’ util file in the bin directory with 3 edits. Many other sites will give you  the line numbers, but I will not use that as it is too variable and you get a chance to experiment with the text processing features of your favorite editor:
    1. Change $sim = bochs to $sim = qemu to enforce qemu as the simulator.
    2. Comment out the line push (@cmd, ‘-no-kqemu’); by prepending it with #.
    3. Put in the absolute path wherever required as Perl doesn’t seem to be able to interpret the ~ shorthand. Do this especially for the kernel.bin location path.
  13. Finally, edit the ‘Pintos.pm’ file in the bin directory and put in the absolute path for the line having the location of the loader.bin file.
  14. Congrats, you now have PintOS set up on your machine. Try running pintos run alarm-multiple as a test.

There you have it. A pretty easy and straightforward way to install a great experimental skeletal OS from Stanford University. I bet it took more work for me to write this up that it will take you to install PintOS. The good part about this little adventure of mine is that I got a chance to dabble in sed and awk, 2 Unix tools that no hacker can afford to not know the basics of. Add to that some wget magic and Emacs power, and you can potentially become a hacking superstar. Infact, this hack has proven really useful as the Computer Engineering department of my college has used it to install PintOS on all the machines so that the students can do meaningful OS practicals. My small way of giving back, you could say. 🙂

As a final note, at the time of writing, this post is as comprehensive a set of instructions you can get. This may not hold true forever as tomorrow someone might make some script-breaking changes. So if you find some change that I need to include, please feel free to comment and let me know about it.

Eviva!