Recovering URLs from Chrome, and other data

Last evening, while working on some C# code on Windows, I saw a notification for the Windows Creator’s Update. Well this was cool since a lot of people having been raving about it and I happily installed the updated hoping for the latest and greatest.

However, later that evening when I booted into my Linux partition, I was greeted with a boot into the Emergency Mode. I assumed that the Windows update must have reset some of my old settings which Windows has been notoriously known to do, so I booted back into Windows, and updated the settings again, booted from a Live-USB to run fdisk and clean any dirty bits on my partitions, but to no avail.
Sadly, after messing around with various solutions off the internet, I had to concede defeat and accept the fact that I could no longer boot into my linux OS. After some strong cussing at Microsoft and rage yelling, I decided to pick up the pieces and recover as much data as I could.

Booting Into Your System

One of the reasons I love working with Linux is the fact that I can use Live-USBs. This is a whole operating system that boots from your USB and runs off your RAM without actually installing the OS on your machine, i.e. it does not touch the secondary memory (HDD/SSD/etc). This is extremely useful since now you can boot into your system and go rummaging through your old linux filesystem without affecting it. Since it runs off the RAM, it’s a bit slower than a native installation, but hey, this is only for recovery purposes.

The easiest way is to just create a Ubuntu Live-USB on Windows. The distro and version doesn’t matter since this is only for recovery, but I generally create a Live-USB of the distro I would end up installing again in order to save time.

Make sure you’re booting off the USB and when you come to the GRUB screen, select Try without installing. This will boot you into Ubuntu where you can use the File Explorer GUI to select the old partition (which I will henceforth refer to as the partition) and mount it. Now I could go exploring into the file system as normal. I had to chown -R $USER some directories in order to be able to access them, but I managed to recover all the data and code from my system onto an external HDD.

Now the only thing remaining was recovering my Chrome tabs from my last web browsing session. This was non-trivial since I now had to find out how Chrome stored my previous session tabs and I also had to go about understanding OneTab’s structure to recover the URLs saved there. If you don’t use OneTab, you should – it’s great!

Thankfully, Stack Overflow was once again a saving grace and I was able to get straightforward solutions without too much trouble.

Recovering Session Tabs

Head over to ~/.config/google-chrome/Default on the partition. This will be under the /media directory and will mostly be denoted by custom GUIDs which you can identify by looking at the folder’s properties from the GUI. Once you’re in the the Default directory, all you need to do is copy over 2 files: Current Session and Current Tabs. The files are binary files so you can’t just read them from a text editor despite the Unix philosophy of everything is a file.

Now to recover your session tabs, we need to use a Chrome browser on another machine. You need to save all the tabs on your Chrome browser (OneTab is great for this), then you should export those tabs to a text file, shut down the Chrome browser and make sure you kill all the Chrome tasks. Then you go to the same location on your alternate machine. The location for different OSes can be found here. Replace the existing Current Session and Current Tabs files with the one you retrieved from the partition. Start up Chrome and you should see all your “supposedly” lost tabs in front of you in all their glories.

In case it didn’t work, just make sure you close Chrome completely, include any background working processes and try the above process again. Since you have all the old as well as new URLs backed up, there shouldn’t be any cause for concern.

Recovering OneTab URLs

Now comes the other bit: Recovering your saved URLs from the OneTab extension. Luckily, this is easier than it sounds. First off, ensure you save your current saved URLs in OneTab. This can be done by exporting them and saving them in a text file. Time to recover data!

In the same config directory ~/.config/google-chrome/Default there is a directory called Local Storage. Inside that directory you’ll see a bunch of files, with the extensions .localstorage and .localstorage-journal. Each of these pairs of files correspond to an extension installed in Chrome and are the data storage files for the extension. The extension corresponding to each file can be identified by the unique ID of the extension, hence for OneTab, you’re looking for the following files:

  • chrome-extension_chphlpgkkbolifaimnlloiipkdnihall_0.localstorage-journal
  • chrome-extension_chphlpgkkbolifaimnlloiipkdnihall_0.localstorage

Copy these files over to your alternate system in the same Local Storage directory. Restart Chrome completely and run the OneTab extension and voila! You should now see all the tabs saved in OneTab from your old system. You can import your previously exported tabs into OneTab again and you should have a combination of both sets of tabs.

Conclusion

Hopefully, at this point you should be able to resume working on your machine with no interruptions. With a little bit of understanding of how Chrome saves its data, we were able to easily recover previous sessions and extension data. Not too bad, huh?

References

  • Session Tab recovery courtesy of this SO thread.
  • OneTab URL recovery from here.

Seeing Spaces by Bret Victor

A workspace to help us better understand the things we build so we have a more scientific approach to whatever we do.

I so need to do this in my work space!!

Firefox: restore your lost tabs

Pretty sure a lot of us need this at one point of time.

The Ubuntu Incident

Problem
Over the last 1.5-2 years, I collected 700+ tabs in my Firefox 🙂 Maybe this summer I will have some time to sort them out. However, today when I switched my computer on, all my tabs were gone and I got a clean Firefox instance with one tab only. Hmm… I had a similar problem once and then I installed an add-on called “Session Manager”. In this add-on I made the setting to offer the list of previous sessions upon restart but it didn’t do anything! Damn, how to get back my tab collection?

Solution
In the .mozilla directory there is a file called sessionstore.js that stores — among others — the opened tabs. However, this file was very small, my previous tabs were clearly not in it. Thank God there was a backup copy of this file next to it called sessionstore.bak. It was a big file…

View original post 193 more words

Birthday Wish NLP Hack

Well, it was my 22nd birthday 11 days back, and while the real-world was quite uneventful, I managed to create a small stir in the virtual-world.

For this birthday, I decided to do something cool and what is cooler (and a greater sign of laziness) than an AI program that replies to all the birthday wishes on my Facebook wall? This was definitely cool and quite possible given a basic understanding of HTTP and some Artificial Intelligence. After experimenting for 2 days with the Facebook Graph API and FQL, I had all the know-how to create my little bot.

Note: This is from a guy who has never taken a single course on Natural Language Processing and who has next to zero exposure programming NLP programs. Basically, I am a complete NLP noob and this hack is something I am really proud of.

But one major problem still remained: How to create a NLP classifier that would classify wall-posts as birthday wishes? I tried looking for a suitable dataset so I could build either a Support-Vector Machine or Naive Bayes Classifier, but all my search attempts were futile. Even looking for related papers and publications were in vain. That’s when I decided to come up with a little hack of my own. I had read Peter Norvig’s amazing essay on How to Build a Toy Spell Checker and seen how he had used his intuition to create a classifier when he lacked the necessary training dataset. I decided to follow my intuition as well and since my code was in Python (a language well suited for NLP tasks), I started off promptly. Here is the code I came up with:

The first thing I do is create a list of keywords one would normally find in a birthday wish, things like “happy”, “birthday” and “returns”. My main intuition was that when wishing someone, people will use atleast 2 words in the simplest wish, e.g. “Happy Birthday”, so any messages just containing the word “Happy” will be safely ignored, and thus I simply have to check the message to see if atleast 2 such keywords exist in the message.

What I do first is remove all the punctuations from the message and get all the characters to lower-case to avoid string mismatching due to case sensitivity. Then I split the message into a list of words, the delimiter being the default whitespace. This is done by :

</p>
<p>s = ''.join(c for c in message if c not in string.punctuation and c in string.printable)<br />
t = s.lower().split()</p>
<p>

However, I later realized that there exist even lazier people than me who simply use wishes like “HBD”. This completely throws off my Atleast-2-Words theory, so I add a simple hack to check for these abbreviations and put in the expanded form into the message. Thus, I created a dictionary to hold these expansions and I simply check if the abbreviations are present. If they are, I add the expanded form of the abbreviation to a new list that contains all the other non-abbreviated message words added in verbatim [lines 15-20]. Since I never check for locations of keywords, where I add the expanded forms are irrelevant.

Then the next part is simple, bordering on trivial. I iterate through the list of words in my message and check if it is one of the keywords and simply maintain a counter telling me how many of the keywords are present. Python made this much, much easier than C++ or Java.
But alas, another problem: Some people have another bad habit of using extra characters, e.g. “birthdayyyy” instead of “birthday” and this again was throwing my classifier off. Yet another quick fix: I go through all the keywords and check if the current word I am examining has the keyword as a substring. This is done easily in Python strings using the count method [lines 31-34].

Finally, I simply apply my Atleast-2-Words theory. I check if my counter has a value of 2 or more and return True if yes, else False, thus completing a 2 class classifier in a mere 40 lines of code. In a true sense, this is a hack and I didn’t expect it to perform very well, but when put to work, it really managed to do a splendid job and managed to flummox a lot of my friends who tried posting messages that they thought could fool the classifier. Safe to say, I had the last laugh.

Hope you enjoyed reading this and now have enough intuition to create simple classifiers on your own. If you find any bugs or can provide me with improvements, please mention them in the comments.

Eviva!

Watermarking – Truly Transparent Text

Well, I just finished writing up a new software project. It wasn’t something really difficult, just a tool to help people watermark multiple images at once, made at the behest of some photographer friends of mine due to the lack of such a tool on the net. While the tool was pretty straightforward (and a great exercise in Software Engineering), what was really interesting was the way to create the watermark, which required me to make the text transparent to a certain degree. Ofcourse, I had to search for the right way to do it, but again nothing straightforward cropped up (this is becoming really common now) and while I did find some useful code snippets, they did not do exactly what I wanted. Thankfully, on reading the code, I was able to gather enough information about how to construct a basic watermarking algorithm works as well as how to manipulate the alpha value of images to achieve the transparency.

First some Image basics. Every image you see in the digital form is represented by pixels (picture elements in short), and each pixel has 4 values: 3 values which specify how much or Red, Green and Blue should be present in that pixel, and the 4th value is the alpha value, which determines the Opacity/Transparency of that pixel. RGBA in total. Now the alpha value is key here, and once I understood how the alpha value is manipulated, creating the Image Processing module was a cinch. For this example, I used the PIL library of Python.

What I first did was declare the colour and transparency of the text which would be used as the watermark. This was as simple as specifying the tuple (0, 0, 0, trans), where trans is my transparency value. Next, I create a completely transparent white image the same size as my input image. By specifying the RGB values as 255 each, the image was a plain white image, but by specifying alpha as 0, the image was truly transparent. Now comes the fun part: PIL has something called an ImageDraw module which allows one to draw text or other shape onto an image using an instance of a Draw object on the image. So I just use this Draw object on my transparent image,  using the .text method to draw the specified text at a particular position. This gives me a transparent image (or a canvas if you may), with just some text on it and nothing else seen. Remember, the image is transparent so you should not see any white or any other colour, but the text is as transparent as specified by the trans variable. But there is a slight problem, as the program ignores the alpha value  when displaying and manipulating the image. This is easily solved by using something called masking as described in the next chapter. However, we can still assume our image to be truly transparent.

Finally, I use the .paste method of my original image to paste my transparent image onto my input image. In the paste method, the most important thing is the 3rd argument which is the mask. The mask simply specifies which parts of the image being pasted should be actually pasted. The .paste method uses the alpha channel of an image to determine the mask, and since everything but the text has an alpha value of 0, only the text is pasted onto my input image. This results in simply my input image having some text on it, without a whitish blur that ruins your hard taken photo. Since both the images are of the same size, it means that the location you put your watermark will be preserved on pasting it.

Here’s the code:


from PIL import Image, ImageFont, ImageDraw

def watermark(img_file, text, wfont, text_pos, trans):

    """
    Watermarks the specified image with the text in the specified font and on the specified point.
    """

    # Open the image file
    img = Image.open(img_file)

    # The Text to be written will be black with trans as the alpha value
    t_color = (0, 0, 0, trans)

    # Specify alpha as 0 to get transparent image on which to write
    watermark = Image.new("RGBA", img.size, (255, 255, 255, 0))

    # Get a Draw object on my transparent image from the ImageDraw module
    waterdraw = ImageDraw.Draw(watermark, "RGBA")

    # Draw the text onto the transparent image
    waterdraw.text(text_pos, text, fill=t_color, font=wfont)

    # Paste the watermark image onto the input image img, using watermark image as the mask
    img.paste(watermark, None, watermark)

    return img

So you can see that the code is fairly straightforward. Now remember, this is just a demo to give you a basic idea of how to achieve watermarking. There will be similar libraries that allow you to do the same thing in almost every programming language, so all you have to do is apply the concepts. And then you too can get an amazing watermark like this:

Watermarked Image

An image that I watermarked using my own program.

Using the above ideas and techniques, I was able to code up my Watermarking tool in about 14 days, with a fun GUI and efficient processing. I have open sourced it on GitHub and hopefully I can expect you to be a contributor on it.

Eviva!

PintOS on Ubuntu

Note: This process is currently broken and seems to throw up unexpected errors. I am trying to look for a solution but the internet is just not helping me at the time of writing this. Until I figure out a way to fix this, I recommend trying out NachOS or xv6 for your OS cravings.

PintOS is one brilliant skeletal Operating System and, given the right time and effort, is a great way to consolidate your knowledge on the design of modern operating systems. However, its installation can be quite a pain especially since the instructions on Stanford’s official site can be a tad bit confusing at times. Here I will walk you through the installation instructions. If you would rather just install than spend time reading my post, feel free to download an install script I wrote to automate the installation process from here: pintosInstall.

If you are using the script, please remember to change the file extension as WordPress does not accept .sh files. After that just run “bash pintosInstall.sh”, without the quotes, from a terminal for a completely automatic process. Also, while I can guarantee you the script runs well on Ubuntu in a folder you have root access to, the script is simple and generic enough for you to hack and customize to your distribution if required.

  1. Install some pre-requisites: GCC, Perl, QEMU, Make, GDB. Just run:
    sudo apt-get install gcc binutils perl make qemu gdb
  2. Create and installation folder. The script makes a folder ‘co302’ (the course number for OS in my college).
  3. Download the PintOS tar from here and extract it in the installation folder. I used the totally awesome wget tool in Linux (just like Mark Zuckerberg in the Social Network 😛 ).
  4. We need a folder where the PATH variable can point to, as it will have some executables that we need to run when coding the OS. I made a folder ‘bin’ in the base installation directory i.e. co302/pintos/bin.
  5. Move all the perl scripts from the src/utils folder of PintOS to the bin/ folder. The important ones are ‘backtrace’, ‘pintos’, ‘pintos-gdb’ and ‘pintos-mkdisk’.
  6. Edit your .bashrc file to add the path for the above bin/ folder. At the end of the .bashrc file, simply add the line
    export PATH=$PATH:$HOME/co302/pintos/bin/

    Again for this, I have used the awk tool as it provides a convenient, independent way to edit files programmantically.

  7. Now we have to make a change to one of the PintOS files. Open up the ‘pintos-gdb’ file (in Emacs I hope)  and edit the GDBMACROS variable to point to the ‘gdb-macros’ file in misc directory of the src directory. At this point, you have officially installed PintOS, so give yourself a pat on the back.
  8. Time to compile the utilities. Head over to the pintos/src/utils directory and run
    $ make

    . If you get a “Undefined reference to ‘floor’ ” error, simply open the Makefile and substitute LDFLAGS for LDLIBS and run make again.

  9. Copy the ‘squish-pty’ file to the PATH pointed bin directory.
  10. Head over to the pintos/src/threads/ directory and edit the ‘Make.vars’ file. Change the SIMULATOR variable from bochs to qemu, which should mostly be the last line of the file.
  11. Run make on the threads folder.
  12. Now we need to edit the ‘pintos’ util file in the bin directory with 3 edits. Many other sites will give you  the line numbers, but I will not use that as it is too variable and you get a chance to experiment with the text processing features of your favorite editor:
    1. Change $sim = bochs to $sim = qemu to enforce qemu as the simulator.
    2. Comment out the line push (@cmd, ‘-no-kqemu’); by prepending it with #.
    3. Put in the absolute path wherever required as Perl doesn’t seem to be able to interpret the ~ shorthand. Do this especially for the kernel.bin location path.
  13. Finally, edit the ‘Pintos.pm’ file in the bin directory and put in the absolute path for the line having the location of the loader.bin file.
  14. Congrats, you now have PintOS set up on your machine. Try running pintos run alarm-multiple as a test.

There you have it. A pretty easy and straightforward way to install a great experimental skeletal OS from Stanford University. I bet it took more work for me to write this up that it will take you to install PintOS. The good part about this little adventure of mine is that I got a chance to dabble in sed and awk, 2 Unix tools that no hacker can afford to not know the basics of. Add to that some wget magic and Emacs power, and you can potentially become a hacking superstar. Infact, this hack has proven really useful as the Computer Engineering department of my college has used it to install PintOS on all the machines so that the students can do meaningful OS practicals. My small way of giving back, you could say. 🙂

As a final note, at the time of writing, this post is as comprehensive a set of instructions you can get. This may not hold true forever as tomorrow someone might make some script-breaking changes. So if you find some change that I need to include, please feel free to comment and let me know about it.

Eviva!