Thursday, December 3, 2015

Archiving your Pocket list with Ruby

I've been seeking a more powerful and extensible alternative to Bash, and so I've recently begun experimenting with Ruby. For my first "real" test of the language, I decided to solve a problem I had been seeking an answer to for some time: Since the web is constantly changing, how could I go through my entire reading list and ensure that I had backup copies of the articles I've saved? As it turns out, there was a fairly simple solution to this- only 35 lines of Ruby!

The script itself uses the Curb and Nokogiri libraries to follow URL shorteners and parse HTML to ensure that the third main component, wkhtmltopdf (a personal favorite of mine), gets the most correct data for each link. To get your Pocket data into the script, you simply use Pocket's nifty HTML export tool to get a webpage full of links to all of your saved articles. 

Using the script is extraordinarily simple: Once dependencies are installed (see the top of the script for more information on that), you simply run ruby pocket_export.rb ~/Downloads/ril_export.html and you're off! The script creates the directory pocket_export_data to store the PDFs it generates and pocket_export_errors.log to keep track of any links it has trouble with.