Category Archives: Tutorials

Access Jupyter Notebooks over SSH tunnel

I’m currently running some analyses on a virtual machine in the cloud, and it turns out that there is a really neat way to access Jupyter Notebooks remotely without installing Jupyter Hub. So if you (like me) just run the notebooks for yourself and don’t need multi-user support and the like, you can simply SSH into your remote machine (replacing username and host, obviously:

ssh -L 9999:localhost:9999 username@host

Enter your password for the remote machine when prompted, and, when logged in, start a notebook server and tunnel the output to your local port 9999:

jupyter notebook --port 9999 --no-browser

Starting the notebook server this way will make it show a URL that you simply paste into your local browser, et voilà – a Jupyter Notebook with code executed in the cloud.

Fixing Pillow segmentation fault

I am currently working a lot with large GeoTIFFs in Python and use Pillow to read them in, then convert them to NumPy arrays for processing. Every now and then, Pillow throws the following error, that I’ve seen on several computers running OS X now:

python TIFFReadDirectory: Warning, Unknown field with tag 42113 (0xa481) encountered. Segmentation fault: 11

Since it always takes me a while to figure out how to fix this, here’s a short note to self, maybe also useful to someone else out there:

  1. Uninstall Pillow:
  2. $ pip uninstall pillow
  3. Install dependencies for building Pillow from source:
  4. $ brew install libtiff libjpeg webp little-cms2
  5. Download Pillow source from PyPI
  6. Unpack and change into the folder with the source code, then build an install via:
    python install

This has always fixed the problem for me so far. I don’t know whether building from source rather than simply running

pip install pillow

will also fix this problem on other operating systems, but it’s worth a shot if you hit that error.


Getting the highest (or lowest) value from a database column is a bit tricky if you cannot use GROUP BY, because it requires you to aggregate across all columns that you want in the result. Say you have a table with employees, having the columns name, salary, and department, and you want to know the highest-paid employee per department. Then GROUP BY is not an option because you would need to also aggregate by name to have the name in the output, which doesn’t make sense.

I used to tackle this kind of query with unwieldy subqueries, but I just came across PostgreSQL’s DISTINCT ON clause, which makes them a lot more compact and readable. Using that, you can simply do

       name, salary, department
FROM   salaries_table
ORDER BY department, salary DESC;

So we’ll only get one entry per department, and ORDER BY salary DESC makes sure it is the one with the highest salary. The only bummer is that it is a PostgreSQL-specific function, so it won’t work on other DBMSs.

Manipulating and mapping US Census data in R →


The US Census provides an incredible wealth of data but it’s not always easy to work with it. In the past, working with the tabular and spatial census data generally meant downloading a table from FactFinder and a shapefile from the boundary files site and joining the two, perhaps in a GIS system. These files could also be handled in R but getting the data, reading it into R and, in particular, merging tabular and spatial data can be a chore. Working with slots and all the different classes of data and functions can be challenging.

recent interesting post on stackoverflow by Claire Salloum prompted me to revisit this issue in R and I’ve definitely found some valuable new packages for capturing and manipulating Census data in R.

Great post explaining how to wrangle and map census data in R.

Strabon on Mac OS X

Strabon is a triple store that has been developed with a focus on spatio-temporal query functions. I’ve meant to play around with Strabon for a while, and while at ISWC earlier this week, I decided to finally give it a shot. There are no instructions for setting up Strabon on Mac OS in the user guide, so here’s what I did to get it running:

If you don’t have Homebrew installed yet, open a terminal window and paste the following command to install it:

ruby -e "$(curl -fsSL"

Then use Homebrew to get Maven, Mercurial, PostGres, and PostGIS:

brew install maven mercurial postgres postgis

Don’t worry if you have any of those already installed, Homebrew is smart enough to figure that out and will just skip over those.

Next, we’ll initialize a new PostGIS database folder and start PostGIS:

initdb strabon
postgres -D strabon

The terminal window will now be “occupied” by PostGIS (press Control+C to shut down PostGIS when you are done), so open a new terminal window (⌘N) to continue there. Next, we’ll create the actual database and enable the PostGIS extensions on it:

createdb -E UTF8 -T template0 template_postgis
psql -d template_postgis -c "CREATE EXTENSION postgis;"

Next, we’ll adjust the permissions on that database so that Strabon can make changes later, do some housekeeping, and create another database for the endpoint:

psql -d template_postgis -c "GRANT ALL ON geometry_columns TO PUBLIC;"
psql -d template_postgis -c "GRANT ALL ON geography_columns TO PUBLIC;"
psql -d template_postgis -c "GRANT ALL ON spatial_ref_sys TO PUBLIC;"
psql -d template_postgis -c "VACUUM FULL;"
psql -d template_postgis -c "VACUUM FREEZE;"
psql -d postgres -c "UPDATE pg_database SET datistemplate='true' WHERE datname='template_postgis';"
psql -d postgres -c "UPDATE pg_database SET datallowconn='false' WHERE datname='template_postgis';"
createdb endpoint -T template_postgis

That’s all for a PostGres database with spatial extension. Next, we’ll install the temporal extension (you can skip over this part if you are only interested in the spatial functions). We’ll download, build, and install the temporal extension like so:

git clone
cd PostgreSQL-Temporal
make install
make installcheck
psql -d endpoint -c "CREATE EXTENSION temporal;"

Next, we’ll install Tomcat, a servlet container that we’ll use to host Strabon, and modify the Tomcat users. Note that if you already have Tomcat and other servlets are running in it, you should just add a user to the Tomcat configuration and skip this step:

cd ~
curl >
cd apache-tomcat-8.0.28/conf
mv tomcat-users.xml tomcat-users-backup.xml
curl > tomcat-users.xml

And start Tomcat:

cd ../bin
sh start

Now we’re done setting up the PostGIS backend and Tomcat, let’s move on to download and compile Strabon:

cd ~
hg clone
cd Strabon
hg update temporals
mvn clean package

Don’t worry if you get an error message here for Strabon: Executable endpoint. This part tries to automatically deploy the compiled code in Tomcat, we’ll do that manually:

cp endpoint/target/strabon-endpoint-*.war ~/apache-tomcat-8.0.28/webapps/strabonendpoint.war

That’s all. At this point, you should be able to visit http://localhost:8080/strabonendpoint/ in your browser. Before you can load data into Strabon and write stSPARQL queries, there is one last bit of configuration to do. On the web interface, go to Explore/Modify Operations > Configuration and enter the following information (assuming you used the information exactly as provided above):

  • Database Name: endpoint
  • Username: test
  • Password: test
  • Port: 5432
  • Hostname: localhost
  • Database Backend: postgis

Since we didn’t set up the database with a password, you can put in any username and password you want, but the fields can’t be empty (this seems to be a JDBC issue). If Strabon can connect to PostGIS with the configuration parameters provided, it will take you to the query page after you click connect. If there is something wrong with your parameters, nothing happens – you will just stay on the configuration page, but there is no error message (it took me a while to figure this out…). If that happens, take a look at the terminal window running PostGIS and at the Tomcat logs in ~/apache-tomcat-8.0.28/logs/, which should give you an idea of what’s going wrong.

SublimeText and LaTeXTools on El Capitan

I find myself using Atom for most of my text editing these days, but for LaTeX editing I still rely on SublimeText. This is mostly because of the excellent LaTeXTools plugin that makes LaTeX editing really convenient in Sublime. Unfortunately, updating OS X to El Capitan broke this setup. It took me a while to figure out how to fix this, so here’s the short story:

  1. Download and install MacTex 2015.
  2. In Sublime, go to Tools > Command Palette, look for LateXTools: Reconfigure and migrate settings, and hit enter to execute the command.

That’s it. This will update LaTeXTools with the new path of the LaTeX executables, which had to be moved around due to new restrictions on the /usr directory introduced in El Capitan. Just be aware that this will overwrite any customizations that you may have added to your LaTeXTools setup.

Developing and testing SPARQL queries with cURL

There are tons of online SPARQL editors out there, but they often lack some specific functionality. The most common one is that they cannot query an endpoint you may have running on your own machine during development. The SPARQL forms that come with most triple stores, however, are very bare bones, to say the least. Plus I don’t really like going back and forth in a web browser when I’m working on a piece of code.

What I do instead is writing the query in a text editor with syntax highlighting and then shoot them over to the endpoint via cURL on the command line:

curl -i -H "Accept: text/csv" --data-urlencode query@query.sparql

This will take the file query.sparql, send its contents to (with the query parameter name being query), and show the results as comma-separated values. Obviously, this is no magic, I just keep forgetting the exact parameters so I thought I might as well document this here.

If you are using Sublime Text as your text editor, there is also the Sublime SPARQL Runner package. Does exactly the same thing and opens the results in a new text file right in sublime. I’ve only tested the package briefly, but it seems to do what it says on the tin.

Building your own watch-it-later video podcast with Dropbox

Update January 2017: This approach will stop working soon, since Dropbox will be turning off public folders in March (and acts like that’s a good thing, because they have “built even better ways for you to share securely and work together with your team” … yeah, right).

Playing a video from the podcast we’ll create.

My feed reader and social media streams swamp me with videos every day, and some of them actually seem to be interesting. The thing is: During the day, I don’t have time to watch them. And the one time I can actually watch them – on my way home on the subway – I have no (or very slow) Internet access. Which basically means no Youtube, no Vimeo – simply no video streaming in general. So I thought it would be great if I could just save these videos to watch them later, and have them transferred automagically to my phone for offline viewing. Turns out this is actually not so hard if we use Dropbox and a fairly simple PHP script. Yes, I know this could also be done in [insert your favorite programming language here]. PHP is preinstalled on Mac OS, though, and I know PHP well enough to quickly implement this, so that’s what I did.

I’ll describe the process on Mac OS, but it should be easily transferrable to any other operating system. As the title suggests, you’ll need a Dropbox account to make this work. What we’ll do is use your Dropbox’s Public folder to publish your own podcast, generated from your saved videos, and then subscribe to this podcast on the phone.

1. Put the PHP script in place.

Download this GitHub repository, unzip it, and copy the folder Watchlater-master to your Dropbox’s Public folder. If you’d rather have it in a different subfolder, you can change the path in the configuration at the top of the feed.php file (just make sure it’s somewhere inside the Public folder):

$path = "Watchlater-master"; // change this if you place the script in a different subfolder.

2. Configure the script

The one thing you do need to change in the script is your Dropbox ID:

$dropboxID = "123";

The easiest way to find out your Dropbox ID is by going to your Public folder, right-clicking on any file (such as the feed.php file we have just pasted there), and select Copy Public Link. Paste that into a text editor, and it should look something like this:

The number after /u/ (123 in this example) is your Dropbox ID.

The other two things you can (but don’t have to) tweak are setting your local time zone:


and the number of videos you want to keep in the folder:

$keepVids = 10;

The script does some housekeeping for you, so if you have more than, say, 10 (which is the default) videos in your folder, the script will automatically remove the oldest X videos until only the 10 newest videos are left. This keeps your Dropbox from overflowing. Note that the script has no way of knowing which videos you have already watched, so it may also remove videos you have not watched yet.

3. Add some videos

Before we run the script, we’ll need some videos to feed it. Download some videos into the Watchlater-master folder (or wherever you put the feed.php script). Since most services like YouTube don’t really offer an easy way to download their stuff, I’m using youtube-dl for that (which also works great for a wide range of other streaming sites). On a Mac, you can install it via Homebrew (if you don’t have Homebrew, go and install it right now – it’s awesome!). While we’re at it, we will also install ffmpeg and use that later to automatically convert videos to web-friendly formats. In a terminal window, type

$ brew install youtube-dl ffmpeg

Once Homebrew is done installing youtube-dl, change into our feed directory and download some videos:

$ cd ~/Dropbox/Public/Watchlater-master
$ youtube-dl

3. Run the script

To run the script, type

$ php feed.php

into the Terminal. If everything goes right, you should see some messages, and a new file called later.xml in the folder.

4. Subscribe to your podcast

This later.xml is the actual podcast that we want to subscribe to, and because it is in your Dropbox’s Public folder, it actually has a stable URL and is accessible online even when your computer is not running (if your Dropbox is completely synced, that is; if not, wait until it’s done and the spinning arrow on the Dropbox icon in your menu bar has disappeared). Right-click on the later.xml, and select Copy Public Link again. Since we need this URL on the phone to subscribe to the podcast, I simply emailed it to myself (there should really be an iCloud-based clipboard, I know).

On your phone, open your Podcast client and add that URL as a new podcast. I use Pocket Casts 4, but it should work with any other app the same way:


Voilà – we now get videos saved to watch later delivered to our phone! Make sure you pick a podcast client that automatically downloads new episodes.

5. Automatically update your podcast

The one piece that is missing is some magic that automatically updates the later.xml on a regular basis to check whether new videos have been added to the folder. We’ll use a Cron job for that. In a terminal, type:

env EDITOR=nano crontab -e

This will open the nano editor inside your terminal window, showing an empty file (unless you have already set up any other Cron jobs on your system). In nano, paste this line:

59 * * * * cd ~/Dropbox/Public/Watchlater-master/; for f in *.mkv;do ffmpeg -i "$f" -c:v copy -c:a aac -b:a 256k -strict -2 "${f%mkv}mp4"; rm "$f"; done; php feed.php

on the first line. This sets up a Cron job to automatically convert any .mkv files in the folder to .mp4 format, remove the .mkv files, and then run the feed.php script once every hour to update the podcast feed. Press CTRL-X to close nano, confirm to save the file by pressing ‘Y’, then enter.

That’s it. Now every time you place a new video in that folder, your podcast will be updated hourly, and the videos will end up in your podcast app. I’ve been using this script for a few months now and find it really useful – it’s a bit tedious to set up, but once you have done that, gone are the times of awkwardly staring at ads on the subway because there’s nothing else to do.

6. (Optional) Make an alias for youtube-dl

If you don’t want to change into the Watchlater directory every time before downloading the video, you also make an alias for youtube-dl that will always download videos to the folder. While we’re at it, we will also make the video titles a bit nicer and tell youtube-dl to fetch MP4 videos straight away (so the conversion from MKV we have added in the previous step is actually not really required any more.) To do that, we’ll just add an alias to our bash profile like so: Change into your home directory:

cd ~

Then open your bash profile in nano:

sudo nano .bash_profile

And add the following line to the end of the file:

alias ytdl='youtube-dl -f mp4 -o "~/Dropbox/Public/Watchlater-master/%(title)s.%(ext)s"'

Then hot control + O to save your change (hit enter to confirm) and control + X to exit nano. That’s all. Now you can use ytdl from any terminal session and the video will always go into your Watchlater folder in MP4 format.

Update October 22, 2015: Added instructions for automatic conversion of .mkv videos to .mp4 format.

Update November 30, 2016: Added instructions for alias.