Supertunnels with SSH – multi-hop proxies

I never know what to call this process, so I’m inventing the term <em>supertunnels</em> via SSH for now. A lot of my work these days involves using clusters built on <a href=”http://aws.amazon.com/ec2/”>Amazon EC2</a> cloud environment. There, I have some servers that are externally accessible, i.e. web servers. Then there are support servers that are only accessible “internally” to those web servers and not accessible from the outward facing public side of the network, i.e. <a href=”http://hadoop.apache.org/”>Hadoop clusters</a>, databases, etc.

To help log into the “internal” machines, I have pretty much one choice – using SSH <em>through the public machine first</em>. No problem here, any server admin knows how to use SSH – I’ve been using it forever. However, I didn’t really use some of the more advanced features that are very helpful. Here are two…

<h3>Remote command chaining</h3>

Most of my SSH usage is for running long sessions on a remote machine. But you can also pass a command as an argument and the results come directly back to your current terminal:

<code>$ ssh user@host “ls /usr/lib”
</code>

Take this example one step further and you can actually <strong>inject another SSH command</strong> that gets into the “internal” side of the network.

This is starting to really sound like tunneling, though it’s somewhat manual and doesn’t redirect traffic from your client side, we’ll get to that later.

As an aside, in EC2-land you often use certificate files during SSH login, so you don’t need to have an interactive password exchange. You specify the certificate with another argument. If that’s how you run your servers (or with authorized_keys files) then you can push in multiple levels of additional SSH commands easily.

For example, here I log into <strong>ext-host1</strong>, then from there log into <strong>int-host2</strong> and run a command:

<code>$ ssh -i ~/mycert.pem user@ext-host1 “ssh -i ~/mycert.pem user@int-host2 ‘ls /usr/lib'”
</code>

That is a bit of a long line for just getting a file listing, but it’s easy to understand and gets the job done quickly. It also works great in shell scripts, in fact you could wrap it up with a simple script to make it shorter.

<h3>Proxy config</h3>

Another way to make your command shorter and simpler is to add some proxy rules to the ~/.ssh/config file. I didn’t even know this file existed, so was thrilled to find out how it can be used.

To talk about this, let’s use the external and internal hosts as examples. And let’s assume that the internal host is 10.0.1.1. Obviously these don’t need to be specifically public or private SSH endpoints, but it serves its purpose for this discussion.

If we are typically accessing int-host2 via ext-host1 then we can setup a Proxy rule in the config file:

Host 10.0.*.*
ProxyCommand ssh -i ~/mycert.pem user@ext-host1 -W %h:%p

This rule watches for <b>any</b> requests on the 10.0… network and automatically pushes the requests through the ext-host1 as specified above. Furthermore, the -W option tells it to stream all output back to the same terminal you are using. (Minor point, but if you miss it you may go crazy trying to find out where your responses go.)

Now I can do a simple login request on the <b>internal</b> host and not even have to think about how to get there.

ssh -i ~/mycert.pem user@int-host2

I think that’s a really beautiful thing – hope it helps!

Another time I’ll have to write more about port forwarding…

Converting Decimal Degree Coordinates

Converting Decimal Degree Coordinates to/from DMS Degrees Minutes Seconds

cs2cs command from GDAL/OGR toolset (gdal.org) - allows robust coordinate transformations.
cs2cs command from GDAL/OGR toolset (gdal.org) – allows robust coordinate transformations.

If you have files or apps that have to filter or convert coordinates – then the cs2cs command is for you.  It comes with most distributions of the GDAL/OGR (gdal.org) toolset.  Here is one popular example for converting between degrees minutes and seconds (DMS) and decimal degrees (DD).


Geospatial Power Tools book coverThe following is an excerpt from the book: Geospatial Power Tools – Open Source GDAL/OGR Command Line Tools by me, Tyler Mitchell.  The book is a comprehensive manual as well as a guide to typical data processing workflows, such as the following short sample…


Input coordinates can come from the command line or an external file. Assuming a file containing DMS (degree, minute, seconds) style, looks like:

124d10'20"W 52d14'22"N
122d20'05"W 54d12'00"N

Use the cs2cs command, specifying how the print format will be returned, using the -f option. In this case -f “%.6f”
is explicitly requesting a decimal degree number with 6 decimals:

cs2cs -f "%.6f" +proj=latlong +datum=WGS84 input.txt

Example Converting DMS to/from DD

This will return the results, notice no 3D/Z value was provided, so none is returned:

-124.172222 52.239444 0.000000
-122.334722 54.200000 0.000000

To do the inverse, remove the formatting option and provide a list of values in decimal degree (DD):

cs2cs +proj=latlong +datum=WGS84 inputdms.txt
124d10'19.999"W 52d14'21.998"N 0.000
122d20'4.999"W 54d12'N 0.000


Geospatial Power Tools is 350+ pages long – 100 of those pages cover these kinds of workflow topic examples. Each copy includes a complete (edited!) set of the GDAL/OGR command line documentation as well as the following topics/examples:

Workflow Table of Contents

  1. Report Raster Information – gdalinfo
  2. Web Services – Retrieving Rasters (WMS)
  3. Report Vector Information – ogrinfo
  4. Web Services – Retrieving Vectors (WFS)
  5. Translate Rasters – gdal_translate
  6. Translate Vectors – ogr2ogr
  7. Transform Rasters – gdalwarp
  8. Create Raster Overviews – gdaladdo
  9. Create Tile Map Structure – gdal2tiles
  10. MapServer Raster Tileindex – gdaltindex
  11. MapServer Vector Tileindex – ogrtindex
  12. Virtual Raster Format – gdalbuildvrt
  13. Virtual Vector Format – ogr2vrt
  14. Raster Mosaics – gdal_merge

My new book on Amazon – raster/vector data manipulation using GDAL/OGR

Geospatial Power Tools by Tyler Mitchell now on Amazon
Geospatial Power Tools by Tyler Mitchell now on Amazon

Ten years ago I wrote a book for O’Reilly called Web Mapping Illustrated – using open source GIS tools. It was mostly about how to use MapServer and PostGIS to publish maps on the web and was the first of its kind in the marketplace.

This year I’ve completed my second book, for Locate Press, which focused on even more low level geospatial data manipulation using the GDAL/OGR command line tools. This was a work-in-progress for a couple of years, but has just now been released on Amazon as Geospatial Power Tools.

If you’re looking for a resource to understand how to convert imagery, vector data or to build elevation shaded maps or contours, and more, then this book is for you. It includes complete GDAL and OGR documentation. A third of the book presents new material geared to help you learn how to do specific kinds of processing tasks – from downloading from web services, to quickly converting imagery into an online map. A PDF version is also available and Kindle will likely come over the next 6 months.

I’m always interested in feedback on the book and to learn more about how to improve the next edition.

Create Tile Map Structure – gdal2tiles command

Tiles in a Tile Map Server (TMS) context are basically raster map data that’s broken into tiny pre-rendered tiles for maximum web client loading efficiency. GDAL, with Python, can chop up your input raster into the folder/file name and numbering structures that TMS compliant clients expect.

OpenLayers mapping application showing natural earth dataset
Default OpenLayers application produced by the gdal2tiles command and a Natural Earth background dataset as input.

This is an excerpt from the book: Geospatial Power Tools – Open Source GDAL/OGR Command Line Tools by me, Tyler Mitchell.  The book is a comprehensive manual as well as a guide to typical data processing workflows, such as the following short sample…

The bonus with this utility is that it also creates a basic web mapping application that you can start using right away.

The script is designed to use georeferenced rasters, however, any raster should also work with the right options. The (georeferenced) Natural Earth raster dataset is used in the first examples, with a non-georeferenced raster at the end.

There are many options to tweak the output and setup of the map services; see the complete gdal2tiles chapter for more information.

Minimal TMS Generation

At the bare minimum an input file is needed:

gdal2tiles.py NE1_50M_SR_W.tif
Generating Base Tiles:
0...10...20...30...40...50...60...70...80...90...100 - done.
Generating Overview Tiles:
0...10...20...30...40...50...60...70...80...90...100 - done.

The output created is the same name as the input file, and include an array of sub-folders and sample web pages:

NE1_50M_SR_W
NE1_50M_SR_W/0
NE1_50M_SR_W/0/0
NE1_50M_SR_W/0/0/0.png
NE1_50M_SR_W/1
...
NE1_50M_SR_W/4/9/7.png
NE1_50M_SR_W/4/9/8.png
NE1_50M_SR_W/4/9/9.png
NE1_50M_SR_W/googlemaps.html
NE1_50M_SR_W/openlayers.html
NE1_50M_SR_W/tilemapresource.xml

Open the openlayers.html file in a web browser to see the results.

The default map loads a Google Maps layer, it will complain that you do not have an appropriate API key setup in the file, ignore it and switch to the OpenStreetMap layer in the right hand layer listing.

 

The resulting map should show your nicely coloured world map image from the Natural Earth dataset. The TMS Overlay option will show in the layer listing, so you can toggle it on/off to see that it truly is loading. Figure 5.2 (above) shows the result of our gdal2tiles command.


Geospatial Power Tools is 350+ pages long – 100 of those pages cover these kinds of workflow topic examples.  Each copy includes a complete (edited!) set of the GDAL/OGR command line documentation as well as the following topics/examples:

Workflow Table of Contents

  1. Report Raster Information – gdalinfo 23
  2. Web Services – Retrieving Rasters (WMS) 29
  3. Report Vector Information – ogrinfo 35
  4. Web Services – Retrieving Vectors (WFS) 45
  5. Translate Rasters – gdal_translate 49
  6. Translate Vectors – ogr2ogr 63
  7. Transform Rasters – gdalwarp 71
  8. Create Raster Overviews – gdaladdo 75
  9. Create Tile Map Structure – gdal2tiles 79
  10. MapServer Raster Tileindex – gdaltindex 85
  11. MapServer Vector Tileindex – ogrtindex 89
  12. Virtual Raster Format – gdalbuildvrt 93
  13. Virtual Vector Format – ogr2vrt 97
  14. Raster Mosaics – gdal_merge 107

Create a Union VRT from a folder of Vector files

The following is an excerpt from the book: Geospatial Power Tools – Open Source GDAL/OGR Command Line Tools by me, Tyler Mitchell.  The book is a comprehensive manual as well as a guide to typical data processing workflows, such as the following short sample…

The real power of VRT files comes into play when you want create virtual representations of features as well.  In this case, you can virtually tile together many individual layers as one.  At the present time you cannot do this with a single command line but it only takes adding two simple lines to the VRT XML file to make it start working.

Here we want to create a virtual vector layer from all the files containing lines in the ne/10m_cultural folder.

First, to keep it simple, create a folder and copy in only the files we are interested in:

mkdir ne/all_lines 
cp ne/10m_cultural/*lines* ne/all_lines

Then we can create our VRT file using ogr2vrt as shown in the previous example:

python ogr2vrt.py -relative ne/all_lines all_lines.vrt

If added to QGIS at this point, it will merely present a list of four layers to select to load. This is not what we want.

So next we edit the resulting all_lines.vrt file and add a line that tells GDAL/OGR that the contents are to be presented as a unioned layer with a given name (i.e. “UnionedLines”).

The added line is the second one below, along with the closing line second from the end:

<OGRVRTDataSource>
 <OGRVRTUnionLayer name="UnionedLines">
  <OGRVRTLayer name="ne_10m_admin_0_boundary_lines_disputed_areas">
   <SrcDataSource relativeToVRT="1" shared="1">
   ...
   <Field name="note" type="String" src="note" width="200"/>
  </OGRVRTLayer>
 </OGRVRTUnionLayer>
</OGRVRTDataSource>

Now loading it into QGIS automatically loads it as a single layer but, behind the scenes, it is a virtual representation of all four source layers.

On the map in Figure 5.8 the unionedLines layer is drawn on top using red lines, whereas all the source files (that I manually loaded) are shown with a light shading. This shows that the new virtual layer covers all the source layer features.

Unioned OGR VRT layers - source layers beneath final resulting merged layer
Unioned OGR VRT layers – source layers beneath final resulting merged layer

 


Geospatial Power Tools is 350 pages long – 100 of those pages cover these kinds of workflow topic examples.  Each copy includes a complete (edited!) set of the GDAL/OGR command line documentation as well as the following topics/examples:

Workflow Table of Contents

  1. Report Raster Information – gdalinfo 23
  2. Web Services – Retrieving Rasters (WMS) 29
  3. Report Vector Information – ogrinfo 35
  4. Web Services – Retrieving Vectors (WFS) 45
  5. Translate Rasters – gdal_translate 49
  6. Translate Vectors – ogr2ogr 63
  7. Transform Rasters – gdalwarp 71
  8. Create Raster Overviews – gdaladdo 75
  9. Create Tile Map Structure – gdal2tiles 79
  10. MapServer Raster Tileindex – gdaltindex 85
  11. MapServer Vector Tileindex – ogrtindex 89
  12. Virtual Raster Format – gdalbuildvrt 93
  13. Virtual Vector Format – ogr2vrt 97
  14. Raster Mosaics – gdal_merge 107

Spatialguru change on Twitter/Google Plus accounts

As a result of moving slightly away from “spatial” as a core focal area in my day-to-day work at Actian.com – (I do way more with Hadoop than spatial these days),  I started a new Twitter account with a less domain specific name.

My original Twitter account was spatialguru – I still use it, but less often than before . Now I’m using 1tylermitchell instead.

When I started calling myself spatialguru it was a bit of an inside joke around our home, I didn’t think it would still around this long.  :) Anyway, follow my new account if you want to see more about what I’m reading, etc.

Similarly, I have tried to migrate my previous Google plus account – tmitchell.osgeo – to a new one here.  Add me to your circles and I’ll probably add you to mine if you aren’t already.

Now, what to do about this blog name.. hmm.. more to come.

– Tyler

Geospatial Power Tools book cover

Query Vector Data Using a WHERE Clause – ogrinfo

The following is an excerpt from the book: Geospatial Power Tools – Open Source GDAL/OGR Command Line Tools by Tyler Mitchell.  The book is a comprehensive manual as well as a guide to typical data processing workflows, such as the following short sample…

Use SQL Query Syntax with ogrinfo

Use a SQL-style -where clause option to return only the features that meet the expression. In this case, only return the populated places features that meet the criteria of having NAME = ’Shanghai’:

$ ogrinfo 10m_cultural ne_10m_populated_places -where "NAME = 'Shanghai'"

... 
Feature Count: 1 Extent: (-179.589979, -89.982894) - (179.383304, 82.483323) 
... 
OGRFeature(ne_10m_populated_places):6282
 SCALERANK (Integer) = 1 
 NATSCALE (Integer) = 300 
 LABELRANK (Integer) = 1 
 FEATURECLA (String) = Admin-1 capital 
 NAME (String) = Shanghai
... 
 CITYALT (String) = (null) 
 popDiff (Integer) = 1 
 popPerc (Real) = 1.00000000000 
 ls_gross (Integer) = 0 
 POINT (121.434558819820154 31.218398311228327)

Building on the above, you can also query across all available layers, using the -al option and removing the specific layer name. Keep the same -where syntax and it will try to use it on each layer. In cases where a layer does not have the specific attribute, it will tell you, but will continue to process the other layers:

   ERROR 1: 'NAME' not recognised as an available field.

NOTE: More recent versions of ogrinfo appear to not support this and will likely give FAILURE messages instead.


Geospatial Power Tools is 350 pages long – 100 of those pages cover these kinds of workflow topic examples.  Each copy includes a complete (edited!) set of the GDAL/OGR command line documentation as well as the following topics/examples:

Workflow Table of Contents

  1. Report Raster Information – gdalinfo 23
  2. Web Services – Retrieving Rasters (WMS) 29
  3. Report Vector Information – ogrinfo 35
  4. Web Services – Retrieving Vectors (WFS) 45
  5. Translate Rasters – gdal_translate 49
  6. Translate Vectors – ogr2ogr 63
  7. Transform Rasters – gdalwarp 71
  8. Create Raster Overviews – gdaladdo 75
  9. Create Tile Map Structure – gdal2tiles 79
  10. MapServer Raster Tileindex – gdaltindex 85
  11. MapServer Vector Tileindex – ogrtindex 89
  12. Virtual Raster Format – gdalbuildvrt 93
  13. Virtual Vector Format – ogr2vrt 97
  14. Raster Mosaics – gdal_merge 107

From zero to HDFS in 60 min.

(Okay, so you can be up and running quicker if you have a better internet connection than me.)

Want to get your hands dirty with Hadoop related technologies but don’t have time to waste?  I’ve spent way too much time trying to get HBase, for example, running on my Macbook with Brew and wish I had just tried this Virtualbox approach before.

In this short post I show how easy it was for me to get an NFS share mounted on OSX – so I could transparently and simply copy files on HDFS without needing any special tools.   Here are the details… Continue reading

JDBC syntax for Matrix/Paraccel Driver

Need to “Perform Big Data Analytics at Massive Scale?”  The Actian Analytics Platform includes the Matrix high performance analytics database (formerly known as Paraccel).

I’ve seen some people asking online for what JDBC URL syntax is.  If you are using the JDBC driver be sure to read the README which gives the details:

The driver recognises JDBC URLs of the form: jdbc:paraccel:database
jdbc:paraccel://host/database
jdbc:paraccel://host:port/database

Also, you can supply both username and passwords as arguments, by appending them to the URL. For example:
jdbc:paraccel:database?user=me
jdbc:paraccel:database?user=me&password=mypass

Notes:
1) If you are connecting to localhost or 127.0.0.1, you can leave the host out of the URL; that is, jdbc:paraccel://localhost/mydb can be replaced with jdbc:paraccel:mydb
2) The port defaults to 5439 if it is not specified.

Note that this driver works well with the Node.js JDBC module!

 

Leveraging Analytics for Personal Health

I spent the last 6 months undergoing some dramatic health changes (ping me for details), primarily diet, and now I’m getting around to refactoring my fitness.  Naturally, I want to try some apps that both collect lots of sensor data but that also present it back to me in a meaningful (and hopefully motivating) way.

While I’m not sure that I’d classify all the sensor data from my iPhone as a “big data” stream, it has some common attributes – in particular, that I want to keep it all and it will grow endlessly and never stop (to quote Actian’s CTO).  So as I’m moving around, walking, running and more, I want to capture that.  Then I want to use whatever tools I want to process, analyse and visualise it.

If you’re already into fitness than you’ve already hear about the myriad of devices waiting on store shelves for you to pick up – from pedometers to bathroom scales – and strap to your persons.  Internet of Things anyone?  It’s not your fridge measuring energy and toaster monitoring temperature – it’s your wrist band monitoring your every footstep and your phone correlating it with your location, time of day and weather.

What a great time to be alive for a data junkie working in analytics and visualisation!

A Few Apps

So let’s talk devices, apps and services…

MapMyFitness app on iPhone 5
MapMyFitness app on iPhone 5

One of the most popular ones I hear about is MapMyFitness.  You can read lots about it elsewhere.  I used it to track some walks and runs, but mainly to share with a friend who does the same (keeping an eye on the competition!).  I’ve used it on and off for a few years to keep track of walks and gym visits.  I only used it on my iPhone and found GPS mapping was really good.  I particularly like the elevation gain stats and the ability to select from past routes.

 

Moves app on iPhone 5.
Moves app on iPhone 5.

I briefly tried another app called Moves and it had a really interesting interface and visualiser.  All computed as automatically as possible, guesstimating where you are at (Home vs Gym) and what you are doing.  It’s neat, I might try it again someday now that I’m actually doing more than just driving around.

I see there are lots of other online services and apps that can read from or populate both these apps.  Do you see how the line between data, application and services continues to blur.  Just as the “open data” revolution has rocked governments we’ll see “open monitoring” apps rock our personal lives.  Or at least thats what I’m experiencing.

When (re)starting my fitness programme I had to decide what to do and what to stick with.  My main goal was to track general movement, but also to get good stats and motivation while walking or running.  Since I’d already used some of these other services, I figured it wouldn’t hurt to go with something else.  Plus, getting a gift for a birthday always helps :)

Nike+ Fuelband

So I went with the Nike+ Fuelband and have found it very useful.

Nike+ Fuelband activity tracker
Nike+ Fuelband activity tracker

In a nutshell, I’ve found that it:

  • Is unobtrusive
  • Fits comfortable (adjustable too)
  • Uses barely any power
  • Syncs constantly even during a workout
  • Doubles as a watch
  • Comes with some interesting apps – on phone, in browser and feedback on the band itself

I wasn’t familiar with NIKEFUEL – a single unit measurement that sums up all your activity.  It’s handy because you can compare your FUEL numbers with someone else on equal footing.  It roughly converts to calories but is more of a sum-total picture.

Nike+ Fuelband App on iPhone 5
Nike+ Fuelband App on iPhone 5

The Phone App

On the phone I fire up the Fuelband app and it syncs with the band and updates all my stats.  The basic stats screen shows how close I am to my FUEL points for the day, while changing the colour of items to also bring the point home that I’m aiming for “green”.  Other screens show how I’m doing by the week, month and year.

It’s all about goals.  I’ve found it recommended a good starting goal (which is still pushing me beyond what I was doing before this).  And it also recommended adding additional goals when I saw that I was starting to run (i.e. Try running 3x per week).

Nike+ Fuelband app on iPhone 5 - activity stats and motivations
Nike+ Fuelband app on iPhone 5 – activity stats and motivations

They also have various motivations throughout the app – “awards” and comparisons with others in my age bracket during the same time of day.  I’m really impressed with how they’ve leverage crowdsourced stats, while also keeping them anonymised yet useful for me.

I’ll show you how to can access that raw data in another post – but for now, enjoy the screenshots of their apps, knowing you can also build your own!

For most of the first week I only used the phone app and the band directly.  By clicking the single button on the band it displays time and daily stats: fuel, steps, hours won (hours in which you were moderately active for 5 minutes) and calories.  It has a great little coloured indicator that also changes as you reach toward “green”.  This kind of feedback completes the full circle of – monitor, analyse and act – what so much of big data analytics aims to do.. but I digress…

Nike+ Fuelband - goal status indicators
Nike+ Fuelband – goal status indicators

Web Apps

But the device and the mobile app are only part of the package.  By creating a Nike+ account you get access to all your Fuelband app data (and the Nike+ Running App data if you are using it. More on that another time I think.)

nikeplus-dashThe site includes a daily dashboard view, which is excellent – providing a good mix of high level goal achievement with meaningful visuals and links to more summarised weekly or monthly info logs.  This is a very useful free service and could be worth the price of a device in and of itself.

 

nikeplus-weeklyUp Next

Aside from using the Fuelband app, you can also use the Nike+ Running app, along with the Nike+ site, which uses your iPhone to collect everything.  More on that in another post.  I’ll also give an intro to using Nike+ developer API to get access to my data!

 

Analytics, spatial, books, etc. from Tyler Mitchell's perspective