From Locate Press:
“We don’t have a cover to tease you with, but we have started a website where you can learn more: http://pythongisbook.com · The PyQGIS Programmers Guide. Extending Quantum GIS with Python. This all ne…Read more…“
As in my previous post, you can see how a generic approach to taking irregularly spaced text format data may be applied to a wide range of contexts. In this post I show a particular example of taking some freely available LIDAR result data and processing it with the same approach.
Get the Data
To follow along, you can grab the same data I did from OpenTopography.org. You can find your way through their search tools for an area of interest.
My resulting data is in CSV/ASCII format, with no gridding applied. Grab the dataset and metadata from here.
Unzipping the file produces a 138MB text file called points.txt. This is already in a format that OGR can read, but to help it along just rename the file to points.csv and we don’t have to mess around changing the delimiter (as in previous post). Note the first line contains column headings too, so this is a perfect example to be using:
Create VRT file
In case it wasn’t clear previously, we create a VRT file so that gdal_grid (or any OGR application) knows how to find the X/Y coordinates in the CSV file. So we just create and tweak like in the previous example but include the field names that are already in points.txt, etc. Here is our new points.vrt file:
<GeometryField encoding="PointFromColumns" x="x" y="y" z="z"/>
Test results by running against ogrinfo:
ogrinfo points.vrt -al -so
INFO: Open of
Layer name: points
Feature Count: 2267774
Extent: (330000.000000, 7001270.110000) - (330589.740000, 7001999.990000)
Layer SRS WKT:
PROJCS["WGS 84 / UTM zone 7N",
x: String (0.0)
y: String (0.0)
z: String (0.0)
classification: String (0.0)
gpstime: String (0.0)
scan_angle: String (0.0)
intensity: String (0.0)
number_of_returns: String (0.0)
return_number: String (0.0)
point_source_ID: String (0.0)
edge_of_flight_line_flag: String (0.0)
direction_of_scan_flag: String (0.0)
user_data: String (0.0)
red_channel: String (0.0)
green_channel: String (0.0)
blue_channel: String (0.0)
Grid the Data
Here’s the part that takes a while. With > 2 million rows it takes a bit of time to process, depending on what algorithm settings you use. To just use the default (inverse distance to a power) run it without an algorithm option:
gdal_grid -zfield z -l points points.vrt points.tif
Grid data type is "Float64"
Grid size = (256 256).
Corner coordinates = (329998.848164 7002001.415547)-(330590.891836 7001268.684453).
Grid cell size = (2.303672 2.851094).
Source point count = 2267774.
Algorithm name: "invdist".
Options are "power=2.000000:smoothing=0.000000:radius1=0.000000:
I also got good results from the using the nearest neighbour options as shown in the
gdal_grid docs. Note it creates a 256×256 raster by default. In the second example it shows how to set the output size (
-outsize 512 512). The second example took about an hour to process on my one triple core AMD 64 desktop machine with 8GB RAM:
gdal_grid -zfield z -outsize 512 512 \
-a nearest:radius1=0.0:radius2=0.0:angle=0.0:nodata=0.0 \
-l points points.vrt points_nearest_lrg.tif
Grid data type is "Float64"
Grid size = (512 512).
Corner coordinates = (329999.424082 7002000.702773)-(330590.315918 7001269.397227).
Grid cell size = (1.151836 1.425547).
Source point count = 2267774.
Algorithm name: "nearest".
Options are "radius1=0.000000:radius2=0.000000:angle=0.000000:nodata=0.000000"
A common question that seems to come up on the GDAL discussion lists seems to be:
Depending on what your source data looks like, you can deal with this through a three step process, but is fairly straightforward if you use the right tools. Here we stick to just using GDAL/OGR command line tools.
1. Convert XYZ data into CSV text
If the source data is in a grid text format with rows and columns, then convert it to the XYZ output format with
gdal_translate -of XYZ input.txt grid.xyz
If your source data is already in an XYZ format, or after above conversion, then change the space delimiters to commas (tabs and semicolon also supported). This is easily done on the command line using a tool like sed or manually in a text editor:
sed 's/ /,/g' grid.xyz > grid.csv
The results would look something like:
2. Create a virtual format header file
Step 3 will require access to the CSV grid data in an OGR supported format, so we create a simple XML text file that can then be referred to as a vector datasource with any OGR tools.
Create the file
grid.vrt with the following content, note the pointers to the source CSV file and the fields which are automatically discerned from the CSV file on-the-fly:
<GeometryField separator=" " encoding="PointFromColumns" x="field_1" y="field_2" z="field_3"/>
If the first row of the
grid.csv file includes custom field names, e.g. X, Y, Z, then substitute them for the field_1, field_2 and field_3 settings above.
3. Convert using gdal_grid
Now for the following step, convert this OGR vector VRT and CSV format into a GDAL raster format, while also interpolating any “missing” points in the grid. Various options are available for interpolating. See
gdal_grid command syntax for more information, a minimal example is shown here using all the defaults:
gdal_grid -zfield field_3 -l grid grid.vrt newgrid.tif
The output product will fill in any holes in the source grid. Several algorithms are available, the default uses the inverse distance to a power option.
Results shown in the following image.
Those of us involved with open source geospatial software already know that you must innovate or die. Okay, maybe that’s a bit dramatic, but I know many of us feel like we will die if we do not continually learn more and be challenged. I just wrote an article that tries to articulate some of the areas where we need to keep our muscle stretched. This is especially for those who are looking at their career in horror – wondering when their skills will be leap-frogged into oblivion by a recent grad who happened to spend more time learning on his/her own time.
Check out the article on GoGeomatics.ca. I would love your comments and feedback on the article.
Following from my last post, if you’re looking for a site for “framework” Canadian geodata, check out GeoBase.ca, including layers such as: hydrographic, road network, various admin boundaries, imagery, and more.. all for free download.
But today I’m pulling it in using their WMS option to dynamically view the data without downloading:
I hope we’ll see more provinces get in on the action and start putting more higher res imagery up for WMS feeds, but for now I’m not complaining. Keep up the good work! If you enjoy it too, let them know by encouraging them with good feedback.
By the way, anyone know of an Ontario higher res imagery WMS or download site?
GeoBase WMS connection:
I went hunting for a good set of Canada-wide WMS layers and (re)discovered some great work from NRCAN on the Toporama site
Good response time but also nice cartographic quality, well, at least if you want something that’s similar to the printed topographic map you may have on your wall. Good work as usual from NRCAN and co. Anyone know what they are serving it up with?
Here is their WMS connection.
This week we’re putting the final polish on Gary Sherman’s new book. For more info see Locate Press site:
Gary Sherman has done it again… Originally published as Desktop GIS, by Pragmatic Press, it quickly went out of print. Locate Press is proud to bring this work back into print by releasing this fully updated second edition. More information coming soon. See Gary’s website dedicated to his book – DesktopGISBook.com.
Leave a note here if you want to be informed when it is available.
Years ago I started writing some scripts with the end goal of doing a wholesale migration of my data into open source spatial databases. I since left that job and didn’t have much need for the scripts, but recently picked them back up again. I never made it to the migration part, but decided instead to focus on cataloging my GIS data so I could then create other apps to use the catalog for looking things up, creating overview maps, and ultimately to fuel migration scripts. The historical name for this project is somewhat irrelevant, for now, the Catalog portion of a broader plan I had called the Network Mapping Engine (NME).
My current efforts are available here: https://github.com/spatialguru/NME/tree/master/nme/cat
Want to try it? Ensure you have the prerequisite xmlgen and elementtree_pretty libs (included on the site) as well as a recent GDAL/OGR install with Python bindings. It’s reported to work well with the OSGeo4W distribution, though I have only tested it recently on Mac OSX 10.6.
The main feature is the gdalogr_catalogue.py script but the above dependencies will also have to be available in your PYTHONPATH.
The simple command line usage only requires one parameter – pointing to the folder that you want to catalog. It will also recursively scan the sub-folders as well.
python gdalogr_catalogue.py -d /tmp/data
The result is basic XML (more about the details of output in a moment):
<?xml version="1.0" ?>
['.svn', '.shx', '.dbf']
Mon Mar 19 14:42:35 2012
.... huge snip ....
Pipe the output into a new file or use the save-to-file output option:
python gdalogr_catalogue.py -d /tmp/data -f output.xml
About the XML
Most metadata standards include pretty high level information about datasets, but the focus of this project is to grab as low level data as possible and make it easily consumable by other applications. For example, consider how valuable hierarchical information about all the datasets on your system or in your enterprise would be. A map viewing application could access the catalog instead of having to repetitively scan folders and open files. It could fuel part of a geospatial search engine for that matter!
The original work created CSV text files, then it produced SQL statements, but now it has final settled on producing XML. There was no existing standard for this type of metadata, so I had to create one. Because it is built on top of GDAL/OGR, much of it parallels the GDAL/OGR API, but additional information is also captured about the process of cataloguing and file statistics. It’s still changing and growing but no huge changes are expected.
There are three primary sections in the root DataCatalogue XML element, and a couple sub-elements as well:
- CatalogueProcess – captures details of how the script was run, the operating system, timestamp, etc.
- VectorData – vector dataset details, format, location
- FileStats – information about the file or path, including user that owns it, modification date and a unique checksum
- VectorLayer – for each layer, spatial data details, format, extent
- RasterData – raster dataset details, format, location, dimensions
- RasterBand – for each band, data details, format, min/max values
I won’t review all the levels of elements but instead suggest you have a look at the sample_output.xml in your browser, along with the catalogue.xsl and css/catalogue.css, it renders to show some basic example data:
One of my goals is to have this XML output plug into other applications as the basis for an enterprise data catalog environment. My first goal is to add capabilities for generating a MapServer map file from the metadata. This would also allow the script to produce overview maps of the dataset extents. Another idea is to create a QGIS plugin to visualise and browse through the datasets, while also allowing option loading of the layer into QGIS.Have a better idea? Let me know and we can discuss it. Maybe I need to come up with a contest, hmm…
Updating the XML is another challenge on the horizon. I built in the creation of md5 hash based checksum so it can check for file changes without even having to open the file with GDAL/OGR, but there are no routines yet to allow it to update the XML in-place.
Obviously, there is more to come, as more people try it and let me know how they get on. Drop me a comment or email and I’d love to help you get it running if it fails in your environment!
Hidden feature alert! I guess I have been a good boy and use sudo more than I do log in as root, or I would have discovered this much earlier on. While updating some website software with a new version, I wanted to recursively copy files from a new tar ball into the old web site vhost location – while also forcing it to overwrite existing files.
Naturally, I used the
cp command, but hit some interesting glitch…
cp -rf new/* old/
Now that should suppress promptings for overwriting existing files right? Well, not always. Try logging in as
root on Fedora (and others?) and you will still find that you are prompted to confirm each overwrite. (This wasn’t the case on RHEL by the way.) Is it a bug in bash, or are you really just not typing it right? Neither! It’s not a bug, it’s a feature! Apparently us Linux admins need a little safeguarding from time to time. I’m sure I opt’ed in somewhere. So the default behaviour of these commands was overridden by an alias setting in .bashrc (Mr. Power Admin, meet Nanny State… 😉 ). There the
mv commands are set to force the –interactive feature instead of blindly accepting a
-rfwithout complaint. I won’t complain, but it was a waste of a few minutes. From .bashrc:
alias rm='rm -i'
alias cp='cp -i'
alias mv='mv -i'
Comment out the line you detest the most and open a new terminal, and your back to nuking your system with ease. Whew… I’m pretty sure this behaviour is part of some certification quiz question somewhere, to weed us novices from the cream of the crop 🙂 Keep your crop – I just want the power.