Need to remove an embedded image from a PDF file? You can easily chop out parts of it as needed with the PDFtk command line tool and a little bit of text editing. Here’s how…
When GIS applications started allowing us to save maps into PDF files, I was happy for about 2 minutes. The first 48MB PDF sent to a client (by “sent” I mean burned to a CD and mailed) wasn’t even viewable on their low-powered computer.
Now, more than 12 years later I’m still dealing with a similar PDF challenge: how to remove unwanted elements from the document. This time I had a PDF that included some vector graphics on top of an orthophoto image background. As I was going to convert the PDF to a TIFF and georeference it in QGIS for someone else to use (see below), I wanted to drop the embedded image.
Remove Image from PDF Map
I’m a big fan of the PDFtk command line toolkit (aka PDFtk Server) as it is cross platform and has never let me down when splicing, chopping, concatenating and now modifying PDF files. So I was thrilled to read this tip from quickpdf.org forums:
1. Uncompress the original PDF using pdftk:
pdftk png_example.pdf output step1.pdf uncompress
2. Edit step1.pdf (using commandline edit.exe):
Find the object of interest, in this case 5 0 obj.
Remove all references to this object by deleting the
lines associated with it, including:
/XOBJ7BC610 5 0 R
Everything between the line starting with 5 0 obj
Save the edited file to step2.pdf.
3. The xref table in step2.pdf is now corrupt, so we need to fix it. We'll also recompress the PDF:
pdftk step2.pdf output step3.pdf compress
The final file, step3.pdf, is now a valid, compressed PDF, without the unwanted image (we left the text description below the spot where the image was). We can now use QuickPDF to add a new image in its place.
That is precisely what I did and it worked well. The only gotcha was that I had never edited an uncompress PDF file and, obviously, my PDF wasn’t the same as the tip example.
I was able to find the section where the only RGB image started – it was “4 0 obj” and the “endobj” was about 400 lines further down. I deleted those, ran the compress step and was off to the races.
PDF to TIFF Command Line Tool on OSX
As a bonus for reading this far, I then converted the PDF to TIFF so I could use it in QGIS. I used the built-in OSX command line tool:
sips -s format tiff step3.pdf --out out_nogcp.tif
Incidentally, using QGIS Georeferencer plugin, I was only four clicks away from thin plate spline transforming the TIFF file into a georeferenced image with transparency. Now I could put whatever background image I wanted behind it.