Converting from MrSid to TIFF is not a challenging process, however, it can be compute and storage intensive depending on how many MrSid files you have and how big they are. The steps in this guide walk you through the conversion process that is 100% in the cloud as well as some post conversion cleanup.
Assumptions
-
You have created a new GCP project or have an existing one.
-
You have installed the Google Cloud SDK.
-
You have initialized the SDK using gcloud init.
Step - Deploy a Fast GCE Instance
In this step we are preparing a virtual machine that will be used to process the MrSid files.
We use GCE here so that we can leverage much faster compute (30 vcpu's) than what we have locally, along with a fast SSD. This uses the install-docker.sh script attached to this post so that docker is running when we SSH into this instance. Use the c2 machine type for the fastest processor (at the time of this writing).
gcloud compute instances create tile-generator \ --image-family ubuntu-2004-lts \ --image-project ubuntu-os-cloud \ --machine-type c2-standard-30 \ --boot-disk-type pd-ssd \ --boot-disk-size 200GB \ --zone us-central1-a \ --scopes storage-rw \ --metadata-from-file startup-script=install-docker.sh
Step - Convert MrSid to TIFF
Connect to your new instance
gcloud compute ssh tile-generator
This step assumes that you have ~/myfiles/mrsid/ directories in your home directory with MrSid files. You can upload your files to GCS and then download them to this instance using gsutil. If you use a bucket in the same project, then this instance is configured to read and write to that bucket.
This step uses a docker image from Klokan Technologies that comes with MrSid drivers. If you wish to build that image yourself you can look at their Dockerfile for reference. It uses gdal_retile, so please review that for arguments that will best suit your goals.
The output from this command will create tiffs in ~/myfiles/tiffs.
sudo docker run --rm -v ~/:/data klokantech/gdal python3 /usr/local/bin/gdal_retile.py \
-ps 5000 5000 \
-co "COMPRESS=DEFLATE" \
-co "ZLEVEL=1" \
-co "NUM_THREADS=ALL_CPUS" \
-targetDir /data/myfiles/tiffs /data/myfiles/mrsid/my_mrsid_file.sid
If you need to exec into the container to see which files get mounted at /data you can use the following command (optional).
sudo docker run -it --rm -v ~/:/data klokantech/gdal /bin/bash
ls /data
Step - Validate The Conversion
Use gdal_info to interrogate a generated tile. You should look at the meta data to see if it's what you expect.
sudo docker run --rm -v ~:/data klokantech/gdal gdalinfo /data/myfiles/tiffs/my_mrsid_file_1_1.tif
Step - Delete Black Tiles (optional)
For our own STREAM:RASTER service, we prune black tiles ahead of time before starting ingestion. This python script can help you do that in bulk. During the conversion from MrSid you are very likely to get all black tiles, or tiles that appear all black but contain many pixels that are near black.
Install this script to get a list of all tiffs along with their percentage of black pixels.
git clone git@github.com:Woolpert/util-nodata.git
cd util-nodata
sudo apt install python3-pip --yes
pip3 install requirements.txt
First, safely delete the ones that are 100% black as they are of no use (this is a destructive command).
python main.py '/myfiles/tiffs' | awk '{ if ($1 == 1.0) { print $2} }' | xargs rm
Second, get a list of remaining tiles sorted by their percent of black. Manually open files that are in the high 90 percentile range and visually inspect them to find a breaking point so to speak where you start to find real imagery in a tiff. This breaking point will be different for every MrSid data source. The step after this one for example put that point at anything greater than 96% as safe to delete.
python main.py '/myfiles/tiffs' | sort -s -n -k 1,1
Third, optionally, delete all images above 96% black (this is a destructive command).
python main.py '/myfiles/tiffs' | awk '{ if ($1 > .96) { print $2} }' | xargs rm
Comments
0 comments
Please sign in to leave a comment.