2. The Geospatial Data Abstraction Library

2.1. What is GDAL?

The Geospatial Data Abstraction Library (GDAL) is a set of tools for processing geospatial data on the command line. For bulk-processing of large satellite datasets, it’s an indispensable tool.

2.2. Setting up test data

2.2.1. Getting Sentinel-2 data

We’ll use data from Sentinel-2 as an example dataset here. The tile we’ll use is 36KWD imaged on the 2nd August 2018, which covers an area of Central Mozambique:

_images/36KWD_20180802.jpg

You can download this image here. Make sure that you’ve first signed up to the Copernicus Open Access Hub.

Save the Sentinel-2 file to an appropriate location. Once downloaded, cd to the file’s location and unzip it:

cd download_location/
unzip S2A_MSIL1C_20180802T073611_N0206_R092_T36KWD_20180802T094821.zip

Note

Feel free to download a different Sentinel-2 image that covers your area of interest. This will make the tutorial tougher as you’ll need to modify filenames and coordinates where appropriate. If you do decide to do this, we recommend finding a cloud-free Sentinel-2 image acquired after 6th December 2016, as prior to this date Sentinel-2 data have a slightly different file format. A good resource for downloading Sentinel-2 data is the Sentinel Hub EO Browser.

2.2.2. Sentinel-2 data format

Data from Sentinel-2 are delivered in the tricky-to-understand .SAFE format. Each .SAFE file is actually a directory, containing imagery and metadata. The .SAFE file can be navigated in the same manner as any other directory.

Here we’ll just aim to undetsand the most important elements:

  • S2A_MSIL1C_20180802T073611_N0206_R092_T36KWD_20180802T094821.SAFE/MTD_MSIL1C.xml: An XML file containing metadata including acquisiton time and quality assurance information. You can examine this data in a text editor such as gedit:
_images/MTD_TL.png
  • S2A_MSIL1C_20180802T073611_N0206_R092_T36KWD_20180802T094821.SAFE/GRANULE/: Sentinel-2 data are distributed as a fixed set of tiles or ‘granules’ (see image below). The GRANULE directory will contain one or more (before December 2016) directories for each of the tiles contained in the file.
  • S2A_MSIL1C_20180802T073611_N0206_R092_T36KWD_20180802T094821.SAFE/GRANULE/L1C_T36KWD_A016248_20180802T080402/MTD_TL.xml: An XML file containing metadata for the Sentinel-2 granule, including projection, resolution, extent, and quality assurance information. You can examine this data in a text editor such as gedit:
_images/MTD_MSIL1C.png
  • S2A_MSIL1C_20180802T073611_N0206_R092_T36KWD_20180802T094821.SAFE/GRANULE/L1C_T36KWD_A016248_20180802T080402/IMG_DATA/: The actual image data, in JPEG2000(.jp2) format. There is one file for each of the Sentinel-2 bands

Sentinel-2 data are distributed in two forms:

  • Level 1C: Top Of Atmosphere reflectance without a cloud mask.
  • Level 2A: Bottom Of Atmosphere reflectance and pixel classification including cloud-cover.

For the purposes of this tutorial we’ll use Level 1C data, but for many remote sensing tasks level 2A data are more appropriate (e.g. time series analysis). Level 1C data can be converted to level 2A with the sen2cor program.

For those intersted, there’s a lot more information on the format of Sentinel-2 data on the ESA website.

2.3. Showing image metadata

We can use GDAL to query image details using the command gdalinfo:

cd S2A_MSIL1C_20180802T073611_N0206_R092_T36KWD_20180802T094821.SAFE/GRANULE/L1C_T36KWD_A016248_20180802T080402/IMG_DATA/
gdalinfo T36KWD_20180802T073611_B8A.jp2

Driver: JP2OpenJPEG/JPEG-2000 driver based on OpenJPEG library
Files: T36KWD_20180802T073611_B8A.jp2
    T36KWD_20180802T073611_B8A.jp2.aux.xml
Size is 5490, 5490
Coordinate System is:
PROJCS["WGS 84 / UTM zone 36S",
    GEOGCS["WGS 84",
        DATUM["WGS_1984",
            SPHEROID["WGS 84",6378137,298.257223563,
                AUTHORITY["EPSG","7030"]],
            AUTHORITY["EPSG","6326"]],
        PRIMEM["Greenwich",0,
            AUTHORITY["EPSG","8901"]],
        UNIT["degree",0.0174532925199433,
            AUTHORITY["EPSG","9122"]],
        AXIS["Latitude",NORTH],
        AXIS["Longitude",EAST],
        AUTHORITY["EPSG","4326"]],
    PROJECTION["Transverse_Mercator"],
    PARAMETER["latitude_of_origin",0],
    PARAMETER["central_meridian",33],
    PARAMETER["scale_factor",0.9996],
    PARAMETER["false_easting",500000],
    PARAMETER["false_northing",10000000],
    UNIT["metre",1,
        AUTHORITY["EPSG","9001"]],
    AXIS["Easting",EAST],
    AXIS["Northing",NORTH],
    AUTHORITY["EPSG","32736"]]
Origin = (499980.000000000000000,7900000.000000000000000)
Pixel Size = (20.000000000000000,-20.000000000000000)
Corner Coordinates:
Upper Left  (  499980.000, 7900000.000) ( 32d59'59.32"E, 18d59'33.08"S)
Lower Left  (  499980.000, 7790200.000) ( 32d59'59.31"E, 19d59' 5.30"S)
Upper Right (  609780.000, 7900000.000) ( 34d 2'34.55"E, 18d59'22.50"S)
Lower Right (  609780.000, 7790200.000) ( 34d 2'57.49"E, 19d58'54.13"S)
Center      (  554880.000, 7845100.000) ( 33d31'22.67"E, 19d29'16.52"S)
Band 1 Block=640x640 Type=UInt16, ColorInterp=Gray
Overviews: 2745x2745, 1372x1372, 686x686, 343x343
Overviews: arbitrary
Image Structure Metadata:
    COMPRESSION=JPEG2000
    NBITS=15

With gdalinfo -stats we can also get summary statistics of the contents of the file:

gdalinfo T36KWD_20180802T073611_B8A.jp2

...
Band 1 Block=640x640 Type=UInt16, ColorInterp=Gray
Minimum=0.000, Maximum=10335.000, Mean=2140.866, StdDev=406.668
Overviews: 2745x2745, 1372x1372, 686x686, 343x343
Overviews: arbitrary
Metadata:
    STATISTICS_MAXIMUM=10335
    STATISTICS_MEAN=2140.8663662695
    STATISTICS_MINIMUM=0
    STATISTICS_STDDEV=406.66771229522
Image Structure Metadata:
    COMPRESSION=JPEG2000
    NBITS=15

2.3.1. Exercises

Using gdalinfo to answer the following questions:

  1. What file format are Sentinel-2 images provided in?
  2. What is the projection and extent of this Sentinel-2 tile?
  3. Is the resolution of all Sentinel-2 bands the same? Why?

2.4. Translating rasters

gdal_translate is a command line tool to convert between raster data formats and extents.

2.4.1. Converting between formats

The JPEG2000 format of Sentinel-2 data can be difficult to work with, so we might want to convert it to a GeoTiff:

gdal_translate -of GTiff T36KWD_20180802T073611_B8A.jp2 T36KWD_20180802T073611_B8A.tif

gdal_translate can be used to convert between many file formats: https://gdal.org/formats_list.html.

2.4.2. Clipping rasters

gdal_translate can also be used to clip a raster to a set of extents. Here, we’ll produce a raster of reduced size. We’ll output this as a GeoTiff.

gdal_translate -projwin 500000 7850000 550000 7800000 -of GTiff T36KWD_20180802T073611_B8A.jp2 T36KWD_20180802T073611_B8A_clipped.tif

In this example, the --projwin flag refers to a window specified in the same projection of the data given in the format <xmin ymax xmax ymin>.

Take a look at the output of gdalinfo (gdalinfo T36KWD_20180802T073611_B8A_clipped.tif) to confirm it’s worked. You can also open the newly produced image in QGIS: here I’ve displayed the reduced image in colour overlaid on the original image in greyscale:

_images/B8A_reduced.jpeg

2.4.3. Resampling

Spectral bands in Sentinel-2 data are not all collected as the same spatial resolution, with images variously provided at 10m, 20m and 60m. If we wanted to combine images with different resolutionsm (for example to build a classified image), we might want to resample one spectral band to match the resolution of another. For example, resampling our image to 60 m resolution:

gdal_translate -tr 60 60 -of GTiff T36KWD_20180802T073611_B8A.jp2 T36KWD_20180802T073611_B8A_resampled.tif

When resampling images, it’s a good idea to carefully consider the type of resampling. By default gdal_translate uses nearest-neighbor resampling, but in the case such as this we might prefer to use the mean average of input pixels. We can specify this with gdal_translate as follows:

gdal_translate -tr 60 60 -of GTiff T36KWD_20180802T073611_B8A.jp2 T36KWD_20180802T073611_B8A_resampledavearge.tif

Use gdalinfo and QGIS to verify that the output image has been resampled appropriately. We’ll return to resampling later.

2.4.4. Exercises

Using gdal_translate:

  1. Translate the green Sentinel-2 band to an image with the following properties:
    Extent: X: 600000-700000, Y: 7850000-7900000 Format: GeoTiff Resolution: 20 m
  2. Can you perform this operation in a single command?

2.5. Building a colour composite image

Having individual bands is useful, but often we want to view a colour image. For example, to build a red/green/blue natural colour image, we can use the following command:

gdalbuildvrt -separate -o T36KWD_20180802T073611_RGB.vrt T36KWD_20180802T073611_B04.jp2 T36KWD_20180802T073611_B03.jp2 T36KWD_20180802T073611_B02.jp2

This generates a .vrt file, which is a ‘virtual raster’. YOu can open this file as a raster in QGIS to view it.

2.5.1. Exercises

  1. Generate a false colour composit .vrt file with: band 8 as red, band 4 as green, band 3 as blue. What property does this image emphasise?
_images/B08_B04_B03_FCC.jpeg
  1. Generate a false colour composit .vrt file with: band 12 as red, band 8 as green, band 4 as blue. What property does this image emphasise? (Hint: What do you see to the lower-right of the image?)
_images/B12_B08_B04_FCC.jpeg

2.6. Changing raster projections

gdalwarp is similar to gdal_translate, but can also be used to convert between projections.

gdalwarp can replicate much of the functionality of gdal_translate, including converting between formats, clipping and resampling. In most cases you’ll want to use gdalwarp, as gdal_translate is limited to images that are already aligned.

AS a reprojection tool, GDAL can understand coordinate reference systems in a range of formats. Perhaps the simplest format is in the form of ‘EPSG codes’, a collection of numbers that refer to commonly used coordinate reference systems. For example, WGS84 has the code 4326, the British National Grid is 27700, and UTM36S/WGS84 is 32736. Find your EPSG code at spatialreference.org.

For example, Sentinel-2 data for the tile 36KWD are provided in UTM 36S. If we wanted to convert this to lat/lon coordinates (WGS84), we could run the following:

gdalwarp -t_srs EPSG:4326 -of GTiff T36KWD_20180802T073611_B8A.jp2 T36KWD_20180802T073611_B8A_wgs84.tif

The -t_srs flag is used to specify target coordinate reference set. Use gdalinfo to confirm that the image has be reprojected.

We can also clip the image with gdalwarp using the -te flag. This operates very similarly to the --projwin flag in gdal_translate, but note that the order of bounds in gdalwarp should be <xmin ymin xmax ymax>:

gdalwarp -t_srs EPSG:4326 -te 33.25 -19.75 33.75 -19.25 -of GTiff T36KWD_20180802T073611_B8A.jp2 T36KWD_20180802T073611_B8A_wgs84clipped.tif

We can combine reprojection with clipping and resampling (average), as follows:

gdalwarp -t_srs EPSG:4326 -te 33.25 -19.75 33.75 -19.25 -tr 0.01 0.01 -r average -of GTiff T36KWD_20180802T073611_B8A.jp2 T36KWD_20180802T073611_B8A_wgs84clippedresampled.tif

See:

_images/B8A_warped.jpeg

2.7. Mosaicking data

Remote sensing images such as those from Sentinel-2 are commonly provided in tiles. If we want to generate a wall-to-wall land cover map, we need to stitch these images together. Here we’ll do that with two adjacent Sentinel-2 tiles.

First, download some data. As an example, we’ll use data from the tile 36KWC, which is to the south of the 36KWD tile which we’ve been looking at up to this point. You can download the tile from ESA at this link.

For this task we’ll use gdal_merge.py. Unfortunately, gdal_merge.py doesn’t understand JPEG2000 files, so first we’ll have to convert inputs to GeoTiffs:

gdal_translate -of GTiff S2A_MSIL1C_20180802T073611_N0206_R092_T36KWC_20180802T094821.SAFE/GRANULE/L1C_T36KWC_A016248_20180802T080402/IMG_DATA/T36KWC_20180802T073611_B8A.jp2 T36KWC_20180802T073611_B8A.tif
gdal_translate -of GTiff S2A_MSIL1C_20180802T073611_N0206_R092_T36KWD_20180802T094821.SAFE/GRANULE/L1C_T36KWD_A016248_20180802T080402/IMG_DATA/T36KWD_20180802T073611_B8A.jp2 T36KWD_20180802T073611_B8A.tif

gdal_merge.py -o B8A_combined.tif -of GTiff T36KWC_20180802T073611_B8A.tif T36KWD_20180802T073611_B8A.tif

Use gdalinfo or QGIS to confirm that you’ve generated a seamless mosaic.

2.7.1. Exercise (advanced!)

Using gdalwarp, gdal_translate, gdal_merge.py, and gdalbuildvrt, build a false colour composite image in .vrt format from tiles 36KWC and 36KWD. Give it the following properties:

  • CRS: WGS84
  • Extent: X: 33.25 to 33.75, Y: -20.25 to -19.75
  • Resolution: 0.001 degrees
  • Red = Band 8, Green = Band 4, Band 3 = Band 3