Image Segmentation

for Landscape and Landcover Classification

using Deep Learning

By Daniel Buscombe,
Marda Science::Analytics

Supported by USGS Community for Data Integration and USGS Coastal and Marine Hazards and Resources Program.

Prepared for the CSDMS2020 – Linking Ecosphere and Geosphere virtual meeting, May 20 - 21st 2020

Click on the arrow in the bottom corner to begin

What is this and who is this for?

The background image is an image of a harbor overlain with a semi-transparent binary mask, where land is white and sea is black. This mask has been generated automatically by a model that has learned how to carry out the task. This course is about how to make a model like that. The process of delineating an image into groups of pixels is called segmentation.

Hopefully you are here because you want to learn how to do segmentation of geo- and eco-scientific images using deep learning. This course is for anybody with interest in how to segment images using deep neural networks, especially those working with images of natural and human-altered landscapes and landforms.

Image segmentation using U-Nets

We'll carry out two types of image segmentation; binary and multiclass. Binary segmentation is automatically delineating imagery into two classes (the thing you are interested in, which is the target class, and everything else, which is the background class). This is done at the pixel level, so every pixel is classified into one of two categories. Multiclass segmentation extends this to more than two classes. We will use the same image segmentation model - called a U-Net - for binary and multiclass problems.

Deep learning is a set of methods in machine learning that uses very large neural networks to automatically extract features from imagery then classify them. This course will assume you already know a little about python, that you've heard of deep learning and machine learning, and you've identified these tools as ones you'd like to gain practical experience using together.

This course runs through a interactive python case studies on how to apply a specific Deep Learning model to imagery, for the purposes of semantic segmentation. The U-Net model is a powerful model for landscape-scale image segmentation. A few different publicly available datasets are used. This course will show you how to prepare different types of data and associated labels, then train a U-Net on each dataset to segment a specific target class.


This course uses the python programming language and you are expected to have basic knowledge of python syntax and concepts. You are also expected to already understand the basics of what machine learning and deep learning are.

We will also be using Tensorflow 2 with keras, and other common scientific python packages such as numpy, matplotlib and scikit-image. Any prior experience with those or similar packages will be relevant here.

We will also assume you have some basic familiarity with digital imagery and have an interest in making scientific measurements from imagery, which is why you are here. To get started with the basics of Tensorflow and Keras, we put together a Google Colab tutorial for you here

What to expect from this course

1. A supervised image segmentation task using pairs of images and label masks that have been made manually.

2. A deep learning model from scratch using python code running on a cloud computer, using a jupyter notebook running on google colab

3. Evaluating model performance, by comparing the estimated versus observed label masks. In other words, we'll reserve a set of test imagery aside with corresponding label masks, and compare the model estimates to the real thing

4. All data will be provided, but by using different datasets with different label formats and other considerations, the hope is that by the end of the course you will be able to apply these techniques to your own data and image segmentation tasks


We'll use a couple of different aerial UAV ("drone") datasets to illustrate different concepts and workflows. They each have a unique set of classes, and the data are stored in different ways.

First, we'll use the "aeroscapes" dataset consisting of aerial UAV color imagery and labels of thousands of 720 x 720 x 3 pixel images and ground-truth masks for 11 classes [bckgrnd, person, bike, car, drone, boat, animal, obstacle, construction, vegetation, road, and sky]. We'll build models for [vegetation] only, and for a multiclass segmentation of [bckgrnd, obstacle, construction, vegetation, road, and sky].

We'll also use another dataset consisting of aerial UAV imagery and labels made publicly available as the Semantic Drone Dataset. The imagery depicts more than 20 houses from nadir (i.e. "bird's eye") view acquired at an altitude of 5 to 30 meters above ground. We'll use this dataset to demonstrate one type of transfer learning, where we use the vegetation model trained on "aeroscapes" to initialize training for the same class using "semantic drone" data.

Dataset 1:

We will use 720 x 720 x 3 pixel imagery and corresponding labels in 2D image format. The dataset consists of 3269 images, randomly split into 1635 training images (50% of total), 817 validation images (25% of total), and 817 test images. Each image pixel has a order 1 - 10cm spatial resolution.

We will use imagery that is downsized to 512 x 512 x 3 pixels. The imagery is of a various natural and unnatural landscapes. We will segment just a subset of those classes [bckgrnd, obstacle, construction, vegetation, road, and sky].

Dataset 2:
Semantic Drone Dataset

We will use a small subset of the entire dataset, consisting of 100 images and associated color image labels. All data are order 1 - 10 cm spatial resolution, but varying depending on acquisition altitude (5 to 30 m). All images are 6000 x 4000 pixels.

We will use imagery that is downsized to 512 x 512 x 3 pixels. The imagery is of a suburban housing development. There are 20 classes, including the following natural and unnatural landcovers: 'tree', 'grass', 'other vegetation', 'dirt', 'gravel', 'rocks', 'water', and 'paved area'.

We will segment three classes, [grass, vegetation, trees] against a background of everything else.

Concepts we'll utilize

1. Working with and visualizing aerial imagery and label formats (2D and 3D color label imagery, both common formats to share labels)

2. Making custom image batch generators, which feed images into deep neural networks as they train (batch by batch)

3. Constructing and training U-Net models for binary segmentation, and making plots of validation results as the model trains

4. Model evaluation using metrics and data visualization

5. Transfer learning by using pretrained model weights, to 'hot start' a new model

6. Multiclass segmentation by merging outputs from multiple binary segmentations

1. Constructing a generic image segmentation model

Click on the link below. The link will launch a jupyter notebook in Google Colab

Constructing a generic image segmentation model

By the end of this first part we will have a generically applicable deep convolutional neural network model that can be used for a variety of image segmentation tasks at landscape and smaller scales. The model is based on the popular U-Net model. We'll see how well it works on segmenting vegetation in a couple of different aerial (UAV) datasets.

We'll also use transfer learning to initialize a model trained on a similar dataset/class, examining the benefits of 'warm starting' a model, which is transfering the weights of a model trained on one dataset to initiate the training of another

Part 1 recap

We went through a complete generic workflow in `keras` for binary image segmentation (binary in the sense that there are two classes; the class of interest and background) involving custom (and customizable) functions that I've developed. I've found this workflow has been tweakable to suit a number of image segmentation tasks.

2: Using a binary segmentation model for a multiclass segmentation problem

Click on the link below. The link will launch a jupyter notebook in Google Colab

Using a binary segmentation model for a multiclass segmentation problem

By the end of this second part we will have a generically applicable workflow for combining deep convolutional neural network models trained on binary classes. We will have seen how it is possible to combine the output predictions of all of those models into a custom multiclass segmentation of an image, and how to refine that segmentation.

Part 2 recap

We have hopefully convinced you of the benefits of a specific approach for segmentation of natural scenery; namely, treating classifications as a series of binary decisions. We'll treat each class separately by considering it against a background of "everything else". That way, we can evaluate each class independently, decide of what classes to use, and we have more options to evaluate what happens in regions predicted to be more than one thing.

The outputs of several UNet models, each trained for a separate class, were combined for a custom multiclass segmentation. Post-processing was used on those multiclass predictions, using median filters and conditional random fields, to refine those labels per image.

Going further

If you found this course useful, you should try the other Marda Science Deep Learning courses!

Click on the links below:

Advanced binary segmentation with U-Nets: detecting intertidal reefs