Semantic Segmentation of Geospatial Imagery with Deep Learning Part 5 – Training the Model

Introduction

In the previous post, we have defined the U-Net architecture and defined a class to access our training samples. Hence, we are ready to train the model. This first requires to split up our image crops into different datasets each having a different role in the training and evaluation process. Second, we can train the model based on these datasets and already obtain a first indication about model performance. This blog post is highly influenced by the excellent tutorial posted by PyImageSearch and this tutorial of the official documentation.

Creating Training, Validation and Testing Datasets

There are three datasets that need to be created. First, the training dataset is used to adjust the weights of the model. Second, the validation dataset gives a good performance measure during training to determine, e.g., whether the model overfits the training dataset. Third, the testing dataset does not take part in the training process at all but rather provides an independent measure of how well the model performs. The testing dataset also provides a good indication about the performance of different models. We are going to use 60% of all samples for the training dataset, 20% for the validation dataset and another 20% for the testing dataset. The following script can be used for this purpose:

from sklearn.model_selection import train_test_split

def write_image_paths(images, location):
    with open(location, "w") as images_file:
        for image in images:
            images_file.write(image)
            images_file.write("\n")
       
    
# read all our potential training data
image_dataset_path =  "./samples/swissimage_paths.txt"
mask_dataset_path = "./samples/mask_paths.txt"

with open(image_dataset_path) as images_file:
    image_paths = images_file.readlines()
    images = [image_path.rstrip() for image_path in image_paths]

with open(mask_dataset_path) as mask_file:
    mask_paths = mask_file.readlines()
    masks = [mask_path.rstrip() for mask_path in mask_paths]
  
    
# split the datasets into training and validation subsets
train_validation_splitted = train_test_split(images, masks, test_size=0.2, random_state=123)

images_train_temp = train_validation_splitted[0]
masks_train_temp = train_validation_splitted[2]

images_validation = train_validation_splitted[1]
masks_validation = train_validation_splitted[3]

write_image_paths(images_validation, "./samples/validation_paths_images.txt")
write_image_paths(masks_validation, "./samples/validation_paths_masks.txt")


# split the remaining training dataset into the actual training dataset and test subsets
train_test_splitted = train_test_split(images_train_temp, masks_train_temp, test_size=0.25, random_state=456)

images_train = train_test_splitted[0]
masks_train = train_test_splitted[2]

images_test = train_test_splitted[1]
masks_test = train_test_splitted[3]

write_image_paths(images_test, "./samples/test_paths_images.txt")
write_image_paths(masks_test, "./samples/test_paths_masks.txt")

write_image_paths(images_train, "./samples/train_paths_images.txt")
write_image_paths(masks_train, "./samples/train_paths_masks.txt")

First, we are going to load the paths of our SWISSIMAGE samples and the paths of our lake mask samples as lists (lines 11-20). Then, we are using the train_test_split function of scikit-learn, which allows to conveniently split the dataset into a preliminary training dataset that now holds 80% of the samples and the validation dataset that holds the remaining 20% of the samples (line 24). Note that this selection is carried out in a random manner, but in such a way so that the SWISSIMAGE and lake mask samples still match. The resulting lists are being written to the disk as text files (lines 26-33). We use the train_test_split function a second time to split the preliminary training dataset into the final training dataset, which holds 60% of the original samples and the test dataset, which holds 20% (line 37). The respective files are then likewise written to the disk (lines 45-49).

The Training Loop

The training loop is used to gradually adjust the weights of the model. It does so by iterating over the whole training dataset multiple times. The code that can be used for this purpose is presented below:

from model import UNet

from dataset import SegmentationDataset

import torch
from torch.nn import BCEWithLogitsLoss
from torch.optim import Adam
from torch.utils.data import DataLoader

batch_size = 4
num_workers = 2
num_epochs = 100


def load_paths(path):
    with open(path) as images_file:
        image_paths = images_file.readlines()
        images = [image_path.rstrip() for image_path in image_paths]

    return images

    
def run():
    # load the training and validation datasets
    images_train = load_paths("./samples/train_paths_images.txt")
    masks_train = load_paths("./samples/train_paths_masks.txt")

    images_validation = load_paths("./samples/validation_paths_images.txt")
    masks_validation = load_paths("./samples/validation_paths_masks.txt")


    # set up the training and validation data loaders
    train_ds = SegmentationDataset(swissimage_paths=images_train, mask_paths=masks_train)
    validation_ds = SegmentationDataset(swissimage_paths=images_validation, mask_paths=masks_validation)

    train_loader = DataLoader(train_ds, shuffle=True, batch_size=batch_size, pin_memory=True, num_workers=num_workers)
    validation_loader = DataLoader(validation_ds, shuffle=False, batch_size=batch_size, pin_memory=True, num_workers=num_workers)


    # set up model and metaparameters
    unet = UNet().to("cuda")
    loss_function = BCEWithLogitsLoss()
    optimizer = Adam(unet.parameters(), lr=0.001)


    # calculate training and validation steps
    train_steps = len(train_ds) // batch_size
    validation_steps = len(validation_ds) // batch_size

    train_loss = []
    validation_loss = []


    # training loop
    for epoch in range(num_epochs):
        unet.train()

        total_train_loss = 0
        total_validation_loss = 0
        
        # loop over the training set and determine the training loss
        for (i, (swissimage, lake_mask)) in enumerate(train_loader):
            
            # zero out gradients
            optimizer.zero_grad()
            
            # make predictions
            (swissimage, lake_mask) = (swissimage.to("cuda"), lake_mask.to("cuda"))
            lake_mask_predicted = unet(swissimage.float())
            
            # calculate loss
            loss = loss_function(lake_mask_predicted.float(), lake_mask.float())

            # adjust gradients
            loss.backward()
            optimizer.step()

            total_train_loss += loss.item()
        
        
        # loop over the validation set and determine the validation loss
        with torch.no_grad():
            unet.eval()
            
            # loop over the validation set
            for (swissimage, lake_mask) in validation_loader:
                (swissimage, lake_mask) = (swissimage.to("cuda"), lake_mask.to("cuda"))

                # calculate loss
                lake_mask_predicted = unet(swissimage.float())
                loss = loss_function(lake_mask_predicted.float(), lake_mask.float())
                total_validation_loss += loss.item()
        
        
        # calculate the average training and validation loss
        average_train_loss = total_train_loss / train_steps
        average_validation_loss = total_validation_loss / validation_steps
        
        # keep track of losses
        average_train_loss_numpy = average_train_loss.cpu().detach().numpy()
        train_loss.append(average_train_loss_numpy)
        
        average_validation_loss_numpy = average_validation_loss.cpu().detach().numpy()
        validation_loss.append(average_validation_loss_numpy)
        
        # print the model training and validation information
        print("epoch", epoch, "of", num_epochs)
        print("train loss", average_train_loss_numpy, "validation loss", average_validation_loss_numpy)
        

    # save the trained unet model
    torch.save(unet, "model.pth")

    
if __name__ == "__main__":
    run()

First, we are going to load the training and validation dataset and instantiate the SegmentationDataset objects (lines 25-34). These datasets can then be used in the DataLoader class provided by PyTorch to access the image samples (lines 36 & 37). We are instantiating the U-Net model we implemented in the previous post and transfer it into video memory via the .to("cuda") method (line 41). As an optimizer, we use Adam with a learning rate of 0.001 and use binary cross entropy with logits as the loss function (lines 42 & 43). We calculate the number of steps it takes to iterate over the whole training dataset and the validation dataset (lines 47 & 48). For this purpose, we are using the batch_size, i.e., the number of samples that are fed into the model at once, in this case 4 (line 10). The actual training loop comprises lines 55 to 108. Each iteration of this loop is called an “epoch”. The number of epochs is defined as 100 (line 12). The model is first set into training mode (line 56) to make sure that the weights are being adjusted. Then we are iterating over all the batches provided by the DataLoader for training (line 62). In this loop we first zero out any gradients that might have been previously generated (line 65). Then we access the SWISSIMAGE and corresponding lake mask groundtruth images and feed them into the model after transferring them to video memory (lines 68 & 69). The model will produce its own prediction based on the SWISSIMAGE sample and stores it in the variable lake_mask_predicted (line 69). The loss function then calculates the loss by comparing the predicted mask with the groundtruth mask (line 72). Based on this loss, first backpropagation is performed and then the weights are being adjusted (lines 75 & 76). Once the training loop is finalized, gradient calculation is being disabled (line 82) and the model is set to evaluation mode (line 83). Then, we iterate over all the samples in the validation dataset (line 86) and compare the predictions of the U-Net with the groundtruth (lines 90-92). Once this has been carried out, the average training and validation losses for this epoch are being calculated, stored and printed to the console (lines 96-108). Finally, once the iteration over all epochs completes, the trained model is saved to disk (line 112).

Conclusion

In this post, we have discussed how the samples can be split up into training, validation and testing datasets and how the training loop in PyTorch allows one to train the model based on the training dataset. The validation dataset serves as a performance indicator during the training procedure. In the next post, we are going to have a look at how the testing dataset can be used to calculate more meaningful performance metrics and how the model can be used in practice.