Supervised Techniques¶

Pixel Classification¶

The intent of the classification process is to categorize all pixels in a digital image into one of several land cover classes, or "themes". This categorized data may then be used to produce thematic maps of the land cover present in an image. Normally, multispectral data are used to perform the classification and, indeed, the spectral pattern present within the data for each pixel is used as the numerical basis for categorization (Lillesand and Kiefer, 1994). The objective of image classification is to identify and portray, as a unique gray level (or color), the features occurring in an image in terms of the object or type of land cover these features actually represent on the ground.

With supervised classification, we identify examples of the Information classes (i.e., land cover type) of interest in the image. These are called "training sites". The image processing software system is then used to develop a statistical characterization of the reflectance for each information class. This stage is often called "signature analysis" and may involve developing a characterization as simple as the mean or the rage of reflectance on each bands, or as complex as detailed analyses of the mean, variances and covariance over all bands. Once a statistical characterization has been achieved for each information class, the image is then classified by examining the reflectance for each pixel and making a decision about which of the signatures it resembles most.

image.png

Principle of supervised learning for remote sensing data.

  • Each pixel is associated to features (ex: the spectral bands) AND known labels Y.

  • Knowledge about class membership is used in the feature space to train a learner.

  • Define a decision boundary.

  • Finally, the unknown pixels are classified with respect to the decision boundary found and a classification map is provided

image.png

Most popular Supervised learning Algorithms¶

Logistic Regression

Logistic Regression is used to estimate discrete values (usually binary values like 0/1) from a set of independent variables. It helps predict the probability of an event by fitting data to a logit function. It is also called logit regression.

Decision Tree

Decision Tree algorithm in machine learning is one of the most popular algorithm in use today; this is a supervised learning algorithm that is used for classifying problems. It works well classifying for both categorical and continuous dependent variables. In this algorithm, we split the population into two or more homogeneous sets based on the most significant attributes/ independent variables.

SVM (Support Vector Machine) Algorithm

SVM algorithm is a method of classification algorithm in which you plot raw data as points in an n-dimensional space (where n is the number of features you have). The value of each feature is then tied to a particular coordinate, making it easy to classify the data. Lines called classifiers can be used to split the data and plot them on a graph.

Naive Bayes Algorithm

A Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature. Even if these features are related to each other, a Naive Bayes classifier would consider all of these properties independently when calculating the probability of a particular outcome. A Naive Bayesian model is easy to build and useful for massive datasets. It's simple and is known to outperform even highly sophisticated classification methods.

KNN (K- Nearest Neighbors) Algorithm

This algorithm can be applied to both classification and regression problems. Apparently, within the Data Science industry, it's more widely used to solve classification problems. It’s a simple algorithm that stores all available cases and classifies any new cases by taking a majority vote of its k neighbors. The case is then assigned to the class with which it has the most in common. A distance function performs this measurement. KNN can be easily understood by comparing it to real life. For example, if you want information about a person, it makes sense to talk to his or her friends and colleagues!

Random Forest

A collective of decision trees is called a Random Forest. To classify a new object based on its attributes, each tree is classified, and the tree “votes” for that class. The forest chooses the classification having the most votes (over all the trees in the forest).

Gradient Boosting Algorithm and AdaBoosting Algorithm

These are boosting algorithms used when massive loads of data have to be handled to make predictions with high accuracy. Boosting is an ensemble learning algorithm that combines the predictive power of several base estimators to improve robustness. In short, it combines multiple weak or average predictors to build a strong predictor. These boosting algorithms always work well in data science competitions like Kaggle, AV Hackathon, CrowdAnalytix. These are the most preferred machine learning algorithms today. Use them, along with Python and R Codes, to achieve accurate outcomes.

Artificial Neural Networks

An artificial neural network (ANN) comprises ‘units’ arranged in a series of layers, each of which connects to layers on either side. ANNs are inspired by biological systems, such as the brain, and how they process information. ANNs are essentially a large number of interconnected processing elements, working in unison to solve specific problems. ANNs also learn by example and through experience, and they are extremely useful for modelling non-linear relationships in high-dimensional data or where the relationship amongst the input variables is difficult to understand.

As an example, Let's use GEE to download a Sentinel 2 image from the year 2020 and its respective LULC map. From this data we train a classifier and predict the LULC map of the year 2021 from an image of that year.

First, let's install and import the necessary packages. After that we will authenticate Drive and GEE.

In [ ]:
!pip install rasterio
Collecting rasterio
  Downloading rasterio-1.2.6-cp37-cp37m-manylinux1_x86_64.whl (19.3 MB)
     |████████████████████████████████| 19.3 MB 61.7 MB/s 
Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from rasterio) (1.19.5)
Requirement already satisfied: attrs in /usr/local/lib/python3.7/dist-packages (from rasterio) (21.2.0)
Collecting click-plugins
  Downloading click_plugins-1.1.1-py2.py3-none-any.whl (7.5 kB)
Requirement already satisfied: certifi in /usr/local/lib/python3.7/dist-packages (from rasterio) (2021.5.30)
Requirement already satisfied: click>=4.0 in /usr/local/lib/python3.7/dist-packages (from rasterio) (7.1.2)
Requirement already satisfied: setuptools in /usr/local/lib/python3.7/dist-packages (from rasterio) (57.2.0)
Collecting snuggs>=1.4.1
  Downloading snuggs-1.4.7-py3-none-any.whl (5.4 kB)
Collecting cligj>=0.5
  Downloading cligj-0.7.2-py3-none-any.whl (7.1 kB)
Collecting affine
  Downloading affine-2.3.0-py2.py3-none-any.whl (15 kB)
Requirement already satisfied: pyparsing>=2.1.6 in /usr/local/lib/python3.7/dist-packages (from snuggs>=1.4.1->rasterio) (2.4.7)
Installing collected packages: snuggs, cligj, click-plugins, affine, rasterio
Successfully installed affine-2.3.0 click-plugins-1.1.1 cligj-0.7.2 rasterio-1.2.6 snuggs-1.4.7
Requirement already satisfied: earthengine-api in /usr/local/lib/python3.7/dist-packages (0.1.277)
Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from earthengine-api) (1.15.0)
Requirement already satisfied: future in /usr/local/lib/python3.7/dist-packages (from earthengine-api) (0.16.0)
Requirement already satisfied: google-api-python-client<2,>=1.12.1 in /usr/local/lib/python3.7/dist-packages (from earthengine-api) (1.12.8)
Requirement already satisfied: google-auth>=1.4.1 in /usr/local/lib/python3.7/dist-packages (from earthengine-api) (1.34.0)
Requirement already satisfied: google-cloud-storage in /usr/local/lib/python3.7/dist-packages (from earthengine-api) (1.18.1)
Requirement already satisfied: httplib2<1dev,>=0.9.2 in /usr/local/lib/python3.7/dist-packages (from earthengine-api) (0.17.4)
Requirement already satisfied: httplib2shim in /usr/local/lib/python3.7/dist-packages (from earthengine-api) (0.0.3)
Requirement already satisfied: google-auth-httplib2>=0.0.3 in /usr/local/lib/python3.7/dist-packages (from earthengine-api) (0.0.4)
Requirement already satisfied: uritemplate<4dev,>=3.0.0 in /usr/local/lib/python3.7/dist-packages (from google-api-python-client<2,>=1.12.1->earthengine-api) (3.0.1)
Requirement already satisfied: google-api-core<2dev,>=1.21.0 in /usr/local/lib/python3.7/dist-packages (from google-api-python-client<2,>=1.12.1->earthengine-api) (1.26.3)
Requirement already satisfied: setuptools>=40.3.0 in /usr/local/lib/python3.7/dist-packages (from google-api-core<2dev,>=1.21.0->google-api-python-client<2,>=1.12.1->earthengine-api) (57.2.0)
Requirement already satisfied: requests<3.0.0dev,>=2.18.0 in /usr/local/lib/python3.7/dist-packages (from google-api-core<2dev,>=1.21.0->google-api-python-client<2,>=1.12.1->earthengine-api) (2.23.0)
Requirement already satisfied: pytz in /usr/local/lib/python3.7/dist-packages (from google-api-core<2dev,>=1.21.0->google-api-python-client<2,>=1.12.1->earthengine-api) (2018.9)
Requirement already satisfied: googleapis-common-protos<2.0dev,>=1.6.0 in /usr/local/lib/python3.7/dist-packages (from google-api-core<2dev,>=1.21.0->google-api-python-client<2,>=1.12.1->earthengine-api) (1.53.0)
Requirement already satisfied: protobuf>=3.12.0 in /usr/local/lib/python3.7/dist-packages (from google-api-core<2dev,>=1.21.0->google-api-python-client<2,>=1.12.1->earthengine-api) (3.17.3)
Requirement already satisfied: packaging>=14.3 in /usr/local/lib/python3.7/dist-packages (from google-api-core<2dev,>=1.21.0->google-api-python-client<2,>=1.12.1->earthengine-api) (21.0)
Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.7/dist-packages (from google-auth>=1.4.1->earthengine-api) (4.7.2)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.7/dist-packages (from google-auth>=1.4.1->earthengine-api) (0.2.8)
Requirement already satisfied: cachetools<5.0,>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from google-auth>=1.4.1->earthengine-api) (4.2.2)
Requirement already satisfied: pyparsing>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from packaging>=14.3->google-api-core<2dev,>=1.21.0->google-api-python-client<2,>=1.12.1->earthengine-api) (2.4.7)
Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /usr/local/lib/python3.7/dist-packages (from pyasn1-modules>=0.2.1->google-auth>=1.4.1->earthengine-api) (0.4.8)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0dev,>=2.18.0->google-api-core<2dev,>=1.21.0->google-api-python-client<2,>=1.12.1->earthengine-api) (3.0.4)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0dev,>=2.18.0->google-api-core<2dev,>=1.21.0->google-api-python-client<2,>=1.12.1->earthengine-api) (1.24.3)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0dev,>=2.18.0->google-api-core<2dev,>=1.21.0->google-api-python-client<2,>=1.12.1->earthengine-api) (2.10)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0dev,>=2.18.0->google-api-core<2dev,>=1.21.0->google-api-python-client<2,>=1.12.1->earthengine-api) (2021.5.30)
Requirement already satisfied: google-resumable-media<0.5.0dev,>=0.3.1 in /usr/local/lib/python3.7/dist-packages (from google-cloud-storage->earthengine-api) (0.4.1)
Requirement already satisfied: google-cloud-core<2.0dev,>=1.0.0 in /usr/local/lib/python3.7/dist-packages (from google-cloud-storage->earthengine-api) (1.0.3)
Collecting geopandas
  Downloading geopandas-0.9.0-py2.py3-none-any.whl (994 kB)
     |████████████████████████████████| 994 kB 6.8 MB/s 
Collecting pyproj>=2.2.0
  Downloading pyproj-3.1.0-cp37-cp37m-manylinux2010_x86_64.whl (6.6 MB)
     |████████████████████████████████| 6.6 MB 24.2 MB/s 
Requirement already satisfied: pandas>=0.24.0 in /usr/local/lib/python3.7/dist-packages (from geopandas) (1.1.5)
Requirement already satisfied: shapely>=1.6 in /usr/local/lib/python3.7/dist-packages (from geopandas) (1.7.1)
Collecting fiona>=1.8
  Downloading Fiona-1.8.20-cp37-cp37m-manylinux1_x86_64.whl (15.4 MB)
     |████████████████████████████████| 15.4 MB 37 kB/s 
Collecting munch
  Downloading munch-2.5.0-py2.py3-none-any.whl (10 kB)
Requirement already satisfied: certifi in /usr/local/lib/python3.7/dist-packages (from fiona>=1.8->geopandas) (2021.5.30)
Requirement already satisfied: cligj>=0.5 in /usr/local/lib/python3.7/dist-packages (from fiona>=1.8->geopandas) (0.7.2)
Requirement already satisfied: click-plugins>=1.0 in /usr/local/lib/python3.7/dist-packages (from fiona>=1.8->geopandas) (1.1.1)
Requirement already satisfied: setuptools in /usr/local/lib/python3.7/dist-packages (from fiona>=1.8->geopandas) (57.2.0)
Requirement already satisfied: click>=4.0 in /usr/local/lib/python3.7/dist-packages (from fiona>=1.8->geopandas) (7.1.2)
Requirement already satisfied: six>=1.7 in /usr/local/lib/python3.7/dist-packages (from fiona>=1.8->geopandas) (1.15.0)
Requirement already satisfied: attrs>=17 in /usr/local/lib/python3.7/dist-packages (from fiona>=1.8->geopandas) (21.2.0)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.7/dist-packages (from pandas>=0.24.0->geopandas) (2018.9)
Requirement already satisfied: numpy>=1.15.4 in /usr/local/lib/python3.7/dist-packages (from pandas>=0.24.0->geopandas) (1.19.5)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.7/dist-packages (from pandas>=0.24.0->geopandas) (2.8.2)
Installing collected packages: munch, pyproj, fiona, geopandas
Successfully installed fiona-1.8.20 geopandas-0.9.0 munch-2.5.0 pyproj-3.1.0
In [ ]:
import ee
ee.Authenticate()
ee.Initialize(project='my-project-1527255156007')
In [ ]:
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
In [ ]:
import folium
from folium import plugins
from IPython.display import Image
import rasterio
import numpy as np
from matplotlib import pyplot as plt
import cv2
from matplotlib import cm
import pandas as pd
import seaborn as sns
import json
from rasterio.mask import mask
from skimage.feature import graycomatrix, graycoprops

We defined our study area and the date of the images:

In [ ]:
AOI = ee.Geometry.Polygon(
        [[[5.516689929215324, 52.79285519485472],
          [5.516689929215324, 52.51126453687936],
          [6.181362780777824, 52.51126453687936],
          [6.181362780777824, 52.79285519485472]]])
In [ ]:
startDateviz_before = ee.Date.fromYMD(2020,3,1);
endDateviz_before = ee.Date.fromYMD(2020,10,31);
startDateviz_after = ee.Date.fromYMD(2021,1,1);
endDateviz_after = ee.Date.fromYMD(2021,7,31);

collectionviz_before = ee.ImageCollection("COPERNICUS/S2_HARMONIZED").filterDate(startDateviz_before,endDateviz_before).filterBounds(AOI).filterMetadata('CLOUDY_PIXEL_PERCENTAGE', 'less_than', 1);
collectionviz_after = ee.ImageCollection("COPERNICUS/S2_HARMONIZED").filterDate(startDateviz_after,endDateviz_after).filterBounds(AOI).filterMetadata('CLOUDY_PIXEL_PERCENTAGE', 'less_than', 1);

We selected the 2020 LULC dataset:

In [ ]:
esri_lulc10 = ee.ImageCollection("projects/sat-io/open-datasets/landcover/ESRI_Global-LULC_10m");

So we can use Folium to plot the data:

In [ ]:
colors =  ["#1A5BAB","#358221","#A7D282","#87D19E","#FFDB5C","#EECFA8","#ED022A","#EDE9E4","#F2FAFF","#C8C8C8"]
In [ ]:
esri_lulc10 = esri_lulc10.mosaic().clip(AOI).rename('LULC')
vis_params_lulc = {'min': 1, 'max': 10, 'palette':colors}
In [ ]:
S2_before = collectionviz_before.median().clip(AOI).divide(10000)
S2_after = collectionviz_after.median().clip(AOI).divide(10000)

vis_params = {'min': 0, 'max': 0.4, 'bands': ['B4', 'B3','B2']}
In [ ]:
basemaps = {
    'Google Satellite': folium.TileLayer(
        tiles = 'https://mt1.google.com/vt/lyrs=s&x={x}&y={y}&z={z}',
        attr = 'Google',
        name = 'Google Satellite',
        overlay = True,
        control = True
    ),
    'Google Terrain': folium.TileLayer(
        tiles = 'https://mt1.google.com/vt/lyrs=p&x={x}&y={y}&z={z}',
        attr = 'Google',
        name = 'Google Terrain',
        overlay = True,
        control = True
    )}
In [ ]:
def add_ee_layer(self, ee_object, vis_params, name):

    try:
        # display ee.Image()
        if isinstance(ee_object, ee.image.Image):
            map_id_dict = ee.Image(ee_object).getMapId(vis_params)
            folium.raster_layers.TileLayer(
            tiles = map_id_dict['tile_fetcher'].url_format,
            attr = 'Google Earth Engine',
            name = name,
            overlay = True,
            control = True
            ).add_to(self)
        # display ee.ImageCollection()
        elif isinstance(ee_object, ee.imagecollection.ImageCollection):
            ee_object_new = ee_object.mosaic()
            map_id_dict = ee.Image(ee_object_new).getMapId(vis_params)
            folium.raster_layers.TileLayer(
            tiles = map_id_dict['tile_fetcher'].url_format,
            attr = 'Google Earth Engine',
            name = name,
            overlay = True,
            control = True
            ).add_to(self)
        # display ee.Geometry()
        elif isinstance(ee_object, ee.geometry.Geometry):
            folium.GeoJson(
            data = ee_object.getInfo(),
            name = name,
            overlay = True,
            control = True
        ).add_to(self)
        # display ee.FeatureCollection()
        elif isinstance(ee_object, ee.featurecollection.FeatureCollection):
            ee_object_new = ee.Image().paint(ee_object, 0, 2)
            map_id_dict = ee.Image(ee_object_new).getMapId(vis_params)
            folium.raster_layers.TileLayer(
            tiles = map_id_dict['tile_fetcher'].url_format,
            attr = 'Google Earth Engine',
            name = name,
            overlay = True,
            control = True
        ).add_to(self)

    except:
        print("Could not display {}".format(name))

# Add EE drawing method to folium.
folium.Map.add_ee_layer = add_ee_layer
In [ ]:
my_map = folium.Map(location=AOI.centroid().coordinates().reverse().getInfo(), zoom_start=11)

# Add custom basemaps
basemaps['Google Terrain'].add_to(my_map)

# Add the elevation model to the map object.
my_map.add_ee_layer(esri_lulc10, vis_params_lulc, 'Netherlands LULC 2020')
my_map.add_ee_layer(S2_before, vis_params, 'Netherlands 2020')
my_map.add_ee_layer(S2_after, vis_params, 'Netherlands 2021')
my_map.add_child(folium.LayerControl())

# Display the map.
display(my_map)
Make this Notebook Trusted to load map: File -> Trust Notebook

Now let's download this data to Drive:

In [ ]:
S2_before = S2_before.addBands(esri_lulc10)
S2_before = S2_before.toFloat()
S2_after = S2_after.toFloat()
In [ ]:
image_before = S2_before.select(['B2','B3','B4','B5','B6','B7','B8','B8A','B11','B12','LULC']).reproject('EPSG:4326', None, 20)
image_after = S2_after.select(['B2','B3','B4','B5','B6','B7','B8','B8A','B11','B12']).reproject('EPSG:4326', None, 20)
In [ ]:
task = ee.batch.Export.image.toDrive(image=image_before,
                                         crs='EPSG:4326',
                                         scale=20,
                                         fileFormat='GeoTIFF',
                                         description='Netherlands_2020' ,
                                         maxPixels=1e13,
                                         folder='Datasets',
                                         region= AOI)
In [ ]:
task.start()
In [ ]:
task2 = ee.batch.Export.image.toDrive(image=image_after,
                                         crs='EPSG:4326',
                                         scale=20,
                                         fileFormat='GeoTIFF',
                                         description='Netherlands_2021' ,
                                         maxPixels=1e13,
                                         folder='Datasets',
                                         region= AOI)
In [ ]:
task2.start()

We are going to use the rasterio to import the data:

In [ ]:
path_before = '/content/drive/MyDrive/Datasets_CV/Netherlands_2020.tif'
path_after= '/content/drive/MyDrive/Datasets_CV/Netherlands_2021.tif'
In [ ]:
with rasterio.open(path_before) as src:
    im_before = src.read()
In [ ]:
with rasterio.open(path_after) as src:
    im_after = src.read()
    out_meta = src.meta.copy()
In [ ]:
im_before = im_before.transpose([1,2,0])
im_after = im_after.transpose([1,2,0])

The 2020 image contains the 10 selected Sentinel 2 bands plus LULC, and the 2021 image contains only the 10 Sentinel 2 bands:

In [ ]:
im_before.shape
Out[ ]:
(1571, 3701, 11)
In [ ]:
im_after.shape
Out[ ]:
(1571, 3701, 10)

As the 2020 data will be used for model training, let's call the spectral values of X and the LULC of Y.

In [ ]:
X = im_before[:,:,0:-1]
Y = im_before[:,:,-1]

Let's use the nan_to_num function to transform possible NaN values into zeros:

In [ ]:
X = np.nan_to_num(X)
Y = np.nan_to_num(Y)

We will also flatten the two-dimensional image (HxW) to a single-dimensional vector.

In [ ]:
flatten_X = X.reshape(X.shape[0]*X.shape[1],X.shape[2])
In [ ]:
flatten_Y = Y.flatten()

Here, we will delete the 0 values in both the spectral data matrix and the labels vector:

In [ ]:
flatten_X = np.delete(flatten_X, np.where(flatten_Y == 0), axis = 0)
In [ ]:
flatten_Y = np.delete(flatten_Y, np.where(flatten_Y == 0))

The variable X contains a matrix representing each pixel by each spectral band.

In [ ]:
flatten_X.shape
Out[ ]:
(5804108, 10)

The Y variable represents each pixel of the LULC target map:

In [ ]:
flatten_Y.shape
Out[ ]:
(5804108,)

Let's import some Sklearn functions:

In [ ]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report
from sklearn.metrics import cohen_kappa_score
from sklearn.metrics import confusion_matrix

Now we'll split the data into training data and testing data:

In [ ]:
X_train, X_test, Y_train, Y_test = train_test_split(flatten_X, flatten_Y, test_size = 0.3, random_state = 42)

To reduce memory usage, we can delete some variables that we will no longer use:

In [ ]:
del X,Y,flatten_X,flatten_Y

It's time to create the classifier instance. Let's use Random Forest, with 50 estimators:

In [ ]:
clf=RandomForestClassifier(n_estimators=50, n_jobs=8, verbose=3)

And then we can fit the classifier with training data:

In [ ]:
clf.fit(X_train,Y_train)
[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.
building tree 1 of 50
building tree 2 of 50
building tree 3 of 50
building tree 4 of 50
building tree 5 of 50
building tree 6 of 50
building tree 7 of 50building tree 8 of 50

building tree 9 of 50
building tree 10 of 50
building tree 11 of 50
building tree 12 of 50
building tree 13 of 50
building tree 14 of 50
building tree 15 of 50
building tree 16 of 50
building tree 17 of 50
building tree 18 of 50
building tree 19 of 50
building tree 20 of 50
building tree 21 of 50
building tree 22 of 50
building tree 23 of 50
[Parallel(n_jobs=8)]: Done  16 tasks      | elapsed:  7.0min
building tree 24 of 50
building tree 25 of 50
building tree 26 of 50
building tree 27 of 50
building tree 28 of 50
building tree 29 of 50
building tree 30 of 50
building tree 31 of 50
building tree 32 of 50
building tree 33 of 50
building tree 34 of 50
building tree 35 of 50
building tree 36 of 50
building tree 37 of 50
building tree 38 of 50
building tree 39 of 50
building tree 40 of 50
building tree 41 of 50
building tree 42 of 50
building tree 43 of 50
building tree 44 of 50
building tree 45 of 50
building tree 46 of 50
building tree 47 of 50
building tree 48 of 50
building tree 49 of 50
building tree 50 of 50
[Parallel(n_jobs=8)]: Done  50 out of  50 | elapsed: 18.2min finished
Out[ ]:
RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                       criterion='gini', max_depth=None, max_features='auto',
                       max_leaf_nodes=None, max_samples=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=50, n_jobs=8,
                       oob_score=False, random_state=None, verbose=3,
                       warm_start=False)

Due to the amount of data, it is possible that the training may take a little longer. When finished, we can perform test data prediction:

In [ ]:
y_pred=clf.predict(X_test)
[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done  16 tasks      | elapsed:   19.4s
[Parallel(n_jobs=8)]: Done  50 out of  50 | elapsed:   51.7s finished

Let's get the accuracy values from the training data and from the testing data:

In [ ]:
print("Accuracy on training set: {:.2f}".format(clf.score(X_train, Y_train)))
print("Accuracy on test set: {:.2f}".format(clf.score(X_test, Y_test)))
[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done  16 tasks      | elapsed:   37.2s
[Parallel(n_jobs=8)]: Done  50 out of  50 | elapsed:  1.8min finished
Accuracy on training set: 1.00
[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done  16 tasks      | elapsed:   15.7s
[Parallel(n_jobs=8)]: Done  50 out of  50 | elapsed:   46.4s finished
Accuracy on test set: 0.88

We can also use classification_report to get other metrics:

In [ ]:
print(classification_report(Y_test, y_pred))
/usr/local/lib/python3.7/dist-packages/sklearn/metrics/_classification.py:1272: UndefinedMetricWarning: Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
              precision    recall  f1-score   support

         1.0       0.96      0.96      0.96    307473
         2.0       0.69      0.74      0.72     72100
         3.0       0.59      0.17      0.27     63589
         4.0       0.48      0.42      0.45     32832
         5.0       0.89      0.96      0.92   1073949
         6.0       0.48      0.18      0.26     13990
         7.0       0.80      0.65      0.72    177055
         8.0       0.94      0.18      0.31       245
        10.0       0.00      0.00      0.00         0

    accuracy                           0.88   1741233
   macro avg       0.65      0.48      0.51   1741233
weighted avg       0.86      0.88      0.86   1741233

One of the main ways to get an overview of the result of the test data prediction is through the confusion matrix.

A Confusion matrix is an N x N matrix used for evaluating the performance of a classification model, where N is the number of target classes. The matrix compares the actual target values with those predicted by the machine learning model. This gives us a holistic view of how well our classification model is performing and what kinds of errors it is making.

In [ ]:
c_matrix = confusion_matrix(Y_test, y_pred)
In [ ]:
names = [ "Water","Trees","Grass","Flooded Vegetation","Crops","Scrub/Shrub","Built Area","Bare Ground","Clouds"]

Let's use matplotlib and seaborn to plot the confusion matrix:

In [ ]:
r1 = pd.DataFrame(data=c_matrix, index= names, columns=names)
fig, ax = plt.subplots(figsize=(16,16))
ax = sns.heatmap(r1, annot=True, annot_kws={"size": 18},fmt='d',cmap="Blues", cbar = False)
#for t in ax.texts: t.set_text(t.get_text() + " %")
ax.tick_params(labelsize=16)
ax.set_yticklabels(names, rotation=45)
ax.set_ylabel('True')
ax.set_xlabel('Predict')
Out[ ]:
Text(0.5, 59.139999999999944, 'Predict')
No description has been provided for this image

After analyzing the metrics, we have to decide if this model is good enough or if we need to improve it. With the model in hand, it is time to predict the LULC map for the 2021 data:

First we convert the NaN values to zeros.

In [ ]:
Predict_img = np.nan_to_num(im_after)

We flatten the image, resulting in a vector of pixels for each spectral band:

In [ ]:
Predict_img = Predict_img.reshape(im_after.shape[0]*im_after.shape[1],im_after.shape[2])

And we use our trained model to make the inference:

In [ ]:
array_pred=clf.predict(Predict_img)
[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done  16 tasks      | elapsed:   32.4s
[Parallel(n_jobs=8)]: Done  50 out of  50 | elapsed:  1.5min finished

Now we add 0 to values that were no-data in the original image for the resulting LULC:

In [ ]:
array_pred = np.where(Predict_img[:,0] == 0,0,array_pred)

Let's convert data to integers:

In [ ]:
array_pred = array_pred.astype('uint8')

Finally we return the original shape of the image:

In [ ]:
array_pred = array_pred.reshape(im_after.shape[0],im_after.shape[1])

We can plot the resulting image with some chosen colormap:

In [ ]:
plt.figure(figsize=[14,10])
plt.imshow(array_pred,cmap='tab20c')
plt.axis('off')
Out[ ]:
(-0.5, 3700.5, 1570.5, -0.5)
No description has been provided for this image

If necessary, you can export the image in a .tiff file:

In [ ]:
export_image = array_pred[np.newaxis,:,:]
In [ ]:
out_meta.update({"driver": "GTiff",
                  "height": export_image.shape[1],
                  "width": export_image.shape[2],
                  "compress":'lzw',
                  "count":1
                  })
In [ ]:
 with rasterio.open('/content/Netherlands_2021_LULC.tif', "w", **out_meta) as dest:
     dest.write(export_image)

OBIA – Object-Based Image Analysis (GEOBIA) Classification¶

Object – based image analysis (OBIA), a technique used to analyze digital imagery, was developed relatively recently compared to traditional pixel-based image analysis (Burnett and Blaschke 2003). While pixel-based image analysis is based on the information in each pixel, object-based image analysis is based on information from a set of similar pixels called objects or image objects. More specifically, image objects are groups of pixels that are similar to one another based on a measure of spectral properties (i.e., color), size, shape, and texture, as well as context from a neighborhood surrounding the pixels.

OBIA does not analyze a single pixel, but rather a homogeneous group of pixels — image objects. An object, contrary to a pixel, provides richer information, including spectrum, texture, shape, spatial relationships, and ancillary spatial data. In turn, spatial context can be exploited to emulate a human analyst, who intuitively identifies various objects in an image, rather than individual pixels, by considering various properties, such as size, texture, shape, and the spatial arrangements of these objects to understand the semantics. The fundamental objective of OBIA, therefore, is to use segmentation to reduce complexity of image data, and to use the extracted image objects and their corresponding attributes (features) to develop strategies for thematic classification. With OBIA, images are segmented to make image objects from various groups of pixels to represent meaningful objects in the scene. Ontologically, this provides more accurate and reliable identification and extraction of real-world features and at more appropriate scales. Moreover, it provides the opportunity to divide map data into homogenous objects of various spatial scales.

image.png

Here in this example, we are going to use the 2020 Sentinel 2 image of our study area to perform the OBIA classification. First we will segment the image, collect sample points from predefined classes, extract some features from the objects, train the classifier and finally perform the inference for all image segments.

Import the Image¶

We start by importing the image and generating an RGB image so that we can plot with matplotlib:

In [ ]:
path_before = '/content/drive/MyDrive/Datasets_CV/Netherlands_2020.tif'
In [ ]:
src = rasterio.open(path_before)
im_before = src.read()
In [ ]:
im_before = im_before.transpose([1,2,0])
In [ ]:
red = im_before[:,:,2]
green = im_before[:,:,1]
blue = im_before[:,:,0]
In [ ]:
rgb = np.dstack((red, green, blue))
plt.figure(figsize=[12,12])
plt.imshow(rgb)
plt.axis('off')
Out[ ]:
(-0.5, 3700.5, 1570.5, -0.5)
No description has been provided for this image

Segment the Image¶

As we saw in previous lessons, we are going to perform the image segmentation. First we convert the RGB image to float:

In [ ]:
from skimage.segmentation import felzenszwalb, slic, quickshift, watershed
from skimage.segmentation import mark_boundaries
from skimage.util import img_as_float
In [ ]:
img = img_as_float(rgb)

Now let's remove possible NaN values:

In [ ]:
img = np.nan_to_num(img)

And then we use the quickshift algorithm to segment the image.

In [ ]:
segments_quick = quickshift(img, kernel_size=3, max_dist=50, ratio=0.9)
In [ ]:
plt.figure(figsize=[20,14])
plt.imshow(mark_boundaries(img, segments_quick))
plt.axis('off')
Out[ ]:
(-0.5, 3700.5, 1570.5, -0.5)
No description has been provided for this image

After generating the boundaries of each segment, we will transform these objects into georeferenced polygons and create a GeoDataFrame with the help of the Shapely and Geopandas libraries:

In [ ]:
segments_quick = segments_quick.astype('uint16')
In [ ]:
from rasterio import features
from shapely.geometry import shape
import geopandas as gpd
import shapely
In [ ]:
find_shapes = features.shapes(segments_quick, transform=src.transform)
In [ ]:
polygons = [shapely.geometry.Polygon(shape[0]["coordinates"][0]) for shape in find_shapes]
In [ ]:
crs = {'init': 'epsg:4326'}
Poly_gdf = gpd.GeoDataFrame(crs=crs, geometry=polygons)
/usr/local/lib/python3.7/dist-packages/pyproj/crs/crs.py:68: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
  return _prepare_from_string(" ".join(pjargs))

Here we can see our GeoDataFrame:

In [ ]:
Poly_gdf
Out[ ]:
geometry
0 POLYGON ((5.89474 52.79327, 5.89474 52.79309, ...
1 POLYGON ((5.91666 52.79327, 5.91666 52.79309, ...
2 POLYGON ((5.96966 52.79327, 5.96966 52.79309, ...
3 POLYGON ((6.10567 52.79309, 6.10567 52.79291, ...
4 POLYGON ((5.90337 52.79273, 5.90337 52.79255, ...
... ...
15761 POLYGON ((6.12004 52.51533, 6.12004 52.51461, ...
15762 POLYGON ((6.13244 52.51767, 6.13244 52.51731, ...
15763 POLYGON ((6.15166 52.51857, 6.15166 52.51839, ...
15764 POLYGON ((6.16640 52.51893, 6.16640 52.51875, ...
15765 POLYGON ((6.17915 52.51695, 6.17915 52.51677, ...

15766 rows × 1 columns

We were also able to plot the boundaries of each polygon:

In [ ]:
Poly_gdf.boundary.plot(figsize=(20,14))
Out[ ]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f83d97767d0>
No description has been provided for this image

Import the Samples¶

Now let's open the shapefile with the collected points representing the classes: Water, Trees, Grass, Crops and Builts:

In [ ]:
samples = gpd.read_file('/content/drive/MyDrive/Datasets_CV/samples/Samples.shp')

The classes received values from 1 to 5:

In [ ]:
samples
Out[ ]:
class geometry
0 1 POINT (5.59104 52.63066)
1 1 POINT (5.57628 52.67148)
2 1 POINT (5.56547 52.62878)
3 1 POINT (5.53530 52.59357)
4 1 POINT (5.53427 52.65651)
... ... ...
194 5 POINT (6.11477 52.51564)
195 5 POINT (6.09777 52.51888)
196 5 POINT (6.09417 52.51303)
197 5 POINT (6.13605 52.52358)
198 5 POINT (6.16524 52.53371)

199 rows × 2 columns

Extract the Samples Segments¶

As our image was divided into segments, let's find the segments that intersect the samples points collected:

In [ ]:
poly_geometry = []
poly_id = []
poly_class = []
for p, point in samples.iterrows():
  for i, poly in Poly_gdf.iterrows():
    if (point['geometry'].within(poly['geometry'])):
      print(p,i)
      poly_geometry.append(poly['geometry'])
      poly_id.append(i)
      poly_class.append(point['class'])
0 9545
1 7213
2 9548
3 11315
4 8468
5 10599
6 10839
7 4064
8 3163
9 10790
10 9711
11 9253
12 8533
13 5736
14 5059
15 6211
16 6052
17 3901
18 3851
19 12411
20 13415
21 11755
22 11849
23 11838
24 11463
25 7496
26 8792
27 7134
28 7193
29 7570
30 6772
31 6299
32 6340
33 5713
34 5519
35 6003
36 5205
37 7014
38 7536
39 5650
40 5272
41 4796
42 5526
43 1683
44 1684
45 1095
46 909
47 874
48 874
49 635
50 1693
51 237
52 586
53 365
54 615
55 689
56 805
57 1633
58 3605
59 3571
60 6411
61 6682
62 7337
63 7389
64 7463
65 7036
66 7251
67 7558
68 7578
69 7802
70 8455
71 8604
72 6772
73 6340
74 6767
75 6429
76 6429
77 5794
78 5794
79 5309
80 5309
81 5285
82 4977
83 4977
84 4796
85 4490
86 4533
87 5045
88 5550
89 5388
90 7299
91 6909
92 7299
93 6679
94 7794
95 7527
96 7115
97 6863
98 6765
99 6699
100 6596
101 10843
102 11010
103 10965
104 10807
105 11640
106 4231
107 4163
108 4435
109 3843
110 3615
111 6050
112 5643
113 5518
114 5647
115 5553
116 6060
117 7296
118 7532
119 6859
120 7215
121 5776
122 5367
123 5217
124 4606
125 2823
126 9240
127 9027
128 9151
129 9538
130 9184
131 8666
132 7346
133 6740
134 6013
135 6016
136 6205
137 8454
138 8370
139 8676
140 7052
141 4877
142 3546
143 3423
144 2421
145 1732
146 1594
147 1858
148 1419
149 2291
150 4761
151 5945
152 5900
153 5583
154 10009
155 10452
156 10009
157 8958
158 8674
159 5537
160 5379
161 4522
162 4347
163 4492
164 2288
165 2728
166 2682
167 2321
168 900
169 1113
170 900
171 881
172 12963
173 12411
174 11911
175 13078
176 13177
177 13304
178 12724
179 12872
180 7729
181 7322
182 7381
183 7663
184 8499
185 8077
186 14911
187 15034
188 14752
189 15414
190 15597
191 14540
192 14144
193 13934
194 15520
195 15358
196 15757
197 15281
198 14590

So we can create a dataframe containing the segment id and the class and then join it with the polygon geometry in a new GeoDataFrame:

In [ ]:
id_class = np.stack((poly_id,poly_class), axis=1)
In [ ]:
id_class.shape
Out[ ]:
(199, 2)
In [ ]:
df = pd.DataFrame(id_class,columns=['id','class'])
In [ ]:
dataset = gpd.GeoDataFrame(df, geometry=poly_geometry)

After that, let's remove the duplicate segments:

In [ ]:
dataset = dataset[~dataset['id'].duplicated()]
In [ ]:
dataset
Out[ ]:
id class geometry
0 9545 1 POLYGON ((5.58932 52.63966, 5.58932 52.63948, ...
1 7213 1 POLYGON ((5.58087 52.67936, 5.58087 52.67918, ...
2 9548 1 POLYGON ((5.56955 52.63607, 5.56955 52.63589, ...
3 11315 1 POLYGON ((5.53129 52.59564, 5.53129 52.59546, ...
4 8468 1 POLYGON ((5.53003 52.65960, 5.53003 52.65942, ...
... ... ... ...
194 15520 5 POLYGON ((6.11699 52.51857, 6.11699 52.51803, ...
195 15358 5 POLYGON ((6.09758 52.52090, 6.09758 52.52054, ...
196 15757 5 POLYGON ((6.08788 52.51605, 6.08788 52.51587, ...
197 15281 5 POLYGON ((6.14034 52.52431, 6.14034 52.52396, ...
198 14590 5 POLYGON ((6.16765 52.53869, 6.16765 52.53851, ...

187 rows × 3 columns

Extract features from the Training Image¶

Let's now extract the image features from each selected polygon. Here in this example, we're just going to get the mean of each spectral band. We'll also select the GLCM texture features.

In [ ]:
json_dataset = json.loads(dataset.to_json())['features']
In [ ]:
X = []
Y = []
for i in range(len(json_dataset)):
  coords = json_dataset[i]['geometry']
  out_image, out_transform = rasterio.mask.mask(src, [coords], crop=True)
  out_image = out_image.transpose([1,2,0])
  out_image = np.nan_to_num(out_image)

  B2_mean = out_image[:,:,0][np.nonzero(out_image[:,:,0])].mean()
  B3_mean = out_image[:,:,1][np.nonzero(out_image[:,:,1])].mean()
  B4_mean = out_image[:,:,2][np.nonzero(out_image[:,:,2])].mean()
  B5_mean = out_image[:,:,3][np.nonzero(out_image[:,:,3])].mean()
  B6_mean = out_image[:,:,4][np.nonzero(out_image[:,:,4])].mean()
  B7_mean = out_image[:,:,5][np.nonzero(out_image[:,:,5])].mean()
  B8_mean = out_image[:,:,6][np.nonzero(out_image[:,:,6])].mean()
  B8A_mean = out_image[:,:,7][np.nonzero(out_image[:,:,7])].mean()
  B11_mean = out_image[:,:,8][np.nonzero(out_image[:,:,8])].mean()
  B12_mean = out_image[:,:,9][np.nonzero(out_image[:,:,9])].mean()

  norm_image = cv2.normalize(out_image[:,:,8], None, alpha = 0, beta = 255, norm_type = cv2.NORM_MINMAX, dtype = cv2.CV_8U)
  distances = [1]
  angles = [ np.pi/2]
  glcm = graycomatrix(norm_image,
                                distances=distances,
                                angles=angles,
                                symmetric=True,
                                normed=True,
                                levels=256)
  contrast = np.array(graycoprops(glcm, 'contrast')).ravel()[0]
  energy = np.array(graycoprops(glcm, 'energy')).ravel()[0]
  homogeneity = np.array(graycoprops(glcm, 'homogeneity')).ravel()[0]
  correlation = np.array(graycoprops(glcm, 'correlation')).ravel()[0]
  dissimilarity = np.array(graycoprops(glcm, 'dissimilarity')).ravel()[0]
  ASM = np.array(graycoprops(glcm,'ASM')).ravel()[0]

  X.append([B2_mean,B3_mean,B4_mean,B5_mean,B6_mean,B7_mean,B8_mean,B8A_mean,B11_mean,B12_mean,contrast,energy,homogeneity,correlation,dissimilarity,ASM])
  Y.append(json_dataset[i]['properties']['class'])

So we have our dataset with the features for each selected polygon and the target:

In [ ]:
X = np.array(X)
Y = np.array(Y)

Training the Model¶

For ML model training, let's perform the same way we did for pixel classification:

In [ ]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.3, random_state = 42)
In [ ]:
clf=RandomForestClassifier(n_estimators=50, n_jobs=8, verbose=3)
In [ ]:
clf.fit(X_train,Y_train)
[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done  16 tasks      | elapsed:    0.1s
[Parallel(n_jobs=8)]: Done  50 out of  50 | elapsed:    0.1s finished
building tree 1 of 50
building tree 2 of 50
building tree 3 of 50
building tree 4 of 50
building tree 5 of 50
building tree 6 of 50
building tree 7 of 50
building tree 8 of 50
building tree 9 of 50
building tree 10 of 50
building tree 11 of 50building tree 12 of 50

building tree 13 of 50
building tree 14 of 50
building tree 15 of 50
building tree 16 of 50
building tree 17 of 50
building tree 18 of 50
building tree 19 of 50
building tree 20 of 50
building tree 21 of 50
building tree 22 of 50
building tree 23 of 50building tree 24 of 50
building tree 25 of 50

building tree 26 of 50
building tree 27 of 50
building tree 28 of 50
building tree 29 of 50
building tree 30 of 50
building tree 31 of 50
building tree 32 of 50
building tree 33 of 50
building tree 34 of 50
building tree 35 of 50
building tree 36 of 50
building tree 37 of 50building tree 38 of 50

building tree 39 of 50
building tree 40 of 50
building tree 41 of 50
building tree 42 of 50
building tree 43 of 50
building tree 44 of 50
building tree 45 of 50
building tree 46 of 50building tree 47 of 50building tree 48 of 50

building tree 49 of 50

building tree 50 of 50
Out[ ]:
RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                       criterion='gini', max_depth=None, max_features='auto',
                       max_leaf_nodes=None, max_samples=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=50, n_jobs=8,
                       oob_score=False, random_state=None, verbose=3,
                       warm_start=False)
In [ ]:
y_pred=clf.predict(X_test)
[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done  16 tasks      | elapsed:    0.0s
[Parallel(n_jobs=8)]: Done  50 out of  50 | elapsed:    0.0s finished
In [ ]:
print("Accuracy on training set: {:.2f}".format(clf.score(X_train, Y_train)))
print("Accuracy on test set: {:.2f}".format(clf.score(X_test, Y_test)))
[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done  16 tasks      | elapsed:    0.0s
[Parallel(n_jobs=8)]: Done  50 out of  50 | elapsed:    0.0s finished
[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done  16 tasks      | elapsed:    0.0s
[Parallel(n_jobs=8)]: Done  50 out of  50 | elapsed:    0.0s finished
Accuracy on training set: 1.00
Accuracy on test set: 0.75
In [ ]:
print(classification_report(Y_test, y_pred))
              precision    recall  f1-score   support

           1       0.78      0.88      0.82        16
           2       0.78      0.88      0.82         8
           3       0.50      0.33      0.40         6
           4       0.73      0.92      0.81        12
           5       0.82      0.60      0.69        15

    accuracy                           0.75        57
   macro avg       0.72      0.72      0.71        57
weighted avg       0.75      0.75      0.74        57

Predict Full Image¶

The last step is to perform the inference of all image segments using the trained model. Let's extract the features, but now for all segments:

In [ ]:
json_dataset_full = json.loads(Poly_gdf.to_json())['features']
In [ ]:
X_pred = []
id_pred = []
for i in range(len(json_dataset_full)):
  coords = json_dataset_full[i]['geometry']
  out_image, out_transform = rasterio.mask.mask(src, [coords], crop=True)
  out_image = out_image.transpose([1,2,0])
  out_image = np.nan_to_num(out_image)

  B2_mean = out_image[:,:,0][np.nonzero(out_image[:,:,0])].mean()
  B3_mean = out_image[:,:,1][np.nonzero(out_image[:,:,1])].mean()
  B4_mean = out_image[:,:,2][np.nonzero(out_image[:,:,2])].mean()
  B5_mean = out_image[:,:,3][np.nonzero(out_image[:,:,3])].mean()
  B6_mean = out_image[:,:,4][np.nonzero(out_image[:,:,4])].mean()
  B7_mean = out_image[:,:,5][np.nonzero(out_image[:,:,5])].mean()
  B8_mean = out_image[:,:,6][np.nonzero(out_image[:,:,6])].mean()
  B8A_mean = out_image[:,:,7][np.nonzero(out_image[:,:,7])].mean()
  B11_mean = out_image[:,:,8][np.nonzero(out_image[:,:,8])].mean()
  B12_mean = out_image[:,:,9][np.nonzero(out_image[:,:,9])].mean()

  norm_image = cv2.normalize(out_image[:,:,8], None, alpha = 0, beta = 255, norm_type = cv2.NORM_MINMAX, dtype = cv2.CV_8U)
  distances = [1]
  angles = [ np.pi/2]
  glcm = graycomatrix(norm_image,
                                distances=distances,
                                angles=angles,
                                symmetric=True,
                                normed=True,
                                levels=256)
  contrast = np.array(graycoprops(glcm, 'contrast')).ravel()[0]
  energy = np.array(graycoprops(glcm, 'energy')).ravel()[0]
  homogeneity = np.array(graycoprops(glcm, 'homogeneity')).ravel()[0]
  correlation = np.array(graycoprops(glcm, 'correlation')).ravel()[0]
  dissimilarity = np.array(graycoprops(glcm, 'dissimilarity')).ravel()[0]
  ASM = np.array(graycoprops(glcm,'ASM')).ravel()[0]

  X_pred.append([B2_mean,B3_mean,B4_mean,B5_mean,B6_mean,B7_mean,B8_mean,B8A_mean,B11_mean,B12_mean,contrast,energy,homogeneity,correlation,dissimilarity,ASM])
  id_pred.append(json_dataset_full[i]['id'])
In [ ]:
X_pred = np.array(X_pred)
ID = np.array(id_pred)
In [ ]:
X_pred = np.nan_to_num(X_pred)
In [ ]:
img_pred=clf.predict(X_pred)
[Parallel(n_jobs=8)]: Using backend ThreadingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done  16 tasks      | elapsed:    0.0s
[Parallel(n_jobs=8)]: Done  50 out of  50 | elapsed:    0.1s finished
In [ ]:
ID = ID.astype('uint16')

After performing the inference, let's set the predicted class for the image segments:

In [ ]:
Poly_gdf['class'] = img_pred

We also map the class names according to the numbers used:

In [ ]:
names = { 1:'Water',
          2:'Trees',
          3:'Grass',
          4:'Crops',
          5:'Built'}
In [ ]:
Poly_gdf['class_name'] = Poly_gdf['class'].replace(names)

Let's plot the result of the classified image by adding a legend:

In [ ]:
fig, ax = plt.subplots(1, figsize=(14,8))
Poly_gdf.plot(column='class_name', categorical=True, cmap='Spectral', linewidth=.6, edgecolor='0.2',
         legend=True, legend_kwds={'bbox_to_anchor':(1, 0.5),'loc':'upper left','fontsize':16,'frameon':False}, ax=ax)
ax.axis('off')
ax.set_title('Land Cover Classification Netherlands',fontsize=20)
plt.tight_layout()
No description has been provided for this image

Image Classification with HOG and SVM¶

Another task that can be performed with satellite images is image classification, where each image is classified and not just its pixels. Let's take for example this Kaggle Dataset which contains images of Ships and Non-Ships:

image.png

Let's start by importing some libraries that we'll use:

In [ ]:
import json
from skimage import color
from skimage.feature import hog
from sklearn import svm

Let's open the file containing the image data and their respective labels:

In [ ]:
path = open(r'/content/drive/MyDrive/Datasets_CV/Ship_Dataset/shipsnet.json')
ships_pahts = json.load(path)
path.close()
In [ ]:
ships_pahts.keys()
Out[ ]:
dict_keys(['data', 'labels', 'locations', 'scene_ids'])

Let's get the images, reshape them and plot an example:

In [ ]:
Image_dataset = np.array(ships_pahts['data']).astype('uint8')
In [ ]:
img_length = 80
Image_dataset = Image_dataset.reshape(-1,3,img_length,img_length).transpose([0,2,3,1])
In [ ]:
plt.imshow(Image_dataset[500])
Out[ ]:
<matplotlib.image.AxesImage at 0x7f83bc37d090>
No description has been provided for this image

We will also get the labels:

In [ ]:
Y = np.array(ships_pahts['labels']).reshape(len(ships_pahts['labels']),1)
In [ ]:
np.unique(Y,return_counts=True)
Out[ ]:
(array([0, 1]), array([3000, 1000]))

As each image in our dataset contains a huge range of spectral values representing ships that don't follow a specific pattern, it would be useless to perform classification with these values. But as we saw in previous lessons, we can get some features like shapes present in the image, using the HOG.

First we convert the RGB image to grayscale:

In [ ]:
X_gray = [color.rgb2gray(i) for i in Image_dataset]

Then we create the dataset with the HOG of each image:

In [ ]:
hog_images = []
hog_features = []
for image in X_gray:
    fd,hog_image = hog(image, orientations=8, pixels_per_cell=(16,16),cells_per_block=(4, 4), visualize=True)
    hog_images.append(hog_image)
    hog_features.append(fd)

Let's plot an example:

In [ ]:
plt.imshow(hog_images[500])
Out[ ]:
<matplotlib.image.AxesImage at 0x7f83bfb2b610>
No description has been provided for this image

Now we can convert the HOG image into a feature vector and train an SVM to perform image classification:

In [ ]:
X = np.array(hog_images)
In [ ]:
X = X.reshape(X.shape[0],X.shape[1]*X.shape[2])
In [ ]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.3, random_state = 42)
In [ ]:
clf = svm.SVC()
In [ ]:
clf.fit(X_train,Y_train)
/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py:760: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)
Out[ ]:
SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='scale', kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)
In [ ]:
y_pred=clf.predict(X_test)

So we can check our model's evaluation metrics:

In [ ]:
print(classification_report(Y_test, y_pred))
              precision    recall  f1-score   support

           0       0.99      0.99      0.99       885
           1       0.98      0.97      0.98       315

    accuracy                           0.99      1200
   macro avg       0.99      0.98      0.98      1200
weighted avg       0.99      0.99      0.99      1200

References:

https://www.researchgate.net/publication/261551597_Machine_learning_models_for_geospatial_data/figures?lo=1

http://www.sc.chula.ac.th/courseware/2309507/Lecture/remote18.htm

https://www.simplilearn.com/10-algorithms-machine-learning-engineers-need-to-know-article

https://www.pcigeomatics.com/pdf/geomatica/techspecs/2017/Object-Analyst.pdf

https://wiki.landscapetoolbox.org/doku.php/remote_sensing_methods:object-based_classification

https://towardsdatascience.com/object-based-land-cover-classification-with-python-cbe54e9c9e24

https://medium.com/@mgalkissa/classifying-ships-in-satellite-imagery-using-hogs-ee2102269c14