Mapping vegetation and weeds with artificial neural networks¶
First, let's install the raster and connect the Drive:
!pip install rasterio
Collecting rasterio Downloading rasterio-1.3.10-cp310-cp310-manylinux2014_x86_64.whl.metadata (14 kB) Collecting affine (from rasterio) Downloading affine-2.4.0-py3-none-any.whl.metadata (4.0 kB) Requirement already satisfied: attrs in /usr/local/lib/python3.10/dist-packages (from rasterio) (23.2.0) Requirement already satisfied: certifi in /usr/local/lib/python3.10/dist-packages (from rasterio) (2024.7.4) Requirement already satisfied: click>=4.0 in /usr/local/lib/python3.10/dist-packages (from rasterio) (8.1.7) Requirement already satisfied: cligj>=0.5 in /usr/local/lib/python3.10/dist-packages (from rasterio) (0.7.2) Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from rasterio) (1.26.4) Collecting snuggs>=1.4.1 (from rasterio) Downloading snuggs-1.4.7-py3-none-any.whl.metadata (3.4 kB) Requirement already satisfied: click-plugins in /usr/local/lib/python3.10/dist-packages (from rasterio) (1.1.1) Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (from rasterio) (71.0.4) Requirement already satisfied: pyparsing>=2.1.6 in /usr/local/lib/python3.10/dist-packages (from snuggs>=1.4.1->rasterio) (3.1.2) Downloading rasterio-1.3.10-cp310-cp310-manylinux2014_x86_64.whl (21.5 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 21.5/21.5 MB 24.6 MB/s eta 0:00:00 Downloading snuggs-1.4.7-py3-none-any.whl (5.4 kB) Downloading affine-2.4.0-py3-none-any.whl (15 kB) Installing collected packages: snuggs, affine, rasterio Successfully installed affine-2.4.0 rasterio-1.3.10 snuggs-1.4.7
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
Now, let's import the libraries:
import os
import cv2
import matplotlib.pyplot as plt
import numpy as np
import rasterio
from rasterio.windows import Window
import pandas as pd
import geopandas as gpd
from pylab import rcParams
import matplotlib
rcParams['figure.figsize'] = 18, 16
from sklearn.model_selection import train_test_split
from rasterio.plot import show
from shapely.geometry import box
import seaborn as sns
The image and shape files to be used are stored in Drive. A point file was collected for each class we have.
path_img = '/content/drive/MyDrive/Datasets/Pinas/AOI_img.tif'
path_classe1 = '/content/drive/MyDrive/Datasets/Pinas/Solo.shp'
path_classe2 = '/content/drive/MyDrive/Datasets/Pinas/Veg.shp'
path_classe3 = '/content/drive/MyDrive/Datasets/Pinas/Invasoras.shp'
gdf1 = gpd.read_file(path_classe1)
gdf2 = gpd.read_file(path_classe2)
gdf3 = gpd.read_file(path_classe3)
fig, ax = plt.subplots(figsize=(20, 20))
with rasterio.open(path_img) as src:
gdf1 = gdf1.to_crs(src.crs.to_dict())
gdf2 = gdf2.to_crs(src.crs.to_dict())
gdf3 = gdf3.to_crs(src.crs.to_dict())
show(src,ax=ax)
gdf1.plot(ax=ax, color='red')
gdf2.plot(ax=ax, color='green')
gdf3.plot(ax=ax, color='yellow')
/usr/local/lib/python3.10/dist-packages/pyproj/crs/crs.py:141: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6 in_crs_string = _prepare_from_proj_string(in_crs_string)
Now we add an id to each of the classes:
gdf1['id'] = 0
gdf2['id'] = 1
gdf3['id'] = 2
Let's concatenate the point geodataframes and extract the spectral values of the image at each point:
gdf = pd.concat([gdf1,gdf2,gdf3], axis=0)
gdf
| id | geometry | |
|---|---|---|
| 0 | 0 | POINT (800522.754 1148937.597) |
| 1 | 0 | POINT (800541.974 1148936.659) |
| 2 | 0 | POINT (800559.084 1148935.253) |
| 3 | 0 | POINT (800579.593 1148934.315) |
| 4 | 0 | POINT (800509.628 1148938.182) |
| ... | ... | ... |
| 147 | 2 | POINT (800469.285 1148898.636) |
| 148 | 2 | POINT (800469.285 1148898.375) |
| 149 | 2 | POINT (800469.629 1148897.139) |
| 150 | 2 | POINT (800470.038 1148896.720) |
| 151 | 2 | POINT (800470.503 1148896.274) |
431 rows × 2 columns
coord_list = [(x, y) for x, y in zip(gdf.geometry.x, gdf.geometry.y)]
Values_list = []
Column_list = []
with rasterio.open(path_img) as src:
Values = [x for x in src.sample(coord_list)]
Values_list.append(Values)
X = np.array(Values_list)
X = X[0].copy()
X = X[:,0:3].copy()
X.shape
(431, 3)
We have the variable X with the spectral data and the variable Y with the reference data:
Y = gdf['id'].values
Y = Y[:,np.newaxis]
Y
array([[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[1],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2],
[2]])
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.metrics import cohen_kappa_score
from sklearn.metrics import confusion_matrix
Since we are working with categorical data, we need to encode categorical values into binary values so that they are compatible with the expected results of a neural network:
enc = OneHotEncoder()
enc.fit(Y)
Y = enc.transform(Y).toarray()
Y.shape
(431, 3)
So, we split the data into training and testing:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.3, random_state = 42)
input_shape = (X_train.shape[1:])
num_classes = len(np.unique(gdf['id'].values))
input_shape
(3,)
num_classes
3
Let's build our neural network by adding the dense layers:
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
model = Sequential()
model.add(Dense(256, input_shape=input_shape, activation='relu'))
model.add(Dense(128, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
And finally, train the model for 400 iterations:
history = model.fit(X_train, Y_train, epochs=400, batch_size=250, verbose=1, validation_split=0.25)
fig, ax = plt.subplots(1,2, figsize=(16,8))
ax[0].plot(history.history['loss'], color='b', label="Training loss")
ax[0].plot(history.history['val_loss'], color='r', label="validation loss",axes =ax[0])
legend = ax[0].legend(loc='best', shadow=True)
ax[1].plot(history.history['accuracy'], color='b', label="Training accuracy")
ax[1].plot(history.history['val_accuracy'], color='r',label="Validation accuracy")
legend = ax[1].legend(loc='best', shadow=True)
Let's look at some model evaluation metrics:
score = model.evaluate(X_test, Y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
Test loss: 0.20202559232711792 Test accuracy: 0.9230769276618958
y_pred = model.predict(X_test)
5/5 [==============================] - 0s 2ms/step
y_pred_res = np.argmax(y_pred, axis=1)
Y_test_res = np.argmax(Y_test, axis=1)
print(classification_report(Y_test_res, y_pred_res))
precision recall f1-score support
0 1.00 1.00 1.00 49
1 0.85 0.89 0.87 38
2 0.90 0.86 0.88 43
accuracy 0.92 130
macro avg 0.92 0.92 0.92 130
weighted avg 0.92 0.92 0.92 130
c_matrix = confusion_matrix(Y_test_res, y_pred_res)
names = ['Solo','Vegetação','Daninhas']
r1 = pd.DataFrame(data=c_matrix, index= names, columns=names)
fig, ax = plt.subplots(figsize=(8,8))
ax = sns.heatmap(r1, annot=True, annot_kws={"size": 18},fmt='d',cmap="Blues", cbar = False)
ax.tick_params(labelsize=16)
ax.set_yticklabels(names, rotation=45)
ax.set_ylabel('True')
ax.set_xlabel('Predict')
Text(0.5, 58.7222222222222, 'Predict')
After training and validating our AI model, we will now apply it to the full image, thus generating a map with the predictions:
src = rasterio.open(path_img)
img = src.read()
img = img.transpose([1,2,0])
img_size = (img.shape[0] , img.shape[1])
img = img.reshape(img.shape[0] * img.shape[1], img.shape[2])
img.shape
(51650802, 4)
After opening the image, let's create a data frame with the spectral information and the alpha band:
df = pd.DataFrame(img, columns=['R','G','B','Mask'])
df
| R | G | B | Mask | |
|---|---|---|---|---|
| 0 | 164 | 130 | 109 | 255 |
| 1 | 162 | 128 | 106 | 255 |
| 2 | 161 | 127 | 105 | 255 |
| 3 | 160 | 127 | 106 | 255 |
| 4 | 157 | 124 | 102 | 255 |
| ... | ... | ... | ... | ... |
| 51650797 | 0 | 0 | 0 | 0 |
| 51650798 | 0 | 0 | 0 | 0 |
| 51650799 | 0 | 0 | 0 | 0 |
| 51650800 | 0 | 0 | 0 | 0 |
| 51650801 | 0 | 0 | 0 | 0 |
51650802 rows × 4 columns
del img, src
We will remove invalid values using the alpha band:
df_to_pred = df[df['Mask'] == 255].copy()
values_to_pred = df_to_pred.values[:,0:3]
df_to_pred.drop(columns={'R', 'G', 'B'}, inplace = True)
df.drop(columns={'R', 'G', 'B'}, inplace = True)
Now we apply the model to the values to obtain the predictions:
pred = model.predict(values_to_pred)
1251127/1251127 [==============================] - 1999s 2ms/step
With the predicted values, we join the entire image dataframe by the index:
pred = np.argmax(pred, axis=1).copy()
df_to_pred['pred'] = pred
del pred, values_to_pred, model
df = pd.merge(df,df_to_pred, how='left', left_index=True, right_index=True)
del df_to_pred
So we have the predictions and we can convert them to the original image size:
values_to_export = df['pred'].values
del df
classify = values_to_export.reshape(img_size)
export_image = classify[np.newaxis,:,:]
Finally we save the predicted image data with georeferencing of the RGB image used:
export_image.dtype
dtype('float64')
src = rasterio.open(path_img)
out_meta = src.meta.copy()
out_meta.update({"driver": "GTiff",
"height": export_image.shape[1],
"width": export_image.shape[2],
"compress":'lzw',
"nodata": np.nan,
"dtype": 'float64',
"count":1
})
with rasterio.open('/content/mapa.tif', "w", **out_meta) as dest:
dest.write(export_image)