Object Detection¶
Introduction to Object Detection¶
Object detection is a computer vision technique for locating instances of objects in images or videos. Object detection algorithms typically leverage machine learning or deep learning to produce meaningful results. When humans look at images or videos, we can recognize and locate objects of interest in a matter of moments. The goal of object detection is to replicate this intelligence using a computer.
Image Classification vs. Object Detection¶
Object detection is often confused with image recognition, so before we move on, it’s important to clarify the distinctions between them.
Image recognition assigns a label to an image. A photo of a dog is labeled “dog.” A photo of two dogs is still labeled “dog.” Object detection, on the other hand, draws a box around each dog and labels the box “dog.” The model predicts where each object is and what label should be applied. In this way, object detection provides more information about an image than recognition.
Here’s an example of what this distinction looks like in practice:
Basic structure¶
Object detection locates the presence of an object in an image and draws a bounding box around that object. This usually involves two processes; classifying and object type and then drawing a box around that object. We have covered image classification before, so let’s now review some of the common model architectures used for object detection:
- R-CNN
- Fast R-CNN
- Faster R-CNN
- SSD (Single Shot MultiBox Defender)
- YOLO (You Only Look Once)
Types of architectures¶
Whether you create a custom object detector or use a pre-trained one, you will need to decide what type of object detection network you want to use: a Two Stage network or a Single Shot network.
Two-Stage Networks
The initial stage of two-stage networks, such as R-CNN and its variants, identifies proposed regions or subsets of the image that may contain an object. The second stage classifies the objects within the proposed regions. Two-stage networks can achieve very accurate object detection results; however, they are typically slower than single-stage networks.
Single-Stage Networks
In single-stage networks like YOLO v2, the CNN produces network predictions for regions across the image using anchor boxes, and the predictions are decoded to generate the final bounding boxes for the objects. Single-stage networks can be much faster than two-stage networks, but they may not achieve the same level of accuracy, especially for scenes that contain small objects.
Two-Stage Architectures¶
R-CNN¶
The 2014 paper proposes the basic version of the CNN-based two-stage detection algorithm, which is improved and accelerated in the following papers. As depicted in the figure above, the overall pipeline consists of three steps: Generate region proposals: the model must draw object candidates in the image, regardless of category. The second stage is a fully convolutional neural network that computes features of each candidate region. The final stage is a fully connected layer, expressed as SVMs in the paper.
The problem that R-CNN tries to solve is finding objects in an image (object detection). What do you do to solve this? You can start with a sliding window approach. When using this method, you simply go through the entire image with rectangles of different sizes and look at these smaller images in a brute force method. The problem is that you have a huge number of smaller images to look at. Luckily for us, other smart people have developed algorithms to intelligently select so-called region proposals. To simplify this concept:
- Region proposals are just smaller parts of the original image that we think might contain the objects we are looking for.
Region proposals¶
There are different region proposal algorithms that we can choose from. These are “normal” algorithms that work out of the box. We don’t need to train them or anything. In the case of this work, they use the selective search method to generate region proposals.
The selective search algorithm uses exhaustive search, but instead of using only exhaustive search, it also segments the colors presented in the image. More formally, we can say that selective search is a method that separates objects in an image by giving the object different colors.
This will create almost 2,000 different regions that we will have to examine. This seems like a large number, but it is still very small compared to the brute force sliding window approach.
CNN¶
In the next step, we take each region proposal and create a feature vector representing that image in a much lower dimension using a Convolutional Neural Network (CNN).
They use AlexNet as a feature extractor. Keep in mind that this is 2014 and AlexNet is still state of the art.
One question we need to answer:
If you use AlexNet only as a feature extractor, how do we train it?
Well, this is a fundamental question with this R-CNN system. You cannot train the entire system at once (this will be solved by the fast R-CNN system). Instead, you will need to train each part independently. This means that AlexNet was previously trained on a classification task. After training, they removed the last softmax layer. Now, the last layer is the 4096-dimensional fully connected layer. This means that our features are 4096-dimensional.
Another important thing to keep in mind is that the input to AlexNet is always the same (227, 227, 3). However, the image proposals have different shapes. Many of them are smaller or larger than the required size. Therefore, we will need to resize each region proposal.
To summarize the CNN task:
SVM¶
We have created feature vectors from the image proposals. Now we need to classify these feature vectors. We want to detect which class of object these feature vectors represent. To do this, we use an SVM classifier. We have one SVM for each class of object and we use all of them. This means that for a given feature vector we have n outputs, where n is the number of different objects we want to detect. The output is a confidence score. How confident are we that this particular feature vector represents that class.
What confused me when I first read this article was how we train these different SVMs. Well, we train them on feature vectors created by AlexNet. This means that we have to wait until we have fully trained the CNN before we can train the SVM. The training is not parallelizable. Since we know when training which feature vector represented which class, we can easily train the different SVMs in a supervised manner.
To summarize:
- We create different image proposals from an image.
- Then we create a feature vector from these proposals using CNN.
- Finally, we classify each feature vector with SVMs for each object class.
The output:
We now have image proposals that are classified into each object class. How do we get them all back into the image? We use something called non-maximum greedy suppression. This is a fancy word for the following concept:
We reject a region (image proposal) if it has an intersection-over-union (IoU) overlap with a selected region with a higher score.
We combine each region if there is overlap and take the proposal with the highest score (calculated by the SVM). We do this step for each object class independently. After that, we keep only the regions with a score higher than 0.5.
Bounding Box Regressor (optional)¶
I want to mention the Bounding Box Regressor at the end because it is not a fundamental building block of the R-CNN system. It is a great idea and the authors found that it improves average accuracy by 3%. So how does it work?
When you are training the Bounding Box Regressor, your input is the center, width, and height in pixels of the region proposal and the label is the ground truth bounding box. The goal, as stated in the paper, is:
Our goal is to learn a transformation that maps a proposed box P to a ground truth box G.
Problems with R-CNN¶
- It still takes a long time to train the network, as you would have to classify 2,000 region proposals per image.
- It cannot be implemented in real-time, as it takes about 47 seconds for each test image.
- The selective search algorithm is a fixed algorithm. Therefore, no learning is happening at this stage. This can lead to generating poor candidate region proposals.
Fast R-CNN¶
The same author of the previous paper (R-CNN) addressed some of the drawbacks of R-CNN to build a faster object detection algorithm and called it Fast R-CNN. The approach is similar to the R-CNN algorithm. But instead of feeding region proposals to CNN, we feed the input image to CNN to generate a convolutional feature map. From the convolutional feature map, we identify the region proposals and warp them into squares and using a RoI pooling layer, reshape them into a fixed size so that they can be fed to a fully connected layer. From the RoI feature vector, we use a softmax layer to predict the class of the proposed region and also the offset values for the bounding box.
Aqui está um resumo das principais contribuições:
- Propôs uma nova camada chamada ROI Pooling que extrai vetores de recursos de comprimento igual de todas as propostas (ou seja, ROIs) na mesma imagem.
- Comparado ao R-CNN, que possui vários estágios (geração de proposta de região, extração de recursos e classificação usando SVM), o Faster R-CNN constrói uma rede que possui apenas um único estágio.
- O R-CNN mais rápido compartilha cálculos (ou seja, cálculos de camada convolucional) em todas as propostas (ou seja, ROIs) em vez de fazer os cálculos para cada proposta de forma independente. Isso é feito usando a nova camada ROI Pooling, que torna o Fast R-CNN mais rápido que o R-CNN.
- O Fast R-CNN não armazena em cache os recursos extraídos e, portanto, não precisa de muito armazenamento em disco em comparação com o R-CNN, que precisa de centenas de gigabytes.
- Fast R-CNN é mais preciso do que R-CNN.
Faster R-CNN¶
Both the above algorithms (R-CNN and Fast R-CNN) use selective search to discover region proposals. Selective search is a slow and time-consuming process that affects the network performance. Therefore, Shaoqing Ren et al. came up with an object detection algorithm that eliminates the selective search algorithm and allows the network to learn region proposals.
Similar to Fast R-CNN, the image is provided as an input to a convolutional network that outputs a convolutional feature map. Instead of using the selective search algorithm on the feature map to identify region proposals, a separate network is used to predict region proposals. The predicted region proposals are then reshaped using a RoI pooling layer that is used to classify the image within the proposed region and predict the offset values for the bounding boxes.
The main contributions of Faster R-CNN:
- Proposed Region Proposal Network (RPN), which is a fully convolutional network that generates proposals with multiple scales and aspect ratios. RPN implements the terminology of the neural network with attention to tell object detection (Fast R-CNN) where to look.
- Instead of using image pyramids (i.e., multiple instances of the image, but at different scales) or filter pyramids (i.e., multiple filters with different sizes), this paper introduced the concept of anchor boxes. An anchor box is a reference box of a specific scale and aspect ratio. With multiple reference anchor boxes, multiple scales and aspect ratios exist for a single region. This can be thought of as a pyramid of reference anchor boxes. Each region is then mapped to each reference anchor box, thus detecting objects at different scales and aspect ratios.
- Convolutional computations are shared between RPN and Fast R-CNN. This reduces computational time.
The architecture of Faster R-CNN is shown in the next figure. It consists of 2 modules:
- RPN: For generating region proposals.
- Fast R-CNN: For detecting objects in the proposed regions.
Regional Proposals Network (RPN)¶
R-CNN and Fast R-CNN models rely on the Selective Search algorithm to generate region proposals. Each proposal is fed to a pre-trained CNN for classification. This paper proposes a network called region proposal network (RPN) that can produce region proposals. This has a few advantages:
- Region proposals are now generated using a network that can be trained and customized according to the detection task.
- Since the proposals are generated using a network, it can be trained end-to-end to be customized for the detection task. Thus, it produces better region proposals compared to generic methods like Selective Search and EdgeBoxes.
- RPN processes the image using the same convolutional layers used in the Fast R-CNN detection network. Thus, RPN does not take extra time to produce the proposals compared to algorithms like Selective Search.
- Due to sharing the same convolutional layers, RPN and Fast R-CNN can be merged/unified into a single network. Thus, training is done only once.
RPN works on the output feature map returned from the last convolutional layer shared with Fast R-CNN. This is shown in the next figure. Based on a rectangular window of size nxn, a sliding window passes through the feature map. For each window, multiple candidate region proposals are generated. These proposals are not the final proposals, as they will be filtered based on their “objectivity score”
Solar Panel Detection in High-Resolution Images Using Faster RCNN¶
In this example we will use the Pytorch implementation of FasterRCNN to detect solar panels in high-resolution satellite images.
First, let's connect to Drive:
from google.colab import drive
GDRIVE_ROOT = "/gdrive"
drive.mount(GDRIVE_ROOT)
Mounted at /gdrive
Let's import the necessary packages:
from PIL import Image
import os
import glob
import random
import csv
random.seed(4)
import pandas as pd
import numpy as np
import tqdm
import xml.etree.ElementTree as ET
import pandas as pd
from skimage import io
from skimage.io import imsave
import matplotlib.pyplot as plt
import cv2
import torch
from torch.utils.data import DataLoader, Dataset
import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator
import torchvision.transforms as transforms
Let's define the paths of the images and annotations that will be used in this study:
path_to_images = os.path.join(GDRIVE_ROOT + '/My Drive/', 'Datasets/dataset_solar/images/')
path_to_annotations = os.path.join(GDRIVE_ROOT + '/My Drive/', 'Datasets/dataset_solar/annotations/')
We create the path of the .csv file where we will save the notes later.
annotations_file_path = os.path.join(path_to_annotations, 'annotations.csv')
Let's then generate a list of .xmls where the annotations for each image are:
xmls_paths = os.path.join(path_to_annotations, os.listdir(path_to_annotations)[0])
xml_list = os.listdir(xmls_paths)
So we can import the notes:
xml_list = []
for xml_file in os.listdir(xmls_paths):
tree = ET.parse(os.path.join(xmls_paths,xml_file))
root = tree.getroot()
for member in root.findall('object'):
value = (root.find('filename').text,
int(root.find('size')[0].text),
int(root.find('size')[1].text),
member[0].text,
int(member[4][0].text),
int(member[4][1].text),
int(member[4][2].text),
int(member[4][3].text)
)
xml_list.append(value)
column_name = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax']
xml_df = pd.DataFrame(xml_list, columns=column_name)
xml_df
| filename | width | height | class | xmin | ymin | xmax | ymax | |
|---|---|---|---|---|---|---|---|---|
| 0 | solar_2.JPG | 901 | 791 | solar | 617 | 390 | 708 | 491 |
| 1 | solar_140.JPG | 901 | 791 | solar | 218 | 294 | 322 | 384 |
| 2 | solar_144.JPG | 901 | 791 | solar | 58 | 521 | 117 | 588 |
| 3 | solar_144.JPG | 901 | 791 | solar | 160 | 537 | 242 | 626 |
| 4 | solar_144.JPG | 901 | 791 | solar | 705 | 150 | 757 | 200 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 397 | solar_84.JPG | 901 | 791 | solar | 679 | 124 | 753 | 146 |
| 398 | solar_84.JPG | 901 | 791 | solar | 762 | 123 | 812 | 146 |
| 399 | solar_84.JPG | 901 | 791 | solar | 763 | 158 | 901 | 624 |
| 400 | solar_84.JPG | 901 | 791 | solar | 660 | 125 | 901 | 623 |
| 401 | solar_56.JPG | 901 | 791 | solar | 625 | 342 | 865 | 471 |
402 rows × 8 columns
Let's then plot an image and its annotations as an example:
list_of_images = os.listdir(path_to_images)
i = 6
img = io.imread(os.path.join(path_to_images,list_of_images[i]))
detec = xml_df[xml_df['filename'] == list_of_images[i]]
for i, row in detec.iterrows():
color = (255,0,0)
cv2.rectangle(img, (max(0, row['xmin']), max(0, row['ymin']) , max(0, row['xmax'] - row['xmin']), max(0, row['ymax'] - row['ymin'])), color, 2)
plt.figure(figsize=(16,16))
plt.imshow(img)
plt.axis('off')
plt.show()
Now, let's resize our images and their respective boundbox values:
path_to_images_resize = os.path.join(GDRIVE_ROOT + '/My Drive/', 'Datasets/dataset_solar/resize_images')
if not os.path.isdir(path_to_images_resize):
os.mkdir(path_to_images_resize)
new_xml_df = []
for img_name in list_of_images:
img = io.imread(os.path.join(path_to_images,img_name))
detec = xml_df[xml_df['filename'] == img_name]
y_ = img.shape[0]
x_ = img.shape[1]
print(os.path.join(path_to_images_resize,img_name))
targetSize = 512
x_scale = targetSize / x_
y_scale = targetSize / y_
new_img = cv2.resize(img, (targetSize, targetSize));
new_img = np.array(new_img)
imsave(os.path.join(path_to_images_resize,img_name), new_img)
color = (255,0,0)
for i, row in detec.iterrows():
new_xmin = int(np.round(row['xmin'] * x_scale))
new_xmax = int(np.round(row['xmax'] * x_scale))
new_ymin = int(np.round(row['ymin'] * y_scale))
new_ymax = int(np.round(row['ymax'] * y_scale))
filename = row['filename']
width = targetSize
height = targetSize
classe = row['class']
new_xml_df.append([filename,width,height,classe,new_xmin,new_ymin,new_xmax,new_ymax])
column_name = ['file_name', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax']
new_xml_df = pd.DataFrame(new_xml_df, columns=column_name)
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_86.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_229.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_228.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_85.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_227.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_84.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_82.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_81.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_224.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_80.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_79.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_221.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_78.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_220.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_77.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_76.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_219.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_75.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_217.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_73.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_215.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_71.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_214.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_70.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_69.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_68.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_211.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_67.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_210.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_209.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_66.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_207.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_206.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_63.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_62.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_205.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_204.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_61.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_60.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_202.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_59.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_58.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_201.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_200.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_57.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_199.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_56.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_198.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_55.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_197.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_54.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_53.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_196.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_195.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_52.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_51.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_193.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_50.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_49.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_48.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_191.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_190.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_47.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_188.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_45.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_44.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_186.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_43.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_185.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_41.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_183.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_40.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_182.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_39.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_38.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_181.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_180.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_37.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_36.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_179.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_178.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_35.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_177.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_34.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_176.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_33.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_31.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_174.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_173.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_172.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_171.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_28.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_27.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_26.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_168.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_25.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_167.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_24.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_165.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_22.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_21.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_164.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_20.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_163.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_19.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_162.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_161.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_18.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_17.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_160.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_16.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_159.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_158.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_157.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_14.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_156.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_12.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_155.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_11.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_154.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_153.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_10.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_9.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_152.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_8.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_151.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_7.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_150.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_6.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_4.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_146.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_3.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_145.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_2.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_144.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_285.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_142.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_284.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_141.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_283.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_140.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_139.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_282.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_281.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_280.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_279.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_136.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_278.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_134.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_133.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_132.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_275.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_130.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_272.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_128.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_271.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_270.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_269.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_125.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_268.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_124.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_123.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_266.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_265.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_122.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_264.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_121.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_263.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_120.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_119.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_118.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_261.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_117.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_116.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_258.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_115.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_257.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_113.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_112.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_254.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_110.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_253.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_252.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_108.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_107.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_106.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_249.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_105.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_248.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_104.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_247.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_103.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_245.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_244.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_101.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_243.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_100.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_99.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_241.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_98.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_97.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_240.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_96.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_239.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_95.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_236.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_93.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_235.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_91.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_89.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_88.JPG /gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_230.JPG
new_xml_df
| file_name | width | height | class | xmin | ymin | xmax | ymax | |
|---|---|---|---|---|---|---|---|---|
| 0 | solar_86.JPG | 512 | 512 | solar | 314 | 130 | 381 | 280 |
| 1 | solar_229.JPG | 512 | 512 | solar | 36 | 129 | 106 | 242 |
| 2 | solar_228.JPG | 512 | 512 | solar | 57 | 31 | 138 | 463 |
| 3 | solar_228.JPG | 512 | 512 | solar | 1 | 267 | 56 | 431 |
| 4 | solar_228.JPG | 512 | 512 | solar | 1 | 30 | 55 | 96 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 397 | solar_89.JPG | 512 | 512 | solar | 221 | 245 | 314 | 292 |
| 398 | solar_89.JPG | 512 | 512 | solar | 314 | 238 | 372 | 309 |
| 399 | solar_88.JPG | 512 | 512 | solar | 102 | 146 | 194 | 236 |
| 400 | solar_88.JPG | 512 | 512 | solar | 195 | 188 | 236 | 241 |
| 401 | solar_230.JPG | 512 | 512 | solar | 468 | 97 | 510 | 177 |
402 rows × 8 columns
new_xml_df['xmin'] = new_xml_df['xmin']/new_xml_df['height']
new_xml_df['xmax'] = new_xml_df['xmax']/new_xml_df['height']
new_xml_df['ymin'] = new_xml_df['ymin']/new_xml_df['width']
new_xml_df['ymax'] = new_xml_df['ymax']/new_xml_df['width']
We save the annotated dataframe to a .csv file
new_xml_df.to_csv((path_to_annotations + 'annotations.csv'), index=None)
Let's define the class name of our object of interest:
cat_to_index = {'solar': 1}
So we created a file with some information about the classes and the division of the dataset into training and testing:
im_list = [os.path.abspath(i) for i in glob.glob(path_to_images_resize + '/**/*.JPG', recursive=True)]
im_list = random.sample(im_list, len(im_list))
# Definining the train test split
train_idx = round(len(im_list) * 0.7)
test_idx = train_idx + round(len(im_list) * 0.3)
# Creating a dictionary with tags
tags_dict = {'train' : im_list[0:train_idx],
'test' : im_list[train_idx:test_idx]}
train_test_split_file_path = os.path.join(path_to_annotations, 'images_tags.csv')
Let's now create a class to import and prepare the images and annotations, leaving them ready to feed the model.|
class ObjectDetectionDataset(Dataset):
"""
Custom PyTorch Dataset Class to facilitate loading data for the Object Detection Task
"""
def __init__(self,
annotations,
train_test_valid_split,
mapping = None,
mode = 'train',
transform = None):
"""
Args:
annotations: The path to the annotations CSV file. Format: file_name, classes, xmin, ymin, xmax, ymax
train_test_valid_split: The path to the tags CSV file for train, test, valid split.
Format: file_name, tag
mapping: a dictionary containing mapping of class name and class index.
Format : {'class_name' : 'class_index'}, Default: None
mode: Mode in which to instantiate class. Default: 'train'
transform: The transforms to be applied to the image data
Returns:
image : Torch Tensor, target: Torch Tensor, file_name : str
"""
self.mapping = mapping
self.transform = transform
self.mode = mode
self.path_to_images = path_to_images_resize
# Loading the annotation file (same format as Remo's)
my_data = pd.read_csv(annotations)
# Here we append the file path to the filename.
# If dataset.export_annotations_to_file was used to create the annotation file, it would feature by default image file paths
my_data['file_name'] = my_data['file_name'].apply(lambda x : os.path.join(path_to_images_resize, x))
my_data = my_data.set_index('file_name')
# Loading the train/test split file (same format as Remo's)
my_tags = pd.read_csv(train_test_valid_split, index_col='file_name')
tags_list = []
for i, row in my_data.iterrows():
tags_list.append(my_tags.loc[i]['tag'])
my_data['tag'] = tags_list
my_data = my_data.reset_index()
# Load only Train/Test/Split depending on the mode
my_data = my_data.loc[my_data['tag'] == mode].reset_index(drop=True)
self.data = my_data
self.file_names = self.data['file_name'].unique()
def __len__(self) -> int:
return self.file_names.shape[0]
def __getitem__(self, index: int):
file_name = self.file_names[index]
records = self.data[self.data['file_name'] == file_name].reset_index()
image = np.array(Image.open(file_name), dtype=np.float32)
image /= 255.0
if self.transform:
image = self.transform(image)
# here we are assuming we don't have labels for the test set
if self.mode != 'test':
boxes = records[['xmin', 'ymin', 'xmax', 'ymax']].values
area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
area = torch.as_tensor(area, dtype=torch.float32)
if self.mapping is not None:
labels = np.zeros((records.shape[0],))
for i in range(records.shape[0]):
labels[i] = self.mapping[records.loc[i, 'class']]
labels = torch.as_tensor(labels, dtype=torch.int64)
else:
labels = torch.ones((records.shape[0],), dtype=torch.int64)
iscrowd = torch.zeros((records.shape[0],), dtype=torch.int64)
target = {}
target['boxes'] = boxes
target['labels'] = labels
target['image_id'] = torch.tensor([index])
target['area'] = area
target['iscrowd'] = iscrowd
target['boxes'] = torch.stack(list((map(torch.tensor, target['boxes'])))).type(torch.float32)
return image, target, file_name
else:
return image, file_name
def collate_fn(batch):
return tuple(zip(*batch))
After defining the dataset creation class, we will instantiate it for the training and testing data. We also create a Loader to load the images.
tensor_transform = transforms.Compose([transforms.ToTensor()])
# Here the operations provided with Remo are integrated into a workflow in PyTorch
# by using the custom ObjectDetectionDataset method.
train_dataset = ObjectDetectionDataset(annotations = annotations_file_path,
train_test_valid_split = train_test_split_file_path,
transform = tensor_transform,
mapping = cat_to_index,
mode = 'train')
test_dataset = ObjectDetectionDataset(annotations = annotations_file_path,
train_test_valid_split = train_test_split_file_path,
transform = tensor_transform,
mapping = cat_to_index,
mode = 'test')
train_data_loader = DataLoader(train_dataset, batch_size=1, shuffle=False, num_workers=0, collate_fn=collate_fn)
test_data_loader = DataLoader(test_dataset, batch_size=1, shuffle=False, num_workers=0, collate_fn=collate_fn)
We set some parameters:
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
num_classes = 2
loss_value = 0.0
num_epochs = 50
From the Pytorch library, we import the fasterRCNN architecture:
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
in_features = model.roi_heads.box_predictor.cls_score.in_features
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
model.to(device)
params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.SGD(params, lr=0.005, momentum=0.9, weight_decay=0.0005)
/usr/local/lib/python3.10/dist-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. warnings.warn( /usr/local/lib/python3.10/dist-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=FasterRCNN_ResNet50_FPN_Weights.COCO_V1`. You can also use `weights=FasterRCNN_ResNet50_FPN_Weights.DEFAULT` to get the most up-to-date weights. warnings.warn(msg) Downloading: "https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth" to /root/.cache/torch/hub/checkpoints/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth 100%|██████████| 160M/160M [00:00<00:00, 407MB/s]
So we can start the training:
# The training loop trains the model for the total number of epochs.
# (1 epoch = one complete pass over the entire dataset)
for epoch in range(num_epochs):
print(epoch)
train_data_loader = tqdm.tqdm(train_data_loader)
for images, targets, image_ids in train_data_loader:
images = list(image.to(device) for image in images)
targets = [{k: v.to(device) for k, v in t.items()} for t in targets]
loss_dict = model(images, targets)
losses = sum(loss for loss in loss_dict.values())
loss_value = losses.item()
optimizer.zero_grad()
losses.backward()
optimizer.step()
print('\nTraining Loss : {:.5f}'.format(loss_value))
0
100%|██████████| 148/148 [00:35<00:00, 4.23it/s]
Training Loss : 0.30937 1
100%|██████████| 148/148 [00:27<00:00, 5.43it/s]
Training Loss : 0.27186 2
100%|██████████| 148/148 [00:28<00:00, 5.28it/s]
Training Loss : 0.21692 3
100%|██████████| 148/148 [00:28<00:00, 5.16it/s]
Training Loss : 0.35555 4
100%|██████████| 148/148 [00:28<00:00, 5.22it/s]
Training Loss : 0.14325 5
100%|██████████| 148/148 [00:28<00:00, 5.24it/s]
Training Loss : 0.16587 6
100%|██████████| 148/148 [00:28<00:00, 5.20it/s]
Training Loss : 0.62193 7
100%|██████████| 148/148 [00:28<00:00, 5.21it/s]
Training Loss : 0.52877 8
100%|██████████| 148/148 [00:28<00:00, 5.21it/s]
Training Loss : 0.52188 9
100%|██████████| 148/148 [00:28<00:00, 5.20it/s]
Training Loss : 0.49366 10
100%|██████████| 148/148 [00:28<00:00, 5.20it/s]
Training Loss : 0.40195 11
100%|██████████| 148/148 [00:28<00:00, 5.20it/s]
Training Loss : 0.19200 12
100%|██████████| 148/148 [00:28<00:00, 5.20it/s]
Training Loss : 0.23942 13
100%|██████████| 148/148 [00:28<00:00, 5.20it/s]
Training Loss : 0.12263 14
100%|██████████| 148/148 [00:28<00:00, 5.21it/s]
Training Loss : 0.17142 15
100%|██████████| 148/148 [00:28<00:00, 5.21it/s]
Training Loss : 0.19020 16
100%|██████████| 148/148 [00:28<00:00, 5.21it/s]
Training Loss : 0.12497 17
100%|██████████| 148/148 [00:28<00:00, 5.22it/s]
Training Loss : 0.17566 18
100%|██████████| 148/148 [00:28<00:00, 5.21it/s]
Training Loss : 0.14988 19
100%|██████████| 148/148 [00:28<00:00, 5.21it/s]
Training Loss : 0.11217 20
100%|██████████| 148/148 [00:28<00:00, 5.21it/s]
Training Loss : 0.16075 21
100%|██████████| 148/148 [00:28<00:00, 5.21it/s]
Training Loss : 0.12907 22
100%|██████████| 148/148 [00:28<00:00, 5.22it/s]
Training Loss : 0.22857 23
100%|██████████| 148/148 [00:28<00:00, 5.21it/s]
Training Loss : 0.18581 24
100%|██████████| 148/148 [00:28<00:00, 5.21it/s]
Training Loss : 0.17690 25
100%|██████████| 148/148 [00:28<00:00, 5.21it/s]
Training Loss : 0.19299 26
100%|██████████| 148/148 [00:28<00:00, 5.20it/s]
Training Loss : 0.19680 27
100%|██████████| 148/148 [00:28<00:00, 5.20it/s]
Training Loss : 0.19126 28
100%|██████████| 148/148 [00:28<00:00, 5.20it/s]
Training Loss : 0.11764 29
100%|██████████| 148/148 [00:28<00:00, 5.20it/s]
Training Loss : 0.13638 30
100%|██████████| 148/148 [00:28<00:00, 5.20it/s]
Training Loss : 0.16259 31
100%|██████████| 148/148 [00:28<00:00, 5.20it/s]
Training Loss : 0.17960 32
100%|██████████| 148/148 [00:28<00:00, 5.21it/s]
Training Loss : 0.19169 33
100%|██████████| 148/148 [00:28<00:00, 5.22it/s]
Training Loss : 0.37868 34
100%|██████████| 148/148 [00:28<00:00, 5.21it/s]
Training Loss : 0.15297 35
100%|██████████| 148/148 [00:28<00:00, 5.21it/s]
Training Loss : 0.12817 36
100%|██████████| 148/148 [00:28<00:00, 5.20it/s]
Training Loss : 0.14718 37
100%|██████████| 148/148 [00:28<00:00, 5.20it/s]
Training Loss : 0.11926 38
100%|██████████| 148/148 [00:28<00:00, 5.20it/s]
Training Loss : 0.09075 39
100%|██████████| 148/148 [00:28<00:00, 5.20it/s]
Training Loss : 0.15343 40
100%|██████████| 148/148 [00:28<00:00, 5.19it/s]
Training Loss : 0.11704 41
100%|██████████| 148/148 [00:28<00:00, 5.20it/s]
Training Loss : 0.18569 42
100%|██████████| 148/148 [00:28<00:00, 5.20it/s]
Training Loss : 0.13897 43
100%|██████████| 148/148 [00:28<00:00, 5.21it/s]
Training Loss : 0.12831 44
100%|██████████| 148/148 [00:28<00:00, 5.20it/s]
Training Loss : 0.06244 45
100%|██████████| 148/148 [00:28<00:00, 5.20it/s]
Training Loss : 0.17426 46
100%|██████████| 148/148 [00:28<00:00, 5.20it/s]
Training Loss : 0.20611 47
100%|██████████| 148/148 [00:28<00:00, 5.21it/s]
Training Loss : 0.11275 48
100%|██████████| 148/148 [00:28<00:00, 5.21it/s]
Training Loss : 0.17762 49
100%|██████████| 148/148 [00:28<00:00, 5.21it/s]
Training Loss : 0.23083
Once training is complete, we can apply the model to the test images. We save a .csv with the detections for all the test images.
# Mapping Between Predicted Index and Class Name
mapping = { value : key for (key, value) in cat_to_index.items()}
detection_threshold = 0.4
img_size = 512
results = []
model.eval()
test_data_loader = tqdm.tqdm(test_data_loader)
with torch.no_grad():
for images, image_ids in test_data_loader:
images = list(image.to(device) for image in images)
outputs = model(images)
#print(outputs)
for i, image in enumerate(images):
boxes = outputs[i]['boxes'].data.cpu().numpy()
scores = outputs[i]['scores'].data.cpu().numpy()
boxes = boxes[scores >= detection_threshold]
scores = scores[scores >= detection_threshold]
image_id = image_ids[i]
for box, labels in zip(boxes, outputs[i]['labels']):
results.append({'file_name' : os.path.basename(image_id),
'classes' : mapping[labels.item()],
'xmin' : int(box[0] * img_size),
'ymin' : int(box[1] * img_size),
'xmax' : int(box[2] * img_size),
'ymax' : int(box[3] * img_size)})
model_predictions_path = path_to_annotations + 'model_predictions.csv'
with open(model_predictions_path, 'w') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=['file_name', 'classes', 'xmin', 'ymin', 'xmax', 'ymax'])
writer.writeheader()
writer.writerows(results)
100%|██████████| 64/64 [00:05<00:00, 11.19it/s]
preds = pd.read_csv(model_predictions_path)
preds
| file_name | classes | xmin | ymin | xmax | ymax | |
|---|---|---|---|---|---|---|
| 0 | solar_86.JPG | solar | 99 | 190 | 245 | 419 |
| 1 | solar_86.JPG | solar | 211 | 193 | 360 | 433 |
| 2 | solar_86.JPG | solar | 16 | 151 | 130 | 315 |
| 3 | solar_86.JPG | solar | 0 | 107 | 15 | 305 |
| 4 | solar_81.JPG | solar | 240 | 228 | 374 | 426 |
| ... | ... | ... | ... | ... | ... | ... |
| 130 | solar_103.JPG | solar | 113 | 304 | 218 | 470 |
| 131 | solar_103.JPG | solar | 308 | 168 | 427 | 337 |
| 132 | solar_240.JPG | solar | 134 | 227 | 274 | 369 |
| 133 | solar_240.JPG | solar | 167 | 147 | 328 | 313 |
| 134 | solar_88.JPG | solar | 174 | 207 | 257 | 324 |
135 rows × 6 columns
list_of_preds = preds['file_name'].unique()
new_xml_df['xmin'] = new_xml_df['xmin']*new_xml_df['height']
new_xml_df['xmax'] = new_xml_df['xmax']*new_xml_df['height']
new_xml_df['ymin'] = new_xml_df['ymin']*new_xml_df['width']
new_xml_df['ymax'] = new_xml_df['ymax']*new_xml_df['width']
new_xml_df['xmin'] = new_xml_df['xmin'].astype(int)
new_xml_df['xmax'] = new_xml_df['xmax'].astype(int)
new_xml_df['ymin'] = new_xml_df['ymin'].astype(int)
new_xml_df['ymax'] = new_xml_df['ymax'].astype(int)
Finally, we can visualize the results and compare them with the original annotations of the test images:
f = 20
img_pred = io.imread(os.path.join(path_to_images_resize,list_of_preds[f]))
detec = preds[preds['file_name'] == list_of_preds[f]]
for i, row in detec.iterrows():
color = (255,0,0)
cv2.rectangle(img_pred, (max(0, row['xmin']), max(0, row['ymin']) , max(0, row['xmax'] - row['xmin']), max(0, row['ymax'] - row['ymin'])), color, 2)
true_sample = new_xml_df[new_xml_df['file_name'] == list_of_preds[f]]
for j, row_2 in true_sample.iterrows():
color = (0,255,0)
cv2.rectangle(img_pred, (max(0, row_2['xmin']), max(0, row_2['ymin']) , max(0, row_2['xmax'] - row_2['xmin']), max(0, row_2['ymax'] - row_2['ymin'])), color, 2)
plt.figure(figsize=(16,16))
plt.imshow(img_pred)
plt.axis('off')
plt.show()