Object Detection¶

Introduction to Object Detection¶

Object detection is a computer vision technique for locating instances of objects in images or videos. Object detection algorithms typically leverage machine learning or deep learning to produce meaningful results. When humans look at images or videos, we can recognize and locate objects of interest in a matter of moments. The goal of object detection is to replicate this intelligence using a computer.

image.png

Image Classification vs. Object Detection¶

Object detection is often confused with image recognition, so before we move on, it’s important to clarify the distinctions between them.

Image recognition assigns a label to an image. A photo of a dog is labeled “dog.” A photo of two dogs is still labeled “dog.” Object detection, on the other hand, draws a box around each dog and labels the box “dog.” The model predicts where each object is and what label should be applied. In this way, object detection provides more information about an image than recognition.

Here’s an example of what this distinction looks like in practice:

image.png

Basic structure¶

Object detection locates the presence of an object in an image and draws a bounding box around that object. This usually involves two processes; classifying and object type and then drawing a box around that object. We have covered image classification before, so let’s now review some of the common model architectures used for object detection:

  • R-CNN
  • Fast R-CNN
  • Faster R-CNN
  • SSD (Single Shot MultiBox Defender)
  • YOLO (You Only Look Once)

Types of architectures¶

Whether you create a custom object detector or use a pre-trained one, you will need to decide what type of object detection network you want to use: a Two Stage network or a Single Shot network.

image.png

Two-Stage Networks

The initial stage of two-stage networks, such as R-CNN and its variants, identifies proposed regions or subsets of the image that may contain an object. The second stage classifies the objects within the proposed regions. Two-stage networks can achieve very accurate object detection results; however, they are typically slower than single-stage networks.

Single-Stage Networks

In single-stage networks like YOLO v2, the CNN produces network predictions for regions across the image using anchor boxes, and the predictions are decoded to generate the final bounding boxes for the objects. Single-stage networks can be much faster than two-stage networks, but they may not achieve the same level of accuracy, especially for scenes that contain small objects.

Two-Stage Architectures¶

R-CNN¶

The 2014 paper proposes the basic version of the CNN-based two-stage detection algorithm, which is improved and accelerated in the following papers. As depicted in the figure above, the overall pipeline consists of three steps: Generate region proposals: the model must draw object candidates in the image, regardless of category. The second stage is a fully convolutional neural network that computes features of each candidate region. The final stage is a fully connected layer, expressed as SVMs in the paper.

image.png

The problem that R-CNN tries to solve is finding objects in an image (object detection). What do you do to solve this? You can start with a sliding window approach. When using this method, you simply go through the entire image with rectangles of different sizes and look at these smaller images in a brute force method. The problem is that you have a huge number of smaller images to look at. Luckily for us, other smart people have developed algorithms to intelligently select so-called region proposals. To simplify this concept:

  • Region proposals are just smaller parts of the original image that we think might contain the objects we are looking for.

Region proposals¶

There are different region proposal algorithms that we can choose from. These are “normal” algorithms that work out of the box. We don’t need to train them or anything. In the case of this work, they use the selective search method to generate region proposals.

The selective search algorithm uses exhaustive search, but instead of using only exhaustive search, it also segments the colors presented in the image. More formally, we can say that selective search is a method that separates objects in an image by giving the object different colors.

This will create almost 2,000 different regions that we will have to examine. This seems like a large number, but it is still very small compared to the brute force sliding window approach.

image.png

CNN¶

In the next step, we take each region proposal and create a feature vector representing that image in a much lower dimension using a Convolutional Neural Network (CNN).

image.png

They use AlexNet as a feature extractor. Keep in mind that this is 2014 and AlexNet is still state of the art.

One question we need to answer:

If you use AlexNet only as a feature extractor, how do we train it?

Well, this is a fundamental question with this R-CNN system. You cannot train the entire system at once (this will be solved by the fast R-CNN system). Instead, you will need to train each part independently. This means that AlexNet was previously trained on a classification task. After training, they removed the last softmax layer. Now, the last layer is the 4096-dimensional fully connected layer. This means that our features are 4096-dimensional.

Another important thing to keep in mind is that the input to AlexNet is always the same (227, 227, 3). However, the image proposals have different shapes. Many of them are smaller or larger than the required size. Therefore, we will need to resize each region proposal.

To summarize the CNN task:

image.png

SVM¶

We have created feature vectors from the image proposals. Now we need to classify these feature vectors. We want to detect which class of object these feature vectors represent. To do this, we use an SVM classifier. We have one SVM for each class of object and we use all of them. This means that for a given feature vector we have n outputs, where n is the number of different objects we want to detect. The output is a confidence score. How confident are we that this particular feature vector represents that class.

What confused me when I first read this article was how we train these different SVMs. Well, we train them on feature vectors created by AlexNet. This means that we have to wait until we have fully trained the CNN before we can train the SVM. The training is not parallelizable. Since we know when training which feature vector represented which class, we can easily train the different SVMs in a supervised manner.

To summarize:

  • We create different image proposals from an image.
  • Then we create a feature vector from these proposals using CNN.
  • Finally, we classify each feature vector with SVMs for each object class.

The output:

We now have image proposals that are classified into each object class. How do we get them all back into the image? We use something called non-maximum greedy suppression. This is a fancy word for the following concept:

We reject a region (image proposal) if it has an intersection-over-union (IoU) overlap with a selected region with a higher score.

We combine each region if there is overlap and take the proposal with the highest score (calculated by the SVM). We do this step for each object class independently. After that, we keep only the regions with a score higher than 0.5.

Bounding Box Regressor (optional)¶

I want to mention the Bounding Box Regressor at the end because it is not a fundamental building block of the R-CNN system. It is a great idea and the authors found that it improves average accuracy by 3%. So how does it work?

When you are training the Bounding Box Regressor, your input is the center, width, and height in pixels of the region proposal and the label is the ground truth bounding box. The goal, as stated in the paper, is:

Our goal is to learn a transformation that maps a proposed box P to a ground truth box G.

Problems with R-CNN¶

  • It still takes a long time to train the network, as you would have to classify 2,000 region proposals per image.
  • It cannot be implemented in real-time, as it takes about 47 seconds for each test image.
  • The selective search algorithm is a fixed algorithm. Therefore, no learning is happening at this stage. This can lead to generating poor candidate region proposals.

Fast R-CNN¶

The same author of the previous paper (R-CNN) addressed some of the drawbacks of R-CNN to build a faster object detection algorithm and called it Fast R-CNN. The approach is similar to the R-CNN algorithm. But instead of feeding region proposals to CNN, we feed the input image to CNN to generate a convolutional feature map. From the convolutional feature map, we identify the region proposals and warp them into squares and using a RoI pooling layer, reshape them into a fixed size so that they can be fed to a fully connected layer. From the RoI feature vector, we use a softmax layer to predict the class of the proposed region and also the offset values ​​for the bounding box.

image.png

Aqui está um resumo das principais contribuições:

  • Propôs uma nova camada chamada ROI Pooling que extrai vetores de recursos de comprimento igual de todas as propostas (ou seja, ROIs) na mesma imagem.
  • Comparado ao R-CNN, que possui vários estágios (geração de proposta de região, extração de recursos e classificação usando SVM), o Faster R-CNN constrói uma rede que possui apenas um único estágio.
  • O R-CNN mais rápido compartilha cálculos (ou seja, cálculos de camada convolucional) em todas as propostas (ou seja, ROIs) em vez de fazer os cálculos para cada proposta de forma independente. Isso é feito usando a nova camada ROI Pooling, que torna o Fast R-CNN mais rápido que o R-CNN.
  • O Fast R-CNN não armazena em cache os recursos extraídos e, portanto, não precisa de muito armazenamento em disco em comparação com o R-CNN, que precisa de centenas de gigabytes.
  • Fast R-CNN é mais preciso do que R-CNN.

Faster R-CNN¶

Both the above algorithms (R-CNN and Fast R-CNN) use selective search to discover region proposals. Selective search is a slow and time-consuming process that affects the network performance. Therefore, Shaoqing Ren et al. came up with an object detection algorithm that eliminates the selective search algorithm and allows the network to learn region proposals.

Similar to Fast R-CNN, the image is provided as an input to a convolutional network that outputs a convolutional feature map. Instead of using the selective search algorithm on the feature map to identify region proposals, a separate network is used to predict region proposals. The predicted region proposals are then reshaped using a RoI pooling layer that is used to classify the image within the proposed region and predict the offset values ​​for the bounding boxes.

The main contributions of Faster R-CNN:

  • Proposed Region Proposal Network (RPN), which is a fully convolutional network that generates proposals with multiple scales and aspect ratios. RPN implements the terminology of the neural network with attention to tell object detection (Fast R-CNN) where to look.
  • Instead of using image pyramids (i.e., multiple instances of the image, but at different scales) or filter pyramids (i.e., multiple filters with different sizes), this paper introduced the concept of anchor boxes. An anchor box is a reference box of a specific scale and aspect ratio. With multiple reference anchor boxes, multiple scales and aspect ratios exist for a single region. This can be thought of as a pyramid of reference anchor boxes. Each region is then mapped to each reference anchor box, thus detecting objects at different scales and aspect ratios.
  • Convolutional computations are shared between RPN and Fast R-CNN. This reduces computational time.

The architecture of Faster R-CNN is shown in the next figure. It consists of 2 modules:

  • RPN: For generating region proposals.
  • Fast R-CNN: For detecting objects in the proposed regions.

image.png

Regional Proposals Network (RPN)¶

R-CNN and Fast R-CNN models rely on the Selective Search algorithm to generate region proposals. Each proposal is fed to a pre-trained CNN for classification. This paper proposes a network called region proposal network (RPN) that can produce region proposals. This has a few advantages:

  • Region proposals are now generated using a network that can be trained and customized according to the detection task.
  • Since the proposals are generated using a network, it can be trained end-to-end to be customized for the detection task. Thus, it produces better region proposals compared to generic methods like Selective Search and EdgeBoxes.
  • RPN processes the image using the same convolutional layers used in the Fast R-CNN detection network. Thus, RPN does not take extra time to produce the proposals compared to algorithms like Selective Search.
  • Due to sharing the same convolutional layers, RPN and Fast R-CNN can be merged/unified into a single network. Thus, training is done only once.

RPN works on the output feature map returned from the last convolutional layer shared with Fast R-CNN. This is shown in the next figure. Based on a rectangular window of size nxn, a sliding window passes through the feature map. For each window, multiple candidate region proposals are generated. These proposals are not the final proposals, as they will be filtered based on their “objectivity score”

Solar Panel Detection in High-Resolution Images Using Faster RCNN¶

In this example we will use the Pytorch implementation of FasterRCNN to detect solar panels in high-resolution satellite images.

First, let's connect to Drive:

In [ ]:
from google.colab import drive
GDRIVE_ROOT = "/gdrive"
drive.mount(GDRIVE_ROOT)
Mounted at /gdrive

Let's import the necessary packages:

In [ ]:
from PIL import Image
import os
import glob
import random
import csv
random.seed(4)

import pandas as pd
import numpy as np
import tqdm
import xml.etree.ElementTree as ET
import pandas as pd
from skimage import io
from skimage.io import imsave
import matplotlib.pyplot as plt
import cv2

import torch
from torch.utils.data import DataLoader, Dataset

import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator
import torchvision.transforms as transforms

Let's define the paths of the images and annotations that will be used in this study:

In [ ]:
path_to_images =  os.path.join(GDRIVE_ROOT + '/My Drive/', 'Datasets/dataset_solar/images/')
path_to_annotations = os.path.join(GDRIVE_ROOT + '/My Drive/', 'Datasets/dataset_solar/annotations/')

We create the path of the .csv file where we will save the notes later.

In [ ]:
annotations_file_path = os.path.join(path_to_annotations, 'annotations.csv')

Let's then generate a list of .xmls where the annotations for each image are:

In [ ]:
xmls_paths = os.path.join(path_to_annotations, os.listdir(path_to_annotations)[0])
In [ ]:
xml_list = os.listdir(xmls_paths)

So we can import the notes:

In [ ]:
xml_list = []
for xml_file in os.listdir(xmls_paths):
  tree = ET.parse(os.path.join(xmls_paths,xml_file))
  root = tree.getroot()
  for member in root.findall('object'):
    value = (root.find('filename').text,
              int(root.find('size')[0].text),
              int(root.find('size')[1].text),
              member[0].text,
              int(member[4][0].text),
              int(member[4][1].text),
              int(member[4][2].text),
              int(member[4][3].text)
              )
    xml_list.append(value)
column_name = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax']
xml_df = pd.DataFrame(xml_list, columns=column_name)
In [ ]:
xml_df
Out[ ]:
filename width height class xmin ymin xmax ymax
0 solar_2.JPG 901 791 solar 617 390 708 491
1 solar_140.JPG 901 791 solar 218 294 322 384
2 solar_144.JPG 901 791 solar 58 521 117 588
3 solar_144.JPG 901 791 solar 160 537 242 626
4 solar_144.JPG 901 791 solar 705 150 757 200
... ... ... ... ... ... ... ... ...
397 solar_84.JPG 901 791 solar 679 124 753 146
398 solar_84.JPG 901 791 solar 762 123 812 146
399 solar_84.JPG 901 791 solar 763 158 901 624
400 solar_84.JPG 901 791 solar 660 125 901 623
401 solar_56.JPG 901 791 solar 625 342 865 471

402 rows × 8 columns

Let's then plot an image and its annotations as an example:

In [ ]:
list_of_images = os.listdir(path_to_images)
In [ ]:
i = 6
img = io.imread(os.path.join(path_to_images,list_of_images[i]))
detec = xml_df[xml_df['filename'] == list_of_images[i]]
for i, row in detec.iterrows():
  color = (255,0,0)
  cv2.rectangle(img, (max(0, row['xmin']), max(0, row['ymin']) , max(0, row['xmax'] - row['xmin']), max(0, row['ymax'] - row['ymin'])), color, 2)
In [ ]:
plt.figure(figsize=(16,16))
plt.imshow(img)
plt.axis('off')
plt.show()
No description has been provided for this image

Now, let's resize our images and their respective boundbox values:

In [ ]:
path_to_images_resize = os.path.join(GDRIVE_ROOT + '/My Drive/', 'Datasets/dataset_solar/resize_images')
if not os.path.isdir(path_to_images_resize):
  os.mkdir(path_to_images_resize)
In [ ]:
new_xml_df = []
for img_name in list_of_images:
  img = io.imread(os.path.join(path_to_images,img_name))
  detec = xml_df[xml_df['filename'] == img_name]
  y_ = img.shape[0]
  x_ = img.shape[1]
  print(os.path.join(path_to_images_resize,img_name))

  targetSize = 512
  x_scale = targetSize / x_
  y_scale = targetSize / y_
  new_img = cv2.resize(img, (targetSize, targetSize));
  new_img = np.array(new_img)

  imsave(os.path.join(path_to_images_resize,img_name), new_img)
  color = (255,0,0)
  for i, row in detec.iterrows():
    new_xmin = int(np.round(row['xmin'] * x_scale))
    new_xmax = int(np.round(row['xmax'] * x_scale))
    new_ymin = int(np.round(row['ymin'] * y_scale))
    new_ymax = int(np.round(row['ymax'] * y_scale))
    filename = row['filename']
    width	= targetSize
    height = targetSize
    classe = row['class']
    new_xml_df.append([filename,width,height,classe,new_xmin,new_ymin,new_xmax,new_ymax])

column_name = ['file_name', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax']
new_xml_df = pd.DataFrame(new_xml_df, columns=column_name)
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_86.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_229.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_228.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_85.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_227.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_84.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_82.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_81.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_224.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_80.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_79.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_221.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_78.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_220.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_77.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_76.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_219.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_75.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_217.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_73.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_215.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_71.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_214.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_70.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_69.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_68.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_211.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_67.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_210.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_209.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_66.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_207.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_206.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_63.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_62.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_205.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_204.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_61.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_60.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_202.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_59.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_58.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_201.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_200.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_57.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_199.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_56.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_198.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_55.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_197.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_54.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_53.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_196.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_195.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_52.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_51.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_193.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_50.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_49.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_48.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_191.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_190.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_47.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_188.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_45.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_44.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_186.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_43.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_185.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_41.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_183.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_40.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_182.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_39.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_38.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_181.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_180.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_37.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_36.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_179.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_178.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_35.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_177.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_34.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_176.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_33.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_31.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_174.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_173.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_172.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_171.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_28.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_27.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_26.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_168.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_25.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_167.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_24.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_165.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_22.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_21.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_164.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_20.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_163.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_19.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_162.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_161.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_18.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_17.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_160.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_16.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_159.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_158.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_157.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_14.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_156.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_12.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_155.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_11.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_154.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_153.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_10.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_9.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_152.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_8.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_151.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_7.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_150.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_6.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_4.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_146.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_3.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_145.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_2.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_144.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_285.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_142.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_284.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_141.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_283.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_140.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_139.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_282.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_281.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_280.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_279.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_136.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_278.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_134.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_133.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_132.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_275.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_130.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_272.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_128.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_271.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_270.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_269.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_125.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_268.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_124.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_123.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_266.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_265.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_122.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_264.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_121.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_263.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_120.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_119.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_118.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_261.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_117.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_116.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_258.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_115.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_257.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_113.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_112.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_254.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_110.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_253.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_252.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_108.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_107.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_106.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_249.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_105.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_248.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_104.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_247.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_103.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_245.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_244.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_101.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_243.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_100.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_99.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_241.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_98.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_97.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_240.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_96.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_239.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_95.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_236.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_93.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_235.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_91.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_89.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_88.JPG
/gdrive/My Drive/Datasets/dataset_solar/resize_images/solar_230.JPG
In [ ]:
new_xml_df
Out[ ]:
file_name width height class xmin ymin xmax ymax
0 solar_86.JPG 512 512 solar 314 130 381 280
1 solar_229.JPG 512 512 solar 36 129 106 242
2 solar_228.JPG 512 512 solar 57 31 138 463
3 solar_228.JPG 512 512 solar 1 267 56 431
4 solar_228.JPG 512 512 solar 1 30 55 96
... ... ... ... ... ... ... ... ...
397 solar_89.JPG 512 512 solar 221 245 314 292
398 solar_89.JPG 512 512 solar 314 238 372 309
399 solar_88.JPG 512 512 solar 102 146 194 236
400 solar_88.JPG 512 512 solar 195 188 236 241
401 solar_230.JPG 512 512 solar 468 97 510 177

402 rows × 8 columns

In [ ]:
new_xml_df['xmin'] = new_xml_df['xmin']/new_xml_df['height']
new_xml_df['xmax'] = new_xml_df['xmax']/new_xml_df['height']
new_xml_df['ymin'] = new_xml_df['ymin']/new_xml_df['width']
new_xml_df['ymax'] = new_xml_df['ymax']/new_xml_df['width']

We save the annotated dataframe to a .csv file

In [ ]:
new_xml_df.to_csv((path_to_annotations + 'annotations.csv'), index=None)

Let's define the class name of our object of interest:

In [ ]:
cat_to_index = {'solar': 1}

So we created a file with some information about the classes and the division of the dataset into training and testing:

In [ ]:
im_list = [os.path.abspath(i) for i in glob.glob(path_to_images_resize + '/**/*.JPG', recursive=True)]
im_list = random.sample(im_list, len(im_list))

# Definining the train test split
train_idx = round(len(im_list) * 0.7)
test_idx  = train_idx + round(len(im_list) * 0.3)

# Creating a dictionary with tags
tags_dict =  {'train' : im_list[0:train_idx],
              'test' : im_list[train_idx:test_idx]}

train_test_split_file_path = os.path.join(path_to_annotations, 'images_tags.csv')

Let's now create a class to import and prepare the images and annotations, leaving them ready to feed the model.|

In [ ]:
class ObjectDetectionDataset(Dataset):
    """
    Custom PyTorch Dataset Class to facilitate loading data for the Object Detection Task
    """
    def __init__(self,
                 annotations,
                 train_test_valid_split,
                 mapping = None,
                 mode = 'train',
                 transform = None):
        """
        Args:
            annotations: The path to the annotations CSV file. Format: file_name, classes, xmin, ymin, xmax, ymax
            train_test_valid_split: The path to the tags CSV file for train, test, valid split.
                                    Format: file_name, tag
            mapping: a dictionary containing mapping of class name and class index.
                     Format : {'class_name' : 'class_index'}, Default: None
            mode: Mode in which to instantiate class. Default: 'train'
            transform: The transforms to be applied to the image data

        Returns:
            image : Torch Tensor, target: Torch Tensor, file_name : str
        """
        self.mapping = mapping
        self.transform = transform
        self.mode = mode

        self.path_to_images = path_to_images_resize
        # Loading the annotation file (same format as Remo's)
        my_data = pd.read_csv(annotations)
        # Here we append the file path to the filename.
        # If dataset.export_annotations_to_file was used to create the annotation file, it would feature by default image file paths
        my_data['file_name'] = my_data['file_name'].apply(lambda x : os.path.join(path_to_images_resize, x))
        my_data = my_data.set_index('file_name')

        # Loading the train/test split file (same format as Remo's)
        my_tags =  pd.read_csv(train_test_valid_split, index_col='file_name')

        tags_list = []
        for i, row in my_data.iterrows():
          tags_list.append(my_tags.loc[i]['tag'])


        my_data['tag'] = tags_list
        my_data = my_data.reset_index()

        # Load only Train/Test/Split depending on the mode
        my_data = my_data.loc[my_data['tag'] == mode].reset_index(drop=True)

        self.data = my_data

        self.file_names = self.data['file_name'].unique()

    def __len__(self) -> int:
        return self.file_names.shape[0]

    def __getitem__(self, index: int):

        file_name = self.file_names[index]
        records = self.data[self.data['file_name'] == file_name].reset_index()
        image = np.array(Image.open(file_name), dtype=np.float32)
        image /= 255.0

        if self.transform:
            image = self.transform(image)

        # here we are assuming we don't have labels for the test set
        if self.mode != 'test':
            boxes = records[['xmin', 'ymin', 'xmax', 'ymax']].values
            area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
            area = torch.as_tensor(area, dtype=torch.float32)

            if self.mapping is not None:
                labels = np.zeros((records.shape[0],))

                for i in range(records.shape[0]):
                    labels[i] = self.mapping[records.loc[i, 'class']]

                labels = torch.as_tensor(labels, dtype=torch.int64)

            else:
                labels = torch.ones((records.shape[0],), dtype=torch.int64)

            iscrowd = torch.zeros((records.shape[0],), dtype=torch.int64)

            target = {}
            target['boxes'] = boxes
            target['labels'] = labels
            target['image_id'] = torch.tensor([index])
            target['area'] = area
            target['iscrowd'] = iscrowd
            target['boxes'] = torch.stack(list((map(torch.tensor, target['boxes'])))).type(torch.float32)

            return image, target, file_name
        else:
            return image, file_name

def collate_fn(batch):
    return tuple(zip(*batch))

After defining the dataset creation class, we will instantiate it for the training and testing data. We also create a Loader to load the images.

In [ ]:
tensor_transform = transforms.Compose([transforms.ToTensor()])

# Here the operations provided with Remo are integrated into a workflow in PyTorch
# by using the custom ObjectDetectionDataset method.

train_dataset = ObjectDetectionDataset(annotations = annotations_file_path,
                                       train_test_valid_split = train_test_split_file_path,
                                       transform = tensor_transform,
                                       mapping = cat_to_index,
                                       mode = 'train')

test_dataset = ObjectDetectionDataset(annotations = annotations_file_path,
                                       train_test_valid_split = train_test_split_file_path,
                                       transform = tensor_transform,
                                       mapping = cat_to_index,
                                       mode = 'test')


train_data_loader = DataLoader(train_dataset, batch_size=1, shuffle=False, num_workers=0, collate_fn=collate_fn)
test_data_loader  = DataLoader(test_dataset, batch_size=1, shuffle=False, num_workers=0, collate_fn=collate_fn)

We set some parameters:

In [ ]:
device      = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
num_classes = 2
loss_value  = 0.0
num_epochs  = 50

From the Pytorch library, we import the fasterRCNN architecture:

In [ ]:
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)

in_features = model.roi_heads.box_predictor.cls_score.in_features
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)

model.to(device)

params = [p for p in model.parameters() if p.requires_grad]

optimizer = torch.optim.SGD(params, lr=0.005, momentum=0.9, weight_decay=0.0005)
/usr/local/lib/python3.10/dist-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=FasterRCNN_ResNet50_FPN_Weights.COCO_V1`. You can also use `weights=FasterRCNN_ResNet50_FPN_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Downloading: "https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth" to /root/.cache/torch/hub/checkpoints/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth
100%|██████████| 160M/160M [00:00<00:00, 407MB/s]

So we can start the training:

In [ ]:
# The training loop trains the model for the total number of epochs.
# (1 epoch = one complete pass over the entire dataset)

for epoch in range(num_epochs):
    print(epoch)
    train_data_loader = tqdm.tqdm(train_data_loader)
    for images, targets, image_ids in train_data_loader:

        images = list(image.to(device) for image in images)
        targets = [{k: v.to(device) for k, v in t.items()} for t in targets]

        loss_dict = model(images, targets)

        losses = sum(loss for loss in loss_dict.values())
        loss_value = losses.item()

        optimizer.zero_grad()
        losses.backward()
        optimizer.step()
    print('\nTraining Loss : {:.5f}'.format(loss_value))
0
100%|██████████| 148/148 [00:35<00:00,  4.23it/s]
Training Loss : 0.30937
1
100%|██████████| 148/148 [00:27<00:00,  5.43it/s]
Training Loss : 0.27186
2
100%|██████████| 148/148 [00:28<00:00,  5.28it/s]
Training Loss : 0.21692
3
100%|██████████| 148/148 [00:28<00:00,  5.16it/s]
Training Loss : 0.35555
4
100%|██████████| 148/148 [00:28<00:00,  5.22it/s]
Training Loss : 0.14325
5
100%|██████████| 148/148 [00:28<00:00,  5.24it/s]
Training Loss : 0.16587
6
100%|██████████| 148/148 [00:28<00:00,  5.20it/s]
Training Loss : 0.62193
7
100%|██████████| 148/148 [00:28<00:00,  5.21it/s]
Training Loss : 0.52877
8
100%|██████████| 148/148 [00:28<00:00,  5.21it/s]
Training Loss : 0.52188
9
100%|██████████| 148/148 [00:28<00:00,  5.20it/s]
Training Loss : 0.49366
10
100%|██████████| 148/148 [00:28<00:00,  5.20it/s]
Training Loss : 0.40195
11
100%|██████████| 148/148 [00:28<00:00,  5.20it/s]
Training Loss : 0.19200
12
100%|██████████| 148/148 [00:28<00:00,  5.20it/s]
Training Loss : 0.23942
13
100%|██████████| 148/148 [00:28<00:00,  5.20it/s]
Training Loss : 0.12263
14
100%|██████████| 148/148 [00:28<00:00,  5.21it/s]
Training Loss : 0.17142
15
100%|██████████| 148/148 [00:28<00:00,  5.21it/s]
Training Loss : 0.19020
16
100%|██████████| 148/148 [00:28<00:00,  5.21it/s]
Training Loss : 0.12497
17
100%|██████████| 148/148 [00:28<00:00,  5.22it/s]
Training Loss : 0.17566
18
100%|██████████| 148/148 [00:28<00:00,  5.21it/s]
Training Loss : 0.14988
19
100%|██████████| 148/148 [00:28<00:00,  5.21it/s]
Training Loss : 0.11217
20
100%|██████████| 148/148 [00:28<00:00,  5.21it/s]
Training Loss : 0.16075
21
100%|██████████| 148/148 [00:28<00:00,  5.21it/s]
Training Loss : 0.12907
22
100%|██████████| 148/148 [00:28<00:00,  5.22it/s]
Training Loss : 0.22857
23
100%|██████████| 148/148 [00:28<00:00,  5.21it/s]
Training Loss : 0.18581
24
100%|██████████| 148/148 [00:28<00:00,  5.21it/s]
Training Loss : 0.17690
25
100%|██████████| 148/148 [00:28<00:00,  5.21it/s]
Training Loss : 0.19299
26
100%|██████████| 148/148 [00:28<00:00,  5.20it/s]
Training Loss : 0.19680
27
100%|██████████| 148/148 [00:28<00:00,  5.20it/s]
Training Loss : 0.19126
28
100%|██████████| 148/148 [00:28<00:00,  5.20it/s]
Training Loss : 0.11764
29
100%|██████████| 148/148 [00:28<00:00,  5.20it/s]
Training Loss : 0.13638
30
100%|██████████| 148/148 [00:28<00:00,  5.20it/s]
Training Loss : 0.16259
31
100%|██████████| 148/148 [00:28<00:00,  5.20it/s]
Training Loss : 0.17960
32
100%|██████████| 148/148 [00:28<00:00,  5.21it/s]
Training Loss : 0.19169
33
100%|██████████| 148/148 [00:28<00:00,  5.22it/s]
Training Loss : 0.37868
34
100%|██████████| 148/148 [00:28<00:00,  5.21it/s]
Training Loss : 0.15297
35
100%|██████████| 148/148 [00:28<00:00,  5.21it/s]
Training Loss : 0.12817
36
100%|██████████| 148/148 [00:28<00:00,  5.20it/s]
Training Loss : 0.14718
37
100%|██████████| 148/148 [00:28<00:00,  5.20it/s]
Training Loss : 0.11926
38
100%|██████████| 148/148 [00:28<00:00,  5.20it/s]
Training Loss : 0.09075
39
100%|██████████| 148/148 [00:28<00:00,  5.20it/s]
Training Loss : 0.15343
40
100%|██████████| 148/148 [00:28<00:00,  5.19it/s]
Training Loss : 0.11704
41
100%|██████████| 148/148 [00:28<00:00,  5.20it/s]
Training Loss : 0.18569
42
100%|██████████| 148/148 [00:28<00:00,  5.20it/s]
Training Loss : 0.13897
43
100%|██████████| 148/148 [00:28<00:00,  5.21it/s]
Training Loss : 0.12831
44
100%|██████████| 148/148 [00:28<00:00,  5.20it/s]
Training Loss : 0.06244
45
100%|██████████| 148/148 [00:28<00:00,  5.20it/s]
Training Loss : 0.17426
46
100%|██████████| 148/148 [00:28<00:00,  5.20it/s]
Training Loss : 0.20611
47
100%|██████████| 148/148 [00:28<00:00,  5.21it/s]
Training Loss : 0.11275
48
100%|██████████| 148/148 [00:28<00:00,  5.21it/s]
Training Loss : 0.17762
49
100%|██████████| 148/148 [00:28<00:00,  5.21it/s]
Training Loss : 0.23083

Once training is complete, we can apply the model to the test images. We save a .csv with the detections for all the test images.

In [ ]:
# Mapping Between Predicted Index and Class Name
mapping = { value : key for (key, value) in cat_to_index.items()}

detection_threshold = 0.4
img_size = 512
results = []

model.eval()
test_data_loader = tqdm.tqdm(test_data_loader)

with torch.no_grad():
    for images, image_ids in test_data_loader:

        images = list(image.to(device) for image in images)
        outputs = model(images)
        #print(outputs)
        for i, image in enumerate(images):

            boxes = outputs[i]['boxes'].data.cpu().numpy()
            scores = outputs[i]['scores'].data.cpu().numpy()
            boxes = boxes[scores >= detection_threshold]
            scores = scores[scores >= detection_threshold]
            image_id = image_ids[i]

            for box, labels in zip(boxes, outputs[i]['labels']):
                results.append({'file_name' : os.path.basename(image_id),
                                'classes'   : mapping[labels.item()],
                                'xmin'      : int(box[0] * img_size),
                                'ymin'      : int(box[1] * img_size),
                                'xmax'      : int(box[2] * img_size),
                                'ymax'      : int(box[3] * img_size)})


model_predictions_path = path_to_annotations + 'model_predictions.csv'

with open(model_predictions_path, 'w') as csvfile:
    writer = csv.DictWriter(csvfile, fieldnames=['file_name', 'classes', 'xmin', 'ymin', 'xmax', 'ymax'])
    writer.writeheader()
    writer.writerows(results)
100%|██████████| 64/64 [00:05<00:00, 11.19it/s]
In [ ]:
preds = pd.read_csv(model_predictions_path)
In [ ]:
preds
Out[ ]:
file_name classes xmin ymin xmax ymax
0 solar_86.JPG solar 99 190 245 419
1 solar_86.JPG solar 211 193 360 433
2 solar_86.JPG solar 16 151 130 315
3 solar_86.JPG solar 0 107 15 305
4 solar_81.JPG solar 240 228 374 426
... ... ... ... ... ... ...
130 solar_103.JPG solar 113 304 218 470
131 solar_103.JPG solar 308 168 427 337
132 solar_240.JPG solar 134 227 274 369
133 solar_240.JPG solar 167 147 328 313
134 solar_88.JPG solar 174 207 257 324

135 rows × 6 columns

In [ ]:
list_of_preds = preds['file_name'].unique()
In [ ]:
new_xml_df['xmin'] = new_xml_df['xmin']*new_xml_df['height']
new_xml_df['xmax'] = new_xml_df['xmax']*new_xml_df['height']
new_xml_df['ymin'] = new_xml_df['ymin']*new_xml_df['width']
new_xml_df['ymax'] = new_xml_df['ymax']*new_xml_df['width']
In [ ]:
new_xml_df['xmin'] = new_xml_df['xmin'].astype(int)
new_xml_df['xmax'] = new_xml_df['xmax'].astype(int)
new_xml_df['ymin'] = new_xml_df['ymin'].astype(int)
new_xml_df['ymax'] = new_xml_df['ymax'].astype(int)

Finally, we can visualize the results and compare them with the original annotations of the test images:

In [ ]:
f = 20
img_pred = io.imread(os.path.join(path_to_images_resize,list_of_preds[f]))
detec = preds[preds['file_name'] == list_of_preds[f]]
for i, row in detec.iterrows():
  color = (255,0,0)
  cv2.rectangle(img_pred, (max(0, row['xmin']), max(0, row['ymin']) , max(0, row['xmax'] - row['xmin']), max(0, row['ymax'] - row['ymin'])), color, 2)
true_sample = new_xml_df[new_xml_df['file_name'] == list_of_preds[f]]
for j, row_2 in true_sample.iterrows():
  color = (0,255,0)
  cv2.rectangle(img_pred, (max(0, row_2['xmin']), max(0, row_2['ymin']) , max(0, row_2['xmax'] - row_2['xmin']), max(0, row_2['ymax'] - row_2['ymin'])), color, 2)

plt.figure(figsize=(16,16))
plt.imshow(img_pred)
plt.axis('off')
plt.show()
No description has been provided for this image