Skip to main content

Region Based Convolutional Neural Networks: How you can achieve your own object detector with transfer learning.

In this article we will try to cover the R-CNN object detector implementation and how transfer learning could h elp in to classify object.

We are going to build a raccoon detector using R-CNN by classifying a raccoon and no raccoon from the raccoon dataset by dat train using transfer learning with fine tuning approach (another approach is feature extraction). The reason I've choose the raccoon dataset is that it has a nice annotation files for every faces of raccoon in the images and many more research has gone through this dataset for object detection.

Before Starting:

  • This article is going to be a bit long so I highly recommend you to s3et yourself free to read it and actually implement it.
  • In this tutorial we will be using selective search algorithm to find ROIs (Region of Interest), boundaries and also be using transfer learning. To read and learn these I recommend the series of objection detection blog and theany line you didn’t get tutorials from pyimagesearch by adrin rpseberg as I’m a big fan of him.

any line you didn’t getany line you didn’t get

Work flow of this project:

Let’s set a vision first and the working paradigm.

What we actually want from this project is that, running a python python script with a input image of a raccoon and eventually get the output image with a box on raccoon’s face.

The first step will be data prepossessing i.e. making dataset straight forward. Because we will be fine tuning a MobileNet V2 CNN pre-trained model on 1000 class ImageNet Dataset, we want our dataset to fit on this model while transfer learning. We want our dataset to be two class, positive (images that contains raccoon) and negative (images that doesn’t raccoon) so that the transfer learned model would classify raccoon and no raccoon for object detection.

Second step is to accept an input image and run selective search algorithm to the image to get the region proposals.

3rd step is to make predictions on each proposals from selective search using classification model and return will be detected objection

1st task:

From our project’s work flow we have to build a perfect dataset so that the dataset would fit the classification model exactly.

since we will be using mobileNEt V2 CNN, our classifier needs a raccoon directory where all the images of interested objects are located i.e. faces of the raccoon and a directory ‘no raccoon’ where all of the objects that are not raccoon are located.

Let’s create a mindset for our 1st task

  • First we loop over all the images in the raccoon dataset and accept the image input one by one so that we could parse and find the bounding box coordinates for any raccoon present in the image. 
  •  Second we use selective search algorithm to the image to list out the bounding boxes. Using annotation files to examine which region proposals proposed by selective search sufficiently support the ground truth bounding box and which doesn’t. for this we will use a intersection over union concept.
  • And finally save the supportive regions proposals that sufficiently overlap with the ground truth bounding boxes to raccoon folder and that which doesn't overlap with ground truth bounding box will go to no raccoon folder

lets start with configuration files which holds the code for the values of constants like parameters for input spatial dimensions minimum probability, Base paths dataset directory paths etc.

before start coding, Please refer to the comments in the coding for better understanding.

# import the necessary packages
import os

# define the base path to the *original* input dataset and then use
# the base path to derive the image and annotations directories
ORIG_BASE_PATH = "raccoons"
ORIG_IMAGES = os.path.sep.join([ORIG_BASE_PATH, "images"])
ORIG_ANNOTS = os.path.sep.join([ORIG_BASE_PATH, "annotations"])

# define the base path to the *new* dataset after running our dataset
# builder scripts and then use the base path to derive the paths to
# our output class label directories
BASE_PATH = "dataset"
POSITVE_PATH = os.path.sep.join([BASE_PATH, "raccoon"])
NEGATIVE_PATH = os.path.sep.join([BASE_PATH, "no_raccoon"])

# define the number of max proposals used when running selective
# search for (1) gathering training data and (2) performing inference
MAX_PROPOSALS = 2000
MAX_PROPOSALS_INFER = 200

# define the maximum number of positive and negative images to be
# generated from each image
MAX_POSITIVE = 30
MAX_NEGATIVE = 10

# initialize the input dimensions to the network
INPUT_DIMS = (224, 224)

# define the path to the output model and label binarizer
MODEL_PATH = "raccoon_detector.h5"
ENCODER_PATH = "label_encoder.pickle"

# define the minimum probability required for a positive prediction
# (used to filter out false-positive predictions)
MIN_PROBA = 0.99

Now lets setup the algorithm that actually determines the proposed regions from selective search with a ground-truth bounding box and could separate the dataset as raccoon and non raccoon. Lets introduced with Intersection over union algorithm. As name suggests, it is just the ratio of the area of overlap to the area of the union between the predicted boundary box and the ground-truth bounding box. The area of union means the area encompassed by both the predicted bounding box and the ground-truth bounding box.

def compute_iou(boxA, boxB):
	# determine the (x, y)-coordinates of the intersection rectangle
	xA = max(boxA[0], boxB[0])
	yA = max(boxA[1], boxB[1])
	xB = min(boxA[2], boxB[2])
	yB = min(boxA[3], boxB[3])

	# compute the area of intersection rectangle
	interArea = max(0, xB - xA + 1) * max(0, yB - yA + 1)

	# compute the area of both the prediction and ground-truth
	# rectangles
	boxAArea = (boxA[2] - boxA[0] + 1) * (boxA[3] - boxA[1] + 1)
	boxBArea = (boxB[2] - boxB[0] + 1) * (boxB[3] - boxB[1] + 1)

	# compute the intersection over union by taking the intersection
	# area and dividing it by the sum of prediction + ground-truth
	# areas - the intersection area
	iou = interArea / float(boxAArea + boxBArea - interArea)

	# return the intersection over union value
	return iou

Now lets implement the actual concept of the dataset-build script as we set all the helper functions up for now. we will follow the wonkflow steps.

                         
# build_dataset.py

# import the necessary packages
from configuration.iou import compute_iou
from configuration import config
from bs4 import BeautifulSoup
from imutils import paths
import cv2
import os

# loop over the output positive and negative directories
for dirPath in (config.POSITVE_PATH, config.NEGATIVE_PATH):
	# if the output directory does not exist yet, create it
	if not os.path.exists(dirPath):
		os.makedirs(dirPath)

# grab all image paths in the input images directory
imagePaths = list(paths.list_images(config.ORIG_IMAGES))

# initialize the total number of positive and negative images we have
# saved to disk so far
totalPositive = 0
totalNegative = 0

# loop over the image paths
for (i, imagePath) in enumerate(imagePaths):
	# show a progress report
	print("[INFO] processing image {}/{}...".format(i + 1,
		len(imagePaths)))

	# extract the filename from the file path and use it to derive
	# the path to the XML annotation file
	filename = imagePath.split(os.path.sep)[-1]
	filename = filename[:filename.rfind(".")]
	annotPath = os.path.sep.join([config.ORIG_ANNOTS,
		"{}.xml".format(filename)])

	# load the annotation file, build the soup, and initialize our
	# list of ground-truth bounding boxes
	contents = open(annotPath).read()
	soup = BeautifulSoup(contents, "html.parser")
	gtBoxes = []

	# extract the image dimensions
	w = int(soup.find("width").string)
	h = int(soup.find("height").string)

	# loop over all 'object' elements
	for o in soup.find_all("object"):
		# extract the label and bounding box coordinates
		label = o.find("name").string
		xMin = int(o.find("xmin").string)
		yMin = int(o.find("ymin").string)
		xMax = int(o.find("xmax").string)
		yMax = int(o.find("ymax").string)

		# truncate any bounding box coordinates that may fall
		# outside the boundaries of the image
		xMin = max(0, xMin)
		yMin = max(0, yMin)
		xMax = min(w, xMax)
		yMax = min(h, yMax)

		# update our list of ground-truth bounding boxes
		gtBoxes.append((xMin, yMin, xMax, yMax))

	# load the input image from disk
	image = cv2.imread(imagePath)

	# run selective search on the image and initialize our list of
	# proposed boxes
	ss = cv2.ximgproc.segmentation.createSelectiveSearchSegmentation()
	ss.setBaseImage(image)
	ss.switchToSelectiveSearchFast()
	rects = ss.process()
	proposedRects= []

	# loop over the rectangles generated by selective search
	for (x, y, w, h) in rects:
		# convert our bounding boxes from (x, y, w, h) to (startX,
		# startY, startX, endY)
		proposedRects.append((x, y, x + w, y + h))

Here the totalPositive and the totalNegative variable will hold the count of raccoon and no raccoon images from the dataset. We loop over the dataset and parse the XML file according to the image present in the dataset. The beautifulsoup library is used to parse XML file and extract the width and height as the image dimension and grab it to the w and h variables. The gtboxes variable is to hold the ground truth bounding boxes.

In gist, Here’s what we have done: 

  • we have <object> elements in the XML file so we loop over those elements to extract label and bounding box coordinates.
  • we ensure bounding box coordinates that do not fall outside bounds of image spatial dimensions by truncating them and update our list of ground truth bounding boxes. 
  • Next we load each image and perform selective search and append the resulting proposed regions to proposedReacts list

By this step so far we have gorund-truth bounding boxes and region proposals generated by selective search. Now it's time to use intersection over union helper function to determine sufficently overlaped regions with ground-truth boxes. From this, the number of region proposals for the image sufficiently supports or overlaps with ground-truth annotations are saved to "raccoon" and those regions with IOU value less than 70% threshhold value will saved to "no raccoon" folder. we will loop over region proposals up to MAX_PROPOSALS. MAX_PROPOSALS is the maximum defined proposal count.

positiveROIs = 0
	negativeROIs = 0

	# loop over the maximum number of region proposals
	for proposedRect in proposedRects[:config.MAX_PROPOSALS]:
		# unpack the proposed rectangle bounding box
		(propStartX, propStartY, propEndX, propEndY) = proposedRect

		# loop over the ground-truth bounding boxes
		for gtBox in gtBoxes:
			# compute the intersection over union between the two
			# boxes and unpack the ground-truth bounding box
			iou = compute_iou(gtBox, proposedRect)
			(gtStartX, gtStartY, gtEndX, gtEndY) = gtBox

			# initialize the ROI and output path
			roi = None
			outputPath = None

			# check to see if the IOU is greater than 70% *and* that
			# we have not hit our positive count limit
			if iou > 0.7 and positiveROIs <= config.MAX_POSITIVE:
				# extract the ROI and then derive the output path to
				# the positive instance
				roi = image[propStartY:propEndY, propStartX:propEndX]
				filename = "{}.png".format(totalPositive)
				outputPath = os.path.sep.join([config.POSITVE_PATH,
					filename])

				# increment the positive counters
				positiveROIs += 1
				totalPositive += 1

			# determine if the proposed bounding box falls *within*
			# the ground-truth bounding box
			fullOverlap = propStartX >= gtStartX
			fullOverlap = fullOverlap and propStartY >= gtStartY
			fullOverlap = fullOverlap and propEndX <= gtEndX
			fullOverlap = fullOverlap and propEndY <= gtEndY

			# check to see if there is not full overlap *and* the IoU
			# is less than 5% *and* we have not hit our negative
			# count limit
			if not fullOverlap and iou < 0.05 and \
				negativeROIs <= config.MAX_NEGATIVE:
				# extract the ROI and then derive the output path to
				# the negative instance
				roi = image[propStartY:propEndY, propStartX:propEndX]
				filename = "{}.png".format(totalNegative)
				outputPath = os.path.sep.join([config.NEGATIVE_PATH,
					filename])

				# increment the negative counters
				negativeROIs += 1
				totalNegative += 1

Noticed that fullOverlap condition?? well that for, if the region proposal bounding box fall entirely within the ground-truth bounding box, that's the condition of full overlap and if the region is not fulloverlap and the value of IoU is sufficiently small that of threshold value.

Comments

Popular posts from this blog

Simple Face Recognition Project using OpenCV python Deep Learning

Okay, not from completely scratch though, in this Article you are going to learn to build a simple face detection and recognition console based application using Opencv python and Deeplearning Before Starting: If you don't have enough time to read the whole article or you are too lazy to read articles  Scroll all the way down and there is source code at last heading Resources. If you really love to learn step by step, there are lots of comments inside the code. I highly recommand you to read and go through it And at last, Don't panic :D Lets start: Installing Libraries: dlib (by davis king) Face_recognition (by adam geitgey ) (wraps around dlib’s facial recognition functionality making it to work easily with dlib) we also need to install imutils. Actually imutils is used to make basic image processing functions such as translation, rotation, resizing, skeletonization, and displaying Matplotlib images easier with OpenCV but we will be using it to maintain di...

Image Compression and Color Quantization using K-Means Clustering

In this post, you'll able to compress an image of higher size relatively to a smaller size. Here size I mean the image's memory consumption, not the aspect ratio (though it is also somewhat related to the size). Before we begin, let's be familiar with what Image Compression, Color Quantization and K-Means Clustering is. Basically  K-Means Clustering  is used to find the central value (centroid) for k  clusters of data. Then each data point is assigned to the cluster whose center is nearest to k . Then, a new centroid is calculated for each of the k  clusters based upon the data points that are assigned in that cluster. In our case, the data points will be Image pixels. Assuming that you know what pixels are, these pixels actually comprises of 3 channels, Red, Green and Blue . Each of these channels' have intensity ranging from 0 to 255, i.e., altogether 256. So as a whole, total number of colors in each pixel is, 256 x 256 x 256.  Each pixel(color) has 2^...

Happiness Detection in Images using Keras

Hello folks! Are you happy or are you not sure? Alright, let's build a model that will help you find out if you're happy or not.  Well, let's start with some basic understanding of this tutorial and later dive deeper into the neural networks. We're very well known what  popular Computer Vision is. It is one of the most popular field of machine learning. Happiness Detection is also one of such field where we apply Computer Vision techniques. This is a binary classification type of problem where we'll building a model that will detect whether the input image is either smiling or not.   The dataset is already labeled as smiling or not smiling. We'll be using 600 images for training and 150 images as test dataset. Before we get our hands into the core part, let's first import some libraries. Now let's know more about the data.  After the execution, you'll be able to look at the number of data we've taken for training and testing the prepared model. N...