Region Based Convolutional Neural Networks: How you can achieve your own object detector with transfer learning.
In this article we will try to cover the R-CNN object detector implementation and how transfer learning could h elp in to classify object.
We are going to build a raccoon detector using R-CNN by classifying a raccoon and no raccoon from the raccoon dataset by dat train using transfer learning with fine tuning approach (another approach is feature extraction). The reason I've choose the raccoon dataset is that it has a nice annotation files for every faces of raccoon in the images and many more research has gone through this dataset for object detection.
Before Starting:
- This article is going to be a bit long so I highly recommend you to s3et yourself free to read it and actually implement it.
- In this tutorial we will be using selective search algorithm to find ROIs (Region of Interest), boundaries and also be using transfer learning. To read and learn these I recommend the series of objection detection blog and theany line you didn’t get tutorials from pyimagesearch by adrin rpseberg as I’m a big fan of him.
any line you didn’t getany line you didn’t get
Work flow of this project:
Let’s set a vision first and the working paradigm.
What we actually want from this project is that, running a python python script with a input image of a raccoon and eventually get the output image with a box on raccoon’s face.
The first step will be data prepossessing i.e. making dataset straight forward. Because we will be fine tuning a MobileNet V2 CNN pre-trained model on 1000 class ImageNet Dataset, we want our dataset to fit on this model while transfer learning. We want our dataset to be two class, positive (images that contains raccoon) and negative (images that doesn’t raccoon) so that the transfer learned model would classify raccoon and no raccoon for object detection.
Second step is to accept an input image and run selective search algorithm to the image to get the region proposals.
3rd step is to make predictions on each proposals from selective search using classification model and return will be detected objection
1st task:
From our project’s work flow we have to build a perfect dataset so that the dataset would fit the classification model exactly.
since we will be using mobileNEt V2 CNN, our classifier needs a raccoon directory where all the images of interested objects are located i.e. faces of the raccoon and a directory ‘no raccoon’ where all of the objects that are not raccoon are located.
Let’s create a mindset for our 1st task
- First we loop over all the images in the raccoon dataset and accept the image input one by one so that we could parse and find the bounding box coordinates for any raccoon present in the image.
- Second we use selective search algorithm to the image to list out the bounding boxes. Using annotation files to examine which region proposals proposed by selective search sufficiently support the ground truth bounding box and which doesn’t. for this we will use a intersection over union concept.
-
And finally save the supportive regions proposals that sufficiently overlap with the ground truth bounding boxes to raccoon folder and that which doesn't overlap with ground truth bounding box will go to no raccoon folder
lets start with configuration files which holds the code for the
values of constants like parameters for input spatial dimensions
minimum probability, Base paths dataset directory paths etc.
before start coding, Please refer to the comments in the coding for better understanding.
# import the necessary packages import os # define the base path to the *original* input dataset and then use # the base path to derive the image and annotations directories ORIG_BASE_PATH = "raccoons" ORIG_IMAGES = os.path.sep.join([ORIG_BASE_PATH, "images"]) ORIG_ANNOTS = os.path.sep.join([ORIG_BASE_PATH, "annotations"]) # define the base path to the *new* dataset after running our dataset # builder scripts and then use the base path to derive the paths to # our output class label directories BASE_PATH = "dataset" POSITVE_PATH = os.path.sep.join([BASE_PATH, "raccoon"]) NEGATIVE_PATH = os.path.sep.join([BASE_PATH, "no_raccoon"]) # define the number of max proposals used when running selective # search for (1) gathering training data and (2) performing inference MAX_PROPOSALS = 2000 MAX_PROPOSALS_INFER = 200 # define the maximum number of positive and negative images to be # generated from each image MAX_POSITIVE = 30 MAX_NEGATIVE = 10 # initialize the input dimensions to the network INPUT_DIMS = (224, 224) # define the path to the output model and label binarizer MODEL_PATH = "raccoon_detector.h5" ENCODER_PATH = "label_encoder.pickle" # define the minimum probability required for a positive prediction # (used to filter out false-positive predictions) MIN_PROBA = 0.99
Now lets setup the algorithm that actually determines the proposed regions from selective search with a ground-truth bounding box and could separate the dataset as raccoon and non raccoon. Lets introduced with Intersection over union algorithm. As name suggests, it is just the ratio of the area of overlap to the area of the union between the predicted boundary box and the ground-truth bounding box. The area of union means the area encompassed by both the predicted bounding box and the ground-truth bounding box.
def compute_iou(boxA, boxB): # determine the (x, y)-coordinates of the intersection rectangle xA = max(boxA[0], boxB[0]) yA = max(boxA[1], boxB[1]) xB = min(boxA[2], boxB[2]) yB = min(boxA[3], boxB[3]) # compute the area of intersection rectangle interArea = max(0, xB - xA + 1) * max(0, yB - yA + 1) # compute the area of both the prediction and ground-truth # rectangles boxAArea = (boxA[2] - boxA[0] + 1) * (boxA[3] - boxA[1] + 1) boxBArea = (boxB[2] - boxB[0] + 1) * (boxB[3] - boxB[1] + 1) # compute the intersection over union by taking the intersection # area and dividing it by the sum of prediction + ground-truth # areas - the intersection area iou = interArea / float(boxAArea + boxBArea - interArea) # return the intersection over union value return iou
Now lets implement the actual concept of the dataset-build script as we set all the helper functions up for now. we will follow the wonkflow steps.
# build_dataset.py # import the necessary packages from configuration.iou import compute_iou from configuration import config from bs4 import BeautifulSoup from imutils import paths import cv2 import os # loop over the output positive and negative directories for dirPath in (config.POSITVE_PATH, config.NEGATIVE_PATH): # if the output directory does not exist yet, create it if not os.path.exists(dirPath): os.makedirs(dirPath) # grab all image paths in the input images directory imagePaths = list(paths.list_images(config.ORIG_IMAGES)) # initialize the total number of positive and negative images we have # saved to disk so far totalPositive = 0 totalNegative = 0 # loop over the image paths for (i, imagePath) in enumerate(imagePaths): # show a progress report print("[INFO] processing image {}/{}...".format(i + 1, len(imagePaths))) # extract the filename from the file path and use it to derive # the path to the XML annotation file filename = imagePath.split(os.path.sep)[-1] filename = filename[:filename.rfind(".")] annotPath = os.path.sep.join([config.ORIG_ANNOTS, "{}.xml".format(filename)]) # load the annotation file, build the soup, and initialize our # list of ground-truth bounding boxes contents = open(annotPath).read() soup = BeautifulSoup(contents, "html.parser") gtBoxes = [] # extract the image dimensions w = int(soup.find("width").string) h = int(soup.find("height").string) # loop over all 'object' elements for o in soup.find_all("object"): # extract the label and bounding box coordinates label = o.find("name").string xMin = int(o.find("xmin").string) yMin = int(o.find("ymin").string) xMax = int(o.find("xmax").string) yMax = int(o.find("ymax").string) # truncate any bounding box coordinates that may fall # outside the boundaries of the image xMin = max(0, xMin) yMin = max(0, yMin) xMax = min(w, xMax) yMax = min(h, yMax) # update our list of ground-truth bounding boxes gtBoxes.append((xMin, yMin, xMax, yMax)) # load the input image from disk image = cv2.imread(imagePath) # run selective search on the image and initialize our list of # proposed boxes ss = cv2.ximgproc.segmentation.createSelectiveSearchSegmentation() ss.setBaseImage(image) ss.switchToSelectiveSearchFast() rects = ss.process() proposedRects= [] # loop over the rectangles generated by selective search for (x, y, w, h) in rects: # convert our bounding boxes from (x, y, w, h) to (startX, # startY, startX, endY) proposedRects.append((x, y, x + w, y + h))
Here the totalPositive and the totalNegative variable will hold the count of raccoon and no raccoon images from the dataset. We loop over the dataset and parse the XML file according to the image present in the dataset. The beautifulsoup library is used to parse XML file and extract the width and height as the image dimension and grab it to the w and h variables. The gtboxes variable is to hold the ground truth bounding boxes.
In gist, Here’s what we have done:
- we have <object> elements in the XML file so we loop over those elements to extract label and bounding box coordinates.
- we ensure bounding box coordinates that do not fall outside bounds of image spatial dimensions by truncating them and update our list of ground truth bounding boxes.
- Next we load each image and perform selective search and append the resulting proposed regions to proposedReacts list
By this step so far we have gorund-truth bounding boxes and region proposals generated by selective search. Now it's time to use intersection over union helper function to determine sufficently overlaped regions with ground-truth boxes. From this, the number of region proposals for the image sufficiently supports or overlaps with ground-truth annotations are saved to "raccoon" and those regions with IOU value less than 70% threshhold value will saved to "no raccoon" folder. we will loop over region proposals up to MAX_PROPOSALS. MAX_PROPOSALS is the maximum defined proposal count.
positiveROIs = 0 negativeROIs = 0 # loop over the maximum number of region proposals for proposedRect in proposedRects[:config.MAX_PROPOSALS]: # unpack the proposed rectangle bounding box (propStartX, propStartY, propEndX, propEndY) = proposedRect # loop over the ground-truth bounding boxes for gtBox in gtBoxes: # compute the intersection over union between the two # boxes and unpack the ground-truth bounding box iou = compute_iou(gtBox, proposedRect) (gtStartX, gtStartY, gtEndX, gtEndY) = gtBox # initialize the ROI and output path roi = None outputPath = None # check to see if the IOU is greater than 70% *and* that # we have not hit our positive count limit if iou > 0.7 and positiveROIs <= config.MAX_POSITIVE: # extract the ROI and then derive the output path to # the positive instance roi = image[propStartY:propEndY, propStartX:propEndX] filename = "{}.png".format(totalPositive) outputPath = os.path.sep.join([config.POSITVE_PATH, filename]) # increment the positive counters positiveROIs += 1 totalPositive += 1 # determine if the proposed bounding box falls *within* # the ground-truth bounding box fullOverlap = propStartX >= gtStartX fullOverlap = fullOverlap and propStartY >= gtStartY fullOverlap = fullOverlap and propEndX <= gtEndX fullOverlap = fullOverlap and propEndY <= gtEndY # check to see if there is not full overlap *and* the IoU # is less than 5% *and* we have not hit our negative # count limit if not fullOverlap and iou < 0.05 and \ negativeROIs <= config.MAX_NEGATIVE: # extract the ROI and then derive the output path to # the negative instance roi = image[propStartY:propEndY, propStartX:propEndX] filename = "{}.png".format(totalNegative) outputPath = os.path.sep.join([config.NEGATIVE_PATH, filename]) # increment the negative counters negativeROIs += 1 totalNegative += 1
Noticed that fullOverlap condition?? well that for, if the region proposal bounding box fall entirely within the ground-truth bounding box, that's the condition of full overlap and if the region is not fulloverlap and the value of IoU is sufficiently small that of threshold value.
Comments
Post a Comment