Object Detection
- Shin Yoonah, Yoonah
- 2022년 8월 17일
- 2분 분량
최종 수정일: 2022년 8월 25일

Outline
- Sliding Windows
- Bounding Box
- Bounding Box Pipeline
- Score
Image classification predicts the class of an object in an image
Classification and object localization
--> Locate the presence of an object and indicate the location with a bounding box and their classes
Sliding Window
= algorithm

- If we want to detect a dog, we consider a fixed window size
- If chosen property, the dog will occupy most of the window
Essentially a sub image that we would like to classify as a dog
The other sub images - classified as background
(Image that does not contain the dog)
*Process
Start in one region in the image, classify that sub-image
Then shift the window and classify the next sub-image
Repeat the process -- when the object occupies with of the window, it will be classified
Problems of Sliding Windows
Overlapping Boxes: object detects often output many overlapping detections
Object Sizes: have the issue of object sizes, where the same object can come in different sizes/Solution: reshaping the image
Overlapping Objects: this may pose issues to the sliding windows
Bounding Box
Bounding box = a rectangular box that can be determined with the lower-right corner of the rectangle with coordinates y=0 and x=0
Y and X are not the same as the classification labels y and the image x

Upper--left corner = (Ymin, Xmin)
Lower--right corner = (Xmax, Ymax)
They are just to illustrate the coordinates of the Bounding Box
--> The goal of object detection is to predict these points, so we add a "hat" to indicate it's prediction
Bounding Box Pipeline
Like classification, we have the class y and x
- we have a dataset of classes and their bounding boxes
Similar to classification, we use the dataset to train the model; we include the box coordinates
--> result: object detector with updated learning parameters
Input the image with the objects we would like to detect
We have the predicted class and the box coordinates
Score
- Many object detection algorithms provide a score letting you know how confident the model prediction is
Each column in the table has an image and it's prediction

The first row: the score ranging from 0 to 1
The second row: the class
The third row: the image and its bounding box
For the first row, we see the prediction is dog
--> but the image does not look like a dog
As a result, the score is 0.99
= the model is confident about its prediction
For each detection, a score is provided, we can adjust so we only accept detections above a specific score
*Usually models will only output objects over a specific threshold*
Copyright Coursera All rights reserved
Bình luận