Image Classification with KNN
- Shin Yoonah, Yoonah
- 2022년 7월 26일
- 2분 분량
최종 수정일: 2022년 8월 8일

What is KNN?
a classifier and is short for K- nearest neighbor
one of the simplest classification algorithms
uses the most common and nearest classes to find closes match
Let's see how it works!!

Then concatenate the three channel images to a vector

This time, we will use two dimensions
Use two dimensional space to represent the images as vectors

Blue and Red = classes

Calculate the Euclidean distance between the two images as i.e. the length of the vector connecting the two points
Euclidean: d'(X1, X2) = ||X1 - X2||
Insert unknown sample = to classify whether it's cat or dog

Then, calculate the distance between each image
Image in a table

Each row: sample
second column: class
final column: distance from each sample to the unknown point
Use the distance to predict the label of the unknown class

This sign called y hat
The hat indicates estimating the distance
Calculate the distance from our unknown sample, then we find the nearest point or nearest neighbor
*Model = assign the label to the unknown sample*
How do we know if the model works?
Separating data into training and testing sets is an important part of model evaluation
Use test data to get an idea how our model will perform in the real world
Training/Testing set
Split a data set into: training set (70%) , testing set (30%)
- Build the model with training set
- Use training set to assess the performance
- When we have completed testing our model we should use all the data
Accuracy
What is the accuracy of a classifier?
The number of samples that have been predicted correctly divided by the total number of samples

First row: denotes different sample numbers
Second row: the actual class label
Third row: predicted value
Final row: will be one if the sample is predicted correctly or else it will be zero
Count the number of times
if the prediction is correct, then take the average to get the accuracy
K- Closest Samples
- performs majority values; where within the subset
- find the class with the most number of samples and assign that label to the unknown sample

Let's select K=3
- select the K nearest samples
- make red samples, the new label is assigned to the red class
Then, how to select K?
Use a subset called the validation data to determine the best K
=> called a hyper-parameter
To select the hyper-parameter, we split our data set into three parts; the training set, validation set, test set

Training set: use it for different hyper-parameters; use the accuracy for K on the validation data
Select the hyper-parameter K, that maximizes accuracy on the validation set
Test data: see how the model will perform on the real world
Combining the validation set and test set to make things simpler

Calculate accuracy for K=1 and K=3
"K=3" has better accuracy
Once you chose the value of K, you can use KNN to classify an image
-> find the nearest neighbors and output the class as a string
*KNN can't deal with many of the challenges of Image classification*
Copyright Coursera All rights reserved
Comments