Image Classification with KNN

Shin Yoonah, Yoonah
2022년 7월 26일
2분 분량

최종 수정일: 2022년 8월 8일

What is KNN?

a classifier and is short for K- nearest neighbor
one of the simplest classification algorithms
uses the most common and nearest classes to find closes match

Let's see how it works!!

Then concatenate the three channel images to a vector

This time, we will use two dimensions

Use two dimensional space to represent the images as vectors

Blue and Red = classes

Calculate the Euclidean distance between the two images as i.e. the length of the vector connecting the two points

Euclidean: d'(X1, X2) = ||X1 - X2||

Insert unknown sample = to classify whether it's cat or dog

Then, calculate the distance between each image

Image in a table

Each row: sample

second column: class

final column: distance from each sample to the unknown point

Use the distance to predict the label of the unknown class

This sign called y hat

The hat indicates estimating the distance

Calculate the distance from our unknown sample, then we find the nearest point or nearest neighbor

*Model = assign the label to the unknown sample*

How do we know if the model works?

Separating data into training and testing sets is an important part of model evaluation
Use test data to get an idea how our model will perform in the real world

Training/Testing set

Split a data set into: training set (70%) , testing set (30%)

- Build the model with training set

- Use training set to assess the performance

- When we have completed testing our model we should use all the data

Accuracy

What is the accuracy of a classifier?

The number of samples that have been predicted correctly divided by the total number of samples

First row: denotes different sample numbers

Second row: the actual class label

Third row: predicted value

Final row: will be one if the sample is predicted correctly or else it will be zero

Count the number of times

if the prediction is correct, then take the average to get the accuracy

K- Closest Samples

- performs majority values; where within the subset

- find the class with the most number of samples and assign that label to the unknown sample

Let's select K=3

- select the K nearest samples

- make red samples, the new label is assigned to the red class

Then, how to select K?

Use a subset called the validation data to determine the best K

=> called a hyper-parameter

To select the hyper-parameter, we split our data set into three parts; the training set, validation set, test set

Training set: use it for different hyper-parameters; use the accuracy for K on the validation data

Select the hyper-parameter K, that maximizes accuracy on the validation set

Test data: see how the model will perform on the real world

Combining the validation set and test set to make things simpler

Calculate accuracy for K=1 and K=3

"K=3" has better accuracy

Once you chose the value of K, you can use KNN to classify an image

-> find the nearest neighbors and output the class as a string

*KNN can't deal with many of the challenges of Image classification*

Image Classification with KNN

최근 게시물

댓글

I'd love to hear from you.