top of page

Image Classification with KNN

  • 작성자 사진: Shin Yoonah, Yoonah
    Shin Yoonah, Yoonah
  • 2022년 7월 26일
  • 2분 분량

최종 수정일: 2022년 8월 8일


What is KNN?

  1. a classifier and is short for K- nearest neighbor

  2. one of the simplest classification algorithms

  3. uses the most common and nearest classes to find closes match

Let's see how it works!!


Then concatenate the three channel images to a vector



This time, we will use two dimensions

Use two dimensional space to represent the images as vectors

Blue and Red = classes


Calculate the Euclidean distance between the two images as i.e. the length of the vector connecting the two points

Euclidean: d'(X1, X2) = ||X1 - X2||


Insert unknown sample = to classify whether it's cat or dog

Then, calculate the distance between each image


Image in a table

Each row: sample

second column: class

final column: distance from each sample to the unknown point


Use the distance to predict the label of the unknown class

This sign called y hat

The hat indicates estimating the distance


Calculate the distance from our unknown sample, then we find the nearest point or nearest neighbor

*Model = assign the label to the unknown sample*



How do we know if the model works?

  1. Separating data into training and testing sets is an important part of model evaluation

  2. Use test data to get an idea how our model will perform in the real world


Training/Testing set


Split a data set into: training set (70%) , testing set (30%)

- Build the model with training set

- Use training set to assess the performance

- When we have completed testing our model we should use all the data


Accuracy

What is the accuracy of a classifier?

The number of samples that have been predicted correctly divided by the total number of samples


First row: denotes different sample numbers

Second row: the actual class label

Third row: predicted value

Final row: will be one if the sample is predicted correctly or else it will be zero


Count the number of times

if the prediction is correct, then take the average to get the accuracy


K- Closest Samples

- performs majority values; where within the subset

- find the class with the most number of samples and assign that label to the unknown sample

Let's select K=3

- select the K nearest samples

- make red samples, the new label is assigned to the red class


Then, how to select K?

Use a subset called the validation data to determine the best K

=> called a hyper-parameter


To select the hyper-parameter, we split our data set into three parts; the training set, validation set, test set


Training set: use it for different hyper-parameters; use the accuracy for K on the validation data

Select the hyper-parameter K, that maximizes accuracy on the validation set

Test data: see how the model will perform on the real world


Combining the validation set and test set to make things simpler

Calculate accuracy for K=1 and K=3

"K=3" has better accuracy


Once you chose the value of K, you can use KNN to classify an image

-> find the nearest neighbors and output the class as a string


*KNN can't deal with many of the challenges of Image classification*


Copyright Coursera All rights reserved

Comments


bottom of page