Assignment: K Nearest Neighbor classification

Download this crate for the seventh homework assignment.

Introduction

In this assignment, you will implement a K Nearest Neighbor (KNN) classifier. The main principle behind the KNN classifier is that an unseen data point should get the class of its nearest neighbor(s) as measured using some distance metric.

One Nearest Neighbor classification

Training/classification in a 1 Nearest Neighbor classifier is conducted as follows:

In this assignment, use Euclidean distance as the distance metric:

Nearest Neighbor classification

K Nearest Neighbor (KNN) classification is a generalization of 1 Nearest Neighbor Classification. In KNN classification, we consider the nearest neighbors of the data point to be classified. is then assigned the most frequent class among the nearest neighbors.1

The assignment

Implement a KNN classifier. The classifier should be a command-line program that:

For example:

$ target/release/knn -k 3 moons-train.txt moons-test.txt
Accuracy: 96.6

Of course, it is expected that the problem is split in logical data structures, methods, and/or classes.

What is provided?

Data

Data for two different classification problems are provided. The first clasification problem is the moons problem, which has two entwined half-moons, where each half-moon constitutes a class.

Moons data set

The second classification problem is the circles problem, which has a circle of one class enclosed by a circle of another class, where each circle constitutes a class.

Circles data set

Code

The crate that you can download contains some code besides the aforementioned data, namely:

Submission

The tar.gz or zip file should be uploaded through this page. The deadline is July 5 at 14:00.

  1. If there is a tie between multiple classes, it is fine to just select one of the classes for this assignment.