# Assignment: K Nearest Neighbor classification

Download this crate for the seventh homework assignment.

## Introduction

In this assignment, you will implement a K Nearest Neighbor (KNN) classifier. The main principle behind the KNN classifier is that an unseen data point should get the class of its nearest neighbor(s) as measured using some distance metric.

### One Nearest Neighbor classification

Training/classification in a 1 Nearest Neighbor classifier is conducted as follows:

• Training: memorize all training instances $\mathbf{X}$ and the corresponding labels $\mathbf{Y}$.
• Classification: to classify a new data point $\mathbf{x}'$:
1. Find $\mathop{\mathrm{argmin}}_{\mathbf{x}^i \in \mathbf{X}} d(\mathbf{x}^i,\mathbf{x}')$ — that is the training instance $\mathbf{x}^i$ with the smallest distance to $\mathbf{x}'$.
2. Return $y_i$ as the class of $\mathbf{x}'$

In this assignment, use Euclidean distance as the distance metric:

### $k>1$ Nearest Neighbor classification

K Nearest Neighbor (KNN) classification is a generalization of 1 Nearest Neighbor Classification. In KNN classification, we consider the $k$ nearest neighbors of the data point $\mathbf{x}'$ to be classified. $\mathbf{x}'$ is then assigned the most frequent class among the nearest neighbors.1

## The assignment

Implement a KNN classifier. The classifier should be a command-line program that:

• Takes a training set and a test set as arguments.
• Takes an option k for specifying the value of $k$.
• Trains on the training set by storing the training instances $\mathbf{X}$ in an ndarray Array2. It’s fine if you store the labels in a Vec.
• Classifies the data points of the test set using KNN classification.
• Report accuracy of the classifier on the test set.

For example:

\$ target/release/knn -k 3 moons-train.txt moons-test.txt
Accuracy: 96.6


Of course, it is expected that the problem is split in logical data structures, methods, and/or classes.

## What is provided?

### Data

Data for two different classification problems are provided. The first clasification problem is the moons problem, which has two entwined half-moons, where each half-moon constitutes a class. • Training data: moons-train.txt.
• Test data: moons-test.txt.

The second classification problem is the circles problem, which has a circle of one class enclosed by a circle of another class, where each circle constitutes a class. ### Code

The crate that you can download contains some code besides the aforementioned data, namely:

• A main function that does the argument handling and opens the relevant files.
• The InstanceIter type for iterating over training/test instances. This is demonstrated in the current main function - it simply iterates over the training and test data and prints every instance.
• A method to compute the Euclidean distances between the rows of a matrix and a vector, taken from the previous assignment.

## Submission

• Submit your updated project, archived as a tar.gz or zip file.
• Include the first names of the team members in the archive name.
• Add the header from the policy page to every source file that you modify or create.
• Run cargo clean before archiving the project.
• The assignment is a bit of a step up compared to previous assignments. Feel free to e-mail me questions, but do so timely and include your code where necessary.

The tar.gz or zip file should be uploaded through this page. The deadline is July 5 at 14:00.

1. If there is a tie between multiple classes, it is fine to just select one of the classes for this assignment.