Assignment: K Nearest Neighbor classification
Download this crate for the seventh homework assignment.
In this assignment, you will implement a K Nearest Neighbor (KNN) classifier. The main principle behind the KNN classifier is that an unseen data point should get the class of its nearest neighbor(s) as measured using some distance metric.
One Nearest Neighbor classification
Training/classification in a 1 Nearest Neighbor classifier is conducted as follows:
- Training: memorize all training instances and the corresponding labels .
- Classification: to classify a new data point :
- Find — that is the training instance with the smallest distance to .
- Return as the class of
In this assignment, use Euclidean distance as the distance metric:
Nearest Neighbor classification
K Nearest Neighbor (KNN) classification is a generalization of 1 Nearest Neighbor Classification. In KNN classification, we consider the nearest neighbors of the data point to be classified. is then assigned the most frequent class among the nearest neighbors.1
Implement a KNN classifier. The classifier should be a command-line program that:
- Takes a training set and a test set as arguments.
- Takes an option
kfor specifying the value of .
- Trains on the training set by storing the training instances in an ndarray
Array2. It’s fine if you store the labels in a
- Classifies the data points of the test set using KNN classification.
- Report accuracy of the classifier on the test set.
$ target/release/knn -k 3 moons-train.txt moons-test.txt Accuracy: 96.6
Of course, it is expected that the problem is split in logical data structures, methods, and/or classes.
What is provided?
Data for two different classification problems are provided. The first clasification problem is the moons problem, which has two entwined half-moons, where each half-moon constitutes a class.
- Training data:
- Test data:
The second classification problem is the circles problem, which has a circle of one class enclosed by a circle of another class, where each circle constitutes a class.
The crate that you can download contains some code besides the aforementioned data, namely:
- A main function that does the argument handling and opens the relevant files.
InstanceItertype for iterating over training/test instances. This is demonstrated in the current main function - it simply iterates over the training and test data and prints every instance.
- A method to compute the Euclidean distances between the rows of a matrix and a vector, taken from the previous assignment.
- Submit your updated project, archived as a
- Include the first names of the team members in the archive name.
- Add the header from the policy page to every source file that you modify or create.
cargo cleanbefore archiving the project.
- The assignment is a bit of a step up compared to previous assignments. Feel free to e-mail me questions, but do so timely and include your code where necessary.
zip file should be uploaded through this
page. The deadline is
July 5 at 14:00.
If there is a tie between multiple classes, it is fine to just select one of the classes for this assignment. ↩