Feature Extraction with KNN (davpinto.github.io)

0 points 21 hours ago ago | visit original

🤖 AI Summary

fastknn’s knnExtract function implements KNN-based feature extraction by computing, for each class, the distances from every observation to its k nearest neighbors and concatenating those distances into k * c new features (k = neighbors, c = number of classes). To avoid training set leakage it generates training-side features via n‑fold cross‑validation, and extraction is parallelizable via the nthread parameter. The package also includes visualization helpers (knnDecision) to inspect decision boundaries before and after transformation. This approach produces a nonlinear mapping of the original input space into a new feature space where classes become more linearly separable, enabling simple linear learners (e.g., GLM) to achieve much higher accuracy. The authors demonstrate this on the Ionosphere data: a GLM on original features scored 83.8% accuracy, while the GLM on KNN features reached 95.2%. Additional visual examples on chess and spiral datasets show how KNN features straighten complex boundaries. Practical implications: knnExtract is a lightweight, model-agnostic representation technique useful for multiclass problems and when a linear classifier is preferred, but users should weigh increased dimensionality (k * c) and KNN computation costs for large datasets or many classes.

Loading comments...

loading comments...