Time Complexity with different distance metrics in kNN

While executing k-NN on a large dataset containing around 1,68,000 records using the inbuilt function available in Python for prediction of continuous values, Euclidean distance takes 2 mins whereas if I use wminkowski distance as a distance metric for computing distances between two instances, k-NN is executed in around 52 minutes. Out of 1,68,000 records, 70% were used for training and 30% were used for testing.

Can anyone please explain the reason behind such a vast time difference? Are there any means by which time complexity of wminkowski can be used?

Hi @ankita23,

First thing, I would not suggest you use kNN on such a large dataset.
Since training time in knn is constant and all the processing happens at testing time, That’s why knn takes a lot of time for prediction.

Still, if you are doing it for experimental purpose. You need to understand the computation it required while calculating the distances. wminkowski is a form of weighted Minkowski, If your p value is large then obviously it will take a lot of time to do computation.

One more factor - wminkowski can be using in ball_tree or brute force approach, these 2 are approaches for implementing kNN algorithm.

There is one more implementation called kd_tree and i know that is quite fast as compared to brute and kd_tree support euclidean distance but not wminkowski. That could be the reason euclidean ran so fast, and the other one taking too long.

Generally knn is very slow that’s why it is not preferred anywhere.

Hope this gave you some idea on both the distance metrics.

Thanks
- Mohit