I didnot get how to calculate accuracy over test data?

vineetchanana · September 16, 2020, 11:47am

I didnot get how to calculate accuracy over test data ?

prashant_ml · September 16, 2020, 4:12pm

hey @vineetchanana ,
once you have got your centroids for each cluster , then for a query_X , calculate the distance from each centroid , get the respective cluster point which is closest to the query point.
Do this for each point , and submit a list of such and check your score on the website.

i hope this helped you .

vineetchanana · September 17, 2020, 11:42am

Sorry, but in the KNN lecture we didn’t talk about finding clusters. I think you have mistaken this doubt for K-Means clustering.

prashant_ml · September 17, 2020, 3:20pm

oh sorry , my bad.
for validation part ,
in KNN , when you have got your predicted classes for each query point in validation set ,just compare those predictions with actual values , and get the
acc = (number of records that matches) / total number of records
acc = acc*100

Now this acc , is your accuracy over validation data.
Similarly predict for test set , create a csv file as same as provided in sample submission and submit on the website to check what score you achieved.

I hope this could be helpful for you
Thank You

vineetchanana · September 17, 2020, 8:29pm

What is validation set?

prashant_ml · September 18, 2020, 3:29am

While working on a machine learning task, how do you know before hand that how the model is going to perform on unseen data, that model is not overfitted?

To answer that, we take a small fraction of our data out as a validation set and use the other data for training.
Like we can split our data set with 80:20 ratio and training and validation data.
We train our model only on training data, and check it’s results on validation set.
When we get our best results , we use that model to predict on test data.

I hope this helped you understand it.