K-NN Algorithm (Training and Testing Data

what is the use of coverting matrix into training and testing data??

Hey @Roopa1i_ma1hotra

I’ll assume that you’re asking about why splitting into train and test set is important.

The answer is to avoid bias when evaluating our model. For example, if you train a very complex model on limited data, there is a probability that our model just learns the outputs of the training set without understanding any underlying pattern. In that case, when evaluating the model on the training data, it will give excellent results (because it simply recalled those examples along with their noise). This type of model, when presented with new unseen data won’t be able to give any reasonable output because it failed to learn from the previous data.

To make sure these type of things doesn’t happen in our models (Over–fitting and such), we always keep a separate set of data just for evaluation (The Test and Validation Set).

Hope this helps!

can you please enlighten me about test set specifically? how is it helping?

It is there to detect Over-Fitting. Your model has never seen the test data before. If the model performs well on the test data, then we can assume that our model will work good on new unseen data and is actually learning from the training data and not just remembering everything.

1 Like