K-NN Algorithm (Training and Testing Data

Roopa1i_ma1hotra · July 13, 2020, 1:12pm

what is the use of coverting matrix into training and testing data??

CrazyRabbit · July 13, 2020, 1:22pm

I’ll assume that you’re asking about why splitting into train and test set is important.

The answer is to avoid bias when evaluating our model. For example, if you train a very complex model on limited data, there is a probability that our model just learns the outputs of the training set without understanding any underlying pattern. In that case, when evaluating the model on the training data, it will give excellent results (because it simply recalled those examples along with their noise). This type of model, when presented with new unseen data won’t be able to give any reasonable output because it failed to learn from the previous data.

To make sure these type of things doesn’t happen in our models (Over–fitting and such), we always keep a separate set of data just for evaluation (The Test and Validation Set).

Hope this helps!

Roopa1i_ma1hotra · July 13, 2020, 3:58pm

can you please enlighten me about test set specifically? how is it helping?

CrazyRabbit · July 13, 2020, 4:02pm

It is there to detect Over-Fitting. Your model has never seen the test data before. If the model performs well on the test data, then we can assume that our model will work good on new unseen data and is actually learning from the training data and not just remembering everything.