Gridsearchcv giving worng results

D17LP0096 · April 16, 2020, 5:25pm

In image classification using SVM project i used gridsearchcv for finding the best parameters but it is giving wrong results like there was some parameter which was giving better result but it did not choose that one

S18CRX0120 · April 17, 2020, 1:34am

Hey @D17LP0096, that should not be the case, share your ipynb by uploading it on google drive and sharing the link here.

D17LP0096 · April 17, 2020, 11:31am

https://drive.google.com/drive/folders/1R46NxAbz9Avaiw0ghC4E_QP4KR37JILw?usp=sharing

S18CRX0120 · April 17, 2020, 1:49pm

Hey @D17LP0096, in Grid search 5 fold cross validation is applied by default which means dataset is divided into 5 parts, trained on first 4 and cross validated on rest 1. But when you applied linear kernel with c = 1.0, you actually calculated the training loss and not, the cross validation score. that is why it is comming more.

So this means you are comparing two wrong things, and your grid search is working perfectly fine.

Hope this resolved your doubt.
Plz mark the doubt as resolved in my doubts section.

D17LP0096 · April 17, 2020, 3:20pm

i am not getting your point i used score function in both of them. and about that i am comparing two differrent things i tried using the best_param given by gridsearch and computed the score same way like i did for linear kernel and its score matches with that given by gs.score()

S18CRX0120 · April 17, 2020, 4:13pm

Hey @D17LP0096, our aim is not to score more on training data, our task is to score more on validation data.
A model having less training error but having more validation error is not a good model.

You are comparing the training error of different models. This should not be the case, we need to compare the testing/validation error. Grid search does that by kfold validation with k value = 5.

First of all go through the concept of k fold validation, if you are not through it.

Also while performing grid search there is randomness, involved so sometimes usually very rare, the result may be slight different from the actual one, but this is sure that this will be close enough.

S18CRX0120 · April 21, 2020, 1:34pm

I hope I’ve cleared your doubt. I ask you to please rate your experience here
Your feedback is very important. It helps us improve our platform and hence provide you
the learning experience you deserve.

On the off chance, you still have some questions or not find the answers satisfactory, you may reopen
the doubt.

D17LP0096 · April 24, 2020, 3:18am

yeah we should compare validation error but if the model isnt able to get less training error we call it less trained model. If the training error is bad means that the classifier isnt good enough i guess. and about you point that theres little randomness. i sorted all the data by classes like…data is now class by class i.e not jumbled up. and i used the gridesearch its choosing completely different parameters and was getting much much higher accuracy

S18CRX0120 · April 24, 2020, 3:24am

Hey @D17LP0096, the best way is to use train_test_split and divide your data into 80% training and 20% testing, now check accuracies of different model.

S18CRX0120 · April 29, 2020, 3:25am

I hope I’ve cleared your doubt. I ask you to please rate your experience here
Your feedback is very important. It helps us improve our platform and hence provide you
the learning experience you deserve.

On the off chance, you still have some questions or not find the answers satisfactory, you may reopen
the doubt.