Accuracy of prediction

saksham_thukral · April 12, 2020, 4:43am

i implemented decision tree on titanic data set ,following the steps of as shown in video, but the accuracy in video comes out to be 0.81,where as mine accuracy is coming out to be 1.00 , so is it ok? or my code is overfitting the results?, plz if you can check my code?

S18CRX0120 · April 12, 2020, 6:46am

Hey @saksham_thukral, it is more vulnerable that you model has overfitted, the best way to check is to, use train_test_split. Train your decission tree on 80% of data and check its score on both training as well as validation data. Now compare them to see if model has overfitted or not.

HOpe this resolved your doubt.
Plz mark it as resolved in my doubts section.

saksham_thukral · April 12, 2020, 12:18pm

i got it ,its actually overfiting , but i don’t know how to resolve it, as i followed the same steps as in video, and they told to define the max_depth to prevent overfitting,that i did, but because of age column, my tree is getting split and gives leaf node at a depth of 1 only, plz if you could check,where i am doing wrong in my code? And can u tell me how can i share my code with you so that u can check it?

S18CRX0120 · April 12, 2020, 2:00pm

Hey @saksham_thukral, it is not feasible for us to check your code line by line, check your code line by line from here,

HOpe this resolved your doubt.
Plz mark it as resolved.

saksham_thukral · April 12, 2020, 3:33pm

i got my error ,but still i am confused whats wrong in this.Actually while filling all of the unknown values , of the age column i did :- 'cleaned_data[“Age”] = cleaned_data.fillna(cleaned_data[“Age”].mean()) ’ ,because it was only the age column which contained unknown values, but the correct line was according to you code is :- 'cleaned_data = cleaned_data.fillna(cleaned_data[“Age”].mean()) ', what’s actually the difference , ie we knew that the unknown values only existed in the age column,so i did like that ,as change was supposed to be made only in the age column

S18CRX0120 · April 12, 2020, 4:13pm

Hey @saksham_thukral, if you want to do like it, than you should do,
cleaned_data[“Age”] = cleaned_data.fillna(cleaned_data[“Age”].mean())[“Age”].
You are setting [“Age”] column with complete cleaned data, which seems ambiguous and incorrect.