Hello.Why are these lines of code used in the decision tree implementation code
if data_left.shape[0] == 0 or data_right.shape[0] ==0:
if X_train.Survived.mean() >= 0.5:
self.target = “Survive”
else:
self.target = “Dead”
return
#Stop earyly when depth >=max depth
especially why is if X_train.Survived.mean() >= 0.5: used?
Doubt in some lines of code
hey @Sid10 ,
here i provide you the explanation of each line in code you provided :
after splitting each node depending , if we see that on any side sub branch we get no node ,so that side branch will result in self.target value depending on the mean value.
Like in logistic regression we provide a threshold to classify , similarly in binary classification task we have taken this threshold to be as 0.5 and if the mean value is less than 0.5 we take target as dead else survived.
This is used to control over fitting. Stopping the model when our validation accuracy starts to decrease.
I hope this would have resolved your doubt.
Thank You and Happy Learning .