I am not able to understand the base case or the case of hitting the leaf node in the recursion of train() function.
According to implementation when we are at leaf node, the feature which has highest info gain will divide the data into two parts (i.e, left and right). Out of which one part will be empty. How can we be sure that always one part is empty? Or when we reach a state where out of 2 data partition, (on the basis of highest info gain feature) one is empty, how we can be sure that this is leaf node??
According to me we should reach a leaf node when the data left with us has all labels as same or there is only one type of y_label. Is this correct? Can we implement train() using this as the base case also??