Why do we use softmax and then take the maximum instead of directly taking the maximum?
softmax is basically an activation function that is used in classification problem.The softmax function normalizes the input vector into a probability distribution that is proportional to the exponential of the input numbers.
whereas if we use maximun function it will give the the maximum value in the input and will not normalize it for classification problems.
It is an activation function that has different role to play in neural network enabling the neural networks to perform non linear computations.
Thanks for replying.
I meant taking argmax instead of doing softmax and then taking maximum. That would also be able to classify.
If we want probabilities, we can also use a probability distribution directly proportional to input numbers instead of exponential of input numbers. Why are we not doing that? That seems more like probabilities than softmax.
argmax will take the maximum from the output and will not take output in making prediction which is based on majority.
I hope I’ve cleared your doubt. I ask you to please rate your experience here
Your feedback is very important. It helps us improve our platform and hence provide you
the learning experience you deserve.
On the off chance, you still have some questions or not find the answers satisfactory, you may reopen