HI! I am facing the following doubts while going through the Neural Network Implementation:
-
delta3 = y_ - y. This means that the loss function is used is MSE, as opposed to cross entropy as shown in the loss method definition. Is my assumption correct or am I missing something here?
-
In relation to the previous query, derivative of softmax activation function is provided with as 1. It would be great if some document for the derivation is provided.
-
Could not derive the derivative of tanh, it would be great if some document for the derivation is provided.
It would help me a lot to understand the NN in details. Thanks.
