Problem in using Linear function instead of sigmoid function for calculating activations

Well i was trying to figure out what will be the problem when i will use linear function instead of sigmoid function

the derivative of the linear function will always be a constant value so why cant i learn from that i mean if gradient is some constant value and everytime we move in correct direction by some constant value so the function will converge to a global minima what will be the problem in this??

Hi Aman,

Your question is valid and we indeed use the linear activation function for Regression problems, but, how do you plan to use it for Classification tasks. The target is 0 or 1 whereas a linear activation function can return an output outside of this range too. So, we need to squash it into this range. How do you plan to do that with a linear function? (If you’re thinking about the Signum function, that too is non-linear).

well if my linear function gives +ve value then i will predict 1 and if it gives -ve value the i will predict 0

That is in fact the signum function which is non-linear. Also, its derivative is 0 everywhere except at 0 where it is non-differentiable. It isn’t possible to learn with a 0 derivative/gradient so back-propagation is completely out-of-question. Hope this help!

I hope I’ve cleared your doubt. I ask you to please rate your experience here
Your feedback is very important. It helps us improve our platform and hence provide you
the learning experience you deserve.

On the off chance, you still have some questions or not find the answers satisfactory, you may reopen
the doubt.