Well i was trying to figure out what will be the problem when i will use linear function instead of sigmoid function
the derivative of the linear function will always be a constant value so why cant i learn from that i mean if gradient is some constant value and everytime we move in correct direction by some constant value so the function will converge to a global minima what will be the problem in this??