Getting very steep loss graph

abubakar_nsit · June 5, 2020, 7:55pm

sir , in the NN ,the loss graph comes very steep , it reduces from 0.9 to 0.4 in just 10-12 epochs and then remains at 0.4 for 500 epochs , and the accuracy comes only 56%, and when the plot is visualise it doesnot fits the make_circles,please help

prashant_ml · June 6, 2020, 8:10am

hey @abubakar_nsit ,
can you please tell me whats the status of validation loss and validation accuracy in those 500 epochs.

abubakar_nsit · June 6, 2020, 8:47am

sir here is the code with some outputs

def softmax(a):
e_pa=np.exp(a)
ans=e_pa/np.sum(e_pa,axis=1,keepdims=True)
return ans

class NeuralNetwork:
def init(self,input_size,layers,output_size):
np.random.seed(0)
model={}
model[‘W1’]=np.random.randn(input_size,layers[0])
model[‘b1’]=np.zeros((1,layers[0]))
model[‘W2’]=np.random.randn(layers[0],layers[1])
model[‘b2’]=np.zeros((1,layers[1]))
model[‘W3’]=np.random.randn(layers[1],output_size)
model[‘b3’]=np.zeros((1,output_size))
self.model=model

def forward(self,x):
    W1,W2,W3=self.model['W1'],self.model['W2'],self.model['W3']
    b1,b2,b3=self.model['b1'],self.model['b2'],self.model['b3']
    z1=np.dot(x,W1)+b1
    a1=np.tanh(z1)
    z2=np.dot(a1,W2)+b2
    a2=np.tanh(z2)
    z3=np.dot(a2,W3)+b3
    y_=softmax(z3)
    self.activation_outputs=(a1,a2,y_)
    return y_

def backward(self,x,y,learning_rate=0.001):
    W1,W2,W3=self.model['W1'],self.model['W2'],self.model['W3']
    b1,b2,b3=self.model['b1'],self.model['b2'],self.model['b3']
    a1,a2,y_=self.activation_outputs
    m=x.shape[0]
    delta3=y_-y
    dw3=np.dot(a2.T,delta3)
    db3=np.sum(delta3,axis=0)/float(m)
    delta2=(1-np.square(a2))*np.dot(delta3,W3.T)
    dw2=np.dot(a1.T,delta2)
    db2=np.sum(delta2,axis=0)/float(m)
    delta1=(1-np.square(a1))*np.dot(delta2,W2.T)
    dw1=np.dot(X.T,delta1)
    db1=np.sum(delta1,axis=0)/float(m)
    
    self.model["W1"] -= learning_rate*dw1
    self.model['b1'] -= learning_rate*db1
    self.model["W2"] -= learning_rate*dw2
    self.model['b2'] -= learning_rate*db2                  
    self.model["W3"] -= learning_rate*dw3
    self.model['b3'] -= learning_rate*db3

def predict(self,x):           
    y_out=self.forward(x)
    return np.argmax(y_out,axis=1)

def loss(y_oht,p):
l = -np.mean((y_oht*np.log§))
return l

def one_hot(y,depth):
m=y.shape[0]
y_oht=np.zeros((m,depth))
y_oht[np.arange(m),y]=1
return y_oht

from sklearn.datasets import make_circles
import matplotlib.pyplot as plt
X,Y=make_circles(n_samples=500,shuffle=True,noise=.05,random_state=0,factor=0.8)
plt.style.use(“seaborn”)
plt.scatter(X[:,0],X[:,1],c=Y,cmap=plt.cm.Accent)
plt.show()

model=NeuralNetwork(input_size=2,layers=[5,4],output_size=2)
def train(X,Y,model,epochs,learning_rate,logs=True):
training_loss=[]
classes=2

Y_OHT=one_hot(Y,classes)

for ix in range(epochs):
    Y_=model.forward(X)
    l=loss(Y_OHT,Y_)
    training_loss.append(l)
    model.backward(X,Y_OHT,learning_rate)
    
    if(logs):
        print("epochs is %d and loss is %.4f"%(ix,l))
        
return training_loss

losses=train(X,Y,model,500,0.001)

plt.plot(losses)
plt.show()

def plot_decision_boundary(model,X,y,cmap=plt.cm.jet):
x_min,x_max=X[:,0].min()-1,X[:,0].max()+1
y_min,y_max=X[:,1].min()-1,X[:,1].max()+1
h=0.01
xx,yy=np.meshgrid(np.arange(x_min,x_max,h),np.arange(y_min,y_max,h))
Z=model(np.c_[xx.ravel(),yy.ravel()])
Z=Z.reshape(xx.shape)

plt.contourf(xx,yy,Z,cmap=plt.cm.Spectral)
plt.ylabel('x2')
plt.xlabel('X1')
plt.style.use("seaborn")
plt.scatter(X[:,0],X[:,1],c=y,cmap=plt.cm.jet)

plot_decision_boundary(lambda x:model.predict(x),X,Y)

outpust=model.predict(X)
np.sum(outpust==Y)
252

prashant_ml · June 6, 2020, 10:17am

hey @abubakar_nsit ,
your code is really good ,and also with no errors.
See , this is an extremely simple multi-layer perceptron and hence it majorly depends upon learning rate , hidden layer sizes and iterations.
currently you have chosen learning rate to be as 0.001 , hence you achieve an almost minimum score in first 10-2 epochs , you can try reducing the learning rate more to around 0.00001 and see it takes more time and might provide better results. and as you are updating weights and bias in full batch manner ( using full training data at once ), hence it is providing you a bit less accuracy.

Things that can be done to improve model accuracy :

Implementing mini batch or stochastic gradient descent.
Changing hidden layer sizes
Changing Activations between layers.

These might help you to learn and implement new topics and also improve your model working.
Thank You and Happy Learning .

abubakar_nsit · June 7, 2020, 10:11am

sir is why the accuracy is also decreasing along with validition_accuracy decrease over the epochs

?

please sir explain why accuracy is decreasing over the epochs?

prashant_ml · June 7, 2020, 11:05am

hey @abubakar_nsit ,

initially till 3-4 epochs your model was performing good , but suddenly it starts getting overfitted as training loss is still decreasing and validation loss increases . Model is paying more attention on outliers and using them majorly for predictions.
Mainly our task is to get either high validation accuracy or low validation loss. The fluctuation in training accuracy is because of the gradient descent ,as it might have taken a little long jump such that it doesn’t effected the loss much.

i would suggest you:

Run them for around 40-50 epochs to understand things more properly.
Vary hidden layer sizes , as it will effect generating patterns from data.
add regularization in between layers.

Try them , i hope this would resolve your doubt.

abubakar_nsit · June 7, 2020, 3:39pm

sir by increasing epochs or by varying hidden layer sizes the problem is not getting solved

sir guide me how to get rid of this problem and also how to add regularization in the neural network models to get rid of overfitting

prashant_ml · June 7, 2020, 4:02pm

hey @abubakar_nsit ,

Accuracy is just a metric to check model performance , but the actual thing to understand about our model is " is it learning something or not ?". This can be answered with the help of loss and validation loss curves. Your clearly shows that your model is overfitting.
So to get rid of it , we need to try some different experiments and for that can you please upload your code and resource files if any on github and share me its link.
As currently you working from scratch ,so it needs some extra work to be done. Hence , I have to rewrite your try those experiments.

abubakar_nsit · June 7, 2020, 4:17pm

sir here is the google colab link
https://colab.research.google.com/drive/1dM_bNKuJhqUuyR4PcgoCAEnTutKaZjyp?usp=sharing

and this one is github link for the previous code which i sent u initially in this thread:

i would be great thankfull to you if u check both sir.

prashant_ml · June 8, 2020, 3:07pm

hey @abubakar_nsit ,
sorry to be so late to reply you ,
there are some mistakes in your code:

For classification tasks ,especially for binary classification , you need to use the output layer activation to be either softmax or sigmoid. You can better understand there use on link
You are providing a very sparse input hence the model is currently more focusing on irrelevant data and hence not performing in the way it should perform. and also add a regularization dropout layer in between to drop some non useful data while training.
while going feed forward we never keep same number of hidden layer size as it just means another layer with values to be as same as in the previous layer.

I had updated your code model part , have a look at my code. It shows better performance and proper fitting on data.

Just have a look at this code , if you don’t understand something in this , kindly let me know.

Thank You and Happy learning.