Regression_notebook_code

yogesh · July 31, 2018, 12:16pm

def batch_gradient(X,Y,theta,batch_size=30):

m = Y.shape[0]
indices = np.arange(m)
np.random.shuffle(indices)
indices = indices[:batch_size]
grad = np.zeros((2,))
for i in indices:
    h = hypothesis(X[i],theta)
    grad[0] += (Y[i]-h)
    grad[1] += (Y[i] - h)*X[i]

return grad*0.5

What is the use of batch_gradient function?

yash97 · July 31, 2018, 1:16pm

it calculates and return gradient by which are parameters are to be updataed.

yogesh · July 31, 2018, 1:18pm

There are two separate function - gradient and batch_gradient. What’s the difference between the two?

def gradient(X,Y,theta):
    
    m = Y.shape[0]
    grad = np.zeros((2,))
    
    for i in range(m):
        h = hypothesis(X[i],theta)
        grad[0] += (Y[i]-h)
        grad[1] += (Y[i] - h)*X[i]
        
    
    return grad*.5

def batch_gradient(X,Y,theta,batch_size=30):
    
    m = Y.shape[0]
    indices = np.arange(m)
    np.random.shuffle(indices)
    indices = indices[:batch_size]
    grad = np.zeros((2,))
    for i in indices:
        h = hypothesis(X[i],theta)
        grad[0] += (Y[i]-h)
        grad[1] += (Y[i] - h)*X[i]
    
    return grad*0.5

yash97 · July 31, 2018, 1:33pm

in first one or normal gradient descent function we are iterating over all the rows or all training examples
but in batch gradient descent we are iterating over only certain part of our training examples instead of whole data(which is given by batch size).they are two types if gradient descent they will both actually find same local minima but batch gradient is slightly faster .
this will be covered more in class on linear regression.

yogesh · July 31, 2018, 1:54pm

Ok thanks. I got that.
I got a few more questions in linear regression.

Why do we normalize data in linear regression? What will happen if we do not normalize our data?
Also, are we standardizing data or normalizing data? I read somewhere:
-Normalization transforms your data into a range between 0 and 1
-Standardization transforms your data such that the resulting distribution has a mean of 0 and a standard
deviation of 1
Do we have to denormalize our results to make the algorithm work for our actual data? Won’t we get wrong results if we test our actual predictions of y using non-normalized values of x?

yash97 · July 31, 2018, 2:05pm

1.answer for normalization and standardization:

yash97 · July 31, 2018, 2:07pm

we dont need to denormalize our data instead we normalize the input of test data with same scale
that is input whose output is to be predicted is normalized first.

yogesh · July 31, 2018, 2:09pm

So we are standardizing data and not normalizing, right? In the video sir said we are normalizing it.

yash97 · July 31, 2018, 2:11pm

standardizing and normalizing are generally used interchangebly as they both are used for feature scaling.

yogesh · July 31, 2018, 2:14pm

Okay. Last question. What will happen if we do not standardize our data before performing a linear regression on it?

yash97 · July 31, 2018, 2:18pm

wait for this topic to be discusssed in class u will get ur answer with complete explanation.