Can you please or share docs?

help regarding what made you take transpose here X.T(y_-y) why X transpose ?is it just for making sure dimensions are correct? is it is possible to do without taking transpose?

hey @gauthampkrishnan,
It is done so because before calculating the gradient we have our data both X and Y as to be column matrix ( having n rows and 1 column ) , so performing multiplication on such 2 matrices is not possible. Hence , according to multiplication seven rule of vector multiplications we had to convert our X to its transpose.

considering only n features in X with m records

shape of X = ( m,n) , shape of Y = ( m, )
#after transpose
shape of X = ( n,m ) , shape of Y = ( m, )

hence now we are able to multiply it properly and get those proper terms we need for our solution based on the given formulae.

I hope this would have resolved your doubt.
Thank You.