I have doubts in the following questions:
Q3) Shouldn’t the weights be n for n features? Are we considering bias as the weight for dummy feature x0, in this question?
Q4) We maximize the log likelihood to find the best value of theta so shouldn’t large values of the log-likelihood statistic indicate that the statistical model fits the data well?
Q6) The diagrams in options (b) and © are same.