Regarding Maximum likelihood estimation?

I am not able to understand how maximum likelihood estimation works here ? as taken example here o1 & o2 are two line o1 is giving maximum likelihood . but that part is unclear . Can you please elaborate in more precise way ??

https://imgur.com/a/R2lCIyR (Please look into this)

For question 1, talking about O2, there are 3 misclassified points, so if we calculate p = g(z) ( = sigmoid(theta^T * x + b)), that theta^t * x + b will come little positive for them, which means p = g(z) will come little greater than 0.5 lets say it comes out to be 0.6 for each of them. Now for calculating likelihood of each point we will use formula p^(y_actual) * (1-p) ^(1-y_actual) since here y_actual =0 for all three points, so likelihood of each point will come to be (1-0.6) = 0.4.
Now when we calculate maximum likelihood we had total answer as = 0.4 * 0.4 * 0.4 * (likelihood of all rest points)

Now talking about O1, if we calculate likelihood of those three previously misclassified points than, now all of them are correctly classified and hence theta^t * x + b will come little negative which means p = g(z) will come little less than 0.5 lets say it comes out to be 0.4 for each of them. Now again for calculating likelihood of each point we will use formula p^(y_actual) * (1-p) ^(1-y_actual) since here y_actual =0 for all three points, so likelihood of each point will come to be (1-0.4) = 0.6 .
Now when we calculate maximum likelihood we had total answer as = 0.6 * 0.6 * 0.6 * (likelihood of all rest points)

Clearly O1 has higher likelihood than O2.

For question 2 we are trying to do both simultaneously .

Hope this cleared your doubt :blush: