For question 1, talking about O2, there are 3 misclassified points, so if we calculate p = g(z) ( = sigmoid(theta^T * x + b)), that theta^t * x + b will come little positive for them, which means p = g(z) will come little greater than 0.5 lets say it comes out to be 0.6 for each of them. Now for calculating likelihood of each point we will use formula p^(y_actual) * (1-p) ^(1-y_actual) since here y_actual =0 for all three points, so likelihood of each point will come to be (1-0.6) = 0.4.
Now when we calculate maximum likelihood we had total answer as = 0.4 * 0.4 * 0.4 * (likelihood of all rest points)
Now talking about O1, if we calculate likelihood of those three previously misclassified points than, now all of them are correctly classified and hence theta^t * x + b will come little negative which means p = g(z) will come little less than 0.5 lets say it comes out to be 0.4 for each of them. Now again for calculating likelihood of each point we will use formula p^(y_actual) * (1-p) ^(1-y_actual) since here y_actual =0 for all three points, so likelihood of each point will come to be (1-0.4) = 0.6 .
Now when we calculate maximum likelihood we had total answer as = 0.6 * 0.6 * 0.6 * (likelihood of all rest points)
Clearly O1 has higher likelihood than O2.
For question 2 we are trying to do both simultaneously .
Hope this cleared your doubt