Multivariate Bernoulli Naive Bayes Formula

Why did Sir multiplied the terms while calculating conditional probabilities of features.
If that word/feature doesn’t belong to that class/label, then a very big factor is multiplied in the equation !!

That formula stands correct for a single feature probability calculation and then we decide and multiply with the suitable factor.
For instance :

Sir used that this word doesn’t belong to the class and assumed it’s prob. to be very less( =0.1)
Then, he multiplied with a even bigger factor ( =0.9) thus increasing the probability for the feature that doesn’t belong to the class.
I am not able to realise this formula !!

Hey @mananaroramail, the reason being that when we have a query point say having two features, x1,x2.
And say suppose there are two classes, 0 and 1.
Probability of x1 belonging to 0 class = 0.9
Probability of x2 belonging to 0 class = 0.05

Probability of x1 belonging to 1 class = 0.7
Probablity of x2 belonging to 1 class = 0.24

Now suppose instead of taking product you choose to take sum as to calculate the final probability of x1,x2 belonging to class 0 and 1. You will get 0.95 and 0.94 respectively so your model will predict 0 class. But don’t you think answer should be 1. Our target is like we need to search the class, which have higher probability of both x1 as well as of x2. and not only of one feature.

Thats the reason we took product of values and not sum.

Hope this resolved your doubt.
Plz mark the doubt as resolved in my doubts section. :blush:

Ok, I get it

And in the formula, P(xi | y=c)^b * ( 1 - P(xi | y=c) )^(1-b)
for calculating conditional probability of each feature,
does " b " refers to the class ??

And in this model only 2 classes are possible?

Hey @mananaroramail, b refers to the actual bit value in query point for the xi feature. In this we pass the vector for query document, containing 0 and 1 , 0 means that word is not present and 1 means that word is present. So lets say there is a word a ‘sport’ and its not contained in query document, so b=0, and we will calculate value of P(‘sport’ | y=c)^b * ( 1 - P(‘sport’ | y=c )^(1-b) substituting b= 0, we will be calculating finally, ( 1 - P(‘sport’ | y=c ) and this makes sense as well, we are calculating the probability the sport word does not belong to class y = c.

Hope this resolved your doubt.
Plz mark the doubt as resolved in my doubts section. :blush:

Ok, now I get that !!
Thanks a lot !!

I hope I’ve cleared your doubt. I ask you to please rate your experience here
Your feedback is very important. It helps us improve our platform and hence provide you
the learning experience you deserve.

On the off chance, you still have some questions or not find the answers satisfactory, you may reopen
the doubt.