I am not able to understand what he is doing with all the zeros and ones in vector form,
Bag of words - Vectorization
Hi @arush,
In any vector v all the ones denotes that there is word present at that index, if the word is not present in the sentence then it will have 0 in it’s corresponding place.
For eg.
We have corupus like this :
- Virat kohli is a good player.
- Cat is running
Let’s make a dictionary
{
"virat" : 0, "kohli": 1,
"is" : 2, "a" : 3,
"good" : 4, "player" : 5,
"cat " : 6 , "running" : 7
}
length of the dictionary would be - 8 because there are 8 unique words.
Suppose we have to vectorize a new sentence which says. Dog is running
We will get this vector = [0,0,1,0,0,0,0,1]
Look the position of words. 1 means they are present , 0 means they are not present in the new sentence.
Thanks 