I am not able to understand what he is doing with all the zeros and ones in vector form,
Bag of words - Vectorization
Hi @arush,
In any vector v all the ones denotes that there is word present at that index, if the word is not present in the sentence then it will have 0 in it’s corresponding place.
For eg.
We have corupus like this :
- Virat kohli is a good player.
- Cat is running
Let’s make a dictionary
{ "virat" : 0, "kohli": 1, "is" : 2, "a" : 3, "good" : 4, "player" : 5, "cat " : 6 , "running" : 7 }
length of the dictionary would be - 8 because there are 8 unique words.
Suppose we have to vectorize a new sentence which says. Dog is running
We will get this vector = [0,0,1,0,0,0,0,1]
Look the position of words. 1 means they are present , 0 means they are not present in the new sentence.
Thanks