Bag of words - Vectorization

I am not able to understand what he is doing with all the zeros and ones in vector form,

Hi @arush,
In any vector v all the ones denotes that there is word present at that index, if the word is not present in the sentence then it will have 0 in it’s corresponding place.
For eg.

We have corupus like this :

  1. Virat kohli is a good player.
  2. Cat is running

Let’s make a dictionary

{
"virat" : 0, "kohli": 1, 
"is" : 2, "a" : 3, 
"good" : 4, "player" : 5,
"cat " : 6 , "running" : 7
}

length of the dictionary would be - 8 because there are 8 unique words.

Suppose we have to vectorize a new sentence which says. Dog is running
We will get this vector = [0,0,1,0,0,0,0,1]
Look the position of words. 1 means they are present , 0 means they are not present in the new sentence.

Thanks :slight_smile: