Doubt in cv.vocabulary_ statement

cv in this statement is object of CountVectorizer class and when we run function cv.vocabulary_ we get dictionary of words mapped with indices of a text what is thsi
text

‘cv’ is an object of the CountVectoriser class and stores the mapping of all unique words thay occur within a text to the frequency of the words and the index corresponding to each word.
Suppose we have a paragraph with us, when we fit our CountVectoriser object (cv) on this paragraph, our cv object learns a mapping of each word to an index. When we execute the cv.vocabulary_ function it gives us a dictionary container each word along with the index which represents that word. The text, thus, are the unique words which we present in the initial paragraph which you had ‘fit’ your cv object.

i have run this.In my text word Indian is present at 0 index but in output of cv.vocabulary_ it having index 9.I am not understanding what is 9

The position of the word in the CountVectoriser list is independent with the position of the word in the text file. That’s the reason why it is called a ‘Bag of words’ model better position of the words in the text doesn’t matter.

1 Like

I hope your query has been resolved?

I hope I’ve cleared your doubt. I ask you to please rate your experience here
Your feedback is very important. It helps us improve our platform and hence provide you
the learning experience you deserve.

On the off chance, you still have some questions or not find the answers satisfactory, you may reopen
the doubt.