In quiz ans was 9 but when i actually tried to run it, it counts upto 8. How? Why is word 'a' not showing up?

Hey @Joy-Gupta-2763139277246091, this is due to the reason that by default sklearn’s countvectorizer applies by default parameter named 'token_pattern ’ to include only the words having length more than one. That is the reason, ‘is a’ should appear when we calculate manually but when we use sklearn’s countvectorizer, it removes ‘a’ from the sentence and hence this pair is missing.

Also if you want to check that you can use

vectorizer = CountVectorizer(ngram_range = (2,2),token_pattern = r"(?u)\b\w+\b")
vectorizer.fit_transform(corpus).toarray()

Hope this resolved your doubt. :blush: