Creating a vocab

sriyash2k · April 20, 2020, 6:17pm

Previously in this course we learned the NLTK preprocessing.In
that Prateek bhaiya says that we create a vocab of unique
words i.e. words which are left after stopword removal and
stemming.But in this video Prateek bhaiya says that this vocab is the list of most frequent 10,000 words so is this
vocab same as the vocab referred in NLTK preprocessing
or both are different.

S18CRX0120 · April 20, 2020, 7:13pm

Hey @sriyash2k, yes this vocab is nearly same as in nltk but sometimes the count of unique words increases so much say, 15K etc so we need to reduce some count of the words, otherwise high computation and memory requirements are required.

Hope this resolved your doubt.
Plz mark the doubt as resolved in my doubts section.