Memory Error when executing large dataset

Anubhav-Sanyal-2376201725990648 · May 28, 2019, 8:43am

Hello!
Okay, so i was trying my hands on the movie sentiment analysis assignment. Now, this is typically a large dataset(may be) with 40k rows. so when i run a cleaning function on it using nltk pipelines, it works fine. but as soon as i count vectorizer method to transform it in an array, i get memory error(happens sometimes, not everytime). So, is it some hardware issue? I have got 8 gb of RAM and have used this on fairly large dataset but this has never happened.
Any suggestion on what the problem is and how can i resolve this issue?

Prateek-Narang-10209158320224419 · May 31, 2019, 5:39am

For large datasets we typically create Pipelines, that load small chunks of data into memory, transform them into vectors and feed into network.
At any point of time, only 2 batches of data persist in memory. One is the current batch, and next batch(optional but recommended) to be fed to the network.
I have covered one example in Image Pipelines.