How to pre-process data of House price prediction Challenge

In the House price prediction challenge, We are given 80 columns of tabular data consisting of both number and text columns. Some columns also compromise of missing data.

  1. How to convert columns consisting of text data into numbers? Do we implement OneHotEncoding on them?

  2. How to take care of missing data. I’ve read that in the case of number columns, we fill the empty cell by mean of numbers available in that column. Is this right way?

Also How to deal with missing data in case of text columns?

  1. Do normalization is necessary for Neural network datasets? Do we need to normalize entire dataset once converted into numbers?

Hey @preetishvij,

  1. yes you need to convert them to one hot encoding, You may use label encoding as well, you need to check the accuracy for both cases,
  2. For columns contaninig Less than 15% missing values, replace numerical features with the mean value of column, For text related features replace themt to ''nan". If column contains more than 15% missing values, drop the complete column, its of no use for us.
  3. Normalization fastens the process of learning, so its better to normalize the dataset before passing to neural network.

Hope this resolved your doubt.
Plz mark the doubt as resolved in my doubts section. :blush: