How to proceed with this challenge?

What type of data processing needs to be done? Do all the features need to be encoded first?

Hey @icoder18, In this challenge we are given 80 columns of tabular data consisting of both number and text columns. Some columns also compromise of missing data. Now let me give you some hints :

  1. How to convert columns consisting of text data into numbers? Do we implement OneHotEncoding on them?
yes you need to convert them to one hot encoding, You may use label encoding as well, you need to check the accuracy for both cases,
  1. How to take care of missing data ?
For columns contaninig Less than 15% missing values, replace numerical features with the mean value of column, For text related features replace themt to ''nan". If column contains more than 15% missing values, drop the complete column, its of no use for us.

Hope this gives you an idea :+1:
Happy Learning :slight_smile:

Great, thanks! I’ll try to do this

1 Like

I hope I’ve cleared your doubt. I ask you to please rate your experience here
Your feedback is very important. It helps us improve our platform and hence provide you
the learning experience you deserve.

On the off chance, you still have some questions or not find the answers satisfactory, you may reopen
the doubt.