I have 2 queries
-
Whats the easiest way to fill NaN values in multiple columns (of course based on their own values) e.g. age will be filled by age.mean() and similarly cabin will be filled with say cabin.mode() or something else
-
I have tried to fill the NaN values of cabin column with mode function(since mean dont make sense), however ever after applying fillna , only 2 values are filled. Please take a look at following
before :
Data columns (total 8 columns):
Survived 891 non-null int64
Pclass 891 non-null int64
Sex 891 non-null object
Age 714 non-null float64
SibSp 891 non-null int64
Parch 891 non-null int64
Fare 891 non-null float64
Cabin 204 non-null object
Survived Pclass Sex Age SibSp Parch Fare Cabin
0 0 3 1 22.0 1 0 7.2500 NaN
1 1 1 0 38.0 1 0 71.2833 C85
2 1 3 0 26.0 0 0 7.9250 NaN
3 1 1 0 35.0 1 0 53.1000 C123
4 0 3 1 35.0 0 0 8.0500 NaN
run the following :
data[“Age”] = data[“Age”].fillna(data[“Age”].mean())
data[“Cabin”] = data[“Cabin”].fillna(data[“Cabin”].mode())
after :
Data columns (total 8 columns):
Survived 891 non-null int64
Pclass 891 non-null int64
Sex 891 non-null int32
Age 891 non-null float64
SibSp 891 non-null int64
Parch 891 non-null int64
Fare 891 non-null float64
Cabin 206 non-null object
Survived Pclass Sex Age SibSp Parch Fare Cabin
0 0 3 1 22.0 1 0 7.2500 B96 B98
1 1 1 0 38.0 1 0 71.2833 C85
2 1 3 0 26.0 0 0 7.9250 G6
3 1 1 0 35.0 1 0 53.1000 C123
4 0 3 1 35.0 0 0 8.0500 NaN
Please let me know whats not correct?
Thanks a lot!