Making sense out of data

I am struggling a bit in figuring out which columns to remove and which columns to keep even after reading the description of the dataset.

  1. There are times when the description of a certain column doesn’t make any sense even after googling.
  2. There are times when a certain column can be useful in predicting the output but maybe it really isn’t. Example in Titanic dataset, number of children can actually affect the chances of that person’s survival but maybe it doesn’t really matter. Idk.
    Can anyone suggest me any tips on how to further improve my analysis?

U should read about feature_importance function of sklearn.
Also u will learn more with experience like try housing price prediction on kaggle it is one of the best problem and dataset to learn data analysis.
Also try to plot ur data and see how each feature affect your output it is one of the most important and basic method to analyze data.

1 Like

We will discuss the about the Titanic Dataset in Decision Trees Section.
Also I will add some videos on Feature Engineering.
For the the assignments given, you can use all columns as features.