Doubt in implementation decision tree

Royal_Yashasvi · June 23, 2021, 9:07am

Hey,

Sir my doubt is as we have split our data into left and right
So we have to use data_left/right.survived.mean() ? Sir, please elaborate this why we are doing x_train.survived.mean()
as this gives us the mean of survived data that is in x_train but we need a mean of survived data which is in data_ left and data_right.

SIR’S CODE:-

github.com

coding-blocks-archives/machine-learning-online-2018/blob/master/13. Decision Trees/Decision_Tree_Class.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Decision Trees\n",
    "*Problem* : **Titanic Survior Prediction** Kaggle Challenge\n",
    "\n",
    "### Learning Goals\n",
    "- How to pre-process data? \n",
    "    - Dropping not useful features\n",
    "    - Filling the missing values (Data Imputation)\n",
    "    \n",
    "- Creating a Binary Decision Tree from Scratch\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,

This file has been truncated. show original

Thanks

prashant_ml · June 25, 2021, 4:18pm

hey @Royal_Yashasvi,
You might know about binary tree. So , decision tree is a binary tree and hence we need to split the data in such a way that we have two such nodes.
Hence , we take mean for that purpose , its not necessary that we need to use mean only , we can define any other formulae too , but we just need to have some method to divide.

This is why we do that.
I hope this helps.