Overall doubts for k nearest neighbour

  1. index = new_vals[1].argmax()
    pred = new_vals[0][index]
    please explain me one more time for what purpose we
    are passing the 0 and 1 values what are there use?

  2. not able to understand the use of shape function .shape(-1,1) it basically convert 1d array to 2d array right? but this is not used everywhere basically i was using test_train_split method of sklearn , while passing the parameter to it it was showing error that expecting 2d array in case of 1 d array… so where is the proper use of this .shape((-1,)) method?

  3. why we are not using sklearn libraries it make computation task too easy?

  4. till now i have studies two algorithms knn and linear regression how can we know which algo to use when??

sorry for asking many doubts in a single go…:slight_smile:

  1. Here, new_vals is a list containing two tuples which was obtained after we ran the np.unique command on the k nearest neighbours list, the first tuple in that list contains the unique values which the function returned and the second tuple contains the number of times each unique value occured.

Here index = new_vals[1].argmax() extracts the index of the element which occurs the most of number of times out of the unique classed. Now, pred = new_vals[0][index] extracts the unique element out of the first tuple by using that ‘index’ which we obtained before.

Thus, with these statements we obtain the class in which most of the neighbouring elements of our test point belongs too.

  1. When passing data to a model you must pass it in the form of a 2D array, a collection of rows and columns, that’s the only valid form it accepts. That’s why, we reshape the 1-D array into 2-D.

  2. You need to understand the mathematics behind everything we do in ML because that’s only how you’ll be able to become one successful ML Engineer/Data Scientist. SciKit learn just makes it easy for you to implement it but you need to understand the maths that work behind it so that you can someday build a model yourself, which solves some real life problems

  3. KNN is a classification algorywhile Linear Regression is a Regression Algorithm. When you want the answer as to which category a data point belongs to, you should use Classification Algorithms while if you want a definite numeric answer to a problem, you use Regression.

I hope this clears your doubt?

yes thanks a lot bhaia