One common doubt

how to decide how many layers we have to make and how many neurons each layer should have??

It’s mostly experimental as well as experience based.

Depending upon the size of our data, the input shape and our desired feature area we start off with choosing the appropriate model. Then a hit and trial approach follows, we tune the parameters according to the loss which we obtain. For example if we are working with a CNN then the best way of tuning the parameters is by observing the summary of the model, the size of the input going into each filter and thus changing the strides and filter size accordingly.

Reiterating, the biggest factor which effects all this is the input size and the kind of data we have.

I hope this helps.