Hello @mohituniyal2010,
To answer this question, I would like to take you to the time when NNs were getting into how it looks today.
The time NNs were introduced, scientists called it a universal approximator, and almost every task in hand was solved with 1 dense layer. At that time, everyone thought that a single layer of a Dense Network is the thing we all will ever want to attain the true AI.
But as time progressed, and with the boom of Internet and Large datasets, researchers found out that tasks that can be accomplished within a computer can be more complex, and sadly understood that layers must be increased. And at that situation, they got stuck at the optimization procedure, when the Father of Modern AI, Prof Geoffrey Hinton came with backpropagation.
Thats a little history into when we came to know that depth is a necessity.
Now comes why?
- Have you ever asked yourself why we needed SVMs when normal logistic and linear regressions were working good?
- Why we built Neural Networks when SVMs did their job so well.
The main standing stone of Neural Computing and Neural Networks are the non-linearity it provides. Just think what will happen if all the neurons had linear activations? The whole process becomes a simple linear regression or a linear estimator. So, basically, those relu, tanh, sigmoid all gave us the non linearity we were missing. Squishing and Stretching hyperspaces looking for a hyperplane that helps in our task.
Now, ZF net came up with some hyperparameter tunings with the AlexNet architecture and a better score that got them the prize that year. The next year VGG came with more depth, showing the research world that more depth produced better results.
How?
There can be multiple core intepretations for the same, but the most significant and intuitive idea behind increasing depth is.
Along you go the direction of depth, you will see your task being simpler. Each layer summarizes its understanding and produces the result to the next one while the next one summarizes its believes on this information and passes ahead its understandings.
All this, with Linear Space Transformations and Non Linear Projections.
Conclusively, more depth means more simpler task at the end, more non linearity, more freedom. Hence VGG scored better than ZF but unfortunately those Google guys wont stop . They did their part and won the title with GoogleNet’s groundbreaking Inception Module.
Hope I have given you a understanding into why and when depth helps.
Happy Learning
Thanks