Comparison of ZF Net and VGG

mohituniyal2010 · July 11, 2019, 12:17pm

Prateek bhaiya explained the importance of choosing small kernel_size (3,3) in VGG that it’s receptive field is increased so, it can look for 7x7 or 11x11 size from 1 pixel in the deep layer.

But the question is the respective field of ZF Net is also 7x7 coz it use big kernel size or alexnet same receptive field uses 11x11.

ZF - big kernel size, smaller no. of layers
VGG - small kernel size, larger no of layers
In either case, the receptive field is the same.
since, these have similar receptive fields, then how VGG performs better than ZF net.

Manu-Pillai-1566551720093198 · July 11, 2019, 6:12pm

Hello @mohituniyal2010,

To answer this question, I would like to take you to the time when NNs were getting into how it looks today.
The time NNs were introduced, scientists called it a universal approximator, and almost every task in hand was solved with 1 dense layer. At that time, everyone thought that a single layer of a Dense Network is the thing we all will ever want to attain the true AI.
But as time progressed, and with the boom of Internet and Large datasets, researchers found out that tasks that can be accomplished within a computer can be more complex, and sadly understood that layers must be increased. And at that situation, they got stuck at the optimization procedure, when the Father of Modern AI, Prof Geoffrey Hinton came with backpropagation.
Thats a little history into when we came to know that depth is a necessity.
Now comes why?

Have you ever asked yourself why we needed SVMs when normal logistic and linear regressions were working good?
Why we built Neural Networks when SVMs did their job so well.

The main standing stone of Neural Computing and Neural Networks are the non-linearity it provides. Just think what will happen if all the neurons had linear activations? The whole process becomes a simple linear regression or a linear estimator. So, basically, those relu, tanh, sigmoid all gave us the non linearity we were missing. Squishing and Stretching hyperspaces looking for a hyperplane that helps in our task.
Now, ZF net came up with some hyperparameter tunings with the AlexNet architecture and a better score that got them the prize that year. The next year VGG came with more depth, showing the research world that more depth produced better results.
How?
There can be multiple core intepretations for the same, but the most significant and intuitive idea behind increasing depth is.
Along you go the direction of depth, you will see your task being simpler. Each layer summarizes its understanding and produces the result to the next one while the next one summarizes its believes on this information and passes ahead its understandings.
All this, with Linear Space Transformations and Non Linear Projections.
Conclusively, more depth means more simpler task at the end, more non linearity, more freedom. Hence VGG scored better than ZF but unfortunately those Google guys wont stop . They did their part and won the title with GoogleNet’s groundbreaking Inception Module.

Hope I have given you a understanding into why and when depth helps.

Happy Learning
Thanks

mohituniyal2010 · July 11, 2019, 6:27pm

omg… This is so overwhelming…
Thank you very much @Manu-Pillai-1566551720093198 for such clear explanation…
so to summarize, more depth means more non linearity which means more features are extracted, which improves the performance of the model…

Manu-Pillai-1566551720093198 · July 11, 2019, 6:45pm

I hope I’ve cleared your doubt. I ask you to please rate your experience here
Your feedback is very important. It helps us improve our platform and hence provide you
the learning experience you deserve.

On the off chance, you still have some questions or not find the answers satisfactory, you may reopen
the doubt.