Alexnet architecture

I didn’t understand DATA AUGMENTATION in given paper.
plz give me brief idea about how is it done.

Data Augmentation:

Data Augmentation is usually done when we have less data than required to train and generalize a model. In order to make the most of our few training examples, we will “augment” them via a number of random transformations, so that our model would never see twice the exact same picture. This helps prevent overfitting and helps the model generalize better.

Coming on to your question two types of data augmentation techniques were used in Alexnet:

  1. The first form of data augmentation consists of Image translations and horizontal reflections, this can be achieved by using keras.preprocessing.image.ImageDataGenerator

    This class can be used to perform horizontal and vertical flips. I’d posted the code in the
    Coding Blocks IDE for the same.

    The output of the above code:

  2. The second form of data Augmentation performed was random changes to the light-level or brightness of the images.

Code for RandomBrightness Augmentation is IDE

Output: Image

I hope this makes everything clear about data augmentaion in AlexNet.

in paper it is written "training set size is increased by a factor of 2048 " how?

They are actually training on 1.2 million * 2048 training images.

For each training image of size 256x256, if you extract patches of size 224x224, you can get up to 1024. 224x224 patches from the image ((256-224)*(256-224)). And for each such patch you take a horizontal reflection. In total 2048 patches from a single image.

I understood the images(in 2nd type) ,are formed by changing brightness but what is meant by horizontal reflection and patches here i didn’t get

Here In the first form of Augmentation you’ll be going to create patches of (224 x 224) size Image.
Patch is basically a crop of size (224 x 224) from the original image of size (256 x 256).

From a single image of size (256 x 256) you can create
256-224 = 32 -- along x axis
256-224 = 32 -- along y axis

32*32=1024 patches.

For each patch you will also create a horizontally flipped image too. Therefore for 1 image there will be 2*1024 = 2048 patches.

Conclusion :

  1. Patch is the Image crop from original Image.
  2. For each unique Image there will be 2048 patches.
  3. Horizontal reflection means flipping an image horizontally.

If patch is crop image ,then it should be of size (224,224),then number of such patches should be(256-224+1)*(256-224+1) ?

yes, you are absolutely right infact In their paper they had said they used random crops so if they left out one or two crops won’t affect the dataset much.