How to use the parameters in Neural Networks

27 minute read

Today I will discuss about how to adjust the activation shape, activation size, and the number of parameters of the neural networks .

I will use a small test dataset from Genshin Impact videogame where we will apply an AlexNet Network

In ordering to select the appropriate parameters in a simple Neural Network for example in case of a Convolutional Neural Network , we should remember the meaning of all the layers and understand the following parameters:

Activation Shape
Activation Size
Number of Parameters

Analysis of Neural Networks

To build the neural network, we should know the dimensions of the layers that are include in the network.

In this work we will use three types of layers in a convolution

Convolution (CONV)
Pooling (POOL)
Fully connected (FC)

Parameters in Convolution Neural Networks (CNNs)

Let us define several helper functions that allow us understand how the Neural Networks works and use.

Convolution (CONV)

def dim_valid_convolution(inputs,  kernel):
    '''
    input
    nh : height
    nw : widht
    
    kernel
    fh : filter
    fw : filter
    '''
    nh,nw,= inputs
    fh,fw = kernel
    return (nh-fh) + 1, (nw-fw) + 1

Let us assume that you have an image of dimension 6x6 which you will perform a convolution with a filter (kernel) that has a dimensions of 3x3. Then the valid output dimension of this convolution is 4x4.

This example may be represented as:

inputs = 6 , 6  # nxn image
filters = 3 , 3 # fxf filter
dim_valid_convolution( inputs, filters)

(4, 4)

def dim_same_convolution(inputs,  kernel,s,p):
    '''
    Output size is the same as input size
    
    input
    nh : height
    nw : widht
    
    kernel
    fh : filter
    fw : filter
    '''
    nh,nw,= inputs
    fh,fw = kernel
    return (nh+2*p-fh) + 1, (nw+2*p-fw) + 1

We choose pad in a way that the output size is the same as the input size

inputs = 6 , 6  # nxn image
filters = 3 , 3 # fxf filter
stride=1.0    #stride s
padding=1.0   # padding s
parameters=dim_same_convolution( inputs, filters,stride,padding)

def check_same(inputs,parameters):
    #2D
    if len(parameters)==2 :
        assert parameters[0] == inputs[0] and  parameters[1] ==inputs[1],"It is not same convolution, please fix the stride or padding for the input"+str(inputs)+"and parameters "+str(parameters)      
    #3D    
    if len(parameters)==3 :
        assert parameters[0][0] == inputs[0] and  parameters[0][1] ==inputs[1],"It is not same convolution, please fix the stride or padding for the input"+str(inputs)+"and parameters "+str(parameters)

check_same(inputs,parameters)

Now let us consider another example, we take a image of dimension 7x7 which you will perform an stride convolution with a kernel of 3x3 within stride 2 and padding 0.

strided

def dim_strided_convolution(inputs, kernel ,s,p):
    '''
    input = (nh, nw)
    nh : height
    nw : widht
    
    kernel = (fh, fw)
    fh : filter height
    fw : filter widht

    p : padding
    s : stride

    '''
    nh,nw= inputs
    fh,fw= kernel
    
    
    print("Activation Shape Strided")

    
    return (nh+2*p-fh)/s + 1, (nw+2*p-fw)/s + 1

You can describe this example as

inputs = 7,7  # nxn image
kernel = 3,3  # fxf filter
stride=2.0    #stride s
padding=0.0   # padding s
dim_strided_convolution(inputs, kernel ,stride,padding)

with the allowed results

Activation Shape Strided

(3.0, 3.0)

def dim_rgb_convolution(inputs, kernel,stride,padding,filters):
    '''
    input = (nh, nw, nc)
    where 
    nh: height
    nw: widht
    nc: channels
    
    output = (nhl,nwl,ncl)
       
    nhl = (nh+2*p-fw)/s + 1
    nwl = (nw+2*p-fh)/s + 1
    ncl = filters
    
    
    where
       fw,fh : filter sizes
       p : padding
       s : stride  
    ncl  : filters
    
    '''
    nh,nw,nc = inputs
    fh,fw = kernel
     
    s        = stride
    p        = padding
    ncl      = filters

    nhl = (nh+2*p-fw)/s + 1
    nwl = (nw+2*p-fh)/s + 1
    output = (int(nhl),int(nwl),int(ncl))

    print("Activation Shape")

    return output

Let us define the number of parameters used in each convolution.

The parameters are defined as :

((shape of width of filter x shape of height filter x number of filters in the previous layer+1) xnumber of filters)

def nparameters_convolution(inputs, kernel,stride,padding,filters):
    '''
    input = (nh, nw, nc)
    where 
    nh: height
    nw: widht
    nc: channels
    
    activation_shape  = (nhl,nwl,ncl)
       
    nhl = (nh+2*p-fw)/s + 1
    nwl = (nw+2*p-fh)/s + 1
    ncl = filters
    
    
    where
       fw,fh : filter sizes
       p : padding
       s : stride  
    ncl  : filters
    
    '''
    nh,nw,nc = inputs
    fh,fw = kernel
     
    s        = stride
    p        = padding
    ncl      = filters
    
    #activation shape 
    nhl = (nh+2*p-fw)/s + 1
    nwl = (nw+2*p-fh)/s + 1
    activation_shape = (int(nhl),int(nwl),int(ncl))
    
    # activation size
    activation_size=int(nhl)*int(nwl)*int(ncl)
    
    
    # number Parameters
    nparameters=((fh*fw*nc)+1)*ncl
    
    print("Activation Shape,", "Activation Size,","# Parameters")
     
    return   activation_shape ,  activation_size, nparameters

inputs  = 32,32,3  #nw x nh x nc image
kernel  = 5,5      #fw x fw  filter
stride  = 1.0      #stride s
padding = 0.0      #padding p
filters = 8       #number of filters ncl
nparameters_convolution(inputs, kernel,stride,padding,filters)

Activation Shape, Activation Size, # Parameters
((28, 28, 8), 6272, 608)

For example, let us a consider simple case of the a convolution Neural Network like ConvNet from the Coursera Deep Learning Course

convnet

with the following example

inputs  = 39,39,3  #nw x nh x nc image
kernel  = 3,3      #fw x fw  filter
stride  = 1.0      #stride s
padding = 0.0      #padding p
filters = 10       #number of filters ncl

dim_rgb_convolution(inputs, kernel,stride,padding,filters)

Activation Shape
(37, 37, 10)

Where the activation size, considering it’s merely the product of width, height and the number of channels in that layer.

The input layer’s shape is (37, 37, 10), the activation size of that layer is \(37* 37* 10 = 13690\)

37* 37* 10 

inputs  = 37,37,10  #nw x nh x nc image
kernel  = 5,5      #fw x fw  filter
stride  = 2.0      #stride s
padding = 0.0      #padding p
filters = 20       #number of filters ncl


dim_rgb_convolution(inputs, kernel,stride,padding,filters)

Activation Shape

(17, 17, 20)

The same happens if we want to calculate the activation size for this convolution. All we have to do is just multiply (17, 17, 20) , i.e 17* 17* 20= 5780

17* 17* 20

inputs  = 17,17,20  #nw x nh x nc image
kernel  = 5,5      #fw x fw  filter
stride  = 2.0      #stride s
padding = 0.0      #padding p
filters = 40       #number of filters ncl

dim_rgb_convolution(inputs, kernel,stride,padding,filters)

Activation Shape

(7, 7, 40)

nparameters_convolution(inputs, kernel,stride,padding,filters)

Activation Shape, Activation Size, # Parameters

((7, 7, 40), 1960, 20040)

The number of parameters in a given layer is the count of “learnable” elements for a filter aka parameters for the filter for that layer. Parameters in general are weights that are learnt during training. They are weight matrices that contribute to model’s predictive power, changed during back-propagation process

Pooling (POOL)

In the pooling there are the following Hyperparameters:

f: filter size
s: stride
Max or average pooling

Given an input with the dimensions

\[n_H \times n_W \times n_C\]

Max or pooling is has the following dimensions

\[\frac{n_H+2p-f}{s}+1 \times \frac{n_W+2p-f}{s}+1 \times n_C\]

The numbers of channels remains \(n_C\)

def dim_pool(inputs, kernel,stride,padding):
    '''
    input = (nh, nw, nc)
    where 
    nh: height
    nw: widht
    nc: channels
    
    activation_shape  = (nhl,nwl,ncl)
       
    nhl = (nh+2*p-fw)/s + 1
    nwl = (nw+2*p-fh)/s + 1
    ncl = nc
    where
       fw,fh : filter sizes
       p : padding
       s : stride     
    '''
    nh,nw,nc = inputs
    fh,fw = kernel
     
    s        = stride
    p        = padding
    ncl      = nc

    nhl = (nh+2*p-fw)/s + 1
    nwl = (nw+2*p-fh)/s + 1
    activation_shape = (int(nhl),int(nwl),int(ncl))


    # activation size
    activation_size=int(nhl)*int(nwl)*int(ncl)
    
    
    # number Parameters
    nparameters=0
    
    print("Activation Shape,", "Activation Size,","# Parameters")
     
    return   activation_shape ,  activation_size, nparameters    

inputs  = 5,5,5 #nw x nh x nc image
kernel  = 3,3      #fw x fw  filter
stride  = 1.0      #stride s
padding = 0.0      #padding p

dim_pool(inputs, kernel,stride,padding)

Activation Shape, Activation Size, # Parameters

((3, 3, 5), 45, 0)

inputs  = 7,7,1000 #nw x nh x nc image
kernel  = 2,2      #fw x fw  filter
stride  = 2.0      #stride s
padding = 0.0      #padding p


dim_pool(inputs, kernel,stride,padding)

Activation Shape, Activation Size, # Parameters

((3, 3, 1000), 9000, 0)

Another example, that we can consider is the LeNet-5

convnet

The input layer’s shape is (32, 32, 3), the activation size of that layer is 32 * 32 * 3 = 3072.

CONV 1

inputs  =32,32,3  #nw x nh x nc image
kernel  = 5,5      #fw x fw  filter
stride  = 1.0      #stride s
padding = 0.0      #padding p
filters = 8 #6       #number of filters ncl

newinput=dim_rgb_convolution(inputs, kernel,stride,padding,filters)
newinput

Activation Shape

(28, 28, 8)

The activation size for CONV1.

28* 28* 8

Parameters CONV1

((fw x fw *nc +1)*ncl)

(((5*5*3)+1)*8) 

inputs  =32,32,3  #nw x nh x nc image
kernel  = 5,5      #fw x fw  filter
stride  = 1.0      #stride s
padding = 0.0      #padding p
filters = 8       #number of filters ncl

nparameters_convolution(inputs, kernel,stride,padding,filters)

Activation Shape, Activation Size, # Parameters

((28, 28, 8), 6272, 608)

POOL 1

inputs  = 28, 28, 8 #nw x nh x nc 
kernel  = 2,2     #fw x fw  filter
stride  = 2.0      #stride s
padding = 0.0      #padding p

dim_pool(inputs, kernel,stride,padding)

Activation Shape, Activation Size, # Parameters

((14, 14, 8), 1568, 0)

The activation size for POOL1.

14* 14* 8

CONV 2

inputs  =14, 14, 8 #nw x nh x nc image
kernel  = 5,5      #fw x fw  filter
stride  = 1.0      #stride s
padding = 0.0      #padding p
filters = 16       #number of filters ncl

newinput=dim_rgb_convolution(inputs, kernel,stride,padding,filters)
newinput

Activation Shape

(10, 10, 16)

The activation size for CONV2.

10*10*16 

inputs  =14, 14, 8 #nw x nh x nc image
kernel  = 5,5      #fw x fw  filter
stride  = 1.0      #stride s
padding = 0.0      #padding p
filters = 16       #number of filters ncl


nparameters_convolution(inputs, kernel,stride,padding,filters)

Activation Shape, Activation Size, # Parameters

((10, 10, 16), 1600, 3216)

POOL 2

inputs  = 10, 10, 16 #nw x nh x nc 
kernel  = 2,2     #fw x fw  filter
stride  = 2.0      #stride s
padding = 0.0      #padding p

dim_pool(inputs, kernel,stride,padding)

Activation Shape, Activation Size, # Parameters

((5, 5, 16), 400, 0)

The activation size for POOL2.

5* 5* 16

Parameters in general are weights that are learnt during training. They are weight matrices that contribute to model’s predictive power, changed during back-propagation process.

FULLY CONNECTED LAYER

To calculate the learnable parameters here, all we have to do is just multiply the by the shape of width hw, height hw, previous layer’s filters nc and account for all such filters k in the current layer. Don’t forget the bias term for each of the filter.

def nparameters_fully_connected(c , p):
    '''
    current layer dimension: c
    previous layer activation size: p  
    '''
    
    #activation shape 
    activation_shape = (c,1)
    
    # activation size
    activation_size=c
    
    
    number=(( c *  p)+1 * c) 
    print("Activation Shape,", "Activation Size,","# Parameters")

    return activation_shape, activation_size, number    

FC3

nparameters_fully_connected(120 , 400)

Activation Shape, Activation Size, # Parameters

((120, 1), 120, 48120)

FC4

nparameters_fully_connected(84 , 120)

Activation Shape, Activation Size, # Parameters

((84, 1), 84, 10164)

Softmax

nparameters_fully_connected(10 , 84)

Activation Shape, Activation Size, # Parameters

((10, 1), 10, 850)

Up to now we have seen the dimensions of the activation shape, the activation size and the number of parameters. Let us put in practice this knowledge.

How to use AlexNet Network

For this project I will take two differnet models of AlexNet applied to an unknown dataset from the problem given at the MMORPG-AI

The models to analyze are:

Non adapted model
Adapted model

The non adapted model is just take the “raw” definition of the AlexNet Network from the standard python code here

The adapted model is the version where we modify the parameters of the non adapted model in according to the Analysis previous done in this blog.

Let us first load the libraries that we need to begin the discussion

#Importing Gamepad library
from mmorpg import *

The important part is this:

# We define the size of the pictures
WIDTH = 480
HEIGHT = 270

We load the data of the project

#We load the images of the gameplay
x_training_data=pd.read_pickle('data/dfx-0.pkl')  
#We load the inputs of the of the gameplay
y_training_data=pd.read_pickle('data/dfy-0.pkl')  
X_train, X_valid, y_train, y_valid = train_test_split(x_training_data, y_training_data, test_size=0.2, random_state=6)
# Train Image part ( 4 Dimensional)
X_image = np.array([df_to_numpy_image(X_train,i) for i in X_train.index])
X=X_image.reshape(-1,WIDTH,HEIGHT,3)
#Train Input part ( 1 Dimensional )
Y = [df_to_numpy_input(y_train,i) for i in y_train.index]
# Test Image part ( 4 Dimensional)
test_image = np.array([df_to_numpy_image(X_valid,i) for i in X_valid.index])
test_x=test_image.reshape(-1,WIDTH,HEIGHT,3)
## Test Input part( 1 Dimensional )
test_y = [df_to_numpy_input(y_valid,i) for i in y_valid.index]
  

Alexnet Model - Non adapted model

We define the standard AlexNet non adapted

LR = 1e-3
MODEL_NAME = 'mmorpg-{}-{}.model'.format(LR, 'alexnet-non-adapted') 

def alexnet(width, height, lr, output=29):
    # Building 'AlexNet'                                                  #line
    network = input_data(shape=[None, width, height, 3])                  #0
    network = conv_2d(network, 96, 11, strides=4, activation='relu')      #1
    network = max_pool_2d(network, 3, strides=2)                          #2
    network = local_response_normalization(network)                       #3
    network = conv_2d(network, 256, 5, activation='relu')                 #4
    network = max_pool_2d(network, 3, strides=2)                          #5
    network = local_response_normalization(network)                       #6
    network = conv_2d(network, 384, 3, activation='relu')                 #7
    network = conv_2d(network, 384, 3, activation='relu')                 #8
    network = conv_2d(network, 256, 3, activation='relu')                 #9
    network = max_pool_2d(network, 3, strides=2)                          #10
    network = local_response_normalization(network)                       #11
    network = fully_connected(network, 4096, activation='tanh')           #12
    network = dropout(network, 0.5)                                       #13
    network = fully_connected(network, 4096, activation='tanh')           #14
    network = dropout(network, 0.5)                                       #15
    network = fully_connected(network, 29, activation='softmax')          #16
    network = regression(network, optimizer='momentum',                   #17
                         loss='categorical_crossentropy',
                         learning_rate=0.001)

    # Training
    model = tflearn.DNN(network, checkpoint_path='model_alexnet',
                        max_checkpoints=1, tensorboard_verbose=2, tensorboard_dir='log')

    return model

model = alexnet(WIDTH, HEIGHT, LR, output=29)

We train the model

model.fit(X, Y, n_epoch=5, validation_set=0.1, shuffle=True,
              show_metric=True, batch_size=64, snapshot_step=200,
              snapshot_epoch=False, run_id=MODEL_NAME)

Training Step: 15  | total loss: [1m[32m1.97406[0m[0m | time: 21.022s
| Momentum | epoch: 005 | loss: 1.97406 - acc: 0.4897 -- iter: 180/180

We have seen that the accuracy is less than 0.5 and the loss near to 2.0 . With the knowledge of the dimensions studied before we will adapt the model in appropiate way to improve the AlexNet model.

Understanding the parameters of AlexNet

The standard AlexNet network may be depicted as the Coursera Deep Learning Course:

AlexNet

Where we obtain the essential parameters for each of the layers depicted in the previous picture

The inputs of the neural nework in tensorflow is given by

input_data(shape=[None, width, height, 3])                  #0

#CONV 1
inputs  =227,227,3  #nw x nh x nc image
kernel  = 11,11      #fw x fw  filter
stride  = 4.0      #stride s
padding = 0.0      #padding p
filters = 96       #number of filters ncl
nparameters_convolution(inputs, kernel,stride,padding,filters)

Activation Shape, Activation Size, # Parameters

((55, 55, 96), 290400, 34944)

In TensorFlow this part corresponds to

conv_2d(network, 96, 11, strides=4, activation='relu')      #1

#POOL1
inputs  = 55, 55, 96 #nw x nh x nc 
kernel  = 3,3     #fw x fw  filter
stride  = 2.0      #stride s
padding = 0.0      #padding p
dim_pool(inputs, kernel,stride,padding)

Activation Shape, Activation Size, # Parameters

((27, 27, 96), 69984, 0)

In TensorFlow this part corresponds to

max_pool_2d(network, 3, strides=2)                          #2

After using a pool we can use a normalization

local_response_normalization(network)                       #3

#CONVOLUTION SAME 1
inputs  =27, 27, 96 #nw x nh x nc image
kernel  = 5,5      #fw x fw  filter
stride  = 1.0      #stride s
padding = 2.0      #padding p
filters = 256       #number of filters ncl
nparameters_convolution(inputs, kernel,stride,padding,filters)

Activation Shape, Activation Size, # Parameters

((27, 27, 256), 186624, 614656)