Classifying images of everyday objects using a neural network

MOHSEN DEHHAGHI
4 min readDec 19, 2020

In this post I trained a feed-forward neural network model to identify handwritten digits from the CIFAR10 dataset: https://www.cs.toronto.edu/~kriz/cifar.html with an accuracy of around 50.01%.

However, I also noticed that it is quite challenging to improve the accuracy beyond 50%, due to the model’s limited power.

Base Model class definition for training on GPU

I created a base model class to start off with, which contains everything except the model architecture (i.e. it will not contain the __init__ and __forward__ methods) so that later I extend this class to try out different architectures. In fact, I can extend this model to solve any image classification problem:

I extended the ImageClassificationBase class to refine the model definition as follows:

Followings are the path that I have taken through experimenting with different network architectures (#hidden layers, size of each hidden layer, activation function) and hyper-parameters (#epochs, LR) in so far as I get the desired validation loss & accuracy as follows:

Experiment#1: Start with the initial size of the two hidden layers being (16, 32) , resulting in the initial model accuracy of {‘val_acc’: 0.097} and the initial model loss of {‘val_loss’: 2.303}.

Experiment#2: Doubling the size of the two hidden layers from (16, 32) to (32, 64), doubled the model accuracy from {‘val_acc’: 0.097} to {‘val_acc’: 0.268} and decreased the model loss from {‘val_loss’: 2.303} down to {‘val_loss’: 1.941}.

Experiment#3: Quintuplicating the number of epochs from 5 to 25 while retaining the previous architecture and other hyperparameters, worsened the results a little bit such that it slightly decreased the model accuracy from {‘val_acc’: 0.268} to {‘val_acc’: 0.239} and slightly increased the model loss from {‘val_loss’: 1.941} up to {‘val_loss’: 1.968}.

Experiment#4: Doubling the Learning Rates from [0.5, 0.1, 0.01, 0.001] to [1, 0.2, 0.02, 0.002] in each training phase while retaining the previous architecture and other hyperparameters, did not help either and aggravated the results again such that it decreased the model accuracy from {‘val_acc’: 0.239} down to {‘val_acc’: 0.173} and increased the model loss from {‘val_loss’: 1.968} up to {‘val_loss’: 2.120}.

Experiment#5: Changing the Activation Function in the network architecture from F.relu() to torch.sigmoid() along with halving the Learning Rates back to [0.5, 0.1, 0.01, 0.001] while retaining the rest of the hyperparameters, indeed improved the results substantially so far as it leveraged the model accuracy from {‘val_acc’: 0.173} up to {‘val_acc’: 0.475} and dampened the model loss from {‘val_loss’: 2.120} way down to {‘val_loss’: 1.469}.

Experiment#6: Doubling total number of layers from 3 to 6 in the network architecture while retaining the rest of the hyperparameters as before, deteriorated the results substantially to the extent that it pushed the model accuracy from {‘val_acc’: 0.475} back down to {‘val_acc’: 0.168} and catapulted the model loss from {‘val_loss’: 1.469} way up to {‘val_loss’: 2.280}.

Experiment#7: Halving the total number of layers from 6 back to 3 in the network architecture along with decreasing the Learning Rates slightly from [0.5, 0.1, 0.01, 0.001] to [0.2, 0.02, 0.002, 0.0002] in each training phase while retaining the rest of the hyperparameters as before, enhanced the results substantially again to the extent that it bounced the model accuracy from {‘val_acc’: 0.168} way up to {‘val_acc’: 0.430} and diminish the model loss from {‘val_loss’: 2.280} back down to {‘val_loss’: 1.593}.

Experiment#8: Changing the optimizer from torch.optim.SGD to torch.optim.Adam while retaining the rest of the hyperparameters as before, crushed the results substantially again to the extent that it dampened the model accuracy from {‘val_acc’: 0.430} way down to {‘val_acc’: 0.094} and raised the model loss from {‘val_loss’: 1.593} back up to {‘val_loss’: 2.303}.

Experiment#9: Reverting the optimizer from torch.optim.Adam back to torch.optim.SGD while retaining the rest of the hyperparameters as before, regained back the good results again so that it jumped the model accuracy from {‘val_acc’: 0.094} way up to {‘val_acc’: 0.465} and diminished the model loss from {‘val_loss’: 2.303} back down to {‘val_loss’: 1.497}.

Experiment#10: Reverting back the Activation Function in the network architecture from torch.sigmoid() to F.relu() while retaining the rest of the hyperparameters, slightly enhanced the results so that it upticked the model accuracy from {‘val_acc’: 0.475} up to {‘val_acc’: 0.483} and downticked the model loss from {‘val_loss’: 1.469} way down to {‘val_loss’: 1.472}.

Experiment#11: Halving the Learning Rates slightly from [0.2, 0.02, 0.002, 0.0002] to [0.1, 0.01, 0.001, 0.0001] in each training phase while retaining the rest of the hyperparameters as before, improved the results slightly again such that it raised the model accuracy from {‘val_acc’: 0.483} up to {‘val_acc’: 0.490} and decreased the model loss from {‘val_loss’: 1.472} further down to {‘val_loss’: 1.451}.

Experiment#12: Increasing total number of layers from 3 to 4 in the network architecture while retaining the rest of the hyperparameters as before, improved the results again to the extent that it raised the model accuracy from {‘val_acc’: 0.490} further up to {‘val_acc’: 0.500097} and dampened the model loss from {‘val_loss’: 1.451} down to {‘val_loss’: 1.407}.

--

--