First Attempt

Monday August 20, 2018 at 11:29 am CST

The past couple weeks have been spent sorting data and building out the network. The following script was used to extract frames from the webm files I recorded. Part of it has already been shared in a previous post.

This iterates through all of the recordings in the webmfiles/ directory and extracts 105 frames from each at a rate of 21 frames a second. Each image extracted is 42 x 32 pixels. Each image is placed in a directory called training_set/ and given a name according to the pattern train_img_[webm_file_number]_[image_number]. Note that all of the webm files end in _[webm_file_number] where webm_file_number is 3 digits long (zero padding is added for numbers less than 100)

After this, the frames have to be separated into two categories - features and targets. I used the script below to place frames in the appropriate directory:

The script iterates through all of the images created. If the image number is divisible by three, it is placed in the labels/ directory. All other files are placed in the features/ directory. Ultimately, the files in the features directory will be concatenated such that the result has dimensions of 42 x 32 x 6. Each image starts with dimensions of 42 x 32 x 3. The 3 refers to the 3 rgb channels each image has.

The data is loaded and transformed like so:

The concatenated images are then run through two layers of convolution and max pooling. Each kernel is 5 x 5 pixels.

The output of the convolutional layers is fed into a four-layer autoencoder.

The results after training the network are rather underwhelming. This upcoming week will be spent making improvements to the network and displaying the output of the autoencoder. Hopefully, seeing the output will give me some insight as to the sorts of improvements that need to be made.


Photo by Calum MacAulay on Unsplash