Access this project’s GitHub page:

Description:
- Trained three different models (A Deep Bag of Frames “DBoF” using detailed frame-level data, and Logistic Regression and Multilayer Perceptron “MLP” models using more summarized video-level data) and tested them on three different styles of feature inputs (visual only, audio only, and combined audio/visual) in order to determine the optimal way to accurately assign multiple “tag” labels to videos
- Determined our video-level logistic regression approach to be optimal, achieving a gAP score of 0.37 for the audio/visual combined input and an 8% improvement in the RGB only input’s Hit@1 score as compared to our benchmark paper
Report File:
Access this Project’s Google Slide Presentation: