YouTube Video Classification: Video-Level vs. Frame-Level

Access this project’s GitHub page:

Description:

Trained three different models (A Deep Bag of Frames “DBoF” using detailed frame-level data, and Logistic Regression and Multilayer Perceptron “MLP” models using more summarized video-level data) and tested them on three different styles of feature inputs (visual only, audio only, and combined audio/visual) in order to determine the optimal way to accurately assign multiple “tag” labels to videos
Determined our video-level logistic regression approach to be optimal, achieving a gAP score of 0.37 for the audio/visual combined input and an 8% improvement in the RGB only input’s Hit@1 score as compared to our benchmark paper

Report File:

Access this Project’s Google Slide Presentation: