The world may have pivoted to video, but that doesn’t mean that everyone in the world is capable of seeing or hearing videos. Already at work captioning the content of images for a while, Facebook is moving onto video captioning. In a paper presented at the EMNLP 2018 conference in Brussels, Facebook researchers introduced VideoStory, a new dataset of annotated videos. The dataset was developed by painstakingly applying 160,000 timestamped descriptions on about 20,000 high-engagement Facebook videos. The team then used the dataset to train a recurrent neural network to come up with its own descriptions of videos, applying a second neural network that looked at earlier and later descriptions to ensure that the final outputs stayed in context. The results were mixed—babies confused for dogs and that sort of thing—but it’s a fascinating and promising foray into what could one day make the appreciation of viral video more inclusive.
sign up for our newsletter