Generating a concise and informative video sum mary from a long video is important, yet subjective due to varying scene importance. Users’ ability to specify scene importance through text queries enhances the relevance of such summaries. This paper introduces an approach for query focused video summarization, aiming to align video summaries closely with user queries. To this end, we propose the Fully Convolutional Sequence Network with Attention (FCSNA QFVS), a novel approach designed for this task. Leveraging temporal convolutional and attention mechanisms, our model effectively extracts and highlights relevant content based on user-specified queries. Experimental validation on a benchmark dataset for query-focused video summarization demonstrates the effectiveness of our approach.
Overview of FCSNA-QFVS. Given a long video and a text query as input, we first divide the video into non-overlapping shots and group them into non-overlapping segments. Next, we pass the segmented video features to the feature learning module, where we learn visual features using eight sequential convolutional blocks. We then process these learned visual features through Local Self-Attention (LSA), Query-Guided Segment Attention (QGSA), and Global Attention (GA) to obtain locally important and globally query-guided features. We restore the original temporal length using two sequential deconvolutional layers. The feature learning network outputs the learned shot features, which we then pass to the shot scoring module to obtain a query relevance score for each shot. Finally, we generate the query-focused video summary based on these shot scores.
We thank the Computer Engineering Department at L. D. College of Engineering, Ahmedabad, for providing access to NVIDIA GPUs, which were used extensively for conducting our experiments.
@misc{patel2024interestsummariesqueryfocusedlong,
title={Your Interest, Your Summaries: Query-Focused Long Video Summarization},
author={Nirav Patel and Payal Prajapati and Maitrik Shah},
year={2024},
eprint={2410.14087},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2410.14087},
}