Journal
Publication Date
Dec 24, 2025
Authors
Abstract
The rapid increase in video content across various domains necessitates intelligent systems capable of automatically analyzing and summarizing key events from this data. This work successfully develops an end-to-end framework that utilizes 5G connectivity, Automatic Speech Recognition, Optical Character Recognition, and Large Language Models to generate concise, context-aware video summaries and textual overviews, enhancing user accessibility and comprehension without the need to watch full videos.