Skip to main content
BACK

Journal

Publication Date

Dec 24, 2025

Authors

Abstract

The rapid increase in video content across various domains necessitates intelligent systems capable of automatically analyzing and summarizing key events from this data. This work successfully develops an end-to-end framework that utilizes 5G connectivity, Automatic Speech Recognition, Optical Character Recognition, and Large Language Models to generate concise, context-aware video summaries and textual overviews, enhancing user accessibility and comprehension without the need to watch full videos.