Podcasts are today one of the fastest-growing forms of digital content. Despite the richness of information they offer, their large-scale exploitation remains challenging, as they consist of unstructured audio that cannot be easily searched, analyzed, or filtered.
As part of our recent research and technological work, we developed a fully automated system that transforms podcasts into structured, analyzable, and recommendation-ready content, leveraging state-of-the-art Artificial Intelligence technologies.
From audio to written information
Our system is built on an end-to-end processing pipeline that:
-
converts audio into text through automatic speech recognition,
-
processes and cleans the data,
-
analyzes the text using NLP techniques to extract topics and key concepts, and
-
recommends relevant content to users.
The result is a collection of podcasts that can now be searched and organized based on their actual meaning, rather than just titles or tags.
Artificial Intelligence with real-world impact
This project is a clear example of how the combination of Data Engineering, Machine Learning, and NLP can deliver meaningful solutions to real-world problems. Rather than relying on isolated models, we designed an architecture that operates at scale and is capable of supporting production environments.
For organizations that manage large volumes of audio or multimedia content, solutions of this kind enable improved content discovery, enhanced user experience, and new opportunities for data-driven value creation.
A collaboration with tangible results
The development of the system was carried out through close collaboration between a research team from the International Hellenic University and European School Radio, within the framework of the European Kids Radio Europe project, combining expertise in data analytics, artificial intelligence, and distributed systems. The outcome is not merely a research study, but a functional technological solution with clear practical and business value. The technology developed by our team is already in production on europeanschoolradio.eu and youthradio.eu.
For those interested in the technical details of the approach, including the methodology and system architecture, the full publication is available here:
https://www.mdpi.com/3042-6308/2/1/1

