Quantification of audio/video de-synchronization with reliable confidence scores
de Thibault, Adrien
Promotor(s) :
Louppe, Gilles
Date of defense : 30-Jun-2025/1-Jul-2025 • Permalink : http://hdl.handle.net/2268.2/23370
Details
| Title : | Quantification of audio/video de-synchronization with reliable confidence scores |
| Translated title : | [fr] Quantification de la désynchronisation audio/vidéo avec des indices de confiance fiables |
| Author : | de Thibault, Adrien
|
| Date of defense : | 30-Jun-2025/1-Jul-2025 |
| Advisor(s) : | Louppe, Gilles
|
| Committee's member(s) : | Massoz, Quentin
Cioppa, Anthony
Leduc, Guy
|
| Language : | English |
| Number of pages : | 66 |
| Keywords : | [en] AV Synchronization [en] computer vision [en] Synchformer [en] confidence score [en] deep learning [en] fine-tuning |
| Discipline(s) : | Engineering, computing & technology > Computer science |
| Target public : | Researchers Professionals of domain Student |
| Institution(s) : | Université de Liège, Liège, Belgique |
| Degree: | Master en ingénieur civil en informatique, à finalité spécialisée en "intelligent systems" |
| Faculty: | Master thesis of the Faculté des Sciences appliquées |
Abstract
[en] Audio-video desynchronization remains a significant challenge in broadcast and multimedia applications, as even slight desynchronization can negatively affect viewer experience. This master's thesis focuses on the evaluation and improvement of Synchformer, a state-of-the-art classifier model for estimating temporal offsets for 'in the wild' video content. A second focus is placed on assessing uncertainty associated with predictions of the model.
To achieve finer and more reliable predictions, Synchformer was fine-tuned for temporal offset regression. These fine-tuned models offer consistent performance across the full range of temporal offsets, addressing a limitation of the original discrete-class approach. In parallel, various confidence measures were explored to quantify prediction certainty. Some measures are based on predictions from Synchformer, while others use a history of predictions to assess the consistency of the model over time. Based on these measures, classification techniques were developed to distinguish between reliable and unreliable predictions.
Experiments conducted on SyncST, a newly introduced dataset of over 1500 broadcast video clips, showed that, when considering discrete offset, it is possible to classify reliable predictions with a precision of 92% for Synchformer in classification, corresponding to an error tolerance of 200ms.
In a regression setting, where offsets are continuous and the error tolerance is 170ms to align with the humanly undetectable limits, this reliability classification achieves a precision of 75% on fine-tuned versions of Synchformer. While, when using an error tolerance of 275ms, which is an acceptable error for viewers, it is possible to achieve a precision of 90%. Considering the low performance of these models at predicting desynchronization below what is humanly perceivable or acceptable, these contributions improve the trustworthiness that could be put in such a synchronization model, making it more applicable to real-world applications where low error tolerance and interpretability are critical.
File(s)
Document(s)
TFE_report_adth.pdf
Description:
Size: 6.43 MB
Format: Adobe PDF
Annexe(s)
classification_impossible_offsets.png
Description:
Size: 32.78 kB
Format: image/png
confidence_classification_diagram.png
Description:
Size: 560.94 kB
Format: image/png
mae_vs_undetectable_accuracy.png
Description:
Size: 258.47 kB
Format: image/png
performance_thresholding_regression.png
Description:
Size: 75.08 kB
Format: image/png
synchformer_model.png
Description:
Size: 72 kB
Format: image/png
Synchformer_without_git.zip
Description: The archive containing the code for this master's thesis
Size: 119.39 MB
Format: Unknown
TFE_AV_synchronization_abstract.pdf
Description:
Size: 273.88 kB
Format: Adobe PDF
Cite this master thesis
The University of Liège does not guarantee the scientific quality of these students' works or the accuracy of all the information they contain.

Master Thesis Online

