We propose to work in depth on the analysis of ASR errors, taking the original point-of-view of the end user. This work package is therefore directly the continuation of the work carried out on obtaining the new corpus of the DIETS project (WP1), including the many perceptual tests on the reception of errors by humans. In the first sub-task, we will perform a detailed qualitative analysis on the user reception of these errors, first analyzing human-made ones (manual transcriptions), then automatic transcription errors with both classical pipeline and end-to-end systems. The objective is to better understand transcription errors, to be able, in sub-task 2.2, to seek an understanding of how errors are constructed in end-to-end ASR frameworks, in particular in the different layers of these deep learning systems. Finally, we propose in the last sub-task new approaches for visualizing errors in end-to-end ASR systems. Note that the work carried out in sub-tasks 1.2 and 1.3 will be based on the LIA ASR systems, of which we control the complete processing chain, and that we can easily extract information contained inside the systems, or adapt it if necessary.
Sub-task 2.1 Qualitative analysis of transcription errors. We will offer a detailed analysis of transcription errors, considering the corpus produced in WP2. Different levels of error analysis will be proposed using three types of transcriptions: manual ones, and automatic ones with both classical and end-to-end ASR systems.
Sub-task 2.2 Construction of errors in end-to-end ASR systems. Information contained and conveyed in end-to-end ASR systems are harder to identify (deep architectures) compared to classical ASR systems. In this sub-task, we will start by exploring the currently proposed approaches for understanding the information contained in these end-to-end systems.
Sub-task 2.3 Visualization of end-to-end ASR transcription errors. We will propose approaches to visualize transcription errors in end-to-end systems. The main objective is to build a cartography of errors based on the previous comprehension and analysis of end-to-end ASR systems construction (sub-task 2.2). We want to detect areas in neural networks that may be erroneous. This would help to have a general idea of how performant a system could be, as well as to point out the potential error categories, and their difficult reception from end-users.