A major issue of language processing evaluation metrics concerns the fact that they are designed to globally measure a proposed solution from a considered reference, with the main objective of being able to compare systems with each other. The choice of the used evaluation metrics is very often crucial since the research undertaken to improve these systems is based on them. While automatic systems, such as speech transcription, are aimed at end-users, they are finally little studied: the impact of these automatic errors on humans, and the way in which they are perceived at the cognitive level, has not been studied, and then ultimately integrated into the evaluation process.
The DIETS project, financed by the Agence Nationale de la Recherche (2021-2024) and carried by the Laboratoire Informatique d’Avignon, proposes to focus on the problematic of diagnosis/evaluation of end-to-end automatic speech recognition (ASR) systems, based on deep neural network architectures, by integrating human reception of transcription errors from a cognitive point-of-view. The challenge is here twofold:
1) To analyze finely ASR errors from a human reception.
2) To understand and detect how these errors manifest themselves in an end-to-end ASR framework, whose work is inspired by how the human brain works.
The DIETS project aimes at pushing the current limits concerning the understanding of end-to-end ASR systems, and initiating new research integrating a transversal approach (computer science, linguistics, cognitive sciences…) by putting back the human in the center of the development of automatic systems.