In this last work package, we propose to work on automatic diagnosis of transcription errors in end-to-end systems. The different results obtained from WP 2, mainly analysis oriented, will guide the automatic approaches that we will propose. Thus, we will first of all define automatic strategies allowing the detection of errors in end-to-end ASR systems (sub-task 3.1). Knowing the current evaluation limits of the word error rate (WER), we will imagine new evaluation strategies, in particular by orienting them on the reception and the cognitive impact of errors in users (sub-task 3.2). Finally, we will work on a still little explored area in end-to-end systems, namely confidence-building measures (sub-task 3.3), in addition to the work of sub-task 3.1.
Sub-task 3.1 Automatic detection of transcription errors in end-to-end systems. We will propose original approaches for the automatic detection of errors in end-to-end transcription systems. The motivation for this task is twofold: 1) provide initial results, still non-existent, for the detection of errors in end-to-end ASR systems, and 2) validate the observations and analyzes made in WP 2 concerning what transcription errors manifest themselves in end-to-end ASR systems.
Sub-task 3.2 Proposition of evaluation metrics Many methods have been proposed to evaluate automatic transcription systems. Even if it is of course difficult to do without the classic WER metric, as it is very widely used by the scientific community to evaluate systems, we wish to place ourselves in the perspective of metrics complementary to the WER, such as what it is done in machine translation, with classical metrics, such as BLEU, and more qualitative ones, with manual evaluation by experts. Our main objective of this sub-task is above all to have a more user-oriented evaluation, and thus really put into perspective the actual performance of ASR systems.
Sub-task 3.3 Confidence measures for end-to-end ASR systems. The final task that we want to carry out concerns the proposal of confidence-building measures for end-to-end ASR systems. The advantage of having proposed to work on the problem of evaluation upstream of this task is then to be able to evaluate our confidence scores with what is currently being done, such as Normalized Cross Entropy (NCE), but also to integrate a user-oriented evaluation. Indeed, we consider having good confidence measures on the words are important.