Modeling speech errors by analogy

Abstract

Analogical modeling (AM) is a paradigm for investigating analogy-based reasoning including---but not restricted to---natural language use. A wide array of AM applications have studied phenomena in phonology, morphology, language evolution, diacritization, and vocabulary usage. To our knowledge no work has been done to use AM in the generation of full sentences, or in modeling speech production. In this paper we present efforts to model speech errors by second-language learners of Japanese (J2L). We describe J2L spoken language data that we have collected and characterize the kinds of speech errors they exhibit in full sentences. We then discuss how we were able to overcome the data paucity problem by using AM to bootstrap our data in the creation of a much larger J2L error corpus. To quantify the effectiveness of our approach, we evaluate the results using automatic speech recognition (ASR) to score the spoken utterances and compare them with human scoring of the same items. We show that, when the ASR system is enhanced with the extended speech error corpus, it does a better job at scoring the test responses.


Back to Table of Contents