Skip to content

Sections
Personal tools
You are here: Home » Research » Projects » Phase II » IM2.AP ( Audio processing ) - lay summary

IM2.AP ( Audio processing ) - lay summary

Document Actions

Audio processing

 

IM2.AP
IP Head: John Dines (IDIAP)
Partners: IDIAP, ICSI.


The primary objective of the audio processing individual project is the research and develop
ment of audio processing techniques within the domain of IM2, with an emphasis on the meeting room environment. Work being conducted in audio processing is divided into four key tasks enumerated below:

  • Automatic speech recognition

The automatic machine recognition of spoken language; that is, the speech-to-text. Research in this task is primarily concerned with improving our models of acoustics and language. The main task setting is the meeting room environment, though fundamental research is also carried out in other domains.  In addition to the development of new and applied technologies in speech recognition, we are also actively involved in developing and disseminating software for research in speech processing.

  • Speaker recognition, clustering and segmentation

The recognition of people from recordings of their speech.  Related technologies that also researched in this task are speaker diarization (the determining who spoke when) and speaker localisation (determining where the speaker is located).

  • Microphone array processing for signal and ASR feature enhancement

Improving the quality of corrupted speech recordings, in terms of human perception and/or for subsequent speech recognition. A major focus of this work is on microphone array processing in which the audio from several microphones is combined.  Such processing is particularly useful in separating the speech  when several people are speaker simultaneously.

  • High level modelling of spoken language

Includes a variety of tasks such as semantic and syntactic modelling of spoken language and classification of paralinguistic vocal features (eg. sentence boundaries, prosody, laughter detection etc.).


Keywords:      Speech recognition, speaker recognition, speaker diarization, microphone array processing, meeting room processing

Last modified 2008-05-19 08:35
 

Powered by Plone