Skip to content

Personal tools
You are here: Home » Research » Projects » Phase I » Multimodal Input and Modality Integration

Multimodal Input and Modality Integration

Document Actions
MI: Multimodal Input and Modality Integration

IDIAP, Speech Processing Group (IP Leader): Samy Bengio,
ETHZ/TIK, Speech Processing Group: Beat Pfister,
CSEM, P. Celka
UniGe/FBML (Functional Brain Mapping Lab); R. Grave, S. Gonzalez, C.M. Michel
UniGe/CVML (Computer Vision on Mutlimedia Lab); S. Voloshynovsky, T. Pun
Other: J. Millan (currently EPFL)

Context and Goals
The goals of IM2.MI are the research and development of principled methods for the fusion and efficient decoding of different input modalities (multi-channel processing). The objectives can thus be decomposed as follows:

  • Development of new multi-channel, multi-rate, signal processing techniques (including EEG processing)
  • Development of new data fusion algorithms, as well as new decision strategies
  • Development of a multichannel statistical model for the combination of asynchronous input streams
  • Development of an efficient multimodal decoder
  • Implementation of all these algorithms into a common software platform.

Research Issues
Over the next two years, IM2.MI will address the above goals by focusing on the following tasks:

  • Investigation of new data fusion algorithms. On top of the classical (but static) data fusion algorithms such as Artificial Neural Networks or Support Vector Machines, we intend to adapt models such as HMMs (and its many hybrid variants) or Bayesian Networks to handle dynamical data fusion problems.
  • Developing and testing new multi-channel processing schemes for multimodal inputs, including further research and development of state-of-the-art fusion models based on multichannel approaches, such as have recently been proposed at IDIAP (see and which started to be used for audio-visual speech recognition.
  • Efficient multimodal decoding taking into account asynchronous sequences. As the different input streams might be asynchronous (i.e. convey related information at a different time or even at a different time scale), it is important to search for efficient decoding algorithms that can handle such asynchrony. Existing decoding algorithms can become exponential with respect to the number of streams and time duration.
  • Investigation of asynchronous fusion decision strategies. Once again, multiple decisions can be taken at different times, and should take into account the asynchrony of the streams in order to optimize a global criterion.
  • Preparation of a software for multimodal input integration as well as multimodal decoding, based on the existing platform ( IDIAP has recently released this new machine learning platform which already contains many classical algorithms (HMMs, GMMs, MLPs, SVMs, etc) and we intend to integrate new multimodal input integration algorithms into this platform in order to get an integrated testing environment.
  • Interaction between multimodal inputs and the dialogue module (IM2.MDM), using the provided constraints to simplify the search during decoding.

In the framework of this project, a (White Paper) project on Brain Machine Interface has also been defined. Addressing a particular kind of (advanced) man-machine interface, this project aims at investigating the possibility of classifying spontaneous brain activity based on either reconstructed barin activity maps, or directly from EEG recordings (thus, a particular kind of multi-channel processing).

Regarding mutli-channel processing and multimodal integration, the expected outcomes of this project over the first two years can be summarized as follows:

  • Development of a baseline multimodal integration system and tests on a predefined task.
  • Addition of new fusion algorithms and decision strategies to the baseline system.
  • Integration of minimum syntactic constraints into the baseline system.
  • Addition of new decoding algorithms for asynchronous sequences under various constraints.

Regarding the related project on Brain Machine Interface, the expected deliverables for the first two years have been set as follows :

  • Development of basic instrumentation and pre-processing
  • Three dimensional electromagnetic imaging
  • Analysis and classification of brain activity signals

Software download:



White Papers

Quarterly status reports
Available on the local site (password protected).

Last modified 2006-02-03 15:56

Powered by Plone