Research

Document Actions

In the preface of the initial IM2 Proposal dated March 15, 2000, we state that:

"By studying, inventing and developing technologies that support multimodal interaction with computers, this NCCR will become a focal point of excellence in the development of related technologies such as human-computer dialogue, computer vision, machine learning, and automatic speech and speaker recognition. By doing so it will carry out the highest quality computer science research on problems of importance to business and society."

As a matter of fact, we believe that most of the goals set forth in the initial IM2 Proposal have been successfully met, and even often surpassed during the first four years of the NCCR. Indeed, the first years of IM2 have significantly contributed to the development of a new research field referred to as multimodal processing, which is now viewed as increasingly important at the international level. This is particularly true for the EU, where this aspect is clearly present in the 6th Framework Programme (FWP), and should be further emphasized in the 7th FWP. In this respect, IM2 is now recognized worldwide for its contributions in related areas such as speech and language understanding, computer vision, multi-channel processing and fusion, and multimedia indexing. Furthermore, IM2 was among the first projects focused on multimodal meeting recordings, which is now attracting more and more attention (resulting in several new projects worldwide). IM2 is among the first to work on large multimodal databases, and to make them publicly and readily available. Thus, IM2 is not only significantly contributing to the field, but is also in a good position to set up international research and development standards in the field. Finally, while the general theme of IM2 (multimodal processing and multimedia information management) is of fundamental interest to both scientific and industrial worlds, each of its research components has its own potential for advanced research and technology transfer, as it was already concretely achieved so far in IM2.

This success was mainly due to the definition of a common challenging application, namely smart meeting rooms, which fostered a great deal of collaboration between several of the IM2 partners. Consequently, to keep building upon these achievements, IM2 will keep the same focus, further emphasizing the highest possible quality, cross-disciplinarity, collaboration between the IM2 partners, and large scale (international) evaluations.

In the comming years, IM2 will thus address the following themes:

Multimodal input interface: including speech signal processing (natural speech recognition, speaker tracking, segmentation, and recognition) and visual input (e.g., shape tracking, face and gesture recognition, printed document processing and handwriting recognition).
Integration of modalities and coordination among modalities, including (asynchronous) multi-channel processing (e.g., audio-visual tracking), integration of knowledge sources (expert fusion), and multimodal language modeling.
Meeting dynamics and human-human interaction modeling, including the definition of meeting scenarios, analysing human interaction and multimodal dialogue modeling.
Content abstraction, including multimodal information indexing, summarizing, and retrieval.
Technology transfer through exploration and evaluation of advanced end-user applications, evaluating the advantages and drawbacks of the above functionalities in different prototype systems.
Internal and external training activities, also integrating large international exchange training programs (e.g., EU projects and ICSI/Berkeley).

Last modified 2005-11-08 14:52

Sections

Personal tools

Navigation

Address

Research

Document Actions

News and Press

Quick Links