Skip to content

Personal tools
You are here: Home » Research » Projects » Phase I » Scene Analysis

Scene Analysis

Document Actions
SA: Scene Analysis

EPFL, Signal Processing Institute, Prof. Pierre Vandergheynst and Dr. Jean-Philippe. Thiran
EPFL, Mathématics Institute, Dr. Michel Bierlaire
University of Geneva, Computer Vision and Multimedia Laboratory, Dr. Slava Voloshynovskiy and Prof. Thierry Pun,,
University of Bern, Institut fuer Informatik, Prof. Horst Bunke,
EPFZ, Institut für Bildverarbeitung, Prof. Luc Van Gool,
IDIAP, Martigny, Computer Vision Group, Dr. Daniel Garcia-Perez and Dr. Jean-Marc Odobez,

Context and Goals
Automated scene analysis is a very old dream that comes true by small and slow steps. It turns out that, today, these steps are becoming increasingly sophisticated, whereas the ultimate goal is still beyond reach. Any working solution, even in a restricted area, will help to solve several every-day problems that cannot yet be fully trusted to a machine.

Research Issues
The initial research issues of IM2.SA have been on:

  • Intelligent segmentation
    Multimodal segmentation may be achieved by progressively integrating unimodal segmentation techniques, where each performs best. This integration will be performed by either combining results of unimodal results by a voting system or by defining thorough feature vectors so as to capture the global data significance at once.
  • Grouping
    Grouping is a crucial stepping stone in vision between low-level feature extraction and high-level scene interpretation, as it binds together image parts into entities with a higher semantic content.
  • Face and gesture analysis
    We will address the problem of robust face detection and recognition in adverse condition, e.g. under varying pose, with complex background, in non-uniform lightning or under occlusion.
  • Handwriting recognition
    The aim is to recognize general, unconstrained handwritten text in a writer-independent fashion.
  • Assessment
    It is difficult to assess segmentation strategies independently from a particular application context. Results from the Intelligent and multimodal segmentation, and from handwritten recognition will be assessed on individual databases, including the one to be developed during the course of the project. They will be included in the search engines of IP.IIR and be evaluated by means of the common benchmarks that will be developed in this IP (in common with various international research groups). Results from the Grouping module will be evaluated using already established databases. Results from Face and gesture analysis will be assessed by means of existing databases (for faces) and new ones that will be developed in Year 1 of the project.

These were reinforced recently thanks to 5 new White Paper (WP) proposals on

  • Applied stochastic image modeling
    The goal is to develop a new generalized framework for joint transformed domain stochastic image modeling that should serve as a theoretical basis for the practical solution of classical tasks of image processing such as compression, denoising and segmentation that are of high importance for the IM2.
  • Information Theoretical Multimodal Signal Processing
    Multimodal information is not just made by grouping several signals from different modalities. It is also characterised by their mutual relationships. Therefore it is very important to investigate a very general framework to extract the feature subspaces with mutual correspondence among signals and to analyze their statistical relationship in detail.
  • Behavioural model-based scene analysis
    The goal is to define a new methodology to design behavioural models based on discrete choice theory to predict the movement of individuals in a scene, to calibrate them using real video sequences and to include these mathematical models in segmentation algorithms for robust analysis of scenes involving human being
  • People tracking and activity recognition using multiple cameras
    The goal is to develop new methods for simultaneously tracking people and recognizing their activity. The system will be based on multiple camera input and will use sequential Monte Carlo methods and graphical models
  • A self learning visual tracker
    The idea here is to design a tracker that start off as generic tracker but evolve towards optimized tracking of specific objects that have been indicated to the system by the user. This type of tracker will be more robust with respect to loosing the tracked object or lighting condition changes.

Year 1:
Provide the basis of unimodal interactive segmentation techniques (consistent with the goal of IM2.IIR). Define and implement environment sensing features using Order statistic filters, mathematical morphology and fuzzy logic. Obtain the automated extraction of ordered groupings that consist of a planar pattern that is being repeated in an ordered way (repetitions need not be coplanar). Define and collect a database including visual and acoustic data for lipreading and handwritten queries. Develop tools for graph extraction from the images in the database, and for transforming a user's query into a graph. Develop tools for basic handwriting recognition. Determine features for the self learning tracker and the activity recognizer. Data collection for crowd environments. Decomposition of multi modal signal in various feature subspaces.

Year 2:
Implement various image processing algorithms and feature extraction methods. Develop various classifiers for the visual signal. Measure the performance of these classifiers will be measured on the database collected in the first year of the project. Detect grouping hierarchies in the repetition, e.g. a mirror symmetry both halfs of which in turn consist of a periodic pattern. Develop tools for unconstrained handwriting recognition that go beyond the current state of the art. Application of the self learning tracker to the smart meeting room to track documents. Integrate the sequential Monte Carlo filters into the Torch environment. Design of multidimensional data approximation and stochastic/geometric model specification. Crowd segmentation and calibrated behavioral models.

Year 3:
Integrate full multimodality into the segmentation process and develop new high level scene understanding features (eg filters for texture analysis based on further modelling of the HVS). Develop elastic graph matching procedures. Develop and test classifier combination methods for the visual signals using as reference individual classifier performances of the previous year. Further development of sophisticated handwriting recognition tools. Generalize the self learning tracker. Recognition of the activity of a person in multiple cameras scenarios. Face image compression and denoising. Segmentation of dense crowd scenes. Error probabilities for media conversion, multimodal data fusion and compression.

Year 4:
Establish full-fletched geometry-based grouping for ordered repetitions, yielding descriptions of their structures and unit cells. Benchmark and test the sketch query system and evaluate its performances dependent on the setting of various system parameter. Combine, implement and test classifiers using both visual and acoustic information for lipreading. Develop robust handwriting recognition tools that can be used in a user independent fashion based on large vocabularies for database queries and other applications.


- EPFL Signal Processing Institute
- UniGe Computer Vision and Multimedia Laboratory
- UniBe, Institute of Computer Science and Applied Mathematics (IAM),
- EPFZ, Institut für Bildverarbeitung,
- IDIAP, Martigny, Computer Vision Group,
- EPFL, Mathématics Institute,

To see a list of publications on this site click here

Quarterly status reports
Available on the local site (password protected).

Last modified 2006-02-03 15:57

Powered by Plone