Call: Grand Challenge at ACM Multimedia 2011 on “Realistic Interaction in Online Virtual Environments”

Grand Challenge at ACM Multimedia 2011 on “Realistic Interaction in Online Virtual Environments”

EMC2 have teamed up with Huawai to present a Grand Challenge at ACM Multimedia 2011 on “Realistic Interaction in Online Virtual Environments”, which we believe may be of interest to you and your colleagues. The challenge provides a captured dataset that provides scope for research into a variety of areas, including the challenges main focus “real-time realistic interaction between humans in online virtual environments”. The provided dataset consists of recordings of a number of Salsa dancers, of differing expertise, from a variety of modalities, including microphones, cameras, inertial sensors and depth sensors. In addition, ground-truth annotations of the choreographies have been made. More details are given hereafter, or visit

Challenge Scenario

Consider an online dance class provided by an expert Salsa dancer teacher to be delivered via the web. The teacher will perform the class with all movements captured by a state of the art optical motion capture system. The resulting motion data will be used to animate a realistic avatar of the teacher in an online virtual dance studio. Students attending the online master-class will do so by manifesting their own individual avatar in the virtual dance studio. The real-time animation of each student’s avatar will be driven by whatever 3D capture technology is available to him/her.  This could be captured via visual sensing techniques using a single camera, a camera network, wearable inertial motion sensing, or recent gaming controllers such as the Nintendo Wii or the Microsoft Kinect. The animation of the student’s avatar in the virtual space will be real-time and realistically rendered, subject to the granularity of representation and interaction available from each capture mechanism.

Available Dataset

Of course, EMC2 partners are not expecting participants to this challenge to recreate this scenario, but rather work with a provided data set to illustrate key technical components that would be required to realize this kind of online interaction and communication. To this end, a dataset consisting of multimodal recordings of Salsa dancers of differing expertise has been captured and is now available for download (see below for link). So far, this dataset consists of 15 dancers, each performing 2 to 5 fixed choreographies, but this is expected to grow to more dancers over the next few weeks. Each dancer has been captured with

  • Synchronised 16-channel audio capture of dancers’ step sounds, voice and music;
  • Synchronised 5-camera video capture of the dancers from multiple viewpoints covering whole body, plus 4 non-synchronised additional video captures
  • Inertial (accelerometer + gyroscope + magnometer) sensor data captured from multiple sensors on the dancer’s body;
  • Depth maps for dancers’ performances captured using a Microsoft Kinect;
  • Original music excerpts;
  • Different types of ground-truth annotations, for instance annotations of the choreographies with reference steps time codes relative to the music and ratings of the dancers’ performances (by the Salsa teacher).

Full details, including instructions for downloading the data can be found at

ISPR Presence News

Search ISPR Presence News: