![]() |
![]() |
University of Birmingham > Talks@bham > Speech Recognition by Synthesis Seminars > Multimodal First-Person Activity Recognition and Summarization
Multimodal First-Person Activity Recognition and SummarizationAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Dr. Philip Weber. First-person (egocentric) videos are captured using a camera on a person and reflects the first person view perspective. In these videos, the observer itself is involved in the events and the camera undergoes large amounts of ego-motion. In typical third-person perspective videos, the camera is usually stationary and it is away from the actors involved in the events. These different characteristics of first-person videos makes it difficult to use the existing approaches directly and necessitate different approaches to the problem. In addition to the video data, use of additional modalities has the potential to contribute positively by bringing complementary information. An important modality is audio as it is readily accessible and allows detecting different activities and interactions. On the other hand, fusion of different modalities also brings new challenges. In this talk, I will be talking about the current state-of-the-art and particular challenges regarding the analysis of multimodal first-person data. This talk is part of the Speech Recognition by Synthesis Seminars series. This talk is included in these lists:Note that ex-directory lists are not shown. |
Other listsArtificial Intelligence and Natural Computation seminars SERENE Seminars ddddOther talksTBA Ultrafast Spectroscopy and Microscopy as probes of Energy Materials Towards the next generation of hazardous weather prediction: observation uncertainty and data assimilation Gravity wave detection and new findings Time crystals, time quasicrystals, and time crystal dynamics Kneser Graphs are Hamiltonian |