Kristen Grauman

The University of Texas at Austin

Top CV ResearchersFrontier Research MapScore: 9h-index: 9936,396 citations

Homepage UT Austin homepage Gen4AVC workshop page Slides PDF Semantic Scholar

Top CV Researcher — Rank #6 (top 10)

Professor of Computer Science

Contributions

visual recognition, egocentric vision, embodied AI, video understanding

Why Selected

A leading figure in visual recognition and egocentric/embodied vision, with strong recent visibility in video-centered vision research.

Score Breakdown

historical impact

recent visibility

current influence

asset availability

total

Frontier Research Map

Featured Work

Discovering and Generating Action Sounds from Video

official workshop page — 2025-10-19

Why Now

Useful because it highlights that the frontier is not just image-text; it is temporally grounded multimodal perception rooted in action.

Key Ideas

-Egocentric video is a privileged route to affordances, intent, and physical interaction.
-Audio is a strong but underused signal for grounding actions and state changes in video.
-Current large multimodal models still miss temporal structure and evidence grounding in long video.

Open Questions

?What is the right unit of understanding in video: frame, clip, state change, or action program?
?How should multimodal systems represent causally meaningful sound rather than mere correlation?
?Can egocentric video become for embodiment what web text was for language models?

Canonical CV Leadershigh confidence

Slides

← Back to People