Kristen Grauman
The University of Texas at Austin
Top CV ResearchersFrontier Research MapScore: 9h-index: 9936,396 citations
Top CV Researcher — Rank #6 (top 10)
Professor of Computer Science
Contributions
visual recognition, egocentric vision, embodied AI, video understanding
Why Selected
A leading figure in visual recognition and egocentric/embodied vision, with strong recent visibility in video-centered vision research.
Score Breakdown
2
historical impact
3
recent visibility
2
current influence
2
asset availability
9
total
Frontier Research Map
Why Now
Useful because it highlights that the frontier is not just image-text; it is temporally grounded multimodal perception rooted in action.
Key Ideas
- -Egocentric video is a privileged route to affordances, intent, and physical interaction.
- -Audio is a strong but underused signal for grounding actions and state changes in video.
- -Current large multimodal models still miss temporal structure and evidence grounding in long video.
Open Questions
- ?What is the right unit of understanding in video: frame, clip, state change, or action program?
- ?How should multimodal systems represent causally meaningful sound rather than mere correlation?
- ?Can egocentric video become for embodiment what web text was for language models?
Canonical CV Leadershigh confidence
Slides
Sign in to access this content