Cees Snoek, member of the Intelligent Systems Lab of the University of Amsterdam, talked about the future of video search. He starts by explaining how the traditional and familiar video search engines work: via text queries. But Snoek points out that an interface working with text queries is insufficient to produce satisfying results. This way of video searching may work if you have a simple query like „flower“. Yet if you have a more complicated query like „Find shots of one or more helicopters in flight“ the classical textbased search interface would not generate adequate results.
Furthermore he explains that the problem with picture or video search is that human beings as cognitive animals perceive semantic patterns when looking at something. This is an attribute computers do not have, and therefore Snoek speaks of a semantic gap between machines and human beings who have the ability to interpret what they perceive and transfer it into semantic patterns. The aspect that human visual perception is a very complex task which needs a big amount of ressources is supported by the fact that visual percipience needs 50% of our cognitive capacity, while playing chess only requires 5%. In his research Snoek tries to find a way to close the semantic and to find solutions to label and name the world‘s visual information.
Cees Snoek presents a modern form of semantic video search engine which is called MediaMill. In his model he provides the search engine with a huge amount of image fragments which can be connected with a particular search query. The engine then calculates every image oder video in regards to a lot of different distinguishing features as color, texture or shape. After this step the search engine determines a distinctive correlation between the particular distinguishing features and the search query supplied by the user. This analysis is the basis for a statistic model – the semantic concept detector – which can be used to search a database for other pictures fitting to this model (Example Video of Semantic Pathfinder). The results found by this model are presented to the user by something Snoek introduces as a CrossBrowser (Example Video here). The vertical axis shows the parts of a video detected by the search engine, while the horizontal axis presents the timeline of specific video clip.
Besides Cees Snoek also presents the VideOlympics, a contest where search engine researches compete in video searching. In front of a live audience different teams try to get the best results in video retrieval for a certain set of search queries (VideoOlympics showcase video).