Given all these aspects which make video search more difficult than text search, together with the fact that text search engines are far from perfect themselves, it may be surprising that successful video search systems have been deployed at all. This may be explained by considering areas where video search is less problematic than text search. Although browsing video results sets is more time consuming than for text, the human visual system can process images more quickly than text.
Therefore a first level of results set filtering can be nearly instantaneous. Of course this assumes that a reasonable set of representative key frames have been extracted and can be rendered quickly. Users can scan arrays of these images to quickly select potentially relevant video segments. Obviously this process is not without error since a single key frame cannot convey all of the information from the video clip which is of course a sequence of frames.
However, users can make reasonably accurate general assessments of the global nature of the video given a single frame. For example, one can differentiate easily between broadcast television content and amateur video blog postings based on the quality and content of a single frame, particularly if certain cues such as text overlays are present.
Another factor in video search engines’ favor relates to the application areas and user expectations. Often video search is used for entertainment purposes in which an irrelevant video may be less of a problem than in the text domain.
Text search can also be used in a less task-oriented, more entertainment- like mode where the user meanders in different directions than the original search topic. With video search, however, the user fully expects that consuming the results of the search will take time given the linear nature of the media. In this sense video search is a more forgiving task than text search and users may be more tolerant of error in some applications.
On the other hand, applications including education or research are not error-tolerant and even entertainment applications will be improved given more accurate or personalized video search and some controlled semi-randomness in the results set can be injected if desired. Search activities can be classified into three broad categories: (1) browsing or exploring the collection; (2) finding an arbitrary video that satisfies the query; and (3) finding all relevant videos.