When generating search results, search engines represent documents by metadata such as title and URL, but they also include a brief summary or extract to enable users to quickly determine if the document is relevant to their query. In the text domain, the operation of extracting representative text segments is straightforward. Regular expressions can be used to efficiently identify text segments matching the user’s query terms, highlight them with markup, and to locate blanks between words to break up long sentences.
More sophisticated processing can remove redundancy to form more meaningful extracts. In the video domain, extraction or summarization methods are not well defined and require complex video processing. The time required to preview video limits the total number of search results that a user is willing to tolerate viewing. Evaluating relevance of a particular document is more time consuming with video than in the text case.
Neglecting HTTP site response time, text documents load within a second or two and users may be able to judge instantly if the page is worth reading, and if so, quickly spot-checking several points in the document is usually enough to determine if the document satisfies the query. For video, a much larger amount of data must be downloaded and buffered prior to start-up. After the video starts, the relevant content that the user is looking for is typically not in the first few seconds of playback.
Video is normally consumed in a lean-back mode and so the content creators devote more time to lead-in material to pique the viewer’s interest. If a viewer attempts to seek past this content, then re-buffering must take place, and it is unlikely that the desired location will be arrived at on the first attempt. The long lead time required to evaluate document relevance frustrates users of video search.
Duplicate or near duplicate pages in Web search results can frustrate users as they repeatedly see pages that they have already rejected as being irrelevant to their query intent. In the text domain, duplication is trivial to detect and there are well accepted methods for determining document similarity (e.g. based on edit distance) that are reasonably efficient to compute in order to detect near duplicates.
Duplicate videos in query results lists present even more of a problem for video search engine users. Videos take a significant amount of time to start playing and the delay will be intolerable for users if they encounter duplicates in query result sets. Sometimes cues from metadata and thumbnails will be enough for users to determine duplications, but not always.
Duplicates are common in the video search applications, since a single source of video, say a television broadcast, may be captured by several viewers and posted to numerous sites. Also, the same video may be broadcast repeatedly or at different times for different television markets, so even if the recording time and broadcast channel of a captured video clip is available and accurate, that may not be enough to determine if the content is duplicated.
Twenty four hour news channels often rebroadcast footage of breaking news and may intersperse this with new video as it becomes available. Video duplicate detection is an algorithmic challenge and proposed algorithms are computationally intensive. Often a duplicate clip is posted to sharing sites with differing metadata.
Ranking and Indexing
Text information retrieval including ranking and document indexing algorithms are mature, and off-the-self solutions that perform efficiently at scale are available. Video indexing is an emerging technology and universally or widely accepted techniques are not available and may not operate with the scale necessary for practical Web video search. Often the algorithms are domain-specific and cannot be applied to unknown arbitrary video content.