Skip to content

Search Distance

Statue in Rodin's garden in Paris, France In thinking about how metadata is used necessarily in searching for most non-text items, a related concept of how far we are from what we seek in a search.  In some cases all we can use is metadata. For example when we search for an image, We cannot present an image to a search engine and say find me one that’s exactly like or similar to the one I showed you.  Image interpretation  is not regularly successful.

The notion of search distance is not the same as precision. For example, giving the latitude and longitude of a location to a mapping software always yields precise results, and we are using metadata in that case. Supplying enough quoted material to uniquely identify a poem to a full-text search engine will return a relevant result, provided the poem is in the engine’s database. Finally, consider a database where only keyword searching is permitted, and one in which the keywords are established by experts. We can expect precise results here too, even though we may be using terms that are not directly in the object we seek. If and when we get to the point where we can search based on meaning rather than only on matching we’ll be able to regularly get relevant results.

For  resources where we only search the primary items in the collection using no metadata we are doing a search of zero distance. Note that this does not guarantee relevance or precision.  Where we use metadata that is part of the encoding process then the search distance is two. For example, looking for files of type mp3 with a bit rate of 128 kbps, a sample rate of 44.1 Khz , and title imagine.mp3 can yield a precise result using the metadata.  As a side note there is some progress doing contextual searches in mutlimedia. The site blinkx “uses a unique combination of patented conceptual search, speech recognition and video analysis software to efficiently, automatically and accurately find and qualify online video.” That said, consider this quote form the Wikipedia entry on Video Search Engine “It is generally acknowledged that visual search into video does not work well and that no company is using it publicly. Researchers at UC San Diego and Carnegie Melon University have been working on the visual search problem for more than 15 years, and admitted at a “Future of Search” conference at UC Berkeley in the Spring of 2007 that it was years away from being viable even in simple search.”

Search distance of three is where text or other items are added to a resource, and we search the added description, summary, or tags only. This seems to be possibly the least precise. One can cite exceptions where the tags come form a limited vocabulary and all persons applying the terms agree upon the definitions of the items in the vocabulary. In the earlier days of the Web, some folks would fill Web pages with inappropriate  invisible terms. This would make the pages turn up in naive search engines, and the pages had no visible relation to the search terms. The same can happen if the vocabulary is applied in a haphazard way or even in a deceitful way. Yet , we need not cast out the notion that tags applied by anyone would yield completely irrelevant results. In deed, we’ve seen though several examples how the collective wisdom of a group tagging bookmarks, photos, or other items yields a useful set of tags, even if the process is unwieldy.

Sorry for the ramble. Just trying to get my hands on the issue.

Post a Comment

Your email is never published nor shared. Required fields are marked *