Please – No More Questions! I’ll Just Rewrite the Document.
Mark Anderson, writing for the Stack wrote a very interesting article, ‘Science Goes in Search of Your Lost Files’. Gangli Liu of the Department of Computer Science at Tsinghu University in China, has developed a wizard-based, interactive tool that provides questions to end users who can’t find what they are looking for. Based on the responses, the tool will modify the algorithms to hone in on the possibilities.
The first set of criteria is based on existing metadata, for example, does the end user remember the author, date, was it printed, keywords, file name, where did they last see it. To most searches this would be the first iteration in the search process. I’m not sure a wizard would do anymore than the end user themselves would do, except for prompting for any additional information the end user recalls.
The second set of criteria digs a little deeper, and also involves gathering metadata that is currently not available as standard. The example in the article was determining an understanding of the percentage of the document the end user read, and tries to identify links between the existing metadata such as a ‘link between a file author’s name and the same person in the user’s contacts’.
I’m quite sure I have over-simplified the tool, however, have we come to expect that we need a wizard to ask us a series of questions to find a document? In a nutshell, the wizard aggregates existing metadata and then tries to find links that would identify a relationship between the query and the answer. However, this also involves indexing to find the relationship. For an organization with millions of documents, I would question the scalability as well as the performance required to index millions of documents for all end user searches.
Would end users be willing to submit to several questions to locate the appropriate document? The effectiveness relies on the quality of metadata available. In this instance, if the basic metadata doesn’t return the correct response, then the wizard must use (or create a new algorithm) to answer the user’s query. We solve the metadata generation issue through the ability to automatically generate compound term metadata, which is metadata that consists of four to five words that represent a concept. The end user is not involved in the tagging, unless authorized. The end result is the ability to identify what the user is seeking simply by using the concept string, or keyword, the end user entered. If the end result isn’t quite what the user is looking for, faceted search will display the hierarchy of the taxonomy so the user can find possibilities and explore the relationships among documents that never would have occurred.