Archive | Search RSS feed for this section

Dysfunctional Search and the FBI, any of it sound familiar?

I think dysfunctional search is a great name. Unfortunately, it appears that the FBI wins the prize, but I am sure there are many organizations that also feel that their search is dysfunctional. An article in techdirt, ‘How The FBI’s Dysfunctional Search Systems Keep Information Out Of FOIA Requesters’ Hands’, did provide a chuckle, simply because it is just too late to take the US government seriously anymore.

To try to make this short, Trentadue versus the FBI, deals with a requested release of videotapes containing footage of the Oklahoma City bombing. Somehow during the first four days of testimony it was revealed that the FBI has ‘convenient’ information silos, instead of a cohesive repository for search. The problem is the person requesting the information must specify the correct records system for a comprehensive search to take place. The FBI typically only searches the main repository. In addition, the requester must specify in their request a ‘cross-reference’ check, which may mention the subject, but is not stored in the main repository. Again, the now beleaguered requester, must also send a request to the field offices involved, because the FBI ‘Records Information Dissemination’ has no cross-links to other than the original field office.

What about internal search at the FBI? The Central Records System (CRS), as it turns out, is not really a central repository and will accept three different methods of search, which will return three different sets of documents. One of the search methods, Automated Case Support (ACS) is used to search the CRS, but that search isn’t unified. To make matters worse the ACS is then split into three components. And, I think I’ll stop there as it just gets worse and worse, really it does. Oh, one more tidbit, the FBI decides what keywords to use.

I would imagine, or sincerely hope most organizations do not have a search environment such as the FBI. But enterprises do have silos of information and many have no integrated way to search across multiple repositories either via a software product that crosses repositories or through federated search. This should be a basic function. According to an AIIM study, only 18% of organizations have cross repository search capabilities. Maybe the FBI should provide training lessons.

Does your organization have cross repository search capabilities or federated search?

Comments are closed

Office 365 Compliance Search for eMail and Content -Good but not Good Enough

According to our third annual Microsoft Survey, the use of Exchange is almost a given. So is the rise of data breaches, which is most likely caused by your own employees. Security in Exchange for the identification of potential exposure can be done through the use of Compliance Search. This will enable administrators to search for common strings such as social security number, credit card numbers, or account numbers. The searches can be saved and re-executed. Concept Searching adds value to the identification of data privacy or confidential information, regardless of where it resides because it is not limited to defined descriptors such as a social security number, but can contain any descriptor and verbiage that you want secured.

Most security products, including Office 365 Compliance Search will identify the most likely, and standard descriptors typically used by most organizations. Sometimes that doesn’t always work. Confidential information, For Official Use Only (FOUO), new product information, competitive information, intellectual property, patents, or specific customer information may all contain confidential information, but it’s not easy as each subject may not have a common denominator to use as a rule. What to do then?

Concept Searching lets the organization quickly define rules that contain descriptors (social security number) and/or associated verbiage. Since we generate multi-term metadata that forms a concept the organization has no limit or bottlenecks trying to secure specific information. Once found, using Office 365 or SharePoint tools the content can be redirected to a secure repository, removed from search, and portability is prevented. Pretty cool. The rules are easily added, deleted if no longer necessary, and can be changed as the content the organization considers confidential may also change. In SharePoint, taxonomies can be deployed and when a document is found to have a data breach, the content type is automatically changed and classified against the taxonomy. Works when content is created or ingested, and in real-time. It works with diverse repositories, SharePoint, Office 365, You name it, you’re totally covered.

Comments are closed

Precision versus Recall – What is old becomes new again

During my research I often find some little snippets of information that make me stop and think about how ideas, theories, processes are repeated, imagining a highway being built that stretches endlessly in the horizon and we return to the starting point. It seems to be happening more often lately.

Even with technology we are still seeing history being repeated. Enterprise search has been around for about 67 years as described by J.E. Holmstrom in 1948. Machine Learning or Artificial Intelligence has been around for 61 years, and is now becoming the newest buzzword and must have technology. Precision and Recall, was introduced in 1955 when a gentleman named Allen Kent joined Case Western Reserve University. That same year, Kent and his colleagues published a paper in American Documentation describing the precision and recall measures as well as detailing a proposed “framework” for evaluating an Information Retrieval system which included statistical sampling methods for determining the number of relevant documents not retrieved.

Over three generations have passed, and what is ‘old’ is now ‘new’. Precision and Recall is now back in the news, at least in the legal industry. What brought this to mind is an article I read in Legaltech News, written by Zach Warren, it’s actually a good read regardless of industry as in almost all points he hits the nail on the head.

Years ago, the accuracy of search was measured by precision versus recall, in fact, we have several clients who use our tools to tweak and manage precision versus recall. Why? One is considered one of the top three global analyst firms, and they need precision and recall on their external client web site – poor search results equal lost revenue. The other client has 170K global users and needs accurate search results. The image from Wikipedia illustrates Precision and Recall in an easy-to-understand graphic.

These days, despite some of our clients, I don’t think it is used much. I also agree with the writer, that most tools don’t let you easily manipulate precision versus recall. It seems to be a forgotten metric in search efficiency. Luckily, our tools are easy-to-use and although precision and recall is a tough nut to crack it’s not like it used to be. Nice to see it back around again, at least in the legal industry.


Comments are closed