This table provides an overview of all Concept Searching Technology Platforms and the components for each platform.
|Standard Components||conceptClassifier for SharePoint Platform||conceptClassifier for Office 365 Platform||conceptClassifier Platform||Concept Searching Technology Platform|
|conceptClassifier for SharePoint 2013||conceptClassifier for SharePoint 2010||conceptClassifier for SharePoint 2007|
|Compound Term Processing Engine – licensed for concept extraction only||yes||yes||yes||yes||yes||Full search functionality included|
|SharePoint Feature Set||yes||yes||yes||yes||no||yes|
|APIs, custom controls, demonstration source code||no||no||no||no||yes||yes|
|Proprietary controls for SharePoint 2007||no||no||yes||no||no||yes|
|Optional Components||conceptClassifier for SharePoint 2013||conceptClassifier for SharePoint Platform conceptClassifier for SharePoint 2010||conceptClassifier for SharePoint 2007||conceptClassifier for Office 365 Platform||conceptClassifier Platform||Concept Searching Technology Platform|
|conceptSearch||yes||yes||yes||yes||yes||Included in Base Product|
|conceptSQL||yes||yes||yes||yes||yes||Included in Base Product|
|Content Enrichment Service for SharePoint 2013||yes||no||no||no||no||yes|
|FAST Pipeline Stage for SharePoint 2010||yes||yes||no||no||no||yes|
|conceptClassifier for OneDrive for Business||yes||no||no||yes||no||no|
|Additional Classification Servers||yes||yes||yes||yes||yes||yes|
|Additional Front End Web Servers||yes||yes||yes||N/A||yes||yes|
Compound Term Processing Engine for Concept Extraction
“Incorporating Concept Searching’s compound term processing engine capability into AFMS Knowledge Management operations has increased exposure of enterprise content and the precision by which that information is retrieved.”
J.D. WHITLOCK, Lt Col, USAF, MSC, CPHIMS (Retd)
Air Force Medical Service
The challenge with metadata is both obvious and elusive. To harness the meaning of content, tools must be utilized that enable content to be managed and retrieved at the same rate that it is being created, ingested, and distributed. The fundamental factor in extracting knowledge from content is the quality of metadata which is used by many applications within an organization.
A lack of metadata, ambiguous metadata, and subjective metadata forms the crux of many challenges in an organization impacting not only search and retrieval but also records management, data privacy, migration, enterprise social networking, and eDiscovery and litigation.
What are the problems associated with metadata tagging?
- Insufficient metadata
- Ambiguous metadata
- Subjective metadata
- No metadata
- Relying solely on system generated metadata
- Relying on the end user to consistently select the correct metadata tags for example from a drop down list
Compound Term Processing Engine
Concept Searching’s industry unique compound term processing technology is an adaptive and scalable technology platform that enables the identification and the correct weighting of multi-word concepts in unstructured text. This provides the rapid creation of semantic metadata, which can be classified to organizationally defined taxonomies. The tagging and auto-classification of content can be aligned to business goals and the semantic metadata generated can be easily integrated with any third party application or platform that can interface via web services.
For example, in a typical search scenario, many words have multiple meanings as depicted in the graphic. Results from an end user query, depending on the search engine, can return keywords only, the query words that appear close together (proximity), or results based on other end user queries (boosting), and other search enhancement technologies, typically used by Internet search engines.
In this example using compound term processing, a search for ‘survival rates following a triple heart bypass’ will locate documents about this topic even if this precise phrase is not contained in any document. Compound term processing can extract the key concepts, in this case ‘survival rates’ and ‘triple heart bypass’ and use these concepts to select the most relevant documents, such as those containing ‘heart attack’ or ‘coronary artery surgery’.
|eDiscovery, Litigation Support, FOIA
Enterprise Social Applications
The Benefits of Compound Term Processing
- Out-of-the-box capabilities
- Adaptive technology that can generate conceptual metadata at source
- Eliminates end user tagging
- Correctly weights multi-word concepts in unstructured and semi-structured data
- Managed through the taxonomy component by Subject Matter Experts
- No need for programming resources or technical expertise
- Through the taxonomy component supports use of folksonomies, controlled vocabularies, and normalization of vocabulary
“With more than 30,000 current users, the MyMoffitt Patient Portal has seen significant growth, and of the new
patients that come to Moffitt, 87% register for a patient portal account”
Director of Portal Technologies and Data Management
H. Lee Moffitt Cancer Center & Research Institute
Taxonomies are not new. Yet many organizations do not even have a defined structure to organize unstructured and semi-structured content. Typically, many organizations have home-grown solutions based on departments, locations, or business function. In the past, although not optimal, they worked. That is no longer the case. The resulting gaps lead to non-compliance, increased risk, and reduced organizational performance. Those that use taxonomies typically find they are hard to deploy, manage, and utilize valuable resources.
For solutions that use auto-classification, the classification is either highly general, for example this document comes from Finance, or dependent on end user metadata or system defined metadata. Without the ability to identify ‘concepts in context’ the hierarchical structure contains little value and more importantly, the metadata is rendered useless to other applications that could be improved.
conceptTaxonomyManager has the capability to automatically group unstructured content together based on an understanding of the concepts and ideas that share mutual attributes while separating dissimilar concepts. This approach is instrumental in delivering relevant information via the taxonomy structure as well as using the semantic metadata in any business application that requires metadata. Using one or more taxonomies, unstructured content can be leveraged to improve any application that uses metadata.
|conceptTaxonomyManager can be used for a wide range of applications. In records management the reason most cited for failure is end user acceptance to appropriately tag documents of record. Although an organization may have a robust file plan and retention schedule it is the end users who ultimately make it succeed or fail by applying appropriate tags.Since Concept Searching technologies can automatically generate intelligent metadata and automatically classify the content as it is created or ingested, they effectively enforce governance at the desktop.Corporate compliance initiatives cover a wide range of laws such as HIPAA, Sarbanes-Oxley, ITAR, and federal mandates. The processes to identify these potential non-compliant exposures need to protect the organization, and reduce risk and legal ramifications. conceptTaxonomyManager can be used to automatically identify potential exposures within unstructured content. The taxonomy creates the standard for all content within the organization, regardless of whether it is used for search, records management, compliance, data protection, text analytics, or eDiscovery/FOIA.|
Workflow in conceptTaxonomyManager
conceptTaxonomyManager remains unique in the industry. It was designed for Subject Matter Experts, without the need for consultants, training algorithms, or technically trained staff. This graphic depicts the information flow using Concept Searching technologies.
Unique features not available in any other product include: automatic clue suggestion, document movement feedback, automated classification, and automatic generation of conceptual metadata. The features of conceptTaxonomyManager include the following:
conceptClassifier for SharePoint
conceptClassifier for Office 365
Concept Searching Technology Platform
conceptTaxonomyWorkflow is an optional Concept Searching component that can perform actions on a document following a classification decision when certain criteria are met. These actions enhance organizational performance and drive down costs, but more importantly enforce corporate and legal compliance guidelines.
conceptTaxonomyWorkflow can perform an action on a document following a classification decision when certain criteria are met. The workflow source type works in SharePoint 2007, 2010, and 2013, as well as in all document types, including FILE and HTTP. This product is available in a SharePoint and non-SharePoint environments and has a plugin architecture enabling clients and integration partners to easily build plugins for both content sources and destination sources.
In any environment, metadata driven policy actions on content, such as in migration, identification of sensitive information or in the identification of documents of record, results in content being moved automatically to the environment of choice.
conceptTaxonomyWorkflow is available in an on-premise, cloud, or hybrid environment. The solution is also available in non-SharePoint and heterogeneous environments.
The conceptTaxonomyWorkflow module delivers workflow capabilities that enable intelligent automatic classification decisions during and after migration. To migrate document collections effectively the text content of each document needs to be searched to determine its value. Migration must also consider the security of the documents as they are moved to their new location, apply the same security in the new location; and to identify sensitive documents that may not currently be in a secure location.
conceptTaxonomyWorkflow delivers workflow capabilities that enable intelligent automatic detection of sensitive or confidential information in a document for the document to be routed to a secure location. This process interrogates the text content of all documents for any organizationally defined sensitive terminology or vocabulary. Once identified the document will secure these sensitive documents by routing to the appropriate repository.
conceptTaxonomyWorkflow is also used to facilitate Records Management. This is accomplished by creating a taxonomy that mirrors the file plan, where content will be auto-classified by identifying and assigning the correct record identifier and other organizationally defined descriptors, and automatically route to an organization’s records management application via conceptTaxonomyWorkflow.
conceptClassifier for OneDrive for Business
From a user, business, and hardware acquisition cost perspective, the sudden availability of storage free of charge, is a very definite plus. For information security, governance and systems management the introduction of OneDrive for Business has created new challenges as more information will now need to be managed according to enterprise policies. For the IT support team, considerations of security and non-compliance are issues of growing concern.
For the business user, business content can be accessed via any device, from any location. In addition, the business user can share documents by specifying users, and the group can work concurrently on the same document, while OneDrive for Business maintains the integrity of the content. The previous space limitation for business users, required them to select and choose what documents could be stored in the cloud. With the new size allocation, this is eliminated.
The benefits and features include the following.
Governance, Compliance, Security, and Records Management
- Ability to automatically identify, tag, and classify sensitive information, and automatically take action on that content supporting organizational governance policies
- Ability to automatically identify, tag, and classify content by document security level, apply content types and attributes, and invoke out-of-the-box SharePoint information rights management, enforcing document security policies such as security of content within the domain
- Ability to automatically identify, tag, and classify documents of record with semantic metadata and retention codes, automatically apply content types, send notifications, and optionally move content in alignment with corporate records policies
- Ability to automate the scanning, identification, notification, and reporting of documents being shared with third parties outside the domain
Productivity and Collaboration
- Full integration with search, the refinement panel and the Term Store, improving findability and collaboration in a secured environment
- Ability for business users to access their own documents or shared enterprise content regardless of location or device
- Ability to share and edit content simultaneously while preserving the integrity of the content
- Removes the current limitation of storage size, end users no longer have to pick and choose what gets saved to the cloud
- Classification and auto tagging of the OneDrive site content, either as a batch process or classification on demand
- Global templating solution for automating settings and deployment to large user populations
- Customized administration pages for selection of managed metadata fields to be auto-classified
- Automatic addition of classification status columns to content types and full support for the content type hub
- Fully integrated with the Term Store and Managed Metadata Service
- Ability to automate OneDrive scans for sensitive or specific information types on a global basis
- Identification, reporting and notification of sharing of documents outside of the domain
- Writing back of semantic metadata to the managed metadata fields
conceptClassifier for OneDrive for Business
“Our previous search system restricted our access to the information by a factor of at least 50%. Something that would have taken weeks is now taking just a few days. Furthermore, the intelligence in the search has meant that sometimes the database will link papers that we wouldn’t have linked in a million years. I am confident now that we don’t skip or ignore important information.”
T Longland CVO OBE, Brigadier (Retd) DCDC
Concept Searching Technology PlatformOptional Component:conceptClassifier for SharePoint
conceptSearch is an enterprise search engine based on a unique, language independent technology. Unlike other enterprise search engines, which require significant customization with marginal results, conceptSearch is delivered as an out-of-the-box application that demonstrates a simple search interface and indexing facilities for internal content, web sites, file systems and XML documents. Consequently, application developers experience a minimal learning curve and the organization can look forward to a rapid return on investment.
The product is based on an open architecture with all APIs based on XML and Web Services. Transparent access to system internals including the statistical profile of terms is standard.
Precision versus Recall
The relevance of content delivered plays a pivotal role in effective search. People explore concepts whereas computers look for keywords. Relevancy will always be subjective to the individual who is performing the search. Only each individual can evaluate how relevant a specific piece of content is. Content retrieved can be inappropriate for myriad reasons; too technical, not technical enough, out-of-date; or completely misaligned to the query.
Because of its innovative technology, conceptSearch delivers both high precision and high recall. Precision and recall are the two key performance measurements for information retrieval. Precision is the retrieval of only those items that are relevant to the query. Recall is the retrieval of all items that are relevant to the query. Yet most information retrieval technologies are less than 22% accurate in both precision and recall. The ideal goal is for these two outcomes it to be balanced and equalized. Compound term processing has the ability to increase precision with no loss of recall.
Navigation and Discovery
Retrieval of content can be based on location or discovery. In location based queries the users know what they are seeking. Discovery is based on the premise that users do not necessarily know precisely what they are seeking. In the first scenario the search engine must retrieve exactly and only the content that is required. In the second scenario, the search engine must identify content that appears to answer the search query. Both scenarios require the ability to retrieve what the end users are seeking and in navigation plays a critical role in both.
The hierarchy provided by a taxonomy addresses the two different search approaches. Location based searches appear simple, but in fact are not. Content is dynamic, additions are being made to the repository, content is changed, and content is frequently deleted. In a location based search, if end users do not immediately find what they are looking for, they can use the hierarchical structure to drill down by searching the concepts or taxonomy nodes.
Hierarchical presentation of content can identify associations and relationships that are typically not obvious in searching. This distinction is important and allow users to identify the parent/child relationships resulting in more relevant information being found more quickly. Accessing inter-related ideas and concepts supports a fundamental change in user focus and activity and transforms it from searching to insight and discovery.
The technology can isolate the key meaning that is normally expressed as proper nouns, nouns phrases and verb phrases. Although linguistic products can do this, their performance is highly variable depending upon the vocabulary and language used. Concept Searching technologies are based on a statistical language independent model that can accept queries in natural language with the user typing words, phrases or whole sentences. The system then analyzes the natural language query to extract the keywords and phrases to identify the main concepts and retrieve content that is highly relevant.
- Compound terms are extracted when content is indexed, enabling the delivery of relevant content at the top of the search results
- Relevance ranking display extracts from the documents based on the query are returned to the user
- Search refinement delivers to the end users highly correlated suggested concepts that may be used to refine the search
- Documents can be classified to one or more taxonomy nodes, enhancing the precision of documents returned
- In addition to static summaries, Dynamic Summarization, a modified weighting system, can be applied that will identify real time short extracts that are most relevant to the user’s query
- Taxonomy and faceted navigation
- Text preview capability of attachments such as email or pdf files without having to open the originating application. Search results will be highlighted in the attachments
- Related topics will return results based on the conceptual meaning of the search terms used
- Based on previous queries, or on extracts retrieved, end users can use the text to perform additional searches to retrieve more granular results
- Presents a single integrated view of content regardless of where it resides
- High scalability and excellent performance
Search Optional Components
Content Enrichment Service for SharePoint 2013
conceptClassifier for SharePoint 2013
This product can be used with Microsoft Search for SharePoint 2013 to classify any document that is being indexed by this search engine. The product integrates with Microsoft Search via the web service callout service which is designed to allow custom processing of documents as they are indexed. The resulting classifications are stored directly in the SharePoint index and will be available in the SharePoint 2013 search refinement panel. This product does not build a conceptSearch index and so its disk usage is zero during classification operations.
FAST Pipeline Stage for SharePoint 2010
“Concept Searching offered a very compelling solution based on taxonomy tools, clue-based suggestions, and the capability of integrating within our own existing FAST framework. ”
Valerio Zanini, VP of Technology, MarketResearch.com
FAST (any version)
This product can be used with FAST Search to classify any document that is being indexed by this search engine. The product integrates with FAST via the pipeline and is designed to allow custom processing of documents as they are indexed. The resulting classifications are stored directly in the FAST index and, if SharePoint is used, will be available in the SharePoint 2010 Search Refinement Panel. This product does not build a conceptSearch index so its disk usage is zero during classification operations.
- Improves search outcomes by placing conceptual metadata in the FAST Search index to increase relevancy of search results
- Enables import of FAST Entities into the conceptTaxonomyManager to fine-tune them with metadata generated from an organization’s content and nomenclature
- Runs natively as a FAST Pipeline Stage eliminating integration and customization issues
- Eliminates vocabulary normalization issues across global boundaries through controlled vocabularies
- Improves faceted search results as facets are based on concepts aligned with the taxonomy
“Search feedback metrics jumped up the last two months from a fairly steady 60-62% positive feedback to 75- 80% positive.”
Director Enterprise Solutions Planning
Concept Searching Technology PlatformOptional Component:
conceptClassifier for SharePoint
conceptClassifier for Office 365
This product provides the ability to define a document structure based on information held in a Microsoft SQL Server or Oracle database. A document can include any number of text and metadata fields and can span multiple tables if required. conceptSQL supports SQL 2005, SQL 2008, and SQL 2012. A powerful but easy to use configuration tool is supplied eliminating the need for any programming. Templates are provided for out-of-the-box support for Documentum, Hummingbird and Worksite/Interwoven DMS.
conceptSQL enables a database administrator to connect to a database and select which tables and fields should be read and indexed for content. The two main components are conceptCollector and conceptIndexer. The conceptCollector service is run after conceptSQL has been configured for the first time. conceptIndexer extracts database content and metadata according to the configuration settings such as access criteria, re-indexing frequency, inclusions and exclusions. Duplicate data checks, language detection and file type detection are used when the data is collected and indexed.
Additional Web Front End Servers
“The RMDA can now easily load millions of records and automatically classify these against multiple taxonomies simultaneously, while having the ability to easily expand the system configuration through its modular and scalable architecture.”
Provides scalability to accommodate size of end user community.
Additional Classification Servers
Provides scalability of classification to increase speed of classification throughput especially when classification on the fly is an important requirement.