Content Clarity

Content Clarity allows you to locate, analyze, and act upon your web content.
Content Clarity scans, inventories, indexes, and classifies unstructured content to help cope with the overwhelming amount of content that flows through a company daily. It is the only classifier on the market that can index both text and image content. Find essential content. Identify stale, duplicate, near-duplicate, and obsolete content. Integrate with leading content repositories, such as TeamSite™, SharePoint™, and Lotus Notes™. Content Clarity indexes text, file, and graphic (JPEG, GIF, Flash™, etc.) assets by similarity. Unlike traditional crawlers and mirror tools, it analyzes and catalogs only the content that you specify, making it invaluable for rapid, targeted application at a business unit level.


Our patent-pending Perceva™ technology powers all of our solutions.

Solution Overview

Enterprises worldwide must rise to the challenge of unstructured content. Most organizations don’t have a good grasp of the content that they have. In the decade since the Internet revolution brought new forms on content to our fingertips, content has silently undergone revisions, and along the way, the copies of copies have become stale and obsolete. Duplicates and near-duplicates exist in many locations, scattered in web servers, lurking in file stores, and buried within email attachments. In many organizations, this content remains online, and their existence undermines efforts to publish and maintain one authoritative version of content.

Policies, procedures, requirements, specifications, manuals, glossaries, and departmental directories are common examples of content types that mandate single authoritative content sources. Although their desired home is a controlled, versioned, content repository, we often find these documents scattered throughout an organization, with inadequate amount of attention paid to maintenance. The challenges are threefold. First, organizations must find the desired content. Second, they must identify and eliminate duplicate, stale, and obsolete versions. Third, they need to continually monitor repositories for occurrences of duplicates and near-duplicates, which can occur by inadvertent insertion of a copy, or by unintended recreation of content that already exists somewhere else. As organizations strive to create fresh content, their best efforts are undermined by stale, obsolete, duplicate, and near-duplicate content.

One way to cope with unstructured content is to apply search technology. But using search without Content Clarity can give erroneous results in a number of ways. It can return too many results. It can return no results. Or worse yet, it can fail to find the intended document but instead return stale and obsolete versions. In addition, performance of search doesn’t scale to match the volume of content encountered in continuous content monitoring.

Key Benefits

Feature Highlights


What's the secret sauce in Video Clarity's speed breakthrough in piracy detection?

Video Clarity uses a patent-pending technique to index the millions of individual frames that comprise a collection of reference videos. Each video frame is transformed into a vector of numbers, and then rapidly indexed along with other frames by a Nahava proprietary technique that partitions high-dimension spaces. Building on that breakthrough, it achieves unprecedented scalability by employing the same height-balanced tree technology that powers modern, high-performance database engines.