DCL's Harmonizer™ is a software set used to identify and consolidate redundant content in document collections. The software analyzes thousands of documents at a time in order to objectively measure extent of content duplication, locate the duplicates and "near duplicates," and condense them into one reusable chunk to harmonize text variations.
Assessment Highlights
Determine reuse potential for ROI calculation
Harmonize content (reduce "near duplicate" content) to provide consistent information throughout a document set
Clean up typographical errors
Implement reuse which reduces size of data set, reduces conversion costs, reduces translation costs, and improved efficiency of updating information
User provides documents via FTP or CD, defines granularity, and selects matching criteria. Harmonizer™ processes the batch of data returning statistics on degree of redundancy along with identification of redundant data and locations within the document set. Automated tools are available to facilitate correcting, editing, and consolidating redundant data
Metrics on reuse potential in document set
DCL Experience
Identifies exact match, similar match, and dissimilar match "granules"
User interface to resolve "near matches"
Works with SGML, XML, HTML, Word, RTF, FrameMaker, MIF, Interleaf, ASCII and many other formats
Works with large document sets
Updates files, inserts effectivity and conditional information, and produces files suitable for direct loading to a Content Management System (CMS) or IETM
Optional service to resolve and correct variations in documents
Differentiators
Solution is unique with patent pending
The process produces automatically, in minutes, what would take weeks or months through a manual process
DCL has been in the conversion business over 30 years
Serving all industries
Authored chapters on 'Legacy Data Conversion' in Bill Kasdorf's Columbia Guide to Digital Publishing
Authored chapters on 'Data Conversion' in Charles Goldfarb's The XML Handbook
Experienced with most foreign languages including Latin-based and double-byte characters











