DCLWiki | Client Area  
DCL  

representational space

   Refer a friend  Email this Page
   Print friendly version Print-Friendly
   Request Information Request Information
   Subscribe  Subscribe

          LinkedInTwitterFacebook

representational space
Services
Content Reuse
Document Conversion
Quality Assurance
Rendering & Publishing
SPL Labeling
Source Formats
   - Word Processors
   - Publishing Systems
   - PDF
   - Other Formats
Target Formats
   - XML & SGML
   - ePub
   - DITA
   - Military DTDs
   - NLM
   - Public DTDs
   - S1000D
   - Other Standards
Other Services »
representational space
Memberships

Questions asked during the "Put your Data on a Diet; Implementing Content Reuse" webinar by Arbortext/Data Conversion Laboratory (March 16, 2005.) If you have further questions please write us at reuse@dclab.com.


Harmonizer™ is a DCL process to eliminate redundant content in documents sets of any size. DCL's Harmonizer™ allows you to objectively measure and locate duplication and “near duplicates”,) eliminate extraneous content, and harmonize text variations to your standard.

 

Question Topics

 

What is Harmonizer™ and what kinds of files can it handle?

 

When to use Harmonizer™?

 

Granules – What are they?; How are they used?

 

Detailed questions on the Demo

 

Special Cases

 

Configurations

 

Pricing

Q&A


Put your Data on a Diet; Implementing Content Reuse

What is Harmonizer™ and what kinds of files can it handle?


Q: Is Harmonizer™ a DCL product or an Arbortext product?
A: Harmonizer™ is a DCL product, and is available through Arbortext.

Q: What types of files will Harmonizer™ analyze besides .doc or xml? Will it analyze HTML files? Can it process legacy PDF files?
A: Harmonizer™ can analyze any data format that contains text, for which DCL can read the content. This includes almost all formats with the exception of ‘page image PDF’ and images. Image formats can also be analyzed but they would need to first be scanned and OCR’d, which adds to the cost.

Q: What format(s) must documents be in for Harmonizer™ to evaluate them? For example, can they be graphics?
A: Harmonizer™ is designed to analyze textual content in any data format that it can ‘read’ the characters. This includes almost all formats – word processors, publishing systems, PDF normal , etc - with the exception of ‘page image PDF’ and images.

Q: Can it compare between file types in a single analysis?
A: Yes, Harmonizer™ can include multiple file types into a common analysis.

Q: Does Harmonizer™ only work on the English language?
A: No. It can work on any Latin-based language set.

Q: Harmonizer appears to understand English pluralization rules. You stated earlier that it works just as well in other languages. Does it have an understanding of such rules for those other languages? Or is it simply looking at words that are similar for whatever reason, so that it might not catch less obvious similarities?
A: Harmonizer™ does not understand pluralization rules per se. It understands strings of text and can compare strings of text (regardless of the language).

Q: How complex can the document set be? That is, must it be a single directory, or can Harmonizer™ work with a complex directory tree of documents?
A: Harmonizer™ works with a complex directory tree of documents.

Q: How large do Harmonizer reports get? Is there a recommended amount of documents to review at a time to prevent a too large and cumbersome report?
A: They can get large for very large document sets. Depending on how the document set is organized, there are “divide and conquer” strategies that would help you get a better handle on the collection. Regardless, working with the reports is much better than working without them.

 
When to use Harmonizer™?


Q: Don mentioned that you need a Content Management System & a “push out" system. Does the CMS need to be in place before using Harmonizer™? If not, what are the implications of using Harmonizer™ and continuing without a CMS? (presumably likelihood of losing control of the granules and duplicating or producing "similar" items)
A: Harmonizer™ can be used to resolve all of your ‘close’ matches and cleanup your data prior to implementation of a CMS. And in fact we think this is a good idea. It can be done in parallel with system implementation and can assure that the data is in good shape before loading – this can save months of after-the-fact cleanup. If you continue without a CMS you still have the advantage of consistent data, but don’t leverage the content reuse capabilities of XML.

Q: Once a company has conducted a DCL analysis, how does a company move the determined granules from DCL to their operating software e.g. Arbortext, Interleaf, Frame, etc.? Is it automated, or manual?
A: Once you have completed the analysis, and cleaned up your data, you are ready to load (or re-load) your content into a CMS. Many CMSs that support reuse will automatically burst your documents into reusable fragments based on rules that you define during the load process (which itself is an automated process). Alternatively, DCL can mark off the reusable elements according to rules that would be mutually defined.

Q: We've already developed and implemented an in-house XML based CMS, and are now just starting to realize the duplication within our document set. Does Harmonizer™/Arbortext work in conjunction with document sets that have already been imported?
A: Yes. Analysis of XML files is very effective and can take advantage of the structure inherent in XML.

Q: Is the use of Harmonizer™ pretty much complete once the conversion is done?
A: It may make sense to have periodic audits to ensure that your content is not creeping as you revise material and introduce more material.

 
Granules – What are they?; How are they used?


Q: What's the definition of a granule? And what is the typical granule size?
A: A granule is the revisable unit on which you wanted to standardize. It is normally a paragraph for most uses. But it could be a larger unit like a section or a procedure. And it can be a smaller unit like a sentence.

Q: When the Harmonizer™ identifies the similar matches, can you select one of the identified matches to use as the content granule?
A: If you are working with legacy materials (Word, Frame, Interleaf, etc.) you would make your changes in the legacy environment. If you have your data in XML or SGML, we are working on a tool to input your changes within a Harmonizer™ environment.

Q: Can granules vary in size and composition or is this fixed by the Harmonizer™ tool? Does this come with a manual override?
A: There is a lot of flexibility as to what can be done with granule sizes in the analysis phase, and this can be tailored for specific projects.

Q: So if I understand what you're saying about the size of granules...I can have some granules that are the size of a warning or caution, while I can save other granules in chunks the size of an introduction?
A: Yes. One of the advantages of XML and content reuse is that there is no single minimum revisable unit size. You need to define a reuse strategy that meets your business and technical requirements and typically that does involve multiple sizes of granules.

 
Detailed questions on the Demo


Q: Is a live demo of Harmonizer™ available?
A: Yes. For the presentation we used screen shots to allow us to move quickly. We can schedule a live WebEx demo if you are interested – please write us at reuse@dclab.com.

Q: Do the line numbers equate to the line number in the document or line number in the "processing document"?
A: The line numbers are references to line numbers in the processed document. Future releases of the products will have additional options to help navigate the documents being analyzed.

 
Special Cases

Q: Most of your examples were very simple. Does the Harmonizer™ have the ability to analyze complex paragraphs - for example, to see if they contain a certain percentage of similar words?
A: Yes. Harmonizer™ can be custom tailored to adjust the definition of a ‘close’ or ‘similar’ match based on the special needs of a project.

Q: Given XML source documents, can you tell Harmonizer™ to exclude certain elements from its comparison? For example, we have elements that support our cross referencing system, and comparison within those elements wouldn't be useful, although comparison of the surrounding sentences would be.
A: Yes. Harmonizer™ has flexibility to exclude defined structures from an analysis. For example, it can be set to exclude specified elements and tags in XML, and specified styles in Word.

Q: Can Harmonizer™ detect situations where similar tags or attributes are used between two documents, or is it limited to looking at the element content?
A: There is flexibility within Harmonizer™ to include or ignore tagging that exists within a document, with many options in between.

 
Configurations

Q: What kind of database at the back end is required?
A: None. Harmonizer™ is self contained and does not require a database.

Q: How does this work within Epic?
A: This technology is complimentary to Epic but not a part of Epic.

Q: What other environments does Harmonizer™ work in? (XML, Word,...)
A: Harmonizer™ works on Windows NT, but can accept data from anywhere, and is specifically designed to handle almost any kind of data.

Q: Are there any CMSs that Harmonizer™ won't work with?
A: Harmonizer™ is independent of the CMS. Content Reuse works with any CMS that understands XML and recognizes granules. These include, among many others, Astoria, Documentum, SiberLogic, Vasont, and XyEnterprise.

Q: Can I simply use Epic Print Composer to produce PDFs as my "push out" system or must I use E3?"
A: If you are "pushing out" to PDF, then you will need E3, as the stand alone print composer does not have Adobe Distiller in it.

 
Pricing


Q: Is Harmonizer™ priced at one price or are there options that would make it affordable to smaller organizations?
A: Pricing is based on the volume of data, number of formats, and types of formats – and we believe is quite affordable to smaller organizations.

 
“Socially Enabling Documentation
in the Cloud“
Watch now!

“Content Strategy: It's Not About Technology“
“Converting to S1000D: What you need to know before, during and after“
DCL Library
Articles, fact sheets, presentations and white papers
Events

RSuite 2011 User Conference
October 25, 2011
Philadelphia, PA

LAVA-Con
November 13-16, 2011
Austin, TX

Digital Book World
January 23-25, 2012
New York, NY

More Events »
News

The Optical Society Selects Data Conversion Laboratory (DCL) For Major Publishing Project


Data Conversion Laboratory Completes eBook Projects For Information Today And Plexus Publishing

Data Conversion Laboratory and Alexander Street Press Collaborate on METS/ALTO Implementation

          More News »

representational space representational space representational space representational space representational space representational space representational space


Corporate office:
61-18 190th Street, 2nd Floor, Fresh Meadows, NY 11365
718-357-8700
Data Conversion Lab
Copyright © 1997-2011  Data Conversion Laboratory, Inc. All rights reserved.