Data Conversion Laboratory, Revolutionizing Publishing for the Digital Age 
  DCLab.com | About DCL | Tech Info | Press Info | Contact Us | DCLNews | Partners | Wiki | Client Area     
menu
Data Conversion Lab

About DCL
  Why go to DCL?
  Clients
  Company Background
  Management
  DCL in the News
  Events
  Mission

DCL News
  Current Issue
  Back Issues
  Subscribe

Technology
  Technology Resources
  FAQ's
  Glossary
  Presentations
  DCL Work Tracking

Press Info

Clients' Area

Contact DCL
  Directions
  Request Estimate
  Positions

Books2Bytes
Popular Pages
* Current Issue of DCLnews
* DCL featured in The Columbia Guide to Digital Publishing
* Slash Document Costs
* Ann Rockley on ROI in CM
* PDF Resources
* XML Conversion Resources
* Roundtrip Document Conversion
* DCL Resources Library
*

Converting Legacy Data...

*

Aviation & Aerospace

*

PDF Conversion to XML & MS-Word

*

PDF Conversion

*

Quark to XML

* Getting Content into XML
Fact Sheets
* Public Access for Research Materials
* S1000D Conversion
* Content Reuse Assessment
* Document Conversion
* SPL - Pharmaceutical Industry
* Harmonizer™
* Jeppesen Map Revision Service
Technical Papers
* Why STM Publishers Should Use XML...
* Department of Defense and the Power of XML
* Your Data in XML
* SGML to SGML 1
* SGML to SGML 2
* Quark to XML
* Plan Ahead
* Do it Yourself?
* Encyclopedia
Presentations
* Conversion to XML: Documents versus Data (11/2003)
* Data Migration Considerations  (6/2003)
* Technology for Cost-Containment and Efficiency  (4/2003)
* Converting Textbooks to Meet the National XML Standard for Accessibility  (3/2003)
* More Presentations
An Egg Too Far

The Perils of Converting a Lot of Data In-House.


The Volume Problem

Most of us know how well Jack fared after he cut the beanstalk. After all, he walked away with the goose that lays the golden egg. Every morning, another golden egg would be waiting for him. Those eggs saved him and his mother from poverty. Before long, they were contented suburban homeowners.

Until that fateful day when Jack took up rollerblading. He was having so much fun that he left the golden egg under the goose all day. That evening, the egg hatched! Jack was dejected about his lost revenue until the next day, when he discovered that both geese had laid golden eggs. He could hardly believe his good fortune. If he harvested the eggs every other day instead of every day, he would double the number of gold-laying geese every two days.

40 days later, he had 1,048,576 geese to take care of and gold was so common that nobody wanted it.

The lesson is simple: volume always complicates matters. Most recipes will work if you double the ingredients. But try multiplying by 50 or 100 and all you'll have is a mess in the kitchen and a big room full of hungry people.

The SGML Expert

High technology is no exception to the problem of volume. Consider Gus, for example. He is Acme Corporation's resident SGML expert, hired as part of Acme's initiative to have all of its product documentation stored as SGML. Gus is a technical wizard. He designed a DTD for Acme in two weeks, and proudly shows off chapter 1 of the Acme Dustscraper Repair Manual, which he tagged himself in just one day.

A commendable effort, but there are 10 chapters in the Acme Dustscraper Repair Manual and Acme has 100 manuals. It would take Gus over 4 years to get all that documentation into SGML. Even if Acme could wait 4 years, they need Gus for other things. After all, he's crucial to ramping up the rest of the company to the new SGML system.

Gus Days

So far we've determined that having Gus convert all the data is unacceptable. But what are the other options? Well, the work can be divided up among Acme's staff, or temporary employees can be hired specifically for this project. Before we make any such decisions, however, it's important to determine just how much effort is involved.

About 1,000 chapters need to be converted. It takes Gus one day to tag a chapter. We can therefore assume an effort of 1,000 Gus-days (the four years mentioned above). So, hire 100 Gus's and you'll be done in two weeks. Easy!

Except for the volume problem. Where are you going to find 100 SGML experts who are willing to work for only two weeks? And even if you could, can you afford to pay 100 people what you're paying Gus? And when you do hire them, how are you going to get all 100 to tag the data the same way? Everyone will have his/her own interpretation. The only way to get useable SGML from these experts is to have Gus train them in his DTD.

Ah hah! If you're going to need training anyway, hire unskilled or semi-skilled workers at one third the cost of Gus. That's fine, but it will take them three times as long.

The point is, what works for low volume doesn't work for high volume. New solutions are required.

Software

An automated solution is ideally suited for high volumes of data. The computer is about 1,000 times faster than Gus. You've finally solved the volume problem. All you have to do is find or develop software that will completely and accurately convert your data to SGML.

Guess what? You'd have an easier time cloning Gus than getting such a program. Why? Because this isn't just a conversion. You are adding structure to your documents, which requires inference and subjective decision-making.

The Best of Both Worlds

Ah, but surely the computer can do most of the grunt work and then Gus can fix it up afterwards. Yes, combining automation with expert review seems to be the best approach. But only if it's done right.

If you do enough damage to your car, the insurance company will give you money to buy another one rather than fix the one you have. Similarly, fixing cookie-cutter SGML can actually take longer than tagging it by hand. It's clear that one key to a successful conversion is to automate as much as you can as cleanly as you can.

Here is where Acme makes a frightening discovery: an SGML expert is not a conversion expert. Gus doesn't know how best to develop or configure a conversion program. Why should he? That's like asking a race car driver to fix your car: it's simply a different field of expertise.

What Does a Conversion Expert Do?

Conversion is not a standard field of knowledge. As far as I know, there are no degrees available: the most reliable indicator of expertise is a track record. So, even though there is no universally accepted methodology, I can cover some guiding principles used at DCL for managing a large conversion.

Standardization

Large volumes require standardization to prevent chaos. Otherwise, different interpretations will generate inconsistent results. DCL implements "conversion specifications," which detail every element in a document and how it should be coded in the new format. These specifications are used as a standards document throughout the project. Also, DCL uses a project team approach, with one data analyst per project. This analyst is solely responsible for interpreting how data should be coded. All exceptions to the written rules are brought to him. Even details such as file naming conventions are standardized, because the smallest discrepancy can snowball at large volumes.

Customized Software One key to successfully using conversion software is to customize it. DCL has developed its own suite of conversion filters that it configures to the specifications of each project. It has even created its own generic intermediate formats. These robust "hub" formats divide the conversion in half so that changes in specs require only partial rework of data that's already been converted.

Quality Control

As discussed earlier, it is crucial to minimize the amount of cleanup necessary after the conversion is finished. While it is true that DCL's editors know nothing about Acme Dustscrapers, they know plenty about SGML (and all the other standard electronic formats). These editors parse the new SGML and then do a "format review." This second review is necessary because parsed SGML is not necessarily correct SGML.

The SGML is filtered into a viewing package. Tags, which require slow, tedious checking, are converted to visual cues. It then becomes immediately apparent to an editor if something is tagged right or not, simply by comparing it to the original hard copy.

Customer Feedback

The most critical element of quality control is customer feedback. DCL keeps the entire conversion process open to Acme, so that a misunderstanding doesn't result in thousands of mistagged pages. Normally, two samples are provided to the customer before the volume work begins. These samples, along with the conversion specifications, must be approved by the client at the start.

Once the conversion is underway, partial deliveries are sent to the client as they are completed. This is more than just checking DCL's work. "Live" data gives Acme a better understanding of how it will best implement new data on its new system.

Experience

For most companies, conversion is a rare occurrence. Therefore, no past experience exists to provide guideposts and warning signs. DCL has converted millions of pages to and from every major format. Which brings us to our conclusion.

No Surprises

Perhaps the most pernicious problem of large volumes is that the work involved is impossible to predict. In other words, even if you do budget for all the Gus days you think you need, you might very well need more. This could lead to disgruntled workers and even more disgruntled executives.

DCL has learned, through experience, to make its process flexible enough to stay on schedule. Problems are either avoided or prepared for in advance. Potential concerns are brought to the customer before they multiply. To put it simply, you can get away with a little sloppiness when you have one goose, but a million geese demand serious attention.

Your company is not set up to be a conversion house. I recommend you hire someone who is. Otherwise, you just might lay an egg.

Want more information on this topic? Click here!

  Structured Product Labeling

Content Reuse

Subscribe

Books2Bytes

DCL Library

Columbia Guide
GSA Schedule
AIA Member
DCL Calendar

Ultramain User Conference 2008, Albuquerque, NM, May 11-15, 2008. More…

PTC User Long Beach, CA, June 2-4, 2008. More…

Mark Logic User San Francisco, CA, June 10-12, 2008. More…

X-Pubs London, England, June 22-24, 2008. More…

Doc Train Life Sciences Indianapolis, IN, June 23-25, 2008. More…

Best Practices Santa Fe, NM, September 15-17, 2008. More…
XyUser Phoenix, AZ, September 22-24, 2008. More…
9th Annual Vasont Users' Group Meeting, Hershey, PA, October 6-8, 2008. More…

DITA/TECHCOMM 2008, Raleigh, NC, November 3-6 2008. More…

ATA e-Business Europe. Details TBA.

 
DCL Calendar

Documentation and Training West 2008 Vancouver, BC, May 6-9, 2008. More…

 
Recent News

CMS/DITA Santa Clara, CA, April 7-9, 2008. More…

DIA Med Comm Orlando, FL, March 10-11, 2008. More…

DIA EDM Philadelphia, PA, February 5-7, 2008. More…

Gilbane Boston Conference Boston, MA, November 29, 2007. More…

The LavaCon Conference on Advanced Technical Communication and Project Management New Orleans, LA, October 27-30, 2007. More…

2007 ATA e-Business Forum Miami, Florida, Oct 17-19, 2007. More…

DITA 2007™-East, Raleigh, North Carolina, October 4-6, 2007. More…

2007 XyUser Group Fall Conference, Boston, MA, Sept 23-26, 2007. More…

Mark Logic 2007 User Conference, San Francisco, CA, May 15-17, 2007. More…

Content Management Strategies/DITA North America Conference 2007, Boston, MA, March 26-28, 2007. More…

DIA 18th Annual Workshop, San Diego, CA. March 4-7, 2007. More…

DIA 2007 EDM & CDM Conference, Philadelphia, PA, Feb 6 - 8, 2007. More…

DITA 2007 – West, San Jose, CA, February 5-7, 2007. More…

Framemaker 2006 Chautauqua, Austin, TX, Nov 8-10, 2006. More…

PTC/User World Event 2006, Grapevine, TX, June 4-6. More…

19th Annual DIA Conference Philadelphia, PA, February 7-9. More…

XyUser's Conference, San Diego, California, September 11-14. DCL's Don Bridges delivered a presentation on "Content Reuse" More…

Structured Product Labeling, Washington, DC, August 23-24. More…

Tri-XML 2005, Raleigh, NC , July 28. DCL's Don Bridges delivered a presentation on "Content Reuse" More…

Pharmaceutical Labeling and Product Identification, Whippany, NJ, June 16-17. DCL's Don Bridges delivered a presentation on "Structured Product Labeling (SPL) and the Implications of Implementing an XML Solution." More…

More…

Data Conversion Laboratory, Inc.   61-18 190th St., 2nd Floor, Fresh Meadows, NY 11365   718-357-8700   convert@dclab.com

Copyright © 1997-2008  Data Conversion Laboratory, Inc. All rights reserved.