Data Conversion Laboratory, Revolutionizing Publishing for the Digital Age 
  DCLab.com | About DCL | Tech Info | Press Info | Contact Us | DCLNews | Partners | Wiki | Client Area     
menu
Data Conversion Lab

About DCL
  Why go to DCL?
  Clients
  Company Background
  Management
  DCL in the News
  Events
  Holiday Calendar
  Mission

DCL News
  Current Issue
  Back Issues
  Subscribe

Technology
  Technology Resources
  FAQ's
  Glossary
  Presentations
  DCL Work Tracking

Press Info

Clients' Area

Contact DCL
  Directions
  Request Estimate
  Positions

Books2Bytes
Popular Pages
* Current Issue of DCLnews
* DCL featured in The Columbia Guide to Digital Publishing
* Slash Document Costs
* Ann Rockley on ROI in CM
* PDF Resources
* XML Conversion Resources
* Roundtrip Document Conversion
* DCL Resources Library
*

Converting Legacy Data...

*

Aviation & Aerospace

*

PDF Conversion to XML & MS-Word

*

PDF Conversion

*

Quark to XML

* Getting Content into XML
Fact Sheets
* Public Access for Research Materials
* S1000D Conversion
* Content Reuse Assessment
* Document Conversion
* SPL - Pharmaceutical Industry
* Harmonizer™
* Jeppesen Map Revision Service
Technical Papers
* Why STM Publishers Should Use XML...
* Department of Defense and the Power of XML
* Your Data in XML
* SGML to SGML 1
* SGML to SGML 2
* Quark to XML
* Plan Ahead
* Do it Yourself?
* Encyclopedia
Presentations
* Conversion to XML: Documents versus Data (11/2003)
* Data Migration Considerations  (6/2003)
* Technology for Cost-Containment and Efficiency  (4/2003)
* Converting Textbooks to Meet the National XML Standard for Accessibility  (3/2003)
* More Presentations
Fueling up your Content Management System

It’s more than just full service vs. self-service

Installing a new content management system is only half the story; the other half is loading the content you’re going to manage. DCL’s Don Bridges reports.

Congratulations! You’ve decided on a new documentation system. Single source publishing here we come. Whether you’re in the early implementation stages, or already installed, there are still details to resolve, a big one is how to upload the content you want to manage.

First, the hard decision: How much of your old materials do you really need? And as in every move, the more you can toss out, the easier the move. But if you’ve been in business for a while, and have ongoing projects, it would be nice to get all your live materials under control of that new CMS. This needs some hard decisions on which materials you really need and which ones you’ll never use again.

Once you’ve decided what you need, the next step is to examine your options for getting the materials from your old system into the new one. What’s the best approach? How long will the conversion take? And what will it cost? Do you do it yourself, or outsource?

You may want to take a leaf out of Benjamin Franklin’s book when considering what is the best option for you. He used to list the issues and the pros and cons of each, and then he would make his decision, giving the most important issues precedence.

In our experience, the key issues to consider are:

  • Quality
  • Schedule
  • Cost
  • Security
  • Scalability

Quality

The quality of a conversion depends on how accurate it is and how it deals with ambiguous elements such as tables and footnotes. Quality can be divided into two areas: Textual accuracy and Tagging accuracy.

Textual accuracy is an indication of the fidelity of the words that are presented. This is often an issue when the source materials are hardcopy or paper (or PDF Page Image). Textual accuracy suffers when data has to be rekeyed (or OCR’d), because humans and computers can make mistakes. The typical standard for hardcopy conversion is 99.95% textual accuracy. While this is very good, it also means that one out of every 2,000 characters can be wrong – additional QC and review is frequently a good idea. The story is better for electronic conversions (where the materials are available in a digital format). For these types of conversions, 100% accuracy is often obtainable when the conversion process accounts for all fields and is capable of maintaining all the text properly.

Tagging accuracy is more subjective and is an indication of the fidelity to the defined tagging scheme. This one is a lot tougher to measure because of the ambiguities in the schema or DTD, and the fact that the materials were unlikely to have been written to fit the schema. In the XML world there is a temptation to equate tagging accuracy with whether a file parses validly with the DTD or Schema. But the reality is that just because a file parses, it doesn’t mean that it’s correct.

How do you maximize tagging accuracy?

The most conventional approach is to use a combination of computer scripts to perform an automated conversion, and a manual clean up to fix all of the things that the script did not convert correctly.

Most conversion scripts handle easy materials, such as paragraph tags, without a problem. The difficulty comes with tables, multiple column layouts, equations, cross references, graphics, headers/footers, footnotes, and other such complexities. Most scripts have a really tough time getting this right.

Another issue comes with contextual tagging (often referred to as content tags or implicit tags). When I write “123 Main Street” almost of us would know that this is an address and that “123” is the street number. But it’s a whole other issue to teach a program to recognize these nuances 100% of the time.

Remember, whatever the script isn’t able to tag correctly will have to be cleaned up manually. (This will be a topic of a future newsletter article, to be informed of its appearance please email us at convert@dclab.com or click here to signup.)

Schedule

The length of time a conversion takes will depend on (a) how good a job the automated conversions do at tagging materials correctly; and (b) how many resources are available to resolve the manual clean-up aspects.

When speaking about legacy data conversion, I invariably ask members of the audience, who have tackled a legacy data conversion project internally: “On average, how long did it take to clean-up a page of your materials?” The typical response is four to six minutes per page. And that’s just clean up. That doesn’t include how long it takes to analyze the data, create the scripts, and process the data – which is likely to be at least a few weeks

At first glance, that may not seem too bad. But if you have a repository of 5,000 pages, at five minutes per page you can plan on approximately 3 months worth of cleaning up data. And of course the more you have the longer it takes. For 10,000 pages you’ve got a six-month project on your hands.

So how does this compare to outsourcing your conversion? Well the process might seem to move slowly at first, because there’s a premium on spending extra time making the automated aspects work as well as possible, but then the process speeds up considerably due to the large labor pool available to clean up the data. In our experience, the average project needs six weeks to work out the process and test it, and then one week for each 3,000-5,000 pages. So for the same 5,000 pages you can typically expect a seven-week effort; and for the 10,000-page project it’s eight weeks. This would make all the difference if schedule is important and you’re trying to hit those implementation goals sooner rather than later.

Cost

The cost of a conversion project will vary widely depending on the complexity and readability of the material, as well as whether you are converting in-house or outsourcing. Most people expect an internal effort to be cheaper…but this is not usually the case.

While there are various startup costs associated with conversion, the cost of the software packages and hardware you utilize is negligible. The real expense comes in your labor costs to clean up the materials, and in your costs to manage the process. If you are doing the conversion in-house, consider that you are taking staff from their usual work; and if you are hiring people for the project there are real costs associated with that too. If you are hiring temps, realize that those already fluent in the intricacies of a given target format are not easily available, and temps who need to be trained will often require significant management resources.

In an upcoming issue of DCLnews we’ll introduce a calculator to help you calculate your own relative costs – (to be notified of availability click here convert@dclab.com, to subscribe to DCLnews, click here)

Security

Many organizations have sensitive or classified materials that they wish to convert. It is therefore vital that, if you are to outsource this material, the organization undertaking the work has strict safeguards in place for sensitive materials. They should also have previous experience with information from the ultra-competitive markets like pharmaceuticals, high-tech or finance; or ultra-sensitive markets like military, legal or nuclear utilities. If you can find a company that has dealt with these kinds of materials, they will understand your concerns and have the safeguards in-place to ensure that your materials are secure at all times.

Risk

Reducing your risk comes down to the competency and experience of those undertaking the conversion. If this is the first time that a particular type of conversion has been attempted, that’s likely to increase your risk. If you are looking to minimize risks, it helps to use a proven approach.

Most organizations that attempt a do-it-yourself approach are driven by either cost or an innate desire to have more control of the process. While these are commendable reasons, what’s often missed is the risk factor. One of the risks is that the process will take longer than predicted and that the amount of manual cleanup might be significantly more than expected.

Experience goes a long way towards mitigating risk. If risk is an issue, it’s important to take a candid view of your experience and your vendor’s experience with this type of project.

Scalability

The final issue is scalability. Can you do a production conversion similar to how you can do a pilot conversion? Sure, you can stay up all night and crunch through a hundred pages with stellar results. But is this the same process you would want to use for thousands of pages?

To gain any degree of scalability, you want to incorporate as much automation as you can in the process. And this is where you will want to leverage computers, because they work very fast and very cheap once configured correctly. For you to have a scalable process you will want to have the automated tagging as accurate and complete as possible.

The above covers the key factors to consider when making your decisions about how to get a conversion project done. Answering these questions honestly will go a long way towards assuring that your migration to a new content management system will go according to schedule and be free of hitches.

DCLnews Editorial
May 24th, 2005

  Structured Product Labeling

Content Reuse

Subscribe

Books2Bytes

DCL Library

Columbia Guide
GSA Schedule
AIA Member
Recent News

DITA/TECHCOMM 2008, Raleigh, NC, November 3-6 2008. More…

ATA e-Business Europe, Budapest, Hungary, October 21-23 2008. More...

9th Annual Vasont Users' Group Meeting, Hershey, PA, October 6-8, 2008. More…

XyUser Phoenix, AZ, September 22-24, 2008. More…
Best Practices Santa Fe, NM, September 15-17, 2008. More…
Doc Train Life Sciences Indianapolis, IN, June 23-25, 2008. More…

X-Pubs London, England, June 22-24, 2008. More…

Mark Logic User San Francisco, CA, June 10-12, 2008. More…

PTC User Long Beach, CA, June 2-4, 2008. More…

Ultramain User Conference 2008, Albuquerque, NM, May 11-15, 2008. More…

Documentation and Training West 2008 Vancouver, BC, May 6-9, 2008. More…

CMS/DITA Santa Clara, CA, April 7-9, 2008. More…

DIA Med Comm Orlando, FL, March 10-11, 2008. More…

DIA EDM Philadelphia, PA, February 5-7, 2008. More…

Gilbane Boston Conference Boston, MA, November 29, 2007. More…

The LavaCon Conference on Advanced Technical Communication and Project Management New Orleans, LA, October 27-30, 2007. More…

2007 ATA e-Business Forum Miami, Florida, Oct 17-19, 2007. More…

DITA 2007™-East, Raleigh, North Carolina, October 4-6, 2007. More…

2007 XyUser Group Fall Conference, Boston, MA, Sept 23-26, 2007. More…

Mark Logic 2007 User Conference, San Francisco, CA, May 15-17, 2007. More…

Content Management Strategies/DITA North America Conference 2007, Boston, MA, March 26-28, 2007. More…

DIA 18th Annual Workshop, San Diego, CA. March 4-7, 2007. More…

DIA 2007 EDM & CDM Conference, Philadelphia, PA, Feb 6 - 8, 2007. More…

DITA 2007 – West, San Jose, CA, February 5-7, 2007. More…

Framemaker 2006 Chautauqua, Austin, TX, Nov 8-10, 2006. More…

PTC/User World Event 2006, Grapevine, TX, June 4-6. More…

19th Annual DIA Conference Philadelphia, PA, February 7-9. More…

XyUser's Conference, San Diego, California, September 11-14. DCL's Don Bridges delivered a presentation on "Content Reuse" More…

Structured Product Labeling, Washington, DC, August 23-24. More…

Tri-XML 2005, Raleigh, NC , July 28. DCL's Don Bridges delivered a presentation on "Content Reuse" More…

Pharmaceutical Labeling and Product Identification, Whippany, NJ, June 16-17. DCL's Don Bridges delivered a presentation on "Structured Product Labeling (SPL) and the Implications of Implementing an XML Solution." More…

More…

Data Conversion Laboratory, Inc.   61-18 190th St., 2nd Floor, Fresh Meadows, NY 11365   718-357-8700   convert@dclab.com

Copyright © 1997-2008  Data Conversion Laboratory, Inc. All rights reserved.