Data Conversion Laboratory, Revolutionizing Publishing for the Digital Age 
  DCLab.com | About DCL | Tech Info | Press Info | Contact Us | DCLNews | Partners | Wiki | Client Area     
menu
Data Conversion Lab

About DCL
  Why go to DCL?
  Clients
  Company Background
  Management
  DCL in the News
  Events
  Holiday Calendar
  Mission

DCL News
  Current Issue
  Back Issues
  Subscribe

Technology
  Technology Resources
  FAQ's
  Glossary
  Presentations
  DCL Work Tracking

Press Info

Clients' Area

Contact DCL
  Directions
  Request Estimate
  Positions

Books2Bytes
Popular Pages
* Current Issue of DCLnews
* DCL featured in The Columbia Guide to Digital Publishing
* Slash Document Costs
* Ann Rockley on ROI in CM
* PDF Resources
* XML Conversion Resources
* Roundtrip Document Conversion
* DCL Resources Library
*

Converting Legacy Data...

*

Aviation & Aerospace

*

PDF Conversion to XML & MS-Word

*

PDF Conversion

*

Quark to XML

* Getting Content into XML
Fact Sheets
* Public Access for Research Materials
* S1000D Conversion
* Content Reuse Assessment
* Document Conversion
* SPL - Pharmaceutical Industry
* Harmonizer™
* Jeppesen Map Revision Service
Technical Papers
* Why STM Publishers Should Use XML...
* Department of Defense and the Power of XML
* Your Data in XML
* SGML to SGML 1
* SGML to SGML 2
* Quark to XML
* Plan Ahead
* Do it Yourself?
* Encyclopedia
Presentations
* Conversion to XML: Documents versus Data (11/2003)
* Data Migration Considerations  (6/2003)
* Technology for Cost-Containment and Efficiency  (4/2003)
* Converting Textbooks to Meet the National XML Standard for Accessibility  (3/2003)
* More Presentations

JoAnn Hackos on Moving Legacy Documentation into DITA: An Interview

In this exclusive interview, Dr. JoAnn Hackos, content management and information design expert, gives her best advice on what organizations need to know about moving legacy documentation to DITA. Dr. Hackos, president of Comtech Services, Inc. and the director of the Center for Information-Development Management, has written five books on the field of content management and Web communication. Her latest book, Information Development: Managing Your Documentation Projects, Portfolio and People (Wiley 2006), is the new bible on managing the document development life cycle.

DCL: First, can you define legacy documentation? What is it?

...for an organization to do it all themselves, they may have a martyr complex.

JH: There are two ways people define legacy documentation. When you are moving to a content management system, using DITA and XML, everything that exists at this point is legacy documentation. But there's a second definition: Among your previously existing information, some of it we may call legacy because it documents products that are not changing much. Much of this information isn't worth changing. There's low value in converting or updating it.

However, many companies have a document suite in which some information is very changeable and volatile, and some is not. They still want everything eventually to be in XML and DITA so that it is all compatible and in a uniform style. They don't want some of it in Word, some in FrameMaker, and some of it in XML. That's legacy documentation that they need to change into a consistent new form.

Why would anyone want to convert legacy documents to DITA?

If you look across a documentation suite, you find basically the same content in various contexts except that they've been written a bit differently in each case. Maintenance costs are higher because you have to update the same information in multiple places. The likelihood of error increases because it's likely you are going to miss something that needs to be updated. And your translation costs skyrocket because you are translating the same content into multiple languages for multiple outputs. And because you are also translating "slightly different" content, translation costs you even more.

In addition, your development costs are higher because your subject matter experts have to review the same information over and over. You have many writers writing the same materials. You also have a clutter of topics that everyone has to sort through every time they look for something. The best advice is to pursue minimalism: Figure out how you want to say it. Say it once. Get it into your repository, and lock it down.

You have an organization with thousands of pages of documents. You can't possibly afford, or want, to convert them all. How does an organization decide what to convert?

That's a question on everyone's minds. How am I ever going to get this content moved to XML, and presumably, to the DITA model? Consider prioritizing. First, you have legacy data that is highly unlikely to change. You have documents that are at the end of their life cycle. That's what you don't convert, or you convert last. Such information should be very stable and have a stable customer base. You want to leave that where it is. It may be happy there for a very long time.

On the other side are your high priority materials. This material is highly volatile; will be changing a lot. You are working on new releases, and you have new content to add. For example, you might have to develop a lot of new topics, but you have existing material that you think is in good shape. You can convert some and get a lot of value out of that conversion. The outcome depends on the information being well structured.

Next, you have the middle ground of information that is not well structured nor what you want it to be. Still you would like to avoid retyping or cutting and pasting everything. It would be valuable to go through a multi-step process to see if you can create some active source in DITA. But you must recognize that you might want to make some significant changes.

How do you know what you need to do? How do you evaluate what a vendor can do for you?

The first step is to look at your own content. Make some decisions and set priorities. Find the right process to use to get the highest value. Do some test cases with a group like DCL that has some intelligent conversion tools that may improve the quality after the first tests. DCL actually refines the process so you get better output. Be certain you get out the three core DITA topic types: Concept, Task, and Reference, not some generic topics with few semantics.

Sometimes what you get is useless, because it's exactly what you put in it. I know conversion systems that we can call "handy-dandy topic splitters." The software is simplistic. In an unstructured document, it merely accounts for heading levels. A heading defines a topic. If you happen to have two headings in line without content between them, you would get one topic with no content. Another danger is that a vendor might convert unstructured content into a DITA base. It's minimal DITA, with little semantic structure. To get to DITA, you have to start all over again.

Should you rewrite content first or convert it right away?

One school of thought suggests doing some restructuring of poorly structured documentation first. I want writers to look at their content from a minimalist point of view. Ask yourself, 'Have you been maintaining information that has low value to customers, which your customers don't use?' Perhaps it's out of date. Or it's become common industry knowledge, and yet you are still maintaining it. Or perhaps it has more technical detail than customers want. If it's no longer relevant-let it go.

Then look at the relevant content and ask, 'Is this content in a form that is best going forward?' Perhaps it was written with a novice user in mind and now you have experienced users. You may want to streamline the procedures and design a different structure. Then, for conversion, consider determining: 'Here's how we want it organized. Can we use a script to get us half-way there?'

Do not accept the old structures for the future. The new structure not only should be DITA; it's going to be more structured and more effective. In the process you may find information that can move to DITA and information that could be dropped. You could devise a fairly sophisticated conversion script to do some of the work.

You might have content that is already well structured in the original. Conversion to DITA is going to give you a lot of value. Then you have some content that is so badly done that you don't want to use it. This information may have to be completely rewritten. You might even need to start from scratch.

You will likely have a considerable middle ground-that's where most people are. You have valuable information with good content. You want to move to DITA but in the process of getting there you want to make some intelligent decisions about using that content in the future. Put all your decisions about your content in at least three buckets: what to leave back as legacy, what to convert, and what to rewrite. One caution, if you convert information that is badly structured, it becomes even harder to fix later.

Let's say that you have a training manual, online help information, and a user manual. These sources might have a lot of the same information. After conversion, are you left with the same chunks of information all over the place?

That's redundant content. DCL has a process for analyzing that. They take a large body of content and locate chunks that are either absolutely redundant or close to a match. I think this process can be valuable. Otherwise, you have to do it by hand. We call that a commonality analysis. DCL offers an important service, one that is more intelligent than some I've seen.

Without this analysis, will you have the same pieces of information multiple times in your database?

Yes, you would then have a bigger problem than if you tried to know where the commonalities are ahead of time. As you are doing your analysis and prioritizing, you need to step in to say, 'We're trying to get maximum reuse out of our content. We need to do commonality analysis as well as a prioritization.' You can do the commonality analysis ahead of time by hand (a lot of hard copy reading), use the process DCL has, or you can put it all in your database and have a big mess.

One of the measures of success of a content management implementation is to minimize the number of words in the repository needed to create the most output. If you have 100 words in your repository and you have an HTML output and a PDF output that total 200 words, you would have 100% reuse. You don't want anything in the repository to be the redundant.

How much rewriting can an organization expect to do to make documents DITA ready?

Everyone is struggling with the task ahead. If the content is in really bad shape, you might want to outsource to a writing group. There's nothing to convert, so you have to rewrite. Or you can hire people to restructure the content. You have to decide if there's enough return on investment to justify the rewriting and restructuring effort and cost.

What's the alternative?

Staying where you are.

That doesn't sound like a very good alternative.

It's probably not. But the mix of conversion, restructuring, and rewriting has to carefully determined.

So that's part of the decision making on how much to convert? How much do you have to rewrite to get to the point where you can convert?

Well, it's all about the prioritization. One thing you do is start analyzing your existing information. You may be shocked by what you have. It may have been written long time ago by someone who doesn't work for you any more. You've always just kept it going. It may even be technically wrong.

I know every project is different but is there an average amount of work that can be done by software versus the amount of human work that has to be done?

They are all very different, so you have to take this with that proviso in mind. A company whose information is in pretty good structural shape could probably convert 50 percent and restructure 50 percent. That's providing the content is well structured and reasonably consistent.

Is this something an organization can take on alone, or is it better to bring in the experts?

First, for an organization to do it all themselves, they may have a martyr complex. But really, they should consult experts and bring them in early. In so many cases, if you've gone off in the wrong direction, it's very hard to recover. It's difficult to say, 'Lets throw out everything we've done so far.' You want to start in a good direction. And I think people understand that. You will save money and time by working with experts. I've had clients who have said, 'We could have done this ourselves, but we avoided the pitfalls we couldn't have anticipated.'

Can you give us a broad sense of where content reuse is being applied on a large scale in industry?

The interest in content reuse has been strong and continues to grow because there have been so many staff reductions and so many increases in the volume of work. Organizations simply cannot afford to maintain multiple sources. Every day I hear someone say they are maintaining the same thing in a help system and in print. They can't afford to do it anymore.

And DITA? How fast is it being adopted? Is there long way to go, or is there anything new on the horizon that might compete with DITA by offering something it doesn't?

DITA is still new. But I believe it's close to tipping. The costs of change are still high, but the tools are getting significantly better. The level of interest is enormous. We've never seen anything like this before-certainly not since the adoption of desktop publishing in the early 1980s. The number of companies interested is high. People are trying to understand it. The attendance at conferences and workshops is strong. I don't really see anything else on the horizon. There's interest in the S1000D aerospace standard, but many find this standard to be very complex and specialized.

Would you advise a company that can't quite make the move to transforming everything to at least start using DITA in some way?

Yes, choose a new project and start it outright. In most companies, you can't say that you're going to start now and not have anything for five years (while you change over all your documentation). You have to start with a small project and prove that you can do something effective.

Comtech Services was founded in 1978 and focuses on helping customers provide effective products and information to their customers and employees. JoAnn can be reached through her web site at http://www.comtech-serv.com/index.shtml.

DCLNews Editorial
May 2007

  Structured Product Labeling

Content Reuse

Subscribe

Books2Bytes

DCL Library

Columbia Guide
GSA Schedule
AIA Member
DCL Calendar

Best Practices Santa Fe, NM, September 15-17, 2008. More…
XyUser Phoenix, AZ, September 22-24, 2008. More…
9th Annual Vasont Users' Group Meeting, Hershey, PA, October 6-8, 2008. More…

DITA/TECHCOMM 2008, Raleigh, NC, November 3-6 2008. More…

ATA e-Business Europe. Details TBA.

 
Recent News

Doc Train Life Sciences Indianapolis, IN, June 23-25, 2008. More…

X-Pubs London, England, June 22-24, 2008. More…

Mark Logic User San Francisco, CA, June 10-12, 2008. More…

PTC User Long Beach, CA, June 2-4, 2008. More…

Ultramain User Conference 2008, Albuquerque, NM, May 11-15, 2008. More…

Documentation and Training West 2008 Vancouver, BC, May 6-9, 2008. More…

CMS/DITA Santa Clara, CA, April 7-9, 2008. More…

DIA Med Comm Orlando, FL, March 10-11, 2008. More…

DIA EDM Philadelphia, PA, February 5-7, 2008. More…

Gilbane Boston Conference Boston, MA, November 29, 2007. More…

The LavaCon Conference on Advanced Technical Communication and Project Management New Orleans, LA, October 27-30, 2007. More…

2007 ATA e-Business Forum Miami, Florida, Oct 17-19, 2007. More…

DITA 2007™-East, Raleigh, North Carolina, October 4-6, 2007. More…

2007 XyUser Group Fall Conference, Boston, MA, Sept 23-26, 2007. More…

Mark Logic 2007 User Conference, San Francisco, CA, May 15-17, 2007. More…

Content Management Strategies/DITA North America Conference 2007, Boston, MA, March 26-28, 2007. More…

DIA 18th Annual Workshop, San Diego, CA. March 4-7, 2007. More…

DIA 2007 EDM & CDM Conference, Philadelphia, PA, Feb 6 - 8, 2007. More…

DITA 2007 – West, San Jose, CA, February 5-7, 2007. More…

Framemaker 2006 Chautauqua, Austin, TX, Nov 8-10, 2006. More…

PTC/User World Event 2006, Grapevine, TX, June 4-6. More…

19th Annual DIA Conference Philadelphia, PA, February 7-9. More…

XyUser's Conference, San Diego, California, September 11-14. DCL's Don Bridges delivered a presentation on "Content Reuse" More…

Structured Product Labeling, Washington, DC, August 23-24. More…

Tri-XML 2005, Raleigh, NC , July 28. DCL's Don Bridges delivered a presentation on "Content Reuse" More…

Pharmaceutical Labeling and Product Identification, Whippany, NJ, June 16-17. DCL's Don Bridges delivered a presentation on "Structured Product Labeling (SPL) and the Implications of Implementing an XML Solution." More…

More…

Data Conversion Laboratory, Inc.   61-18 190th St., 2nd Floor, Fresh Meadows, NY 11365   718-357-8700   convert@dclab.com

Copyright © 1997-2008  Data Conversion Laboratory, Inc. All rights reserved.