DCL  
     Refer a friend Send this Page to a Friend
     Print friendly version Printer-Friendly Format

    Resource Center

    Fact Sheets

    White Papers

JoAnn Hackos on Moving Legacy Documentation into DITA: An Interview

In this exclusive interview, Dr. JoAnn Hackos, content management and information design expert, gives her best advice on what organizations need to know about moving legacy documentation to DITA. Dr. Hackos, president of Comtech Services, Inc. and the director of the Center for Information-Development Management, has written five books on the field of content management and Web communication. Her latest book, Information Development: Managing Your Documentation Projects, Portfolio and People (Wiley 2006), is the new bible on managing the document development life cycle.

DCL: First, can you define legacy documentation? What is it?

...for an organization to do it all themselves, they may have a martyr complex.

JH: There are two ways people define legacy documentation. When you are moving to a content management system, using DITA and XML, everything that exists at this point is legacy documentation. But there's a second definition: Among your previously existing information, some of it we may call legacy because it documents products that are not changing much. Much of this information isn't worth changing. There's low value in converting or updating it.

However, many companies have a document suite in which some information is very changeable and volatile, and some is not. They still want everything eventually to be in XML and DITA so that it is all compatible and in a uniform style. They don't want some of it in Word, some in FrameMaker, and some of it in XML. That's legacy documentation that they need to change into a consistent new form.

Why would anyone want to convert legacy documents to DITA?

If you look across a documentation suite, you find basically the same content in various contexts except that they've been written a bit differently in each case. Maintenance costs are higher because you have to update the same information in multiple places. The likelihood of error increases because it's likely you are going to miss something that needs to be updated. And your translation costs skyrocket because you are translating the same content into multiple languages for multiple outputs. And because you are also translating "slightly different" content, translation costs you even more.

In addition, your development costs are higher because your subject matter experts have to review the same information over and over. You have many writers writing the same materials. You also have a clutter of topics that everyone has to sort through every time they look for something. The best advice is to pursue minimalism: Figure out how you want to say it. Say it once. Get it into your repository, and lock it down.

You have an organization with thousands of pages of documents. You can't possibly afford, or want, to convert them all. How does an organization decide what to convert?

That's a question on everyone's minds. How am I ever going to get this content moved to XML, and presumably, to the DITA model? Consider prioritizing. First, you have legacy data that is highly unlikely to change. You have documents that are at the end of their life cycle. That's what you don't convert, or you convert last. Such information should be very stable and have a stable customer base. You want to leave that where it is. It may be happy there for a very long time.

On the other side are your high priority materials. This material is highly volatile; will be changing a lot. You are working on new releases, and you have new content to add. For example, you might have to develop a lot of new topics, but you have existing material that you think is in good shape. You can convert some and get a lot of value out of that conversion. The outcome depends on the information being well structured.

Next, you have the middle ground of information that is not well structured nor what you want it to be. Still you would like to avoid retyping or cutting and pasting everything. It would be valuable to go through a multi-step process to see if you can create some active source in DITA. But you must recognize that you might want to make some significant changes.

How do you know what you need to do? How do you evaluate what a vendor can do for you?

The first step is to look at your own content. Make some decisions and set priorities. Find the right process to use to get the highest value. Do some test cases with a group like DCL that has some intelligent conversion tools that may improve the quality after the first tests. DCL actually refines the process so you get better output. Be certain you get out the three core DITA topic types: Concept, Task, and Reference, not some generic topics with few semantics.

Sometimes what you get is useless, because it's exactly what you put in it. I know conversion systems that we can call "handy-dandy topic splitters." The software is simplistic. In an unstructured document, it merely accounts for heading levels. A heading defines a topic. If you happen to have two headings in line without content between them, you would get one topic with no content. Another danger is that a vendor might convert unstructured content into a DITA base. It's minimal DITA, with little semantic structure. To get to DITA, you have to start all over again.

Should you rewrite content first or convert it right away?

One school of thought suggests doing some restructuring of poorly structured documentation first. I want writers to look at their content from a minimalist point of view. Ask yourself, 'Have you been maintaining information that has low value to customers, which your customers don't use?' Perhaps it's out of date. Or it's become common industry knowledge, and yet you are still maintaining it. Or perhaps it has more technical detail than customers want. If it's no longer relevant-let it go.

Then look at the relevant content and ask, 'Is this content in a form that is best going forward?' Perhaps it was written with a novice user in mind and now you have experienced users. You may want to streamline the procedures and design a different structure. Then, for conversion, consider determining: 'Here's how we want it organized. Can we use a script to get us half-way there?'

Do not accept the old structures for the future. The new structure not only should be DITA; it's going to be more structured and more effective. In the process you may find information that can move to DITA and information that could be dropped. You could devise a fairly sophisticated conversion script to do some of the work.

You might have content that is already well structured in the original. Conversion to DITA is going to give you a lot of value. Then you have some content that is so badly done that you don't want to use it. This information may have to be completely rewritten. You might even need to start from scratch.

You will likely have a considerable middle ground-that's where most people are. You have valuable information with good content. You want to move to DITA but in the process of getting there you want to make some intelligent decisions about using that content in the future. Put all your decisions about your content in at least three buckets: what to leave back as legacy, what to convert, and what to rewrite. One caution, if you convert information that is badly structured, it becomes even harder to fix later.

Let's say that you have a training manual, online help information, and a user manual. These sources might have a lot of the same information. After conversion, are you left with the same chunks of information all over the place?

That's redundant content. DCL has a process for analyzing that. They take a large body of content and locate chunks that are either absolutely redundant or close to a match. I think this process can be valuable. Otherwise, you have to do it by hand. We call that a commonality analysis. DCL offers an important service, one that is more intelligent than some I've seen.

Without this analysis, will you have the same pieces of information multiple times in your database?

Yes, you would then have a bigger problem than if you tried to know where the commonalities are ahead of time. As you are doing your analysis and prioritizing, you need to step in to say, 'We're trying to get maximum reuse out of our content. We need to do commonality analysis as well as a prioritization.' You can do the commonality analysis ahead of time by hand (a lot of hard copy reading), use the process DCL has, or you can put it all in your database and have a big mess.

One of the measures of success of a content management implementation is to minimize the number of words in the repository needed to create the most output. If you have 100 words in your repository and you have an HTML output and a PDF output that total 200 words, you would have 100% reuse. You don't want anything in the repository to be the redundant.

How much rewriting can an organization expect to do to make documents DITA ready?

Everyone is struggling with the task ahead. If the content is in really bad shape, you might want to outsource to a writing group. There's nothing to convert, so you have to rewrite. Or you can hire people to restructure the content. You have to decide if there's enough return on investment to justify the rewriting and restructuring effort and cost.

What's the alternative?

Staying where you are.

That doesn't sound like a very good alternative.

It's probably not. But the mix of conversion, restructuring, and rewriting has to carefully determined.

So that's part of the decision making on how much to convert? How much do you have to rewrite to get to the point where you can convert?

Well, it's all about the prioritization. One thing you do is start analyzing your existing information. You may be shocked by what you have. It may have been written long time ago by someone who doesn't work for you any more. You've always just kept it going. It may even be technically wrong.

I know every project is different but is there an average amount of work that can be done by software versus the amount of human work that has to be done?

They are all very different, so you have to take this with that proviso in mind. A company whose information is in pretty good structural shape could probably convert 50 percent and restructure 50 percent. That's providing the content is well structured and reasonably consistent.

Is this something an organization can take on alone, or is it better to bring in the experts?

First, for an organization to do it all themselves, they may have a martyr complex. But really, they should consult experts and bring them in early. In so many cases, if you've gone off in the wrong direction, it's very hard to recover. It's difficult to say, 'Lets throw out everything we've done so far.' You want to start in a good direction. And I think people understand that. You will save money and time by working with experts. I've had clients who have said, 'We could have done this ourselves, but we avoided the pitfalls we couldn't have anticipated.'

Can you give us a broad sense of where content reuse is being applied on a large scale in industry?

The interest in content reuse has been strong and continues to grow because there have been so many staff reductions and so many increases in the volume of work. Organizations simply cannot afford to maintain multiple sources. Every day I hear someone say they are maintaining the same thing in a help system and in print. They can't afford to do it anymore.

And DITA? How fast is it being adopted? Is there long way to go, or is there anything new on the horizon that might compete with DITA by offering something it doesn't?

DITA is still new. But I believe it's close to tipping. The costs of change are still high, but the tools are getting significantly better. The level of interest is enormous. We've never seen anything like this before-certainly not since the adoption of desktop publishing in the early 1980s. The number of companies interested is high. People are trying to understand it. The attendance at conferences and workshops is strong. I don't really see anything else on the horizon. There's interest in the S1000D aerospace standard, but many find this standard to be very complex and specialized.

Would you advise a company that can't quite make the move to transforming everything to at least start using DITA in some way?

Yes, choose a new project and start it outright. In most companies, you can't say that you're going to start now and not have anything for five years (while you change over all your documentation). You have to start with a small project and prove that you can do something effective.

Comtech Services was founded in 1978 and focuses on helping customers provide effective products and information to their customers and employees. JoAnn can be reached through her web site at http://www.comtech-serv.com/index.shtml.

DCLNews Editorial

 
representational space
    Popular Links

    Events

    Recent Events

representational space
representational space representational space representational space representational space representational space representational space representational space


Corporate office:
61-18 190th St., 2nd Floor, Fresh Meadows, NY 11365, P: 718-357-8700
Data Conversion Lab
Copyright © 1997-2009  Data Conversion Laboratory, Inc. All rights reserved.