|
|
DITA-izing Your Documents: Five Issues to Think About When Converting Your Legacy Publications to DITA
By Michael Gross, DCLnews
Converting your legacy document collections to any XML markup scheme presents challenges such as inconsistent input data, documents that don't fit the DTD/Schema, page-based references (such as "See Top of Next Page"), and documents "shoehorned" to fit into a paper layout of the word processor of desktop publishing package. DITA adds a few more challenges.
The Darwin Information Typing Architecture (DITA), an XML markup scheme developed by IBM and targeted at technical documentation, has in the last few years moved to the forefront of XML tagging schemes. It promises greater flexibility, extensibility, document reuse, and along with it, cost savings. While these are benefits that XML has promised for years, DITA seems to take us closer to the realization of the promise.
But while DITA shows great promise, converting legacy documents to DITA presents some additional challenges. This article outlines five issues that you will likely have to face as you pour existing documentation into DITA.
-
Topic Breakdown - Topics are probably the most important new concept within DITA. The idea is to break down conventional documents into topics that can stand on their own. Each topic becomes separate units or files. These topics can then be reassembled into more traditional manuals by using a feature of DITA called maps. By breaking down documents into many standalone topics you increase the reusability of your data. For example, if you manufacture digital cameras and produce manuals for many cameras that each use the same memory card and mechanism, you can isolate the discussion of changing the memory card to its own topic. You can then more easily share that topic among the many manuals that need information. The challenge when converting legacy data, is that the delineation of the appropriate location to break down your data into stand-alone topics probably does not exist in your original document. Your documents were probably designed for paper manuals, so you'll need someone who knows the data well, perhaps even a Subject Matter Expert, go through the document to mark the pieces of your documents that make sense as separate topics. Since topics can contain subtopics, some serious thought needs to be put into this process.
-
Document Reauthoring - Sometimes, because of the way a legacy document is written, a particular section of text might be a perfect candidate for its own topic, except that the section might contain side discussions of related issues. Ideally, well structured DITA would have you re-author that section so that the topic can really stand on its own. For instance, in the previous example, the digital camera section on changing the memory card might digress into a discussion of how many pictures can be stored in your memory card, which might be different for each camera. If you can isolate that discussion, you will have a more ideal topic. This might involve just moving some paragraphs, or it might involve a significant amount of re-authoring. With an already approved documentation set, this can be time consuming and expensive. You might decide in the initial legacy conversion to try to get as close as possible to breaking the documents into topics without re-authoring, perhaps leaving that task for a later stage; step-wise refinement is often not a bad idea.
-
Identifying the Topic Type - In addition to breaking the documents into topics, DITA provides for three built-in topic types, Tasks (such as "Changing the Memory Card"), Concepts ("How Lighting Affects Your Pictures"), and References ("Physical Specifications of the Camera"). The types of topics would normally not have been defined in the source data. Just as in deciding where to break down your topics, you may need someone who knows the data to go through it and define DITA topic types. DITA also provides an extensibility mechanism called Specializations. If your tagging needs are very specific, specializations can be a good idea, but you'll need a way to decide when these specialized topics are called for.
-
Content Reuse - In addition to the ability to break down documents into topics. DITA provides a mechanism called CONREF, which allows you to reuse chunks of text by referring to them from other documents. Reducing the amount of duplicated text is always a good idea, and cleaning up your data may result in more candidates for CONREF reuse. So for instance, you might have a warning throughout your manuals "This camera is not waterproof. Please do not use this camera in the rain." CONREF allows you to place that text in one location, and pull it in any number of others, thereby only having to maintain one version of this text. If you want to take advantage of the CONREF mechanisms as part of your legacy conversion, you'll need to think about how you might look through your document set and find candidates for CONREF reuse.
-
Domain Elements - Standard DITA provides for certain "Domain Elements" that IBM has provided for software documentation. Some tags can be used to markup Software Elements, others are provided to indicate User Interface Elements of a piece of software. If your documentation is written for software, then using these tags will enhance your marked-up topics. In legacy conversion this is something else that may be hard to discern just from the look of the source data. Different elements may be marked up using the same appearance, making it difficult to determine which tag to use. In addition, the DITA specialization mechanism allows you to add your own specific Domain Elements. These too may be difficult to apply in an automated fashion, and may require someone to manually go through the data and determine which tags are to be used.
We do feel that DITA is a major breakthrough and offers a lot of promise and potential. Getting there may require more effort than traditional XML, but you'll get a significant return on that investment. Nevertheless, there is effort required to get there. Planning ahead and considering the issues discussed will help you plan ahead and let you get there faster.
DCLnews Editorial
|
|
|
|
|
CIDM Best Practices Conference September 13–15, 2010 Hampton, Virginia
Vasont Users' Group Meeting September 27–30, 2010 Hershey, Pennsylvania
Internet Librarian Conference October 25–27, 2010 Monterey, California
Journal Article Tag Suite Conference (JATS-Con) November 1–2, 2010 Bethesda, Maryland
SPARC Digital Repositories Meeting November 8–9, 2010 Baltimore, Maryland
More Events »
|
|
|
|
 |
|
|