Data Conversion Laboratory, Revolutionizing Publishing for the Digital Age 
  DCLab.com | About DCL | Tech Info | Press Info | Contact Us | DCLNews | Partners | Wiki | Client Area     
menu
Data Conversion Lab

About DCL
  Why go to DCL?
  Clients
  Company Background
  Management
  DCL in the News
  Events
  Holiday Calendar
  Mission

DCL News
  Current Issue
  Back Issues
  Subscribe

Technology
  Technology Resources
  FAQ's
  Glossary
  Presentations
  DCL Work Tracking

Press Info

Clients' Area

Contact DCL
  Directions
  Request Estimate
  Positions

Books2Bytes
Popular Pages
* Current Issue of DCLnews
* DCL featured in The Columbia Guide to Digital Publishing
* Slash Document Costs
* Ann Rockley on ROI in CM
* PDF Resources
* XML Conversion Resources
* Roundtrip Document Conversion
* DCL Resources Library
*

Converting Legacy Data...

*

Aviation & Aerospace

*

PDF Conversion to XML & MS-Word

*

PDF Conversion

*

Quark to XML

* Getting Content into XML
Fact Sheets
* Public Access for Research Materials
* S1000D Conversion
* Content Reuse Assessment
* Document Conversion
* SPL - Pharmaceutical Industry
* Harmonizer™
* Jeppesen Map Revision Service
Technical Papers
* Why STM Publishers Should Use XML...
* Department of Defense and the Power of XML
* Your Data in XML
* SGML to SGML 1
* SGML to SGML 2
* Quark to XML
* Plan Ahead
* Do it Yourself?
* Encyclopedia
Presentations
* Conversion to XML: Documents versus Data (11/2003)
* Data Migration Considerations  (6/2003)
* Technology for Cost-Containment and Efficiency  (4/2003)
* Converting Textbooks to Meet the National XML Standard for Accessibility  (3/2003)
* More Presentations

DITA-izing Your Documents: Five Issues to Think About When Converting Your Legacy Publications to DITA


By Michael Gross, DCLNews

Converting your legacy document collections to any XML markup scheme presents challenges such as inconsistent input data, documents that don't fit the DTD/Schema, page-based references (such as "See Top of Next Page"), and documents "shoehorned" to fit into a paper layout of the word processor of desktop publishing package. DITA adds a few more challenges.

The Darwin Information Typing Architecture (DITA), an XML markup scheme developed by IBM and targeted at technical documentation, has in the last few years moved to the forefront of XML tagging schemes. It promises greater flexibility, extensibility, document reuse, and along with it, cost savings. While these are benefits that XML has promised for years, DITA seems to take us closer to the realization of the promise.

But while DITA shows great promise, converting legacy documents to DITA presents some additional challenges. This article outlines five issues that you will likely have to face as you pour existing documentation into DITA.

  1. Topic Breakdown - Topics are probably the most important new concept within DITA. The idea is to break down conventional documents into topics that can stand on their own. Each topic becomes separate units or files. These topics can then be reassembled into more traditional manuals by using a feature of DITA called maps. By breaking down documents into many standalone topics you increase the reusability of your data. For example, if you manufacture digital cameras and produce manuals for many cameras that each use the same memory card and mechanism, you can isolate the discussion of changing the memory card to its own topic. You can then more easily share that topic among the many manuals that need information. The challenge when converting legacy data, is that the delineation of the appropriate location to break down your data into stand-alone topics probably does not exist in your original document. Your documents were probably designed for paper manuals, so you'll need someone who knows the data well, perhaps even a Subject Matter Expert, go through the document to mark the pieces of your documents that make sense as separate topics. Since topics can contain subtopics, some serious thought needs to be put into this process.

  2. Document Reauthoring - Sometimes, because of the way a legacy document is written, a particular section of text might be a perfect candidate for its own topic, except that the section might contain side discussions of related issues. Ideally, well structured DITA would have you re-author that section so that the topic can really stand on its own. For instance, in the previous example, the digital camera section on changing the memory card might digress into a discussion of how many pictures can be stored in your memory card, which might be different for each camera. If you can isolate that discussion, you will have a more ideal topic. This might involve just moving some paragraphs, or it might involve a significant amount of re-authoring. With an already approved documentation set, this can be time consuming and expensive. You might decide in the initial legacy conversion to try to get as close as possible to breaking the documents into topics without re-authoring, perhaps leaving that task for a later stage; step-wise refinement is often not a bad idea.

  3. Identifying the Topic Type - In addition to breaking the documents into topics, DITA provides for three built-in topic types, Tasks (such as "Changing the Memory Card"), Concepts ("How Lighting Affects Your Pictures"), and References ("Physical Specifications of the Camera"). The types of topics would normally not have been defined in the source data. Just as in deciding where to break down your topics, you may need someone who knows the data to go through it and define DITA topic types. DITA also provides an extensibility mechanism called Specializations. If your tagging needs are very specific, specializations can be a good idea, but you'll need a way to decide when these specialized topics are called for.

  4. Content Reuse - In addition to the ability to break down documents into topics. DITA provides a mechanism called CONREF, which allows you to reuse chunks of text by referring to them from other documents. Reducing the amount of duplicated text is always a good idea, and cleaning up your data may result in more candidates for CONREF reuse. So for instance, you might have a warning throughout your manuals "This camera is not waterproof. Please do not use this camera in the rain." CONREF allows you to place that text in one location, and pull it in any number of others, thereby only having to maintain one version of this text. If you want to take advantage of the CONREF mechanisms as part of your legacy conversion, you'll need to think about how you might look through your document set and find candidates for CONREF reuse.

  5. Domain Elements - Standard DITA provides for certain "Domain Elements" that IBM has provided for software documentation. Some tags can be used to markup Software Elements, others are provided to indicate User Interface Elements of a piece of software. If your documentation is written for software, then using these tags will enhance your marked-up topics. In legacy conversion this is something else that may be hard to discern just from the look of the source data. Different elements may be marked up using the same appearance, making it difficult to determine which tag to use. In addition, the DITA specialization mechanism allows you to add your own specific Domain Elements. These too may be difficult to apply in an automated fashion, and may require someone to manually go through the data and determine which tags are to be used.

We do feel that DITA is a major breakthrough and offers a lot of promise and potential. Getting there may require more effort than traditional XML, but you'll get a significant return on that investment. Nevertheless, there is effort required to get there. Planning ahead and considering the issues discussed will help you plan ahead and let you get there faster.

DCLNews Editorial
June 2007

  Structured Product Labeling

Content Reuse

Subscribe

Books2Bytes

DCL Library

Columbia Guide
GSA Schedule
AIA Member
DCL Calendar

Best Practices Santa Fe, NM, September 15-17, 2008. More…
XyUser Phoenix, AZ, September 22-24, 2008. More…
9th Annual Vasont Users' Group Meeting, Hershey, PA, October 6-8, 2008. More…

DITA/TECHCOMM 2008, Raleigh, NC, November 3-6 2008. More…

ATA e-Business Europe. Details TBA.

 
Recent News

Doc Train Life Sciences Indianapolis, IN, June 23-25, 2008. More…

X-Pubs London, England, June 22-24, 2008. More…

Mark Logic User San Francisco, CA, June 10-12, 2008. More…

PTC User Long Beach, CA, June 2-4, 2008. More…

Ultramain User Conference 2008, Albuquerque, NM, May 11-15, 2008. More…

Documentation and Training West 2008 Vancouver, BC, May 6-9, 2008. More…

CMS/DITA Santa Clara, CA, April 7-9, 2008. More…

DIA Med Comm Orlando, FL, March 10-11, 2008. More…

DIA EDM Philadelphia, PA, February 5-7, 2008. More…

Gilbane Boston Conference Boston, MA, November 29, 2007. More…

The LavaCon Conference on Advanced Technical Communication and Project Management New Orleans, LA, October 27-30, 2007. More…

2007 ATA e-Business Forum Miami, Florida, Oct 17-19, 2007. More…

DITA 2007™-East, Raleigh, North Carolina, October 4-6, 2007. More…

2007 XyUser Group Fall Conference, Boston, MA, Sept 23-26, 2007. More…

Mark Logic 2007 User Conference, San Francisco, CA, May 15-17, 2007. More…

Content Management Strategies/DITA North America Conference 2007, Boston, MA, March 26-28, 2007. More…

DIA 18th Annual Workshop, San Diego, CA. March 4-7, 2007. More…

DIA 2007 EDM & CDM Conference, Philadelphia, PA, Feb 6 - 8, 2007. More…

DITA 2007 – West, San Jose, CA, February 5-7, 2007. More…

Framemaker 2006 Chautauqua, Austin, TX, Nov 8-10, 2006. More…

PTC/User World Event 2006, Grapevine, TX, June 4-6. More…

19th Annual DIA Conference Philadelphia, PA, February 7-9. More…

XyUser's Conference, San Diego, California, September 11-14. DCL's Don Bridges delivered a presentation on "Content Reuse" More…

Structured Product Labeling, Washington, DC, August 23-24. More…

Tri-XML 2005, Raleigh, NC , July 28. DCL's Don Bridges delivered a presentation on "Content Reuse" More…

Pharmaceutical Labeling and Product Identification, Whippany, NJ, June 16-17. DCL's Don Bridges delivered a presentation on "Structured Product Labeling (SPL) and the Implications of Implementing an XML Solution." More…

More…

Data Conversion Laboratory, Inc.   61-18 190th St., 2nd Floor, Fresh Meadows, NY 11365   718-357-8700   convert@dclab.com

Copyright © 1997-2008  Data Conversion Laboratory, Inc. All rights reserved.