|
|
With DITA implementations on the rise, and an entrenched DocBook community already in place, the resulting market interest has spurred interest in automated DocBook to DITA conversion. So I would expect offerings of automated DocBook to DITA conversion scripts to emerge in the next 6-10 months. This article addresses the real questions, "What should I expect from automated tools?" and "Will they work for me?" from the viewpoint of live experience with numerous DocBook to DITA conversions. The answers to these questions are not usually obvious. As there have been a number of articles recently published about the differences between and the advantages/disadvantages of DocBook and DITA (http://www.dclab.com/converting_to_dita, http://www.dclab.com/dita_legacy.asp, http://www.dclab.com/dita_docbook.asp), I'll move right into the topic at hand - how to evaluate, and improve the quality, of what you get out of DocBook to DITA conversion tools and services. The following is a list of questions to ask yourself in the evaluation process:
1. Is my DocBook data compatible with an out-of-the-box automated conversion script?
DITA is different than traditional document layouts in its emphasis on modules of reusable content (topics) that can be strung together in different ways - it's not necessarily a linear presentation in the way traditional books are. With "pure DITA" the more modularity and reuse the better. However most DocBook documents were not authored assuming modularized DITA stand alone topics (for example many times a number of procedures were included as one - under the one heading, or the same procedure in several places might have minor variations). The purist approach would require re-authoring it all to be "true" DITA, but that's usually prohibitive. The conversion process may deal with this issue at a number of levels and may require workarounds to maintain existing data layout and produce valid DITA. See http://www.dclab.com/dita_topic.asp for more examples of such incompatibility. 3. How consistent is my source data?
4. What are the topic types within my document set? Topics are groupings of information that one would consider a reasonable module that could be repeated in other places if necessary. The next factors to evaluate are the various topic types in your document set as they would apply to DITA: task, concept and reference. Since DocBook does not distinguish its components based on these types, the content you want outputted as a task may have identical source tagging to the content you want outputted as a concept. If the entire document consists of only the same type of topics (like concept) or 80%-90% of the same topic type, then out of the box conversion script may still be a good option. Otherwise consider that you would need a way to impart the information on the various topic types to minimize after-the-fact rework. 5. What is my level of topic granularity? Technical documentation often has multiple levels of heading hierarchy, and the headings might mean different levels of granularity in different parts of the documents, and since most documents were authored without a modularized DITA idea of stand alone topics, new topics will vary based on the actual content. So the "why don't you just…" rules like each <block> becoming a new DITA topic will likely not work that well. There's more sophistication needed to make this work well
6. What are my data enrichment requirements? The true benefits of DITA come from features and tagging that don't exist in your source documents. Adding those tags and features is what we mean here by data enrichment. It's best to truly understand what kinds of added (or enriched) tags you want from your conversion and to find a solution that will support those new requirements. DITA has a large number of specialized tags like User Interface Element to add additional levels of granularity. For example in your existing source data <uicontrol> may be done just as bolded text, making it difficult for software to distinguish when <emphasis role="bold"> should be converted to a <b> tag and when it should be converted to <uicontrol>. Another example is a use of conrefs to facilitate content re-use. If the variable that you'd like to re-use via conrefs are done as regular text, special conversion routines may need to be developed to add previously not available information. (NOTE: In most cases it's not a simple search/replace as phrases often appear as part of the bigger context that can be impacted by the conversion. For example, you want to replace "DCL" with a variable that will be defined as "Data Conversion Laboratory Inc.", but if your text contains this term in another context too, like Part No. 235-DCL-0001 then a simple search/replace type of action would introduce an error. Understanding your data enrichment requirements prior to the conversion is an important checklist item to ensure that you've selected an effective conversion solution. Sometimes an automated script will not be able meet all these requirements So if you are ready to consider converting your DocBook docs to DITA, make sure you thoroughly analyze your document set first, and get answers to some of these questions by asking yourself and asking your vendor. Doing this will take you a long way down the path of knowing if and how you can move to DITA and enjoy all the benefits it offers.
DCLNews Editorial
|
|
|
|
|
|
|
|
|
|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||