|
||||
| DCLab.com | About DCL | Tech Info | Press Info | Contact Us | DCLNews | Partners | Wiki | Client Area | ||||
|
Mike Gross, Chief Technical Officer at DCL, reveals the secrets of using XML document conversion tools for effective roundtripping and legacy document conversion. AS XML HAS GAINED in popularity, there is growing interest in producing documents that are stored in XML. However, since most people aren't familiar with XML, there is a need to be able to take documents originally authored in traditional authoring formats like MS Word and convert them to XML. This is referred to as "legacy document conversion." (More on legacy document conversion)
However, since many documents are updated on a regular basis (usually by authors expert in their field, but without XML knowledge) there is also a need for a "roundtrip conversion" capability. This involves converting documents to a proprietary publishing format (such as Word, WordPerfect, Quark, or InDesign), which authors can edit in their favorite word processor or Desktop Publisher with the intent that when done, the documents will be converted back to XML format. Off-the-shelf Document Conversion Tools Various tools on the market support converting back and forth between XML and DTP/word processing formats. These attempt to map XML tagging structures to the stylesheets found in publishing software. They also offer some ability to apply customized rules and conditions to the transformation. Customized conversion rules and conditions are needed because there is rarely an exact mapping between XML tagging and stylesheets, and there are features supported on one side but not on the other. For example, the tag nesting and document hierarchies in XML are not easy to simulate with the much "flatter" structure of a stylesheet. The conversion of a document from XML to a publishing format is usually straightforward - particularly if you have described each element of your material with XML codes, and have built a DTD or Schema to further constrain your content. The potential problems lie in converting documents back from publishing formats to XML - the "roundtripping" part. This is because publishing tools contain many features that allow users to create colorful and intricate designs - the ones you see in glossy magazines and corporate brochures. Most of these, however, are impossible to map directly into an XML tagging structure. To successfully roundtrip documents you need to build a comprehensive publishing stylesheet. This will have "containers" that hold your XML structure. That way, when you convert documents back to XML from the DTP or word processor format, the structure will be reasonably intact. It is also important to define a set of authoring rules that must be enforced among the authors - otherwise you risk "misplacing" information on the return trip. The following guidelines will help ensure smoother roundtripping:
The above guidelines will allow round-tripping tools to do a better job. But since each tool has its own unique capabilities, you'll need to assess the capabilities and limitations of the software available before setting up a roundtripping strategy. Performing legacy document conversion using off-the-shelf tools You might be tempted to use roundtripping tools. However, be aware you'll only get good results if the legacy documents were written in a strict environment and the authors knew how to use the publishing software properly. This is very, very rare. The harsh reality is that even getting a good document to convert in the controlled roundtripping environment is not always possible. Authors are usually experts in their own field, but have little knowledge of publishing tools. They know how to use the basic formatting buttons - such as bold, italic and indents - to make pages come out the way they want them to look. But they have little knowledge of how to set up even simple stylesheets. When given an "authoring spec" they are often bewildered. Therefore, it is unrealistic to expect to easily convert legacy documents that were authored primarily with the intention of producing good looking documents on paper. What's more, such documents would often have been created to tight deadlines, since documentation is often the last rung on the ladder to delivering a new product or service. Such pressure means little time to worry about the niceties of using word processors and DTP systems correctly. "Whatever works" is the maxim of the day. In addition, the structure of the DTD or Schema may not have been built with these types of documents in mind, leaving you without a tagging structure to hold the content of documents created with publishing software. Assess the effort involved These are by no means all of the challenges you are likely to face (more articles on document conversion). But the key thing to remember here is that off-the-shelf tools are suitable for converting documents that were authored with a definite XML structure in mind. If you use them to convert all your legacy materials, you may well be able to get some (or even a lot) of the conversion right. But if your documents are somewhat complex, you will likely have to do a good deal of work on them before they are ready for prime time. The bottom line is: These tools will work when you can carefully control the environment. However, if there is uncontrollable variation, more specialized or tailored tools may be a better choice. Mike Gross |
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Data Conversion Laboratory, Inc. 61-18 190th St., 2nd Floor, Fresh Meadows, NY 11365 718-357-8700 convert@dclab.com Copyright © 1997-2008 Data Conversion Laboratory, Inc. All rights reserved. |