Data Conversion Laboratory, Revolutionizing Publishing for the Digital Age 
  DCLab.com | About DCL | Tech Info | Press Info | Contact Us | DCLNews | Partners | Wiki | Client Area     
menu
Data Conversion Lab

About DCL
  Why go to DCL?
  Clients
  Company Background
  Management
  DCL in the News
  Events
  Mission

DCL News
  Current Issue
  Back Issues
  Subscribe

Technology
  Technology Resources
  FAQ's
  Glossary
  Presentations
  DCL Work Tracking

Press Info

Clients' Area

Contact DCL
  Directions
  Request Estimate
  Positions

Books2Bytes
Popular Pages
* Current Issue of DCLnews
* DCL featured in The Columbia Guide to Digital Publishing
* Slash Document Costs
* Ann Rockley on ROI in CM
* PDF Resources
* XML Conversion Resources
* Roundtrip Document Conversion
* DCL Resources Library
*

Converting Legacy Data...

*

Aviation & Aerospace

*

PDF Conversion to XML & MS-Word

*

PDF Conversion

*

Quark to XML

* Getting Content into XML
Fact Sheets
* Public Access for Research Materials
* S1000D Conversion
* Content Reuse Assessment
* Document Conversion
* SPL - Pharmaceutical Industry
* Harmonizer™
* Jeppesen Map Revision Service
Technical Papers
* Why STM Publishers Should Use XML...
* Department of Defense and the Power of XML
* Your Data in XML
* SGML to SGML 1
* SGML to SGML 2
* Quark to XML
* Plan Ahead
* Do it Yourself?
* Encyclopedia
Presentations
* Conversion to XML: Documents versus Data (11/2003)
* Data Migration Considerations  (6/2003)
* Technology for Cost-Containment and Efficiency  (4/2003)
* Converting Textbooks to Meet the National XML Standard for Accessibility  (3/2003)
* More Presentations

Should I stay or should I go (from SGML to XML)?

Mike Gross, Chief Technical Officer at DCL, answers the big question of whether you should move from SGML to XML.

Mike GrossMike Gross (pictured) is responsible for solution engineering at DCL. He has been solving digital publishing conversion problems at DCL for almost 20 years, where he has overseen thousands of legacy conversion projects. One of the most common questions he gets asked is whether you should move from SGML to XML. People understandably want to know whether it is necessary (i.e. will SGML go out of use) and whether it will benefit them.

DCLnews caught up with Mike during a quiet moment in his busy schedule and put these and other often asked questions to him.

Q: I'm already in SGML. Do I need to start moving over to XML?

A: If you are already in SGML with production systems in place, you've already gotten past the hard part and there may be no real immediate purpose in leaving SGML. Typically, you would need to contemplate such a move if you are considering new changes and technologies that support XML to the exclusion of SGML, or if the vendors that support your current system will be dropping SGML support. In general, the old adage "If it ain't broke, don't fix it," may apply here.

Q: I'm just starting out. Do I go to SGML or XML?

A: Although there are useful features in SGML that were left out of XML, the reality is that XML has become a worldwide accepted standard, with many new, powerful technologies built around it, such as XSLT, XLINK, XSLFO, XML Schema, and MATHML, as well as many XML support tools - along with the continually growing trend of enhanced XML support within popular mainstream software titles (such as Microsoft Office). Most applications can work around the features that were left out in the transition from SGML to XML. So if you are starting out today (and don't need to work with a preexisting SGML DTD), our recommendation in most cases would be to go for XML.

Q: If I've already put my data into SGML, and I want to tap some of XML's power, have I lost my investment?

OTHER XML RESOURCES ON DCLAB.COM

Converting from Quark to XML

Converting Adobe PageMaker and InDesign documents to XML

XML & SGML - What's the Difference?

DCL Technical Library, XML pages

A: Your investment in having built a database in SGML is certainly not lost, and as we said above, it may simply be unnecessary to stop existing SGML projects in order to retool in XML. In general terms, XML is a subset of SGML, and both allow you to represent your information using structured markup (and taking unstructured documents and transforming them into structured XML/SGML markup really is the hard part). Having said that, for large legacy document systems built around SGML, a complete migration to XML can be complex and involve substantial expense.

Migration of an SGML document set from SGML to XML involves two main tasks – porting the DTD to XML, and then porting the actual documents from SGML compliant markup to XML compliant markup. The porting of documents so that they are XML compliant, though nontrivial, can often be done with publicly available tools that transform SGML documents to XML compliant ones (by doing tasks such as fixing case sensitivity, adding in minimized tagging, and adding the extra slash (/) at the end of empty tags).

There may be some infrequent cases where you can run into problems trying to make an SGML document XML compliant, but typically, this is straightforward. In some cases, you might even have SGML documentation sets that are already XML compliant and nothing needs to be done, other than fixing empty tags.

Converting SGML DTDs to XML

The more difficult issue is actually migrating the legacy SGML DTD(s) to conform to the restricted DTD features that the XML standard mandates. In order to simplify XML, certain features that were allowed in building SGML DTDs, such as inclusions, exclusions, and the "&" content model connector, were removed from XML DTD support (partially to make XML parsers easier to build.) Lack of inclusion support (inclusions allow an element to occur anywhere inside another element) means that if your DTD made heavy usage of this feature, it'll take a partial overhaul to support that functionality in XML.

Exclusions (the ability to exclude particular tags in specific contexts) are a bigger problem. It may not be possible to rewrite a DTD in XML and specify tagging requirements in exactly the same way as they were done in SGML – and may even require new tagging to be inserted in the XML marked-up documents.

The other DTD features left out of XML could be problematic as well, and can require DTD changes that then require changes to the actual XML document tagging.

What this means in practice, is that for many complex DTDs (including some industry standard ones), the DTD overhaul itself may cause you to modify some document tagging, which in itself is a "can of worms," because if you've got approved documents, you may have to go through a whole re-approval process. If you've got thousands of documents, you could be looking at an expensive migration process.

Are there any shortcuts?

So it is probably prudent to ask yourself why you are doing the migration. If management has mandated XML, then you may have no choice. If you've got publishing tools that are no longer supported by the vendors (or the vendor is pulling SGML support), then you may also have no choice. If you want to take make use of certain technologies that are XML only, then it may also be wise to do a full migration.

But if you need XML simply to make use of XML document publishing tools, a compromise solution may be workable. You can continue to author in SGML (assuming your authoring/editing environment still includes SGML support, as many still do), validate the documents against the SGML DTD, then use migration tools to convert the SGML document instance to a well-formed XML document. (Please remember that XML only requires a document be well-formed - it doesn't actually require a DTD).

This XML version of the document can then be used with readily available publishing tools. So it could, for instance, be rendered over the web within IE (as well as other XML enabled Web browsers), or be published as a PDF document, via XSLFO document rendering tools. Thus, this hybrid solution may allow you to avoid the pain of migrating complex DTDs to XML, while allowing you to make use of some of XML's publishing power. This solution will not work for everybody, but it could allow you to tap into the world of XML while minimizing time and expense.

For more information on the specific details of migrating from SGML to XML, we refer you to an excellent article by Norman Walsh which can be found at http://www.xml.com/pub/a/98/07/dtd/index.html.

Automated transition

The transition from SGML to XML, while not trivial, is not a complex one. Most things that need to be done can be done automatically through software filters, the DTD can be made XML compliant, and much existing SGML software has already been ported to support XML. The key features that are not supported in XML and will require clean up are things like: putting quotes around attributes, proper casing of tags, and removing tag minimization. More details may be found in "SGML to XML Conversion Strategies" by Richard Lander.

DCLnews editorial
March 3rd, 2004

  Structured Product Labeling

Content Reuse

Subscribe

Books2Bytes

DCL Library

Columbia Guide
GSA Schedule
AIA Member
DCL Calendar

Ultramain User Conference 2008, Albuquerque, NM, May 11-15, 2008. More…

PTC User Long Beach, CA, June 2-4, 2008. More…

Mark Logic User San Francisco, CA, June 10-12, 2008. More…

X-Pubs London, England, June 22-24, 2008. More…

Doc Train Life Sciences Indianapolis, IN, June 23-25, 2008. More…

Best Practices Santa Fe, NM, September 15-17, 2008. More…
XyUser Phoenix, AZ, September 22-24, 2008. More…
9th Annual Vasont Users' Group Meeting, Hershey, PA, October 6-8, 2008. More…

DITA/TECHCOMM 2008, Raleigh, NC, November 3-6 2008. More…

ATA e-Business Europe. Details TBA.

 
DCL Calendar

Documentation and Training West 2008 Vancouver, BC, May 6-9, 2008. More…

 
Recent News

CMS/DITA Santa Clara, CA, April 7-9, 2008. More…

DIA Med Comm Orlando, FL, March 10-11, 2008. More…

DIA EDM Philadelphia, PA, February 5-7, 2008. More…

Gilbane Boston Conference Boston, MA, November 29, 2007. More…

The LavaCon Conference on Advanced Technical Communication and Project Management New Orleans, LA, October 27-30, 2007. More…

2007 ATA e-Business Forum Miami, Florida, Oct 17-19, 2007. More…

DITA 2007™-East, Raleigh, North Carolina, October 4-6, 2007. More…

2007 XyUser Group Fall Conference, Boston, MA, Sept 23-26, 2007. More…

Mark Logic 2007 User Conference, San Francisco, CA, May 15-17, 2007. More…

Content Management Strategies/DITA North America Conference 2007, Boston, MA, March 26-28, 2007. More…

DIA 18th Annual Workshop, San Diego, CA. March 4-7, 2007. More…

DIA 2007 EDM & CDM Conference, Philadelphia, PA, Feb 6 - 8, 2007. More…

DITA 2007 – West, San Jose, CA, February 5-7, 2007. More…

Framemaker 2006 Chautauqua, Austin, TX, Nov 8-10, 2006. More…

PTC/User World Event 2006, Grapevine, TX, June 4-6. More…

19th Annual DIA Conference Philadelphia, PA, February 7-9. More…

XyUser's Conference, San Diego, California, September 11-14. DCL's Don Bridges delivered a presentation on "Content Reuse" More…

Structured Product Labeling, Washington, DC, August 23-24. More…

Tri-XML 2005, Raleigh, NC , July 28. DCL's Don Bridges delivered a presentation on "Content Reuse" More…

Pharmaceutical Labeling and Product Identification, Whippany, NJ, June 16-17. DCL's Don Bridges delivered a presentation on "Structured Product Labeling (SPL) and the Implications of Implementing an XML Solution." More…

More…

Data Conversion Laboratory, Inc.   61-18 190th St., 2nd Floor, Fresh Meadows, NY 11365   718-357-8700   convert@dclab.com

Copyright © 1997-2008  Data Conversion Laboratory, Inc. All rights reserved.