|
The
Real Story on XML
THERE HAS BEEN a tremendous amount of buzz in the last few years about XML and how it will revolutionize how information is used, managed, exchanged, and presented. A 1998 technology report went so far as to say that "XML will revolutionize the exchange of business information similar to the way the phone, fax machine, and photocopier did when those devices were invented." But talk of revolution is bold talk indeed. In the next few pages, we will attempt to provide a high level overview about what XML is, how it compares to its predecessor HTML, and what we see in the near future. Please remember that predicting the future is dangerous business (if you don't believe us, just ask your local TV weatherman!). These are our opinions, based on 20 years in the industry (long before XML, HTML, or SGML for that matter). Data
Formats
Data Issues
This is a
concise list of data use issues. (As an aside, if you feel that we are
missing one, your feedback is
welcome). Data "Consumer
Reports"
It is critical to emphasize that each organization should only evaluate data formats based on the issues that are important to them. For instance, if "Distributing Page Image Representations" is the ONLY issue that is important, the PDF is a very good option (maybe the best option). However, when you look at all of the data issues (most of which ARE important to 'high-tech' companies), you start to understand why there is such a buzz around XML. But if XML is rated so highly, why is HTML still around? To understand that question, let's take a closer look at HTML and how it compares to XML. HTML vs. XMLBoth HTML and XML are "mark-up languages", meaning that there are tags applied to impart meaning to the data. HTML (Hypertext Markup Language) is:
XML (Extensible Markup Language) is:
To illustrate this, let's look at an example of the tagging for HTML vs. XML In HTML:
In XML:
XML typically tells us about the data; HTML tells us about the formatting. SGML is
the Foundation This is a result of two main issues that are particularly true of HTML and XML:
Today's Markup REQUIREMENTS are defined by content creation, management, and distribution requirements, which are currently defined as:
Today's Markup ACCEPTANCE is driven by effectiveness and ROI. XML meets the business needThe reality is that XML is simpler and easier to create and distribute than SGML. Features that are important for web delivery have been retained (elements, attributes, linking, validation), while least used and most difficult to implement features dropped (marked sections, inclusions, exclusions). In addition, XML is extensible, which means transformation capabilities and data-type standards are inherent to the format. So is XML the 'Silver Bullet' for content? Not so fast. XML is:
These limitations can make XML a difficult format to migrate to. This is particularly true of large and/or complex materials that are typically characterized by elaborate tables, equations, cross-referencing, special characters, footnotes, and complex imaging requirements, including hotspots. Another issue is that there is no single XML standard like there is for HTML. There are several reasons for this, including:
To take the point further, data models tend to be turned to internal processes and priorities. Since every company differs in those areas, it's natural that the data models would differ aswell. At the same time, it's important for industries to strive to establish interchange data models, which will be subsets of the internal data models of the participating companies. So what about SGML?Does XML replace SGML today? MAYBE!
If you're starting up now, XML is easier to implement, and the tools are pretty much in existence. At the very least you should make your application XML-ready (meaning that the data should be structured in a manner that will allow it to meet (or almost meet) the restrictions of XML if that is desired in the future. However, if your project is already in process - e.g. you've already defined a DTD, or are using an industry standard DTD that works for you - there's no reason to change in midstream to XML, as you do get the same benefits, and you've done most of the hard work already. Also, some data sets use 'Exclusions' and 'Inclusions' (rules that say the data is only applicable to some models or parts, but not all), and these are not currently allowed in XML (but are allowed in SGML). Does XML replace SGML in the future? PROBABLY!
Does XML replace HTML today? NO!
Does XML replace HTML in the future? YES!
XML will not replace HTML as a formatting language. But XML should and certainly will take the place of HTML as a source language for many types of applications. Conclusion Clearly, XML is not for everyone. Each organization has to evaluate the benefits and make a thorough analysis to understand if the business case justifies the expense and effort to migrate to XML. As the technology matures (away from the bleeding edge) and tools become easier, cheaper, and more powerful, the business case will become easier to validate. Post ScriptData Conversion Laboratory's expertise in SGML and XML is recognized in a variety of forums. DCL's president, Mark Gross, recently authored the chapter on legacy document conversion to XML for Charles Goldfarb's XML Handbook (Prentice Hall), and is currently authoring the Conversion chapter for Columbia University's The Columbia Guide to Digital Publishing. DCLstaff frequently speak on document conversion at leading industry conferences. You can learn more about XML by going to our Technical Library which is a collection of resources about data conversion and related topics gathered from past issues of DCLnews, various papers and presentations from DCL, and materials available in other places. The Library is in a state of evolution and is being updated frequently - so stop by often. If you are planning to migrate your data to XML (or are just thinking about it), we would be happy to discuss your project with you, and explain how DCL can help put you on track to getting the most out of XML, by fully integrating all your existing documents and data in the most efficient and cost-effective way possible. Don
Bridges >>> You can contact us on (800) 321-2816 x267 or send us an e-mail at sales@dclab.com |