|
|
|
|
Click
here for a printer-friendly
version of this report
The
Real Story on XML
DCL's
Don Bridges (pictured) provides the lowdown on XML, looking at how it
compares to its predecessors, SGML and HTML, and at what to expect in
the near future...
THERE HAS
BEEN a tremendous amount of buzz in the last few years about XML and
how it will revolutionize how information is used, managed, exchanged,
and presented. A 1998 technology report went so far as to say that "XML
will revolutionize the exchange of business information similar to the
way the phone, fax machine, and photocopier did when those devices were
invented." But talk of revolution is bold talk indeed.
In
the next few pages, we will attempt to provide a high level overview
about what XML is, how it compares to its predecessor HTML, and what
we see in the near future. Please remember that predicting the future
is dangerous business (if you don't believe us, just ask your local
TV weatherman!). These are our opinions, based on 20 years in the industry
(long before XML, HTML, or SGML for that matter).
Data
Formats
Before
we get too far, it's probably a good idea to map out the data format
landscape, as it exists today. It's a bit of an "alphabet soup"
with lots of acronyms to choose from.
|
TIFF
|
A
common format exchanging raster (bitmapped) images between application
programs. The equivalent of a photographic image of the page usually
produced through scanning. May also be identifying metadata attached
to the image, but the text appearing in the image is not available
for searching.
|
|
PDF
|
Created
by Adobe and an acronym for "Portable Document Format,"
PDF is a proprietary print format intended to reproduce documents
as originally composed. Requires the freely available Adobe Acrobat
Reader to view, print, and search PDF documents.
|
|
HTML
|
Hypertext
Markup Language is the set of "markup" tags, loosely
modeled on SGML, and specifically intended to support files for
display on the web. This markup tells the web browser how to display
a web page's text and images.
|
|
SGML
|
Standard
Generalized Markup Language is an internationally agreed standard
for information representation. It provides an architecture for
defining document tag sets for a wide variety of applications.
The tag sets allow the appearance and text to be separated and
reformatted for different uses.
|
|
XML
|
Extensible
Markup Language is a streamlined version of SGML which makes it
possible to use and display information in different ways by defining
its structure and elements
|
Data
Issues
With that bit of housekeeping behind us, we can look at Data Use issues
(which, in the final analysis, is why one format is considered 'better'
than another). DCL feels that the requirements should always drive the
solution. So we have put together a concise list of six areas that data
formats can be evaluated on. Of course, your organization may feel
that some areas are more important than others, but more about that
later. The six areas are:
| Distributing
Page Image Representations |
Ability
to distribute and produce an exact page image with exact fonts,
composition, and page integrity.
|
| Repurposing |
Ability
to create new versions of data suitable for derivative uses (e.g.,
the web, diagnostic equipment, hand-held devices, etc.) or customized
applications (e.g., showing one 'view' of a data set to a mechanic
and a different "view" of the same data set to an operator)
|
| Searching |
Ability
to find information through text searches and through more advanced
(e.g., Boolean) searches that depend on context and "understanding"
|
| Component
Re-use |
Ability
to use portions of data for different products and different documentation
sets. Automation of the data process comes into play here.
|
| Enforce
Data Standards |
Ability
to assure that the information produced is produced consistently
and meets corporate standards
|
| Interchange
with Vendors, Customers, and the World |
Ability
for others to use your information for communications with others
and to incorporate into products belonging to other organizations
|
This is
a concise list of data use issues. (As an aside, if you feel that we
are missing one, your feedback
is welcome).
Data
"Consumer Reports"
So
the natural question is "How do different technologies compare
against these issues?" Glad you asked. With all due respect to
Consumer Reports, we present the results of our "Battle of the
Data Formats":
| Data
Use Issue |
TIFF |
PDF |
HTML |
XML |
SGML |
| Distributing
Page Images |
 |
 |
 |
 |
 |
| Re-purposing |
 |
 |
 |
 |
 |
| Searching |
 |
 |
 |
 |
 |
| Component
Reuse |
 |
 |
 |
 |
 |
| Enforce
Standards |
 |
 |
 |
 |
 |
| Interchange |
 |
 |
 |
 |
 |
|

None
|

Limited
|

Good
|

Very
Good
|

Excellent
|
It is critical
to emphasize that each organization should only evaluate data formats
based on the issues that are important to them. For instance, if "Distributing
Page Image Representations" is the ONLY issue that is important,
the PDF is a very good option (maybe the best option). However, when
you look at all of the data issues (most of which ARE important to 'high-tech'
companies), you start to understand why there is such a buzz around
XML.
But if
XML is rated so highly, why is HTML still around? To understand that
question, let's take a closer look at HTML and how it compares to XML.
HTML vs.
XML
Both HTML and XML are "mark-up languages", meaning that there
are tags applied to impart meaning to the data.
HTML (Hypertext
Markup Language) is:
- Pervasive
and supported means of describing information for web transmission
- Limited
structure, reuse, interchange, and automation
- Uses
tags to describe how information should appear
XML (Extensible
Markup Language) is:
- Destined
to become the mainstream technology in web applications where high
degrees of reuse, interchange and automation are required.
- Tags
are separated from the formatting, which means that the tags tell
you what the data means - not how it looks.
To illustrate
this, let's look at an example of the tagging for HTML vs. XML
In HTML:
<p>
<b> P266 Laptop</b>
<br>
<i>Friendly Computer Shop</i>
<Br>$1438
</p>
In XML:
<product>
<model>P266 Laptop</model>
<dealer>Friendly Computer Shop</dealer>
<price>$1438</price>
</product>
XML typically
tells us about the data; HTML tells us about the formatting.
|
|
NEWS FLASH!!!
DCL PROVIDES ONLINE
ACCESS TO NEW XML
TECHNICAL LIBRARY
Data Conversion Laboratory announces
the launch of their new online
technical library. This new library gives anyone FREE access
to insider information about XML and SGML,
e-books, technical documentation, and scientific
and educational publishing.
Go
to: http://www.dclab.com/dcllibrary.asp
|
|
SGML
is the Foundation
Before there was HTML or XML, there was SGML. SGML became an ISO standard
in 1986 (ISO 8879). SGML has been adopted and implemented by many industries
in many applications (DCL performed one of the first large scale SGML
conversions for General Motors in 1986). SGML is rich in syntax and
very extensible, and today's markup language implementations (HTML)
and Variations (XML) owe their usefulness to SGML. But if SGML is the
foundation, HTML and XML are the evolutionary applications that are
strong, reliable, and cost effective.
This is
a result of two main issues that are particularly true of HTML and XML:
- Content
creation and rendering is easy
- Content
management and distribution tools are available and affordable
Today's
Markup REQUIREMENTS are defined by content creation, management, and
distribution requirements, which are currently defined as:
- Paper
- Web
- Custom
Applications
Today's
Markup ACCEPTANCE is driven by effectiveness and ROI.
XML meets
the business need
The
reality is that XML is simpler and easier to create and distribute than
SGML. Features that are important for web delivery have been retained
(elements, attributes, linking, validation), while least used and most
difficult to implement features dropped (marked sections, inclusions,
exclusions). In addition, XML is extensible, which means transformation
capabilities and data-type standards are inherent to the format.
So is XML
the 'Silver Bullet' for content? Not so fast.
XML is:
- Not
a print format
- Not
suitable for unstructured information
- Requires
planning
These limitations
can make XML a difficult format to migrate to. This is particularly
true of large and/or complex materials that are typically characterized
by elaborate tables, equations, cross-referencing, special characters,
footnotes, and complex imaging requirements, including hotspots.
Another
issue is that there is no single XML standard like there is for HTML. There
are several reasons for this, including:
- Everyone
uses data differently
- Each
industry has its issues
- Not
all XML is created equal
- There
will always be new ways to use data
- Creative
approaches lead to a competitive advantage
To take
the point further, data models tend to be turned to internal processes
and priorities. Since every company differs in those areas, it's natural
that the data models would differ aswell. At the same time, it's important
for industries to strive to establish interchange data models, which
will be subsets of the internal data models of the participating companies.
So what
about SGML?
Does
XML replace SGML today? MAYBE!
- XML
is designed for data delivery, not authoring
- XML
simplifies the delivery and rendering of complex data
If
you're starting up now, XML is easier to implement, and the tools are
pretty much in existence. At the very least you should make your application
XML-ready (meaning that the data should be structured in a manner that
will allow it to meet (or almost meet) the restrictions of XML if that
is desired in the future.
However,
if your project is already in process - e.g. you've already defined
a DTD, or are using an industry standard DTD that works for you - there's
no reason to change in midstream to XML, as you do get the same benefits,
and you've done most of the hard work already.
Also,
some data sets use 'Exclusions' and 'Inclusions' (rules that say the
data is only applicable to some models or parts, but not all), and these
are not currently allowed in XML (but are allowed in SGML).
Does
XML replace SGML in the future? PROBABLY!
- XML
tools will become plentiful, powerful, and cheap
- XML
data structures (schemas) will become standards
- Web
based interfaces will become reliable
So what
about HTML?
Does
XML replace HTML today? NO!
- HTML
is easy and free
- HTML
works well for a majority of web users
- HTML
is universal
Does
XML replace HTML in the future? YES!
- Users
will expect more from their Web experience
- Web
based interfaces will become reliable
XML will
not replace HTML as a formatting language. But XML should and certainly
will take the place of HTML as a source language for many types of applications.
Conclusion
There
is a buzz about XML in the market, and for good reason. Is it a revolution? Technically,
no. Is it the final answer for data formats forever? History tells
us no. But it may revolutionize the way that we use information to share
and leverage information.
Clearly,
XML is not for everyone. Each organization has to evaluate the benefits
and make a thorough analysis to understand if the business case justifies
the expense and effort to migrate to XML. As the technology matures
(away from the bleeding edge) and tools become easier, cheaper, and
more powerful, the business case will become easier to validate.
Post Script
Data
Conversion Laboratory's expertise in SGML and XML is recognized in a variety
of forums. DCL's president, Mark Gross, recently authored the chapter
on legacy document conversion to XML for Charles Goldfarb's XML
Handbook (Prentice Hall), and is currently authoring the Conversion
chapter for Columbia University's The
Columbia Guide to Digital Publishing. DCLstaff frequently speak on
document conversion at leading industry conferences.
You can
learn more about XML by going to our Technical
Library which is a collection of resources about data conversion
and related topics gathered from past issues of DCLnews, various papers
and presentations from DCL, and materials available in other places.
The Library is in a state of evolution and is being updated frequently
- so stop by often.
If you
are planning to migrate your data to XML (or are just thinking about
it), we would be happy to discuss your project with you, and explain
how DCL can help put you on track to getting the most out of XML, by
fully integrating all your existing documents and data in the most efficient
and cost-effective way possible.
Don
Bridges
Account Manager for Technical Documents
Data Conversion Laboratory
Back
to top
|
|
|
|
|
|
Best Practices Santa Fe, NM, September 15-17, 2008. More…
XyUser Phoenix, AZ, September 22-24, 2008. More…
9th Annual Vasont Users' Group Meeting, Hershey, PA, October 6-8, 2008. More…
ATA e-Business Europe, Budapest, Hungary, October 21-23 2008. More...
DITA/TECHCOMM 2008, Raleigh, NC, November 3-6 2008. More…
|
| |
|
|
Doc Train Life Sciences Indianapolis, IN, June 23-25, 2008. More…
X-Pubs London, England, June 22-24, 2008. More…
Mark Logic User San Francisco, CA, June 10-12, 2008. More…
PTC User Long Beach, CA, June 2-4, 2008. More…
Ultramain User Conference 2008, Albuquerque, NM, May 11-15, 2008. More…
Documentation and Training West 2008 Vancouver, BC, May 6-9, 2008. More…
CMS/DITA Santa Clara, CA, April 7-9, 2008. More…
DIA Med Comm Orlando, FL, March 10-11, 2008. More…
DIA EDM Philadelphia, PA, February 5-7, 2008. More…
Gilbane Boston Conference Boston, MA, November 29, 2007. More…
The LavaCon Conference on Advanced Technical Communication and Project Management New Orleans, LA, October 27-30, 2007. More…
2007 ATA e-Business Forum Miami, Florida, Oct 17-19, 2007. More…
DITA 2007™-East, Raleigh, North Carolina, October 4-6, 2007. More…
2007 XyUser Group Fall Conference, Boston, MA, Sept 23-26, 2007. More…
Mark Logic 2007 User Conference, San Francisco, CA, May 15-17, 2007. More…
Content Management Strategies/DITA North America
Conference 2007, Boston, MA, March 26-28, 2007. More…
DIA 18th Annual Workshop,
San Diego, CA. March 4-7, 2007. More…
DIA 2007 EDM & CDM Conference, Philadelphia, PA, Feb 6 - 8, 2007. More…
DITA 2007 – West, San Jose, CA, February 5-7, 2007. More…
Framemaker 2006 Chautauqua, Austin, TX, Nov 8-10, 2006. More…
PTC/User World Event 2006, Grapevine, TX, June 4-6. More…
19th Annual DIA Conference Philadelphia, PA, February 7-9. More…
XyUser's Conference, San Diego, California, September 11-14. DCL's Don Bridges delivered a presentation on "Content Reuse" More…
Structured Product Labeling, Washington, DC, August 23-24. More…
Tri-XML 2005, Raleigh, NC , July 28. DCL's Don Bridges delivered a presentation on "Content Reuse" More…
Pharmaceutical Labeling and Product Identification, Whippany, NJ, June 16-17. DCL's Don Bridges delivered a presentation on "Structured Product Labeling (SPL) and the Implications of Implementing an XML Solution." More…
More…
|
|