|
|
WHITE PAPER Tell me again ... Why
should I care about XML? Converting to XML not only
gives you the ability to publish documents to the Web, print, CD-ROM, and to
handheld devices at the click of a button, it also brings very real cost savings
...
In this white paper we look at the benefits of XML and discover how much it costs to
get those benefits. We also look at strategies for increasing benefits
while at the same time keeping costs down. Plus we touch on PDF -- often
viewed as an XML alternative -- and discuss when it is appropriate to use
it instead of XML. Note that much of the information in this document
applies to SGML, which is the "parent" of XML.
What is
XML? XML (eXtensible Markup Language) is a means of
representing text information so that:
- Only standard text
(both ASCII and
Unicode) is used within a document
- No formatting
information is contained in the document. (A Document Type Definition,
or DTD, can be set up to allow formatting information to be included in
the XML tagging).
- All document
elements are clearly identified (for example, <title>Why
XML?</title>).
- The document
typically conforms to a predefined template or DTD. Strictly speaking
you don't have to use a DTD, but it is highly recommended that you do.
- Mechanisms are
provided for linking text within a document to information within the
same or other documents. The information being linked can be any XML
structure including tables, figures, paragraphs, headings, and so
on.
(NOTE: Developers invented an XML sub-technology,
or "vocabulary," called XLink. XLink is a more powerful
way of linking from one item to another than is possible in the standard
XML mechanism. It allows you to link to an arbitrary place in a document.
In standard linking, like that found in HTML, you can only link to something
if you've got an anchor to it. If you've got a complex document this can
mean inserting thousands of anchors -- a laborious task. With XLink you
to point anywhere you like without anchors).
Key benefits of
XML The benefits of using XML as a document representation format are
great and apply across all areas of industry. Let's look at what you gain when you adopt
XML:
- Content
identification - Perhaps the most important aspect of XML is that
text elements are identified, not on the basis of what they look like,
but on the basis of what they are -- that is, of their significance in
the context of a document. The <title> example above
illustrates this, but the concept goes well beyond identifying things
like titles, captions, or body text. Depending on need, warning
paragraphs can be identified, procedures can be identified in terms of
who they are applicable to, and assembly parts can be identified. Tags
are user-defined for each document set, so different documents can be
tagged in different ways.
- Databasing - An XML tagged document can be viewed as
fielded text. The fielding makes it possible to break documents
down to their component parts to any degree of granularity for storage
in a document management system. The documents can then be re-assembled
in different ways, and for different audiences, without the need to track
multiple document versions. This is particularly important in cases where different audiences may need to see different
versions of a document (in the military, for example, you might have a "top security" clearance version
and a "standard" version).
In this way,
boilerplate text, such as a standard warning, can be stored once for use
in many manuals. When the warning text is changed, it is changed once,
not each time it appears. Also, the warning will appear the same way
each time it appears, thus avoiding the embarrassment of incorrect
text.
- Enforced
structure - XML documents are composed in accordance with a DTD,
or Schema,
which defines the legal tag set for that document type. It also defines
valid and invalid relationships between elements (for example, a
<header 2> tag might be defined as valid only when it comes
after a <header 1> tag). This "enforced structure" ensures
that documents have uniformity -- even when coming from diverse sources.
- Merging
materials from diverse suppliers - The uniform structure and lack of
internal formatting makes it easy to merge documents into seamless
document sets -- even if they are coming in from different facilities.
An XML compliant document management system can track the individual
pieces by contributor, if necessary.
- International
Standard - XML is an international standard that is maintained by
an independent standards' committee, which means it enjoys widespread
support across industry boundaries and gets extensive support from
vendors. Being an international standard also means that there are a
wide variety of XML editing, document management, validation, and
publishing tools available at a range of price and quality
levels.
- Industry
standardization - Many industries have adopted standardized XML
DTDs to allow documents to be easily exchanged across different areas of
industry. In fact,
developing inter-industry, data exchange standards based on XML
is currently the big thing amongst both developers and firms alike (Microsoft's BizTalk is an example). Aside
from industries coming up with standard DTDs, many organizations have developed
new tag sets to fit their subject field. The newspaper industry, for example,
recently came up with its own XML-based markup language, called SportsML, makes it easier for sports writers and editors
to format, store, and publish sports information for newspapers,
websites, and other media. Plus there's MathML
and ChemML for the
sciences.
- Platform
independent - Because "raw" XML consists only of ASCII and
Unicode approved characters
(the tags themselves are represented in ASCII), XML data can be moved
freely between all hardware and operating system platforms that support
these character sets. There are no hardware or operating systems
that do not support the ASCII character set and Unicode is now widely
supported. The Internet
Explorer and Netscape browsers, for example, support it, as do most plain
text editors.
- Software
independent - As noted, there are a wide variety of XML-compliant
tools available from many vendors. Because XML is an independent
standard, tool sets can be upgraded or changed without fear of data
incompatibility. Furthermore, many of the mainstream and "low-end" tools
are becoming XML compliant in response to market demand for
support of these formats. Such software includes WordPerfect,
FrameMaker+XML and Ventura Publisher, among others. Support for XML is already available to some
degree in most of the Office 2000 products. It is supported extensively
in Internet Explorer 5 and above, as well as in recent versions of Netscape. What's more,
any text editor that supports Unicode can be used to
view/edit XML. And the XSL (eXtensible Stylesheet Language) standard
will allow you to publish XML material to paper or a website using publicly
available software.
- Endurance
- Appearance-based text representations are constantly changing --
making conversion costly when migrating from one software package to
another or even when upgrading an existing software package. There is
also potential for data loss when performing such conversions. XML,
however, is a "permanent" representation. Even as the standard evolves,
there is no problem upgrading data. If the DTD is carefully selected or
designed, a conversion to XML will be the last conversion you'll ever
need. In a budget-sensitive environment, this is a very important
benefit.
- Repurpose
data for different publication media - With XML, formatting is done
on a "just in time" basis. As noted, tags identify content, not
appearance. Appearance decisions are therefore left until documents are
actually published, which means they can easily be modified based on the
publication platform. This is a big advantage because what looks good on
paper won't look good on screen and vice-versa. XML makes it easy to
develop different stylesheets based on the needs of individual
publications. The stylesheets map the tags to a set of formatting
directives. Thus the same document can easily be published to paper and
to the web -- and be customized for each rendition -- simply by
customizing stylesheets. When publishing to paper, <title>
can be rendered as Times-Roman, 12 point bold. On the web titles might
look better in a more web-friendly typeface, like Verdana, in a larger
size. They would simply be defined that way in the web stylesheet,
without the need to change the document at all. Because XML data is
well-fielded ... (continued on next page)
Click to next page
>>>
Read more
XML
articles at DCL
Library
|
|
FREE Tech
Newsletter! Subscribe to
DCLnews for the latest tech, XML/XML, and e-Publishing news.
Plus top stories, reports, and interviews. Click here to
subscribe. |
Return to
top
|
|
|
|
|
CIDM Best Practices Conference September 13–15, 2010 Hampton, Virginia
Vasont Users' Group Meeting September 27–30, 2010 Hershey, Pennsylvania
Internet Librarian Conference October 25–27, 2010 Monterey, California
Journal Article Tag Suite Conference (JATS-Con) November 1–2, 2010 Bethesda, Maryland
SPARC Digital Repositories Meeting November 8–9, 2010 Baltimore, Maryland
More Events »
|
|
|
|
 |
|
|