|
|

Pharmaceutical Industry;
Why XML?
ABSTRACT:
The recent Drug Information Association conference focused on the future of
document management in the pharmaceutical industry. DCL President Mark Gross
provided an overview of data use issues, while comparing XML, an emerging industry
standard, to other technologies.
Traditionally,
distributing information has most commonly meant distributing page image
representations. Today, however, there is increasingly a need to re-purpose or
re-organize information that was originally intended for paper, so that it can
be utilized in many other ways. Features
that have become increasingly important to data utility are the ability to
display well on a computer screen and the ability to do full-text searching, both
inside documents and across document sets. Added
to this is the need to enforce consistent standards across data sets so that
information can be reused and re-purposed for multiple and repeated uses with
vendors, customers, and the world at large.
Several
alternative technologies to XML exist for electronic distribution.
In addition to the native formats that the software produces, there’s
also TIFF, PDF, SGML, and HTML. Native
formats are the files as they were originally developed for word processing or
publishing; these include typesetting formats and the desktop packages that
became popular with the advent of the PC for publishing.
For print purposes these work well; no new investment or training is
needed, and the systems are generally already in place. But the files are proprietary, and they are quite limited when
it comes to the enforcement of standards. And
furthermore, from an electronic distribution perspective (Internet, CD-ROM,
eCommerce), they offer virtually no re-purposing capabilities.
In today’s marketplace, that’s a problem.
TIFF
is an image format produced by scanning pages.
It provides an exact representation of the page, and is inexpensive to
produce. But files are large, and
the format’s not suitable for searching or re-organizing information.
PDF
is an almost exact representation of the original document, and it’s also
quite inexpensive to produce. But
it’s a proprietary format, it offers limited searching capability, and you
can’t edit or modify files. Added
to this, documents intended for print are difficult to read on a computer
screen.
SGML
is an international ISO standard for information representation. It’s robust
and adaptable to many applications, well established in many industries, and
allows for full content searching. It
also enforces standards and consistency. But
it’s hard to implement, it requires a significant investment, and professional
support staff. SGML tools aren’t suitable for casual users, and creating
unstructured documents and forms is difficult.
HTML
was designed for the web; 90% of information on the web today is housed in the
HTML format. It’s much easier to
use than SGML, it’s widely supported, and it’s automatically generated by
some software packages. But it’s an appearance-based language; tags relate to
appearance not content. Compounding
this problem, it has limited
formatting capabilities, it’s difficult to edit, and it’s truly a moving
standard.
XML
has all the advantages of SGML, but it’s much easier to get started.
You can start gradually and add functionality with experience. It has features that are important for the web and e-commerce
and it’s already become widely supported. While
people need training and tools aren’t yet fully developed, developers have
rushed to support the standard. XML
is typically used for large web applications, e-commerce, technical
documentation, catalogs, and living documents.
XML
separates text and tags – that is content from structure.
This is the key to its versatility, and the reason it’s taken off. Its uses are many. You
can reformat and restyle for different media, identify components, interchange
data, reuse parts, and maintain and output multiple versions of the same
document. “You’ve taken what
has traditionally been text used on paper and turned it into a bunch of data
elements you can use in various ways,” said Mr. Gross.
View
the Presentation Slideshow here
ABOUT DCL: Data Conversion
Laboratory is the leader in implementing complex data conversion solutions
for Web- and electronic-based publishers and organizations, B2B applications and
evolving new technologies. The company supports XML, SGML and all major
electronic formats, and since 1981, has extracted, reorganized and repurposed
data for a wide range of pharmaceutical clients including: Glaxo-Wellcome,
SmithCline-Beecham, Merck, Genentech, Allergan and Eli-Lilly.
The company is located at 61-18 190th St., 2nd Floor, in
Fresh Meadows, N.Y. and is privately owned. For more information about Data
Conversion Laboratory and its services, please visit the company website at www.dclab.com,
or call 1-718-357-8700.
|
|
|
|
|
CIDM Best Practices Conference September 13–15, 2010 Hampton, Virginia
Vasont Users' Group Meeting September 27–30, 2010 Hershey, Pennsylvania
Internet Librarian Conference October 25–27, 2010 Monterey, California
Journal Article Tag Suite Conference (JATS-Con) November 1–2, 2010 Bethesda, Maryland
SPARC Digital Repositories Meeting November 8–9, 2010 Baltimore, Maryland
More Events »
|
|
|
|
 |
|
|