Data Conversion Laboratory, Revolutionizing Publishing for the Digital Age 
  DCLab.com | About DCL | Tech Info | Press Info | Contact Us | DCLNews | Partners | Wiki | Client Area     
menu
Data Conversion Lab

About DCL
  Why go to DCL?
  Clients
  Company Background
  Management
  DCL in the News
  Events
  Holiday Calendar
  Mission

DCL News
  Current Issue
  Back Issues
  Subscribe

Technology
  Technology Resources
  FAQ's
  Glossary
  Presentations
  DCL Work Tracking

Press Info

Clients' Area

Contact DCL
  Directions
  Request Estimate
  Positions

Books2Bytes
Popular Pages
* Current Issue of DCLnews
* DCL featured in The Columbia Guide to Digital Publishing
* Slash Document Costs
* Ann Rockley on ROI in CM
* PDF Resources
* XML Conversion Resources
* Roundtrip Document Conversion
* DCL Resources Library
*

Converting Legacy Data...

*

Aviation & Aerospace

*

PDF Conversion to XML & MS-Word

*

PDF Conversion

*

Quark to XML

* Getting Content into XML
Fact Sheets
* Public Access for Research Materials
* S1000D Conversion
* Content Reuse Assessment
* Document Conversion
* SPL - Pharmaceutical Industry
* Harmonizer™
* Jeppesen Map Revision Service
Technical Papers
* Why STM Publishers Should Use XML...
* Department of Defense and the Power of XML
* Your Data in XML
* SGML to SGML 1
* SGML to SGML 2
* Quark to XML
* Plan Ahead
* Do it Yourself?
* Encyclopedia
Presentations
* Conversion to XML: Documents versus Data (11/2003)
* Data Migration Considerations  (6/2003)
* Technology for Cost-Containment and Efficiency  (4/2003)
* Converting Textbooks to Meet the National XML Standard for Accessibility  (3/2003)
* More Presentations
National Library of Medicine

Making Medical Information Available On-Line


ABSTRACT: In the medical industry, the accuracy of data is a matter of life and death. Therefore, accurate data conversion is a necessity whenever a new system is implemented. This article describes the conversion process of the National Library of Medicine as they moved their materials into an on-line system, and will be of interest to anyone who is concerned with the accuracy and quality of data.

The New Library

There's a maxim in the conversion business: "Information is an asset." Nowhere is this more true than in the medical industry, where information can mean the difference between life and death. In such a milieu, the importance of the National Library of Medicine (NLM) can hardly be exaggerated. It is, after all, the largest medical research library in the world. In fact, it's the world's largest research library in a single scientific and professional field, with a collection of 5 million items books, journals, technical reports, manuscripts, microfilms, and pictorial materials.

But what does it mean to be a library today? New information technology is expanding the possibilities of how much information can be stored and how it can be disseminated. In the field of health care, new opportunities mean new obligations. To quote the Hippocratic Oath, "into whatsoever house you shall enter, it shall be for the good of the sick to the utmost of your power." High technology has expanded "utmost" to new levels and has made it possible to enter more houses than ever before without even getting out of one's chair.

NLM's Electronic Resource

The NLM's latest response to this challenge is HSTAT (Health Services/Technology Assessment Text), an electronic resource that includes the full text of clinical practice guidelines, quick-reference guides for clinicians, and consumer brochures. The materials were provided by the Agency for Health Care Policy and Research (AHCPR), National Institutes of Health (NIH) consensus development conference and technology assessment reports, and the U.S. Preventive Services Task Force Guide to Clinical Preventive Services (1989 edition).

HSTAT is part of an initiative called the Health Services Research Information Program coordinated by NLM's National Information Center on Health Services Research and Health Care Technology (NICHSR). The actual development of HSTAT was left to the Information Technology Branch of the Lister Hill Center, also part of NLM. It was the Lister Hill Center that called DCL.

The Importance of Access

"HSTAT can be accessed several different ways," explains Maureen Prettyman at the Center. "In fact, it's currently in three different databases. Users can do full-text search and retrieval on character-based terminals, they can download over the Internet with gopher or ftp, and then there's the World Wide Web. But when we started, all we had were WordPerfect documents, ASCII, and the books themselves. We knew we needed to go to SGML."

SGML tagging would provide the cues needed for search engines and could readily be converted to the SGML-based HTML, the accepted format for World Wide Web access. But after developing a Document Type Definition (DTD) to define the structural rules for the SGML documents, hundreds of pages of material had to be converted from the first set of books, and then there would be more sets to follow.

Don't Try This At Home

Norman Barth, DCL Project Manager for this conversion, talked about the difficulty of such a conversion. "Some companies think they can save money by converting files in-house, but with a complex conversion like this one, a company will find itself draining more and more of its resources as the project continues. NLM came to us right away and we were able to give them a cost estimate that allowed them to make a realistic budget"

But why is the conversion so difficult? Norman continues, "In this conversion, we are adding information. There are no tags in the original material. Where does this information come from? Three places: appearance, context, and content. If all chapter titles are the same font size, then we can use appearance cues to tag chapters.

"But because SGML is so concerned with how a document is structured, context becomes important, too. A blank line in a sample form, for example, might be considered different than a blank line after a study question in the back of a chapter. The only information source we didn't use for the NLM job was content. In this case, it was more cost-effective to have their own people do that tagging, since they were subject-matter experts. Still, most of the tagging was accomplished by appearance and context clues only.

"Your original question was about difficulty. Let me just say that we've had to expand our development and editorial departments twice as we've increased the number of SGML conversions we do. Most of our editors are trained specifically for SGML. My advice: Don't try this at home!"

The Importance of Communication

DCL is able to offer any combination of manual and automated processes to most cost-effectively convert legacy documents. In this case, a manual approach was chosen. Jennifer Ruckdeschel was put in charge of the editing process.

"Maureen [Prettyman] sent the DTD and narrowed down what she wanted us to do. From that information, I came up with keying specs from outside editors, who tagged the WordPerfect files. When they came back, we parsed them and had in-house editors do the final clean up.

"Communication with the client was very good on this project. Whenever Maureen had a question or concern about our tagging, she didn't hesitate to call. Our priority is always to create a good feedback cycle with our clients. By sending them materials early and often, and then making them feel comfortable when they call, we stay in touch with what the clients want and the clients don't get any surprises."

Even though most, if not all, of DCL's employees have not taken the Hippocratic Oath, they did the utmost of their power for the NLM, which has already begun to make HSTAT information available. For more information on how you can access this information, please call the NICHSR at (301) 496-0176 or E-mail them at NICHSR@NLM.NIH.GOV

  Structured Product Labeling

Content Reuse

Subscribe

Books2Bytes

DCL Library

Columbia Guide
GSA Schedule
AIA Member
DCL Calendar

Best Practices Santa Fe, NM, September 15-17, 2008. More…
XyUser Phoenix, AZ, September 22-24, 2008. More…
9th Annual Vasont Users' Group Meeting, Hershey, PA, October 6-8, 2008. More…

DITA/TECHCOMM 2008, Raleigh, NC, November 3-6 2008. More…

ATA e-Business Europe. Details TBA.

 
Recent News

Doc Train Life Sciences Indianapolis, IN, June 23-25, 2008. More…

X-Pubs London, England, June 22-24, 2008. More…

Mark Logic User San Francisco, CA, June 10-12, 2008. More…

PTC User Long Beach, CA, June 2-4, 2008. More…

Ultramain User Conference 2008, Albuquerque, NM, May 11-15, 2008. More…

Documentation and Training West 2008 Vancouver, BC, May 6-9, 2008. More…

CMS/DITA Santa Clara, CA, April 7-9, 2008. More…

DIA Med Comm Orlando, FL, March 10-11, 2008. More…

DIA EDM Philadelphia, PA, February 5-7, 2008. More…

Gilbane Boston Conference Boston, MA, November 29, 2007. More…

The LavaCon Conference on Advanced Technical Communication and Project Management New Orleans, LA, October 27-30, 2007. More…

2007 ATA e-Business Forum Miami, Florida, Oct 17-19, 2007. More…

DITA 2007™-East, Raleigh, North Carolina, October 4-6, 2007. More…

2007 XyUser Group Fall Conference, Boston, MA, Sept 23-26, 2007. More…

Mark Logic 2007 User Conference, San Francisco, CA, May 15-17, 2007. More…

Content Management Strategies/DITA North America Conference 2007, Boston, MA, March 26-28, 2007. More…

DIA 18th Annual Workshop, San Diego, CA. March 4-7, 2007. More…

DIA 2007 EDM & CDM Conference, Philadelphia, PA, Feb 6 - 8, 2007. More…

DITA 2007 – West, San Jose, CA, February 5-7, 2007. More…

Framemaker 2006 Chautauqua, Austin, TX, Nov 8-10, 2006. More…

PTC/User World Event 2006, Grapevine, TX, June 4-6. More…

19th Annual DIA Conference Philadelphia, PA, February 7-9. More…

XyUser's Conference, San Diego, California, September 11-14. DCL's Don Bridges delivered a presentation on "Content Reuse" More…

Structured Product Labeling, Washington, DC, August 23-24. More…

Tri-XML 2005, Raleigh, NC , July 28. DCL's Don Bridges delivered a presentation on "Content Reuse" More…

Pharmaceutical Labeling and Product Identification, Whippany, NJ, June 16-17. DCL's Don Bridges delivered a presentation on "Structured Product Labeling (SPL) and the Implications of Implementing an XML Solution." More…

More…

Data Conversion Laboratory, Inc.   61-18 190th St., 2nd Floor, Fresh Meadows, NY 11365   718-357-8700   convert@dclab.com

Copyright © 1997-2008  Data Conversion Laboratory, Inc. All rights reserved.