Data Conversion Laboratory, Revolutionizing Publishing for the Digital Age 
  DCLab.com | About DCL | Tech Info | Press Info | Contact Us | DCLNews | Partners | Wiki | Client Area     
menu
Data Conversion Lab

About DCL
  Why go to DCL?
  Clients
  Company Background
  Management
  DCL in the News
  Events
  Mission

DCL News
  Current Issue
  Back Issues
  Subscribe

Technology
  Technology Resources
  FAQ's
  Glossary
  Presentations
  DCL Work Tracking

Press Info

Clients' Area

Contact DCL
  Directions
  Request Estimate
  Positions

Books2Bytes
Popular Pages
* Current Issue of DCLnews
* DCL featured in The Columbia Guide to Digital Publishing
* Slash Document Costs
* Ann Rockley on ROI in CM
* PDF Resources
* XML Conversion Resources
* Roundtrip Document Conversion
* DCL Resources Library
*

Converting Legacy Data...

*

Aviation & Aerospace

*

PDF Conversion to XML & MS-Word

*

PDF Conversion

*

Quark to XML

* Getting Content into XML
Fact Sheets
* Public Access for Research Materials
* S1000D Conversion
* Content Reuse Assessment
* Document Conversion
* SPL - Pharmaceutical Industry
* Harmonizer™
* Jeppesen Map Revision Service
Technical Papers
* Why STM Publishers Should Use XML...
* Department of Defense and the Power of XML
* Your Data in XML
* SGML to SGML 1
* SGML to SGML 2
* Quark to XML
* Plan Ahead
* Do it Yourself?
* Encyclopedia
Presentations
* Conversion to XML: Documents versus Data (11/2003)
* Data Migration Considerations  (6/2003)
* Technology for Cost-Containment and Efficiency  (4/2003)
* Converting Textbooks to Meet the National XML Standard for Accessibility  (3/2003)
* More Presentations

WHITE PAPER

"Department of Defense and the Power of XML"

50-million pages of U.S. military technical data is a powerful resource to help with the defense of this nation. It would be even more powerful if those 50-million pages were converted into a single source data format like XML or SGML, argues David Skurnik (pictured), VP of sales at Data Conversion Laboratory

CONTENTS:

CURRENTLY, US Military Technical Data (Technical Manuals and Technical Orders) is available in a multitude of electronic formats or, in many instances, only on paper. Integrating these formats into a single data format would involve the conversion of approximately 50 Million Pages. However, having a single data format would drastically increase efficiency and inter-service communication.

This paper discusses the reasons why the US Department of Defense (DOD) should pursue the conversion of their Technical Data into an SGML and XML format. Specifically, how having Technical Data in these formats would directly contribute to the DOD's mission of Readiness and Safety. 

Definition and History of SGML, HTML and XML
SGML (Standard Generalized Markup Language) was developed in the 1980's as a non-proprietary, platform independent method of describing the structure of a document rather than its appearance. An early adopter of SGML was the Military's Computer Aided Logistics Support (CALS) Program.

What does it mean to tag a document based on structure?


"Although SGML/XML data is not the complete solution, it is the foundation on which you can utilize tools such as a CMS, IETMs, and create an industrial strength publishing system."


In a resume, for example, you have the applicant name, address, telephone number, job history, etc. A person looking at the resume will be able to distinguish between the different types of information from the appearance of the data (the applicant's name may be bolded, centered and have a larger typeface than the rest of the resume.)  But a resume containing SGML mark-up will have an "applicant name" tag surrounding the applicant name. This could look like this: <applicant name> John Smith </applicant name>. Thus, a marked-up SGML document will contain the contents of the document with the associated tags identifying the different structures within the document.

Prior to applying tags to a document, you have to define some basic rules determining:

  1. What structures within the document are to be tagged?  In our resume example, you would need to determine whether you will be separately tagging the first name, last name and title of the applicant or whether you will only use one tag for the entire applicant name. The decision will be based on what type of intelligence you will want to extract from the data.
  2. What the tag names will be called? Using our previous example, you may wish to use "applicant name" as the tag to describe the applicants name or use "appnam".
  3. The order of when and where these structures can be found within the document. In our resume example, you would want to ensure that the applicant name always precedes the applicant address. You may decide to place the education section before the job experience section, or decide it should follow job experience.

These rules comprise a document called a Document Type Definition (DTD). Before any conversion, the DTD has to be developed to give guidance on the basic rules of the conversion.

In the early days, the biggest issues against implementing an SGML solution were that it was complex and that there were not many tools on the market to support it.

In the infancy of the Internet, a universal DTD for tagging documents designed to be viewed on the Internet was developed. This DTD came to be known as Hypertext Markup Language (HTML). Since HTML was focused on presentation and not on structure, the HTML tag set was very limited, and was therefore much easier to implement.

But its advantage of being simple was its biggest drawback since HTML's ability to do complex searching, linking and document maintenance was very limited.

The challenge was to find a way of marking up documents that was not as complex as SGML but was more powerful than HTML. The solution was XML. XML is an acronym for eXtensible Markup Language and is a data format that is a derivative of SGML. Since its introduction on to the market, many corporations and organizations like IBM, Microsoft and General Electric have been converting their documentation to XML and XML has become the de-facto standard for data transfer.

How Can SGML and XML Assist in Improving Readiness and Safety at the DOD?
At this juncture, the DOD is at a crossroads. There have been many SGML DTD's developed within the DOD and a small percentage of the Technical manuals have been converted to SGML. Although it was an uphill battle, there has been a growing acceptance that SGML is essential to the DOD.

A new issue that is surfacing within the DOD is this: should they convert to SGML or XML or a mixture of both? The answer may not be obvious. But before we try to deal with this question, let's understand the benefits of SGML and XML.

It is important to understand that the benefits discussed in this section will not immediately be realized after the data has been converted to SGML/XML. Additional tools have to be put in place to take advantage of the intelligence contained in the data. SGML and XML are the foundation that will enable you to ultimately gain the functionality described in this section. Therefore, this paper will also discuss some of the tools necessary for exploiting the power of SGML and XML data that will ultimately result in the benefits contributing to overall Readiness and Safety.

The two general areas where SGML and XML based data can benefit the DOD mission of Readiness and Safety are Document Creation and Maintenance, and Weapon System Maintenance.

Document Creation and Maintenance: This refers to the task of creating, maintaining and modifying documents.

Weapon System Maintenance: This refers to the information needed to maintain a weapon system.

>>> Document Creation and Maintenance
The main tool necessary for achieving the functionality discussed in this section is an XML based Content Management System (CMS). As its name suggests, the CMS is designed to manage the content contained within it. The basic features of an SGML and XML based CMS are:

  1. It identifies the original author of the document and grants permission to select individuals that may be required to edit the document.
  2. It tracks all the changes ever made to the document, indentifying who made the changes and when the changes were made.
  3. The CMS doesn't store whole documents, it stores pieces or "chunks" of content. These chunks are then assembled by the CMS into a single document when the entire document is required. The level of granularity of the chunks is determined by the level of tagging that was done to the data. In our resume example, if the applicant was tagged only as <applicant name>, the first name, last name and title will be represented and stored in the CMS as one chunk. If the applicant name is tagged as <Title>, <First Name>, <Last Name>, then all 3 pieces of information will be stored as separate chunks.
  4. All similar chunks of data are represented only once even though they may appear in multiple documents.
  5. It stores all the information regarding who requested what data chunks and when they were sent.
  6. It allows new documents to be created from existing chunks stored within the CMS.

If we apply these capabilities to the Technical Manual Maintenance Environment, the following advantages come into play:

  1. Manuals are always current - in Military Maintenance Documentation, it is very common for similar pieces of information to be replicated across many manuals. For example, there are many manuals that contain the torque level guidelines for tightening an aircraft engine bolt. If a maintainer notices cracks in the underbelly of an aging engine, he might determine that it was caused because the torque level of the bolt is too great for the aging aircraft. He would then have to find all the manuals that contain torque level guidelines for engine bolts and modify the torque level guidelines contained in each of the manuals. What usually occurs is that due to time constraints and lack of information, (since there is no way of tracking which manuals contain that similar torque level), the change is made only to the manual that the maintainer was using when the problem was discovered. This results in additional cracks forming, possibly resulting in breakage.

    With an SGML or XML based CMS, the engine bolt torque level will be stored as a chunk of information only once within the CMS. Therefore changing the torque level in the CMS will result in an automatic modification to all manuals containing that similar chunk - making them more current.

    Here are some other reasons why an SGML or XML based CMS makes manuals more current:

    A) The volume of changes are drastically reduced allowing for quicker completion of the required changes. Also, depending on the volume of changes, the amount of personnel required to implement the changes can be reduced.

    B) Whenever a manual is viewed using the CMS, the manual is rebuilt from the latest versions of the chunks. This ensures that only the most current version of the manual will be used. The system can even be configured to notify all the potential users of a manual when a change to the manual was made.

Current manuals result in more accurate maintenance instructions, thus reducing part breakage and increasing the probability that a maintenance procedure will be correctly and successfully completed. This will result in an increased capacity to perform maintenance of parts that need repair or replacement, thus increasing Readiness and Safety. Also, there would be less of a need for new parts, which would reduce the overall cost of parts.

  1. Reduce cost and time needed to produce new manuals - since a majority of new manuals contain a very significant percentage of verbiage from existing manuals, the time and cost of producing a new manual is greatly reduced. Prior to authoring a new manual, the author would view all the applicable chunks of information contained in the CMS and select the chunks applicable for the new manual. Only new information not contained within the CMS will have to be authored.
  1. Increased document security - you can control user access to documents at the chunk level. That way, only select personnel will be given access to view certain chunks, while others will not be shown those chunks even though they are viewing the same document. This would greatly reduce the need for "document scrubbing" and greatly increase document security.
  1. Multiple views of the same document - you can compose the document using any order of chunks. Therefore, you have the capability to produce different versions of the same document depending on the intended use of the user. For example, although a maintainer and a pilot would need to view the same manual, you would ideally want the manual to be structured differently and to possibly highlight different pieces of information to suit the differing end-user requirements.
  1. Bridging technical and training manuals - if the training manuals are also converted to SGML and XML and housed in the same CMS, then the chunks from training manuals can be used to enhance existing technical manuals and create new technical manuals. Also, chunks from the technical manuals can be used to create and enhance existing training manuals. This would improve the transition of maintainers from the classroom to the field, since they would be viewing familiar data. This would increase the effectiveness of new maintainers by reducing their learning curve.
  1. Regaining control of the manuals - although not related to a CMS, a prevalent issue facing the DOD is how to regain control of documentation authored and controlled by vendors. Many times the vendor is not meeting the performance criteria and the authoring of the documentation has to be brought in-house or transferred to another vendor. Also, the Weapon System may be old and some of the suppliers of the parts may be out of business. Documents that are authored by an SGML or XML based publishing system, or have been converted to SGML or XML, should be readily transferable to another location provided that they used a Military Standard DTD.
  1. Advanced search capability - although not related to a CMS, you can perform very sophisticated searches of SGML or XML documents. Continuing our example, the maintainer has the ability to search for all the instances of the phrase "torque level" contained in a Warning. This ensures that even within the manual being used by the maintainer, all the significant instances depicting torque levels are fixed.
  1. Separation of content & styling - XML disassociates content from styling. This enables personnel working with document content to solely concentrate on content while disregarding document styling.

>>> Weapon System Maintenance

  1. Media independent - a critical factor for consideration is the ease of accessing the maintenance information via as many different mediums as possible. This is necessary because of the diverse environments in which maintainers have to operate.  The maintainer may be in a maintenance hanger on a base, on the flight line of an aircraft carrier in the middle of an ocean, in the middle of a desert, or on a jeep in the middle of a war zone. The common thread is that the maintenance instructions must be accessible in a medium optimized for the maintainer's specific environment. Therefore, it is absolutely necessary for the information to be available on paper, viewable on computers using an intranet or extranet, CD-ROMs, and portable devices. SGML and XML are platform independent, thus, with the addition of publishing or "rendering" software tools, the data could be made available on any medium and could provide custom appearances for each, ensuring maximum readability on all media.

  2. Interactive Electronic Technical Manuals (IETMs) - for a maintenance procedure to be successful, the problem has to be correctly identified, the correct part has to be ordered, and the instructions on how to fix the problem have to be easy to follow. This is the idea behind an IETM. Depending on the level of functionality, the IETM can diagnose the problem, order the part, and give "step by step" instructions on how to fix the part. IETMs are most effective when the source data is XML or SGML. The DOD can realize the following benefits when employing IETMs:

    1. The automated diagnostic capability would drastically reduce the time needed to diagnose a problem and drastically reduce the likelihood of the wrong part being replaced. This would increase the readiness and safety of weapons systems and decrease the cost incurred by unnecessarily replacing parts.
    2. The automated ordering of parts would dramatically reduce the errors resulting from manually entered part numbers. This would decrease the instances where the wrong part is delivered and thus increase readiness.
    3. The automated maintenance instructions would decrease the time spent repairing or replacing the part and increase the likelihood that the maintenance procedure is correctly performed, thus further increasing safety and readiness.
    4. The IETM can be enhanced to interface with back-end systems that can track parts failure history, weapon system down time, parts approval time, and parts delivery time, etc. This would result in the ability to stock the appropriate number of parts in specific locations, thus decreasing inventory costs and increasing readiness and safety.

  3. Enforcement of standards - as discussed earlier, one of the aspects of the DTD is to define the order of when and where the different elements in a document can appear. Therefore, even if a maintainer is viewing a manual for the first time, it will always be known where to expect the Warnings, Cautions and the other elements of the manual. This would greatly reduce the amount of time needed to comprehend the manual and reduce the possibility that a critical phase of the maintenance was omitted or incorrectly followed.

SGML vs XML
Since both SGML and XML can assist in realizing the DOD mission of Readiness and Safety, should the DOD convert the manuals to SGML or XML?

The favored approach seems to be to do both. The reason is that each format has advantages. Although it is easier to produce paper manuals from SGML than XML, commercial browsers for viewing on the Web support XML and not SGML. Also, SGML is more robust than XML.

Ideally, the document repository should be in SGML format and one of the derivative formats should be XML. The beauty of SGML is that producing XML from SGML can be an automated process; so with the "touch of a button", the XML will be produced. This is similar to building a PDF rendering engine, which can automatically produce PDF from SGML. 

Therefore, the ultimate Publishing Environment will have its data produced and archived in SGML, with automatic outputs to XML, PDF, IETM and paper.

Summary
Although this sounds very complex, the actual conversion of the DOD's 50-million pages of technical data could take as little as three years. During this time the DTDs could be tweaked and the publishing environment built. Although SGML and XML data is not the complete solution, it is the foundation on which you can utilize tools such as a CMS, IETMs, and create an industrial strength publishing system. Once a complete SGML and XML based architecture is implemented, the DOD would have numerous capabilities that contribute to overall Safety and Readiness, while reducing overall Technical Manual production and maintenance costs.

5/9/2002
David Skurnik
E-mail: dskurnik@dclab.com

Read more...Read more XML related articles at DCL Library

Return to top

  Structured Product Labeling

Content Reuse

Subscribe

Books2Bytes

DCL Library

Columbia Guide
GSA Schedule
AIA Member
DCL Calendar

Ultramain User Conference 2008, Albuquerque, NM, May 11-15, 2008. More…

PTC User Long Beach, CA, June 2-4, 2008. More…

Mark Logic User San Francisco, CA, June 10-12, 2008. More…

X-Pubs London, England, June 22-24, 2008. More…

Doc Train Life Sciences Indianapolis, IN, June 23-25, 2008. More…

Best Practices Santa Fe, NM, September 15-17, 2008. More…
XyUser Phoenix, AZ, September 22-24, 2008. More…
9th Annual Vasont Users' Group Meeting, Hershey, PA, October 6-8, 2008. More…

DITA/TECHCOMM 2008, Raleigh, NC, November 3-6 2008. More…

ATA e-Business Europe. Details TBA.

 
DCL Calendar

Documentation and Training West 2008 Vancouver, BC, May 6-9, 2008. More…

 
Recent News

CMS/DITA Santa Clara, CA, April 7-9, 2008. More…

DIA Med Comm Orlando, FL, March 10-11, 2008. More…

DIA EDM Philadelphia, PA, February 5-7, 2008. More…

Gilbane Boston Conference Boston, MA, November 29, 2007. More…

The LavaCon Conference on Advanced Technical Communication and Project Management New Orleans, LA, October 27-30, 2007. More…

2007 ATA e-Business Forum Miami, Florida, Oct 17-19, 2007. More…

DITA 2007™-East, Raleigh, North Carolina, October 4-6, 2007. More…

2007 XyUser Group Fall Conference, Boston, MA, Sept 23-26, 2007. More…

Mark Logic 2007 User Conference, San Francisco, CA, May 15-17, 2007. More…

Content Management Strategies/DITA North America Conference 2007, Boston, MA, March 26-28, 2007. More…

DIA 18th Annual Workshop, San Diego, CA. March 4-7, 2007. More…

DIA 2007 EDM & CDM Conference, Philadelphia, PA, Feb 6 - 8, 2007. More…

DITA 2007 – West, San Jose, CA, February 5-7, 2007. More…

Framemaker 2006 Chautauqua, Austin, TX, Nov 8-10, 2006. More…

PTC/User World Event 2006, Grapevine, TX, June 4-6. More…

19th Annual DIA Conference Philadelphia, PA, February 7-9. More…

XyUser's Conference, San Diego, California, September 11-14. DCL's Don Bridges delivered a presentation on "Content Reuse" More…

Structured Product Labeling, Washington, DC, August 23-24. More…

Tri-XML 2005, Raleigh, NC , July 28. DCL's Don Bridges delivered a presentation on "Content Reuse" More…

Pharmaceutical Labeling and Product Identification, Whippany, NJ, June 16-17. DCL's Don Bridges delivered a presentation on "Structured Product Labeling (SPL) and the Implications of Implementing an XML Solution." More…

More…

Data Conversion Laboratory, Inc.   61-18 190th St., 2nd Floor, Fresh Meadows, NY 11365   718-357-8700   convert@dclab.com

Copyright © 1997-2008  Data Conversion Laboratory, Inc. All rights reserved.