|
|
WHITE
PAPER
"Department
of Defense and the Power of XML"
50-million
pages of U.S. military technical data is a powerful resource to help
with the defense of this nation. It
would be even more powerful if those 50-million pages were converted
into a single source data format like XML or SGML, argues David Skurnik
(pictured), VP of sales at Data Conversion Laboratory
CURRENTLY,
US Military Technical Data (Technical Manuals and Technical Orders)
is available in a multitude of electronic formats or, in many instances,
only on paper. Integrating these formats into a single data format would
involve the conversion of approximately 50 Million Pages. However, having
a single data format would drastically increase efficiency and inter-service
communication.
This
paper discusses the reasons why the US Department of Defense (DOD) should
pursue the conversion of their Technical Data into an SGML and XML format.
Specifically, how having Technical Data in these formats would directly
contribute to the DOD's mission of Readiness and Safety.
Definition
and History of SGML, HTML and XML
SGML
(Standard Generalized Markup Language) was developed in the 1980's as
a non-proprietary, platform independent method of describing the structure
of a document rather than its appearance. An early adopter of SGML was
the Military's Computer Aided Logistics Support (CALS) Program.
What
does it mean to tag a document based on structure?
"Although
SGML/XML data is not the complete solution, it is the foundation
on which you can utilize tools such as a CMS, IETMs, and create
an industrial strength publishing system."
|
In a resume,
for example, you have the applicant name, address, telephone number,
job history, etc. A person looking at the resume will be able to distinguish
between the different types of information from the appearance of the
data (the applicant's name may be bolded, centered and have a larger
typeface than the rest of the resume.) But a resume containing SGML
mark-up will have an "applicant name" tag surrounding the
applicant name. This could look like this: <applicant name> John
Smith </applicant name>. Thus, a marked-up SGML document will
contain the contents of the document with the associated tags identifying
the different structures within the document.
Prior
to applying tags to a document, you have to define some basic rules
determining:
- What
structures within the document are to be tagged? In our resume
example, you would need to determine whether you will be separately
tagging the first name, last name and title of the applicant or whether
you will only use one tag for the entire applicant name. The decision
will be based on what type of intelligence you will want to extract
from the data.
- What
the tag names will be called? Using our previous example, you
may wish to use "applicant name" as the tag to describe
the applicants name or use "appnam".
- The
order of when and where these structures can be found within the document.
In our resume example, you would want to ensure that the applicant
name always precedes the applicant address. You may decide to place
the education section before the job experience section, or decide
it should follow job experience.
These
rules comprise a document called a Document Type Definition (DTD). Before
any conversion, the DTD has to be developed to give guidance on the
basic rules of the conversion.
In
the early days, the biggest issues against implementing an SGML solution
were that it was complex and that there were not many tools on the market
to support it.
In
the infancy of the Internet, a universal DTD for tagging documents designed
to be viewed on the Internet was developed. This DTD came to be known
as Hypertext Markup Language (HTML). Since HTML was focused on presentation
and not on structure, the HTML tag set was very limited, and was therefore
much easier to implement.
But
its advantage of being simple was its biggest drawback since HTML's
ability to do complex searching, linking and document maintenance was
very limited.
The
challenge was to find a way of marking up documents that was not as
complex as SGML but was more powerful than HTML. The solution was XML.
XML is an acronym for eXtensible Markup Language and is a data format
that is a derivative of SGML. Since its introduction on to the market,
many corporations and organizations like IBM, Microsoft and General
Electric have been converting their documentation to XML and XML has
become the de-facto standard for data transfer.
How
Can SGML and XML Assist in Improving Readiness and Safety at the DOD?
At this juncture, the DOD is at a crossroads. There have been many SGML
DTD's developed within the DOD and a small percentage of the Technical
manuals have been converted to SGML. Although it was an uphill battle,
there has been a growing acceptance that SGML is essential to the DOD.
A
new issue that is surfacing within the DOD is this: should they convert
to SGML or XML or a mixture of both? The answer may not be obvious.
But before we try to deal with this question, let's understand the benefits
of SGML and XML.
It is important to understand that the benefits discussed in this
section will not immediately be realized after the data has been converted
to SGML/XML. Additional tools have to be put in place to take advantage
of the intelligence contained in the data. SGML and XML are the foundation
that will enable you to ultimately gain the functionality described
in this section. Therefore, this paper will also discuss some of the
tools necessary for exploiting the power of SGML and XML data that will
ultimately result in the benefits contributing to overall Readiness
and Safety.
The
two general areas where SGML and XML based data can benefit the DOD
mission of Readiness and Safety are Document Creation and Maintenance,
and Weapon System Maintenance.
| Document
Creation and Maintenance: This refers to the task of creating,
maintaining and modifying documents.
Weapon
System Maintenance: This refers to the information needed
to maintain a weapon system.
|
>>>
Document Creation and Maintenance
The main tool necessary for achieving the functionality discussed
in this section is an XML based Content Management System (CMS). As
its name suggests, the CMS is designed to manage the content contained
within it. The basic features of an SGML and XML based CMS are:
- It identifies the original author of the document and grants permission
to select individuals that may be required to edit the document.
- It tracks all the changes ever made to the document, indentifying
who made the changes and when the changes were made.
- The CMS doesn't store whole documents, it stores pieces or "chunks"
of content. These chunks are then assembled by the CMS into a single
document when the entire document is required. The level of granularity
of the chunks is determined by the level of tagging that was done
to the data. In our resume example, if the applicant was tagged only
as <applicant name>, the first name, last name and title will
be represented and stored in the CMS as one chunk. If the applicant
name is tagged as <Title>, <First Name>, <Last Name>,
then all 3 pieces of information will be stored as separate chunks.
- All similar chunks of data are represented only once even though
they may appear in multiple documents.
- It stores all the information regarding who requested what data
chunks and when they were sent.
- It allows new documents to be created from existing chunks stored
within the CMS.
If
we apply these capabilities to the Technical Manual Maintenance Environment,
the following advantages come into play:
- Manuals
are always current - in Military Maintenance Documentation, it
is very common for similar pieces of information to be replicated
across many manuals. For example, there are many manuals that contain
the torque level guidelines for tightening an aircraft engine bolt.
If a maintainer notices cracks in the underbelly of an aging engine,
he might determine that it was caused because the torque level of
the bolt is too great for the aging aircraft. He would then have to
find all the manuals that contain torque level guidelines for engine
bolts and modify the torque level guidelines contained in each of
the manuals. What usually occurs is that due to time constraints and
lack of information, (since there is no way of tracking which manuals
contain that similar torque level), the change is made only to the
manual that the maintainer was using when the problem was discovered.
This results in additional cracks forming, possibly resulting in breakage.
With
an SGML or XML based CMS, the engine bolt torque level will be stored
as a chunk of information only once within the CMS. Therefore changing
the torque level in the CMS will result in an automatic modification
to all manuals containing that similar chunk - making them more
current.
Here
are some other reasons why an SGML or XML based CMS makes manuals
more current:
A)
The volume of changes are drastically reduced allowing for quicker
completion of the required changes. Also, depending on the volume
of changes, the amount of personnel required to implement the
changes can be reduced.
B) Whenever a manual is viewed using the CMS, the manual is rebuilt
from the latest versions of the chunks. This ensures that only
the most current version of the manual will be used. The system
can even be configured to notify all the potential users of a
manual when a change to the manual was made.
Current
manuals result in more accurate maintenance instructions, thus reducing
part breakage and increasing the probability that a maintenance procedure
will be correctly and successfully completed. This will result in
an increased capacity to perform maintenance of parts that need repair
or replacement, thus increasing Readiness and Safety. Also, there
would be less of a need for new parts, which would reduce the overall
cost of parts.
- Reduce cost and time needed to produce new manuals - since
a majority of new manuals contain a very significant percentage of
verbiage from existing manuals, the time and cost of producing a new
manual is greatly reduced. Prior to authoring a new manual, the author
would view all the applicable chunks of information contained in the
CMS and select the chunks applicable for the new manual. Only new
information not contained within the CMS will have to be authored.
- Increased document security - you can control user access
to documents at the chunk level. That way, only select personnel will
be given access to view certain chunks, while others will not be shown
those chunks even though they are viewing the same document. This
would greatly reduce the need for "document scrubbing" and
greatly increase document security.
- Multiple views of the same document - you can compose the
document using any order of chunks. Therefore, you have the capability
to produce different versions of the same document depending on the
intended use of the user. For example, although a maintainer and a
pilot would need to view the same manual, you would ideally want the
manual to be structured differently and to possibly highlight different
pieces of information to suit the differing end-user requirements.
- Bridging technical and training manuals - if the training
manuals are also converted to SGML and XML and housed in the same
CMS, then the chunks from training manuals can be used to enhance
existing technical manuals and create new technical manuals. Also,
chunks from the technical manuals can be used to create and enhance
existing training manuals. This would improve the transition of maintainers
from the classroom to the field, since they would be viewing familiar
data. This would increase the effectiveness of new maintainers by
reducing their learning curve.
- Regaining control of the manuals - although not related
to a CMS, a prevalent issue facing the DOD is how to regain control
of documentation authored and controlled by vendors. Many times the
vendor is not meeting the performance criteria and the authoring of
the documentation has to be brought in-house or transferred to another
vendor. Also, the Weapon System may be old and some of the suppliers
of the parts may be out of business. Documents that are authored by
an SGML or XML based publishing system, or have been converted to
SGML or XML, should be readily transferable to another location provided
that they used a Military Standard DTD.
- Advanced search capability - although not related to a
CMS, you can perform very sophisticated searches of SGML or XML documents.
Continuing our example, the maintainer has the ability to search for
all the instances of the phrase "torque level" contained
in a Warning. This ensures that even within the manual being used
by the maintainer, all the significant instances depicting torque
levels are fixed.
- Separation of content & styling - XML disassociates content from styling. This enables personnel working with document content to solely concentrate on content while disregarding document styling.
>>>
Weapon System Maintenance
- Media
independent - a critical factor for consideration is the ease
of accessing the maintenance information via as many different mediums
as possible. This is necessary because of the diverse environments
in which maintainers have to operate. The maintainer may be in a
maintenance hanger on a base, on the flight line of an aircraft carrier
in the middle of an ocean, in the middle of a desert, or on a jeep
in the middle of a war zone. The common thread is that the maintenance
instructions must be accessible in a medium optimized for the maintainer's
specific environment. Therefore, it is absolutely necessary for the
information to be available on paper, viewable on computers using
an intranet or extranet, CD-ROMs, and portable devices. SGML and XML
are platform independent, thus, with the addition of publishing or
"rendering" software tools, the data could be made available
on any medium and could provide custom appearances for each, ensuring
maximum readability on all media.
- Interactive Electronic Technical Manuals (IETMs) - for
a maintenance procedure to be successful, the problem has to be correctly
identified, the correct part has to be ordered, and the instructions
on how to fix the problem have to be easy to follow. This is the idea
behind an IETM. Depending on the level of functionality, the IETM
can diagnose the problem, order the part, and give "step by step"
instructions on how to fix the part. IETMs are most effective when
the source data is XML or SGML. The DOD can realize the following
benefits when employing IETMs:
- The automated diagnostic capability would drastically reduce
the time needed to diagnose a problem and drastically reduce the
likelihood of the wrong part being replaced. This would increase
the readiness and safety of weapons systems and decrease the cost
incurred by unnecessarily replacing parts.
- The automated ordering of parts would dramatically reduce the
errors resulting from manually entered part numbers. This would
decrease the instances where the wrong part is delivered and thus
increase readiness.
- The automated maintenance instructions would decrease the time
spent repairing or replacing the part and increase the likelihood
that the maintenance procedure is correctly performed, thus further
increasing safety and readiness.
- The IETM can be enhanced to interface with back-end systems that
can track parts failure history, weapon system down time, parts
approval time, and parts delivery time, etc. This would result in
the ability to stock the appropriate number of parts in specific
locations, thus decreasing inventory costs and increasing readiness
and safety.
- Enforcement of standards - as discussed earlier, one of
the aspects of the DTD is to define the order of when and where the
different elements in a document can appear. Therefore, even if a
maintainer is viewing a manual for the first time, it will always
be known where to expect the Warnings, Cautions and the other elements
of the manual. This would greatly reduce the amount of time needed
to comprehend the manual and reduce the possibility that a critical
phase of the maintenance was omitted or incorrectly followed.
SGML
vs XML
Since both SGML and XML can assist in realizing the DOD mission
of Readiness and Safety, should the DOD convert the manuals to SGML
or XML?
The
favored approach seems to be to do both. The reason is that each format
has advantages. Although it is easier to produce paper manuals from
SGML than XML, commercial browsers for viewing on the Web support XML
and not SGML. Also, SGML is more robust than XML.
Ideally,
the document repository should be in SGML format and one of the derivative
formats should be XML. The beauty of SGML is that producing XML from
SGML can be an automated process; so with the "touch of a button",
the XML will be produced. This is similar to building a PDF rendering
engine, which can automatically produce PDF from SGML.
Therefore,
the ultimate Publishing Environment will have its data produced and
archived in SGML, with automatic outputs to XML, PDF, IETM and paper.
Summary
Although
this sounds very complex, the actual conversion of the DOD's 50-million
pages of technical data could take as little as three years.
During this time the DTDs could be tweaked and the publishing environment
built. Although SGML and XML data is not the complete solution, it is the
foundation on which you can utilize tools such as a CMS, IETMs, and
create an industrial strength publishing system. Once a complete SGML
and XML based architecture is implemented, the DOD would have numerous
capabilities that contribute to overall Safety and Readiness, while
reducing overall Technical Manual production and maintenance costs.
5/9/2002
David
Skurnik
E-mail:
dskurnik@dclab.com
Read
more XML related articles at DCL
Library
Return
to top
|
|
|
|