By Naveh Greenberg, Director, US Defense Development, DCL
Legacy content throughout the US Department of Defense (DoD) exists in a variety of formats and structures. And the older the content, the less likely it is to conform to required standards such as S1000D or the Army's MIL-STD-40051. Quality Control (QC) becomes critical, but under the pressure of high volumes and rigorous standards, DoD agencies and offices can find themselves overwhelmed by the prospect of manual review of thousands of pages of content multiple times during a complex conversion process. Data Conversion Laboratory (DCL) recommends working with a vendor that not only understands the intricacies of DoD standards, but also has significant expertise in analyzing data and developing automated routines that reduce the need for manual quality control.
In fact, many government agencies implement data conversion initiatives where compliance to data standards is a critical requirement. But DoD projects tend to be more complex, due to the fact that the DoD applies standards in a more content-centric way than other government agencies. To address these structural requirements, DCL works closely with the project team to perform very detailed analysis of the data to be converted. Over decades of work across the defense sector, we have developed several tools that automate the processes to screen for duplicate content, map data to target schemas, extract data, apply business rules, and perform quality control at each stage of the conversion process.
DCL Harmonizer™ analyzes content to streamline conversion volumes
The intent of standards such as S1000D is to maximize content reuse throughout the organization. And maximizing reuse potential early in the project can significantly reduce the overall volume to be converted. Harmonizer™ looks for and flags content that appears in multiple places, and also checks for variations, or “near duplicates” that could mean typos, or even a similar version of content that applies to a different part number where applicability can be used. Harmonizer™ provides a comprehensive view of the entire document set and supports the team’s efforts to minimize the volume of data to be converted and/or translated, while estimating the potential for reuse.
Content-driven extraction scripts build content inventories into spreadsheets for analysis
Source data exists in a variety of formats, from PDF, to MS Word, to various SGML or XML formats. Based on the target formats and schema, and the results of the Harmonizer™ analysis, DCL prepares a conversion specification that governs the tagging rules of the conversion process. Some of items that are covered in the conversion specification are:
- The elements that were found in the document set that were used for analysis.
- Examples of where the elements can be found within the document.
- Concrete rules as to how these elements are identified.
- A sample of the tagging that will be used for that element.
- Any open items that are connected with this element.
A script is developed to extract all possible titles to a spreadsheet. The script can identify the data module or work package types (for example, maintenance/procedural or troubleshooting types in the Army’s MIL-STD-40051 standard or S1000D). They are highly customized to the particular needs of the project, and designed to extract the content to spreadsheets to make analysis much easier for the Subject Matter Experts (SMEs). With a reduced set of content to review, and a detailed map of the extracted content, the SME need to review only the items flagged as suspicious, instead of every line of content going through conversion. The result is a detailed mapping of the content before any content has been converted.
Automated QA software uses customized business rules to create data module mapping
Not so long ago, an army of proofreaders would be needed to check content to the detailed levels required to ensure accuracy of the content. Now, working closely with agencies’ SMEs and others on the content conversion project team, DCL develops highly customized business rules and integrates them into our automated QA software. Remember, any content conversion effort will have business rules that manage and control how data is mapped, extracted, converted, and quality-checked. These rules constitute the heart of the conversion specification. For example, for content that must comply with S1000D, DCL develops and implements a BREX checker, to validate that all rules applying to that project have been implemented.
The QA software also automatically checks tables for conformance to rules on structures, cell validity, and column declarations. It can look for typos throughout the data using a customized dictionary database. Other checks that are especially important to DoD standards include checking for noun/verb agreement inside steps, and cross-reference checks for correct key words, spelling and links to the correct location.
Upfront analysis and automation save time and improve accuracy
DCL works with civilian and DoD clients alike to address all of the unique needs that arise in a content digitization effort. We accomplish this by performing the detailed upfront analysis necessary to identify potential problems, maximize reuse potential, and automate as much of the project as is feasible. While the upfront analysis and implementation planning typically requires additional budget over what other conversion vendors quote, this analysis, and commitment to quality and automation throughout the effort, saves time and money over the life of the project and going forward.
Contact the DoD experts at DCL to learn how to plan for high-quality, extremely accurate digitization of your legacy content.
Naveh Greenberg is a Project Manager and is the Director for US Defense Development at Data Conversion Laboratory.