Data Conversion Laboratory Logo
Getting it Right ...Every Time
Share on Facebook!Share on Twitter!Share on LinkedIn!Share on Facebook!
Data Conversion Laboratory LogoHome » Source & Target Formats

Source & Target Formats

Source Formats

NAVIGATION
STAY INFORMED!
Subscribe to DCLnews! Subscribe to receive our newsletter covering topics from XML and DITA to eBooks and Social Media, and be informed of all upcoming webinars!

Years ago, the distinctions between word processors, desktop publishing systems, and composition applications were more clear-cut than they are today. Recent trends have seen the expansion of these applications' functionalities in response to consumers' broadening needs, leading to considerable overlap between these previously discrete categories. While these categories are constantly changing, we have grouped our source formats into:

Word Processors

Word processors are tools for composing, editing, and formatting documents. Popular word processors include proprietary applications like MS Word and WordPerfect as well as open-source applications.

Publishing Systems

Slightly broader in function than word processors, publishing systems allow for the management of the entire publishing process. They are useful for large documents and frequently include styling and tagging information. Common publishing systems include FrameMaker, QuarkXPress, InDesign, and PageMaker.

PDF

PDF files come in two main varieties: Image-only PDF and PDF Normal. Image-only PDF is little more than a scanned image of a page; in order for computer-readable text to be extracted, the files must first undergo an optical character recognition (OCR) process. On the other hand, PDF Normal files (such as those produced by word processors and publishing systems) already contain computer-readable text, and they frequently contain styling and tagging information as well.

Other Formats

There are literally hundreds of possible source or target formats (including paper) in addition to those listed on our website.

Target Formats

DCL can convert data from any format to any format, but most clients choose to transition their content into high-performance electronic formats like XML. Within XML (or SGML) are DTDs, standards, or schemas—all of which refer to sets of rules that determine the structure of the content.

There are a multitude of public domain DTDs and standards from which to choose. Since these public domain DTDs are free and widely used, there are many tools available to make using them easier. Their widespread use also makes it possible to transfer data between individual projects or among multiple organizations.

While public DTDs are most commonly used, there are hundreds of different XML (or SGML) DTDs and standards written for specific document types or industries—and DCL can convert your data to conform to any of them.

XML

Original Purpose — XML was created to bridge the gap between HTML and SGML in terms of data storage and interchange by being both human and machine readable, while being flexible enough to support platform- and architecture-independent data interchange.

Current Use — eXtensible Markup Language (XML) is a way to organize and display textual data.
Benefits — XML is sometimes referred to as a high-performance data format, since its use allows for enhanced functionality of content. For example, using XML, content can be grouped into content modules, tagged, and reused.

Original Purpose — DITA was originally developed in 2001 as a modular, reuse-friendly DTD for software documentation.

Current Use — DITA is now being used for all kinds of technical documentation, including help guides and manuals, and is not limited to just software documentation.
Benefits — DITA has been described as being "a step behind" other DTDs due to its use of document maps. These maps allow for document structure to be created by simply arranging content modules in the desired order. This makes DITA a good choice for documentation in which similar chunks of content appear multiple times in different locations, since content modules can easily be managed from a centralized database and reused wherever necessary.

Indeed, DITA is best-suited to content that is modular and context-independent in nature. It is ideal for projects in which you want to reuse some of the same content within a document, between documents, or even among different projects. Documentation that must be translated, for example, can benefit greatly from DITA's reusable modules; with DITA, a given chunk of content needs only be translated once, no matter how many times it appears throughout a set of documentation.
Reasons for Using an Alternative DTD — Since DITA works by sorting content into small chunks that can be easily reused, it is not well-equipped to handle full-text articles or other context-dependent content.

Original Purpose — Created in 1991 by HaL Computer Systems and O'Reilly and Associates, DocBook was designed for computer hardware and software documentation purposes.

Current Use — DocBook is now used for all types of documentation.
Benefits — DocBook's presentation-neutral form allows content to be published in numerous other formats (including, but not limited to HTML, XHTML, EPUB, and PDF). DocBook is simple to download and set up, and it has been around for a long enough period of time to be stable, well-known, and well-supported. A sophisticated set of rendering tools is available for use with this DTD.

While a DTD like DITA provides a general structure that can easily be specialized to your needs, DocBook comes with more options built-in. If the modifications you would make to DITA are already set within DocBook, then DocBook may be a better DTD for you, since it may let you do what you need without losing the benefits of a standardized structure.
Reasons for Using an Alternative DTD — DocBook also doesn't allow for modular content organization or document mapping (like DITA), so it is not well-suited for content reuse. Possible public domain alternatives include DITA, S1000D, or NLM (for textbooks).

Original Purpose — eXtensible Hypertext Markup Language (XHTML) is not technically a DTD, but rather a tag set. XHTML refers to a family of markup languages developed to build on HTML, the language used to write most web pages. XHTML was designed to make HTML more extensible, so while it is more restrictive than HTML, its requirement that documents be well-formed allows it the advantage of increased versatility.

Current Use — XHTML is a set of base tags for web rendering, and is not used for content tagging.
Benefits — XHTML is preferred to HTML 4.1 in many cases because of XHTML's versatility and tag minimization.
Reasons for Using an Alternative DTD — XHTML is about appearance, not content. If you are hoping to do more with your data than present it on a webpage, you will be better served by a richer content tag set.

Original Purpose — Standard Generalized Markup Language (SGML) was developed in the 1980s as a non-proprietary, platform-independent method of describing the structure of a document rather than its appearance.

Current Use — Though SGML is considered an archaic format to critics, it is still used in some government and military projects.
Benefits — SGML documents contain uniquely identifiable components that allow for information reuse.

Original Purpose — eXtensive Business Reporting Language (XBRL) was created as an XML-based markup language for electronic transmission of business and financial data.

Current Use — XBRL is used to define and exchange financial information such as statements and records. In June 2009, SEC has mandated a phase-in of XBRL filings for accelerated filers.
Benefits — XBRL tags increase the speed of data integration and exchange while simultaneously eliminating data redundancy and quality-related issues.

Original Purpose — A subset of XHTML, EPUB was developed for eBook publishing and came on the scene as an official standard in 2007.

Current Use — EPUB remains the publishing standard of choice for eBook publishers and eReader device and application manufacturers.
Benefits — As it was designed to handle "reflowable" content, the appearance of EPUB documents can be easily customized to suit the needs of different display devices.
Reasons for Using an Alternative DTD — If you are trying to present sophisticated content with complex formatting, EPUB may be too limited for your needs.

While EPUB has been widely adopted worldwide, not all media readers use or support EPUB the same way, and some specific media readers may require a different format altogether (Amazon's Kindle devices, for example, do not support EPUB, and uses a proprietary format based on MobiPocket).

Original Purpose — MOBI was originally an extension of the PalmDOC format where certain HTML-like tags were added to the data. Currently the source files follow the guidelines of the Open eBook format (OeB).

Current Use — Although there are a few others, the MOBI format is used most widely by Amazon's Kindle-series of eBook reading devices, except for the Kindle Fire, which has the device family moving toward a HTML5/CSS3-type of format, Kindle Format 8.
Benefits — This is the current format supported by Amazon's Kindle-series eBook reading devices, which accounts for a major portion of the eBook market.
Reasons for Using an Alternative DTD — It is a proprietary format, and requires adhering to Amazon's specifications. It does not support nearly as many styles of formatting as EPUB.

Original Purpose — The Structured Product Labeling (SPL) Standard was initially developed by a small group within the HL7 Regulated Clinical Research Information Management Technical Committee for healthcare industry product labels.

Current Use — SPL is an XML standard now required by the FDA as of 2005 to facilitate communicating drug labeling data reliably among various groups such as hospitals, doctors, pharmacies, and the general publis. SPL is both an HL7- and ANSI-approved standard.

Original Purpose — In 2003, the National Library of Medicine (NLM) created the Journal Archiving and Interchange DTD (also known as the NLM DTD), as a common format for medical journal articles, as well as the NLM Book DTD, designed specifically for textbooks.

Current Use — Used for various different types of journals and books—many having nothing to do with medical literature—the NLM DTD provides a useful structure for almost any kind of full-text article. It has been called the de facto standard DTD for full-text publishing.
Benefits — The NLM DTD was developed in response to publishers' needs, so this "reality-based" DTD provides a flexible structure that can accomodate many content irregularities without requiring customization. It is also well-known, widely-used, and allows for the creation of rich custom metadata.
Reasons for Using an Alternative DTD — Content-wise, the NLM DTD is somewhat more rigidly-defined than TEI.

Original Purpose — MAchine-Readable Cataloging (MARC) is a data format and set of related standards used by libraries to encode and share information about books and other material they collect. It was first developed by Henriette Avram at the Library of Congress in the 1960s.

Current Use — MARC is still widely used today as the basis for most online public access catalogs.

Original Purpose — S1000D was developed in the 1980s for the production of technical publications for military aircraft.

Current Use — S1000D has been modified for use with various different types of equipment documentation. It is now a popular DTD for maintenance and operations documentation for commercial equipment as well as for military technical publications of all sorts.
Benefits — As an international specification, S1000D is widely used in the military and aviation industries. Its complex hierarchical structure and predefined data modules allow for minimal flexibility, so your content is well-organized, standardized, and ready for reuse.
Reasons for Using an Alternative DTD — S1000D adheres to a rigid hierarchical structure with predefined data module codes designed to deal with equipment documentation. It is customizable to an extent, but if your needs fall outside the specific realm of equipment documentation, you may be better served by another DTD.

Original Purpose — In an effort to standardize digital development, acquisition, and delivery of equipment maintenance and operations information and training materials in consistent and identifiable chunks, the U.S. Army created the MIL-STD-2361. It was developed to ensure compliance with existing Department of Defense, Army, and international policies and requirements.

Current Use — MIL-STD-2361 is used to facilitate the automated storage, retrieval, interchange, and processing of publications from varying data sources.

MIL-STD-38784

Current Use — The MIL-STD-38784 DTD is used exclusively in the conversion of legacy Technical Manuals, Repair Parts and Special Tool Lists, Depot Maintenance Work Requirements, Technical Bulletins, Supply Bulletins, Preventative Maintenance Technical Manuals, Modification Work Order Requirements, and Joint Technical Manuals.