DCLWiki | Client Area  
DCL  

representational space

   Refer a friend  Email this Page
   Print friendly version Print-Friendly
   Request Information Request Information
   Subscribe  Subscribe

          LinkedInTwitterFacebook

representational space
Services
Content Reuse
Document Conversion
Quality Assurance
Rendering & Publishing
SPL Labeling
Source Formats
   - Word Processors
   - Publishing Systems
   - PDF
   - Other Formats
Target Formats
   - XML & SGML
   - DITA
   - Military DTDs
   - NLM
   - Public DTDs
   - S1000D
   - Other Standards
Other Services »
representational space
Memberships

DCL's Ask The Experts

Experts from Data Conversion Laboratory Inc. answer your data conversion and XML questions. This month you asked...

1) What's the difference between Searchable PDF and PDF Normal?

How does a PDF source format affect your per-page conversion cost? Read the DCLnews feature and its accompanying white paper.

PDF Normal - also referred to as Formatted Text & Graphics - is the usual PDF output produced from a text processing or authoring environment, such as MS Word, Quark, and FrameMaker. It contains the full text of the page with appropriate coding to define fonts, and font sizes, and so on.

Searchable PDF is usually produced from scanned documents. It consists of an image of the page, with the text portions of the image converted to text for search purposes and stored in a "text layer." This layer is generated through an Optical Character Recognition (OCR) process. While the image layer will be accurate, the accuracy of the text in the text layer will vary depending on the OCR and cleanup process that was used.

Searching is done by querying the text layer for matching text patterns. If the text is found, the image corresponding to the found text is displayed, and the materials can be read in context. Searchable PDF is created in two steps: (1) obtaining a page image (for example, by scanning a page), and (2) creating the text layer via OCR.

This PDF format is better than Searchable PDF for several reasons:

  • File sizes are smaller
  • Legibility of text is better on screen and on printouts, especially at high zoom
  • Textual accuracy is very high

When PDF Normal is available, that's the way to go. But when starting with images, Searchable PDF is a much less expensive process.

For a more extensive treatment of this topic see the multi-part white paper - "Adobe PDF Conversion: How, for Whom, and When?" by Data Conversion Laboratory's Lazar Weisz:

Overview of PDF formats
PDF Image Only
PDF Searchable Image
PDF Normal

Also refer to the following relevant items on our FAQ page:

Is all PDF created equal?
How do I choose between PDF or SGML conversions?
Converting PDF to XML - can it be done easily?

 

2) How does an investment in XML and related technologies bring Return on Investment (ROI)?

XML usually represents a new, different way of doing business and often requires a significant investment to bring together the tools, training of people, and the conversion of legacy materials. It should be considered a capital item that would be expensed over a number of years. It's usually not worth it unless you can gain significant benefits, and many of those benefits may not be easily quantifiable.

Adopting XML doesn't necessarily make business sense for every organization. What's more, the equation keeps changing as new tools and technology become available - and determining the ROI for you requires some homework. Factors suggesting XML may be cost effective for you are:

  1. You have large amounts of materials
  2. Those materials are complex and change over time
  3. Content management is an issue
  4. You want to deliver data in multiple formats and media

But ROI is only one part of it. There are many intangibles associated with adopting XML. It allows you to provide a better service, which leads to loyal and satisfied customers and clients. Plus you have the ability to deliver products you just couldn't before - which means you can grow and generate profits in new ways.

Some resources for more information:

From a previous issue of DCLnews - "The Business Case For XML"
http://www.dclab.com/businessxml.asp

From a white paper on DCL's website - "Department of Defense and the Power of XML"
http://www.dclab.com/dodxml.asp

 

Got A Data Conversion Question?

If so, send it to DCL's experts and see the reply in next month's issue. mailto:experts@dclab.com

 
representational space
DCL Library
Articles, fact sheets, presentations and white papers
representational space
Events

CIDM Best Practices Conference
September 13–15, 2010
Hampton, Virginia

Vasont Users' Group Meeting
September 27–30, 2010
Hershey, Pennsylvania

Internet Librarian Conference
October 25–27, 2010
Monterey, California

Journal Article Tag Suite Conference (JATS-Con)
November 1–2, 2010
Bethesda, Maryland

SPARC Digital Repositories Meeting
November 8–9, 2010
Baltimore, Maryland

More Events »

representational space

News
Brill Again Turns to Data Conversion Laboratory (DCL™) for Key Project


DCL and GeerStreet Announce Strategic Partnership


DCL's “Dan Tonkery on the iPad and the Future of Technical Publications” Published in CIDM News


DCL's “Guide to Conversion Cost Variables” Published in Best Practices Newsletter


DCL's “Dan Tonkery on the iPad and the Future of Technical Publications” Translated on German Blog

More News »


representational space
representational space representational space representational space representational space representational space representational space representational space


Corporate office:
61-18 190th Street, 2nd Floor, Fresh Meadows, NY 11365
718-357-8700
Data Conversion Lab
Copyright © 1997-2010  Data Conversion Laboratory, Inc. All rights reserved.