DCL  
     Refer a friend Send this Page to a Friend
     Print friendly version Printer-Friendly Format

    Resource Center

    Fact Sheets

    White Papers

DCL's Ask The Experts

Experts from Data Conversion Laboratory Inc. answer your data conversion and XML questions. This month you asked...

1) What's the difference between Searchable PDF and PDF Normal?

PDF Normal - also referred to as Formatted Text & Graphics - is the usual PDF output produced from a text processing or authoring environment, such as MS Word, Quark, and FrameMaker. It contains the full text of the page with appropriate coding to define fonts, and font sizes, and so on.

Searchable PDF is usually produced from scanned documents. It consists of an image of the page, with the text portions of the image converted to text for search purposes and stored in a "text layer." This layer is generated through an Optical Character Recognition (OCR) process. While the image layer will be accurate, the accuracy of the text in the text layer will vary depending on the OCR and cleanup process that was used.

Searching is done by querying the text layer for matching text patterns. If the text is found, the image corresponding to the found text is displayed, and the materials can be read in context. Searchable PDF is created in two steps: (1) obtaining a page image (for example, by scanning a page), and (2) creating the text layer via OCR.

This PDF format is better than Searchable PDF for several reasons:

  • File sizes are smaller
  • Legibility of text is better on screen and on printouts, especially at high zoom
  • Textual accuracy is very high

When PDF Normal is available, that's the way to go. But when starting with images, Searchable PDF is a much less expensive process.

For a more extensive treatment of this topic see the multi-part white paper - "Adobe PDF Conversion: How, for Whom, and When?" by Data Conversion Laboratory's Lazar Weisz:

Overview of PDF formats
PDF Image Only
PDF Searchable Image
PDF Normal

Also refer to the following relevant items on our FAQ page:

Is all PDF created equal?
How do I choose between PDF or SGML conversions?
Converting PDF to XML - can it be done easily?

 

2) How does an investment in XML and related technologies bring Return on Investment (ROI)?

XML usually represents a new, different way of doing business and often requires a significant investment to bring together the tools, training of people, and the conversion of legacy materials. It should be considered a capital item that would be expensed over a number of years. It's usually not worth it unless you can gain significant benefits, and many of those benefits may not be easily quantifiable.

Adopting XML doesn't necessarily make business sense for every organization. What's more, the equation keeps changing as new tools and technology become available - and determining the ROI for you requires some homework. Factors suggesting XML may be cost effective for you are:

  1. You have large amounts of materials
  2. Those materials are complex and change over time
  3. Content management is an issue
  4. You want to deliver data in multiple formats and media

But ROI is only one part of it. There are many intangibles associated with adopting XML. It allows you to provide a better service, which leads to loyal and satisfied customers and clients. Plus you have the ability to deliver products you just couldn't before - which means you can grow and generate profits in new ways.

Some resources for more information:

From a previous issue of DCLnews - "The Business Case For XML"
http://www.dclab.com/businessxml.asp

From a white paper on DCL's website - "Department of Defense and the Power of XML"
http://www.dclab.com/dodxml.asp

 

Got A Data Conversion Question?

If so, send it to DCL's experts and see the reply in next month's issue. mailto:experts@dclab.com

 
representational space
    Popular Links

    Events

    Recent Events

representational space
representational space representational space representational space representational space representational space representational space representational space


Corporate office:
61-18 190th St., 2nd Floor, Fresh Meadows, NY 11365, P: 718-357-8700
Data Conversion Lab
Copyright © 1997-2009  Data Conversion Laboratory, Inc. All rights reserved.