Data Conversion Laboratory, Revolutionizing Publishing for the Digital Age 
  DCLab.com | About DCL | Tech Info | Press Info | Contact Us | DCLNews | Partners | Wiki | Client Area     
menu
Data Conversion Lab

About DCL
  Why go to DCL?
  Clients
  Company Background
  Management
  DCL in the News
  Events
  Holiday Calendar
  Mission

DCL News
  Current Issue
  Back Issues
  Subscribe

Technology
  Technology Resources
  FAQ's
  Glossary
  Presentations
  DCL Work Tracking

Press Info

Clients' Area

Contact DCL
  Directions
  Request Estimate
  Positions

Books2Bytes
Popular Pages
* Current Issue of DCLnews
* DCL featured in The Columbia Guide to Digital Publishing
* Slash Document Costs
* Ann Rockley on ROI in CM
* PDF Resources
* XML Conversion Resources
* Roundtrip Document Conversion
* DCL Resources Library
*

Converting Legacy Data...

*

Aviation & Aerospace

*

PDF Conversion to XML & MS-Word

*

PDF Conversion

*

Quark to XML

* Getting Content into XML
Fact Sheets
* Public Access for Research Materials
* S1000D Conversion
* Content Reuse Assessment
* Document Conversion
* SPL - Pharmaceutical Industry
* Harmonizer™
* Jeppesen Map Revision Service
Technical Papers
* Why STM Publishers Should Use XML...
* Department of Defense and the Power of XML
* Your Data in XML
* SGML to SGML 1
* SGML to SGML 2
* Quark to XML
* Plan Ahead
* Do it Yourself?
* Encyclopedia
Presentations
* Conversion to XML: Documents versus Data (11/2003)
* Data Migration Considerations  (6/2003)
* Technology for Cost-Containment and Efficiency  (4/2003)
* Converting Textbooks to Meet the National XML Standard for Accessibility  (3/2003)
* More Presentations

DCL's Ask The Experts

Experts from Data Conversion Laboratory Inc. answer your data conversion and XML questions. This month you asked...

1) What's the difference between Searchable PDF and PDF Normal?

PDF Normal - also referred to as Formatted Text & Graphics - is the usual PDF output produced from a text processing or authoring environment, such as MS Word, Quark, and FrameMaker. It contains the full text of the page with appropriate coding to define fonts, and font sizes, and so on.

Searchable PDF is usually produced from scanned documents. It consists of an image of the page, with the text portions of the image converted to text for search purposes and stored in a "text layer." This layer is generated through an Optical Character Recognition (OCR) process. While the image layer will be accurate, the accuracy of the text in the text layer will vary depending on the OCR and cleanup process that was used.

Searching is done by querying the text layer for matching text patterns. If the text is found, the image corresponding to the found text is displayed, and the materials can be read in context. Searchable PDF is created in two steps: (1) obtaining a page image (for example, by scanning a page), and (2) creating the text layer via OCR.

This PDF format is better than Searchable PDF for several reasons:

  • File sizes are smaller
  • Legibility of text is better on screen and on printouts, especially at high zoom
  • Textual accuracy is very high

When PDF Normal is available, that's the way to go. But when starting with images, Searchable PDF is a much less expensive process.

For a more extensive treatment of this topic see the multi-part white paper - "Adobe PDF Conversion: How, for Whom, and When?" by Data Conversion Laboratory's Lazar Weisz:

Overview of PDF formats
PDF Image Only
PDF Searchable Image
PDF Normal

Also refer to the following relevant items on our FAQ page:

Is all PDF created equal?
How do I choose between PDF or SGML conversions?
Converting PDF to XML - can it be done easily?

 

2) How does an investment in XML and related technologies bring Return on Investment (ROI)?

XML usually represents a new, different way of doing business and often requires a significant investment to bring together the tools, training of people, and the conversion of legacy materials. It should be considered a capital item that would be expensed over a number of years. It's usually not worth it unless you can gain significant benefits, and many of those benefits may not be easily quantifiable.

Adopting XML doesn't necessarily make business sense for every organization. What's more, the equation keeps changing as new tools and technology become available - and determining the ROI for you requires some homework. Factors suggesting XML may be cost effective for you are:

  1. You have large amounts of materials
  2. Those materials are complex and change over time
  3. Content management is an issue
  4. You want to deliver data in multiple formats and media

But ROI is only one part of it. There are many intangibles associated with adopting XML. It allows you to provide a better service, which leads to loyal and satisfied customers and clients. Plus you have the ability to deliver products you just couldn't before - which means you can grow and generate profits in new ways.

Some resources for more information:

From a previous issue of DCLnews - "The Business Case For XML"
http://www.dclab.com/businessxml.asp

From a white paper on DCL's website - "Department of Defense and the Power of XML"
http://www.dclab.com/dodxml.asp

 

Got A Data Conversion Question?

If so, send it to DCL's experts and see the reply in next month's issue. mailto:experts@dclab.com

  Structured Product Labeling

Content Reuse

Subscribe

Books2Bytes

DCL Library

Columbia Guide
GSA Schedule
AIA Member
DCL Calendar

Best Practices Santa Fe, NM, September 15-17, 2008. More…
XyUser Phoenix, AZ, September 22-24, 2008. More…
9th Annual Vasont Users' Group Meeting, Hershey, PA, October 6-8, 2008. More…

DITA/TECHCOMM 2008, Raleigh, NC, November 3-6 2008. More…

ATA e-Business Europe. Details TBA.

 
Recent News

Doc Train Life Sciences Indianapolis, IN, June 23-25, 2008. More…

X-Pubs London, England, June 22-24, 2008. More…

Mark Logic User San Francisco, CA, June 10-12, 2008. More…

PTC User Long Beach, CA, June 2-4, 2008. More…

Ultramain User Conference 2008, Albuquerque, NM, May 11-15, 2008. More…

Documentation and Training West 2008 Vancouver, BC, May 6-9, 2008. More…

CMS/DITA Santa Clara, CA, April 7-9, 2008. More…

DIA Med Comm Orlando, FL, March 10-11, 2008. More…

DIA EDM Philadelphia, PA, February 5-7, 2008. More…

Gilbane Boston Conference Boston, MA, November 29, 2007. More…

The LavaCon Conference on Advanced Technical Communication and Project Management New Orleans, LA, October 27-30, 2007. More…

2007 ATA e-Business Forum Miami, Florida, Oct 17-19, 2007. More…

DITA 2007™-East, Raleigh, North Carolina, October 4-6, 2007. More…

2007 XyUser Group Fall Conference, Boston, MA, Sept 23-26, 2007. More…

Mark Logic 2007 User Conference, San Francisco, CA, May 15-17, 2007. More…

Content Management Strategies/DITA North America Conference 2007, Boston, MA, March 26-28, 2007. More…

DIA 18th Annual Workshop, San Diego, CA. March 4-7, 2007. More…

DIA 2007 EDM & CDM Conference, Philadelphia, PA, Feb 6 - 8, 2007. More…

DITA 2007 – West, San Jose, CA, February 5-7, 2007. More…

Framemaker 2006 Chautauqua, Austin, TX, Nov 8-10, 2006. More…

PTC/User World Event 2006, Grapevine, TX, June 4-6. More…

19th Annual DIA Conference Philadelphia, PA, February 7-9. More…

XyUser's Conference, San Diego, California, September 11-14. DCL's Don Bridges delivered a presentation on "Content Reuse" More…

Structured Product Labeling, Washington, DC, August 23-24. More…

Tri-XML 2005, Raleigh, NC , July 28. DCL's Don Bridges delivered a presentation on "Content Reuse" More…

Pharmaceutical Labeling and Product Identification, Whippany, NJ, June 16-17. DCL's Don Bridges delivered a presentation on "Structured Product Labeling (SPL) and the Implications of Implementing an XML Solution." More…

More…

Data Conversion Laboratory, Inc.   61-18 190th St., 2nd Floor, Fresh Meadows, NY 11365   718-357-8700   convert@dclab.com

Copyright © 1997-2008  Data Conversion Laboratory, Inc. All rights reserved.