DCLWiki | Client Area  
DCL  

representational space

   Refer a friend  Email this Page
   Print friendly version Print-Friendly
   Request Information Request Information
   Subscribe  Subscribe

          LinkedInTwitterFacebook

representational space
Services
Content Reuse
Document Conversion
Quality Assurance
Rendering & Publishing
SPL Labeling
Source Formats
   - Word Processors
   - Publishing Systems
   - PDF
   - Other Formats
Target Formats
   - XML & SGML
   - DITA
   - Military DTDs
   - NLM
   - Public DTDs
   - S1000D
   - Other Standards
Other Services »
representational space
Memberships

Adobe PDF Conversion: How, For Whom, And When?

PDF White Paper, Part 2: PDF Image Only.

Get the lowdown on how to convert documents to PDF from Lazar Weisz, PDF expert at Data Conversion Laboratory (DCL).

What is PDF Image Only?

READ MORE OF THIS PDF CONVERSION WHITE PAPER:

  1. Overview
  2. PDF Image Only
  3. PDF Searchable Image
  4. PDF Normal

OTHER PDF RESOURCES ON DCLAB.COM

NEW WHITE PAPER ALERT!

Be first in line to read new articles on PDF, XML, and data conversion. Subscribe to DCLnews, Data Conversion Laboratory's popular tech newsletter now!

PDF Image Only is simply a scanned, non-searchable image of the page inside PDF wrappers. This limited approach to distribute documents is the cheapest - for simple text documents prices range between $0.15 and 0.30 / page. This is an ideal solution for archiving legacy documents in digital format.

PDF Image Only File Sizes

In contrast to the relatively small PDF Normal documents authored in word processors or publishing platforms, PDF Image Only files are subject to the same file size concerns that all image formats - such as TIFF, JPEG and BMP - are subject to. Depending on the type of image, color range, and image resolution, file size is frequently a major concern. Two methods to reduce file size are Image Compression and Composite PDF. A combination of the two might yield the best results.

Image Quality

When scanning to PDF Image Only, keep in mind that the quality of the final PDF is largely dependant on the initial capture to digital format. A high quality scanner with minimal post-scan clean-up will always yield better results than a low quality scan and lots of image clean-up. Investing more in an excellent scanner or in better training of the people doing the scanning pays off quickly when compared to the costs of manually having to de-speckle, de-skew, and otherwise fix a bad scan.

Hyper linking an Image Only PDF

While images are not searchable, there are other navigational aids that can be used with Image Only PDF files. Adobe Acrobat, and other tools, can be used to add hyper linking to a PDF Image Only document. For example, you can provide a Table of Contents, Index, or other intra-document linking structure, which would be linked directly to the relevant page. Alternatively, you can use Acrobat's bookmarking feature, which enables you to create your own Table of Contents-like list of headings that are linked to their respective pages. These bookmarks become part of the PDF file but are not an actual page in the file.


Appendix A:

Image Compression: An Overview

Image compression refers to any of several techniques used to reduce image file sizes usually by removing either redundant information or information which can be recreated prior to display. Reducing file sizes is often important in order to allow image-heavy files to be easily transmitted and stored.

The scope of the problem is related to a number of factors. For example, if the page contains only text or a few black/white (bi-tonal) images, the problem is limited since bi-tonal images compress to very small sizes (typically using CCITT Group 4 compression at the industry standard 300 DPI resolution). You'll be able to scan the entire page at a single resolution (300 DPI), color-depth (bi-tonal) and compression (CCITT Group 4), and retain a small file size. Using JBIG2 compression you can even achieve similar file sizes as PDF Normal. If the page contains grayscale or color images, however, file size increases dramatically. An 8 ½ by 11 inches page scanned at 300 DPI with 24-bit color depth would result, uncompressed, in a TIFF file of around 25 MB:

Width: 8 ½ x 300 = 2550 pixels
Length: 11 x 300 = 3300 pixels

2550 x 3300 = 8415000 total page pixels

8415000 x 24 (color-depth bits) = 201960000 bits

201960000 / 8 = 25245000 bytes, or 25.2 MB.

PDF files containing 25.2 MB per page would take a long time to download and will require much disk space to store. Image compression is intended to reduce image sizes.

Image Compression can be categorized as lossy and lossless. Lossy compression algorithms focus more on losing file size than on retaining the image quality. JPEG, for example, is a lossy compression method. It is frequently used for color images on the Web, where small image file sizes and thus shorter download times are more important than high quality images. TIFF G4, on the other hand, is a lossless bitonal compression methodology often used to scan medical, legal, and governmental documents that must retain their original look and feel. Also, when converting to Searchable Image PDF, the OCR (Optical Character Recognition) process required to add the text layer to the PDF will work much better if applied to a lossless, purely bitonal scan. TIFF G4 is therefore often used for OCR. The right compression method for your conversion therefore depends on the following factors:

  1. Type of Information (medical, legal, etc.)
  2. Range of colors (bitonal, grayscale, color)
  3. Resolution required, in DPI (dots per inch)

The following table illustrates the most popular methods of compression and where they are commonly used:

Compression Method

Lossy/
Lossless

Color Range Supported

Application

Compression Ratio

TIFF

G4

Lossless

Bitonal

Legal, Defense, Government

90-95%

JBIG2

Supports both

Bitonal

Legal, Defense, Government

95-98%

LZW/Packbits

Lossless

Color

Medical, IT

LZW: 80-85%
Packbits: 75-80%

JPEG, GIF

JPEG: Lossy
GIF: Lossless

Color

WWW

JPEG: 90-95%
GIF: 60-80%

While compressing the entire page using one method is the simplest, it does not necessarily provide the optimal results. Frequently different types of compression are suitable to different parts of the page. Areas on a page containing text that will undergo an OCR process to produce Searchable PDF, for example, should be scanned at a resolution not lower than 300 DPI and using bitonal color depth. Images on the same page, however, can't be scanned at bitonal color depth since that would convert the color image to monochrome. Scanning the entire page at 300 DPI color will result in a large file size even when using image compression. So if a page contains images and text, a dilemma unfolds: If the compression methods mentioned above allow for only one color depth and one resolution setting, the final PDF produced from the image will either contain color but will be large in size and suffer from below-par OCR results, or it will have to be created bitonally to allow for small file sizes and good OCR. This problem is solved with Composite PDF.

Appendix B:

Composite PDF

Standard image file formats have a major drawback: you can only have one resolution and one color depth setting for the entire image. For example: in order to scan a page containing mostly text, but also a few color images surrounded by text (think of a medical journal or a computer magazine), you'll typically either scan the whole page at a bitonal setting, which will capture the text and white space optimally but will convert all images to monochrome, or at a color setting, which will pick up the color images beautifully but create unnecessary gray shadings for the text and white space and result in a huge file. You'd also be limited to one resolution. As a solution to this, PDF allows you to combine many 'zones' on a single page. In the example above, you could scan the whole page to a 300 DPI bitonal TIFF, and then again at 150 DPI JPEG color, and combine them in the final PDF to yield the perfect balance: Composite PDF. This PDF will enjoy the best of both worlds: purely bitonal text and white space areas (which is important to get best OCR and print results) and true color, compressed image areas. File size will be kept to a minimum since you'll be able to use G4 or JBIG2 compression on all text and white space areas and JPEG for the images..

Lazar Weisz
Data Conversion Laboratory

Read more of this white paper on how to convert documents to PDF:

  1. Overview: www.dclab.com/pdf_conversion.asp
  2. PDF Image Only: www.dclab.com/pdfwhitepaper2.asp
  3. PDF Searchable Image: www.dclab.com/pdfconversion3.asp
  4. PDF Normal: www.dclab.com/pdf_whitepaper_4.asp


© 2002/2003 Data Conversion Laboratory. All rights reserved.
This White Paper is for informational purposes only. Data Conversion Laboratory makes no warranties in this document, expressed or implied.
 
representational space
DCL Library
Articles, fact sheets, presentations and white papers
representational space
Events

CIDM Best Practices Conference
September 13–15, 2010
Hampton, Virginia

Vasont Users' Group Meeting
September 27–30, 2010
Hershey, Pennsylvania

Internet Librarian Conference
October 25–27, 2010
Monterey, California

Journal Article Tag Suite Conference (JATS-Con)
November 1–2, 2010
Bethesda, Maryland

SPARC Digital Repositories Meeting
November 8–9, 2010
Baltimore, Maryland

More Events »

representational space

News
Brill Again Turns to Data Conversion Laboratory (DCL™) for Key Project


DCL and GeerStreet Announce Strategic Partnership


DCL's “Dan Tonkery on the iPad and the Future of Technical Publications” Published in CIDM News


DCL's “Guide to Conversion Cost Variables” Published in Best Practices Newsletter


DCL's “Dan Tonkery on the iPad and the Future of Technical Publications” Translated on German Blog

More News »


representational space
representational space representational space representational space representational space representational space representational space representational space


Corporate office:
61-18 190th Street, 2nd Floor, Fresh Meadows, NY 11365
718-357-8700
Data Conversion Lab
Copyright © 1997-2010  Data Conversion Laboratory, Inc. All rights reserved.