|
|
PDF Conversion: How, For Whom, And When?PDF Conversion White Paper, Part 1: Overview. The lowdown on PDF conversion from Data Conversion Laboratory (DCL) by Lazar Weisz.
PDF, or Portable Document Format, is Adobe's flagship document publishing and distribution format. It has become the most widely used format for distributing documents within businesses, schools, and the Web. This white paper addresses PDF conversion and the attendant issues. One of the secrets behind the success of PDF is the fact that it is portable. Regardless of the Operating System of the user - whether it be Windows, Linux or Macintosh - with Adobe's free Acrobat Reader, PDFs become readable and printable everywhere. In a changing world of constant struggle for compatibility, this is a tremendously powerful factor. If you want to make sure your documents will be viewable by the largest amount of people at low cost, Adobe PDF is the way to go. If your primary goal is to disseminate information in its existing form and look, PDF will do an excellent job at much lower cost than other alternatives. PDF is an outstanding choice for reference documents that must retain their original look, and for documents that would normally be printed. However, if your requirements include repurposing and normalizing your documents so that they can be republished and shared with other organizations, PDF may not be the ideal choice. PDF files are also typically larger than marked-up text.
Not all PDF files are equal. There are three forms of PDF files, each with their own characteristics:
Let's look at each of them in turn ... PDF NormalAdobe officially calls this Formatted Text & Graphics. But we'll continue to refer to it as PDF Normal. This is the best kind of PDF. You get this when your materials have been produced on a modern word processing or publishing system, with a PDF output capability. It contains the full text of the page with appropriate coding to define fonts, sizes, etc. The downloaded files are relatively small, and it will look as good on the screen as the printed version would. If PDF works for your application, and you have the original Word Processing or publishing files, this is the best bet. However, if you are going from legacy materials and don't have suitable electronic files, producing PDF Normal is complex and relatively costly, usually requiring that you convert to a word processing or publishing format first, and from there produce the PDF files. Image OnlyThis type of PDF is easiest to produce from legacy sources. It is an image of the page in a PDF wrapper and contains no searchable text. Producing it is easy. All you need to do is scan the materials and put the images through an automated PDF loading process. Image Only PDF could be seen as a replacement for microfilm: It is an archival format which can be retrieved. However, there is no ability for text searching and files tend to be fairly large and therefore harder to store and download. The image quality is dependent on the quality of the source materials and the quality of the scanning operation. Searchable Image PDFThis is a good compromise for many legacy applications. It is an image of the page, but with the text portions of the image converted to text for search purposes. In a search application, when the text is found, the image corresponding to the found text is displayed, and the materials can be read in context. This type of PDF is relatively inexpensive to produce since the pages can be scanned and run through an automated Optical Recognitions Process -- commonly referred to as "Optical Character Recognition" (OCR). Usually raw OCR is not suitable because accuracy is unlikely to be high enough (raw OCR accuracy is only about 95-99% for most materials). But for search purposes, it is good enough for the majority of applications. Also, since the image needs to be retained, file sizes are larger than PDF Normal and larger than other text formats. If you can live with these constraints, Searchable Image PDF could be a very good compromise. This approach is frequently suitable for library and legal applications.
Table 1. contains a general overview of the prices you can expect when converting to the various types of PDF. Note that these prices depend on a wide variety of factors. Each conversion project requires its own, unique conversion methodology. The prices shown should be regarded as benchmarks for the average project. Table 2. illustrates typical file sizes per PDF page generated from the various types of paper and electronic sources.
1 Assuming scan at 150 DPI using medium-strength 8-bit JPEG compression.
Lazar Weisz Read more of this PDF conversion white paper:
© 2002/2003 Data Conversion Laboratory. All rights reserved.
This White Paper is for informational purposes only. Data Conversion Laboratory makes no warranties in this document, expressed or implied.
|
|
|
|
|
|
|
|
|
|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||