|
Adobe PDF Conversion: How, For Whom, And When?PDF White Paper, Part 4: PDF Normal. Get the lowdown on how to convert data to PDF from Lazar Weisz, PDF expert at Data Conversion Laboratory (DCL). What is PDF Normal?
PDF Normal is an exact print-ready representation of the source format, whether paper or electronic. All page layout information, such as font properties, resolution and compression of images, and their location on the page, is contained within this format. The easiest way to understand PDF Normal is to think of it as a viewing platform for documents created in a word processing or publishing application: it displays exactly what the author has created. This allows for the most realistic representation of the source. Text in PDF Normal documents are not scanned bitmap representations of the original, as is the case in PDF Searchable Image. It comes directly from the application in which the document was authored. This ensures that text accuracy is extremely high. Also, the absence of bit-mapped images enables the PDF file size to remain as small as possible. In eBooks, for example, this is very important because eBooks are frequently downloaded and small file sizes are therefore essential. How do I get PDF Normal?If your source is already in a typeset, electronic format and has been created using a word processor such as MS Word or a desktop publishing application such as Quark, Interleaf or FrameMaker, going to PDF Normal is simple. These applications typically come with a 'Save As PDF' or 'Print To PDF' function, which allows the user to painlessly convert the document to PDF Normal. The author ensures that all text, images, hyperlinking, and other elements of the document are correctly formatted within the authoring application. Once that is done, the document is saved as PDF Normal. If your source data is paper, however, creating PDF Normal becomes significantly more complicated and expensive. Paper to PDF Normal conversionDepending on the quality of the source paper documents, the information must be converted to electronic format either by scanning & OCR or by manual keying. Other elements of the page, such as tables and images, will also have to be ported over to electronic format. OCR engines do a pretty good job at detecting simple tables; however, expect to do post-OCR clean up on complex tables. Raster - or bitmap - images will have to be scanned, cleaned up, and adjusted to the right color space and resolution. If you would like to include vector images in your final PDF Normal file, you will have to draw them from scratch, since OCR is not able to create vector images from paper. Typesetting and conversion to PDF NormalOnce all document elements have been captured from paper into electronic format, they need to be typeset in a desktop publishing or word processing environment. This is the step where all final PDF Normal components are created: text layout, hyperlinks, image properties, headers and footers, table structures, and so on. Remember: if you OCR'ed the text from paper, you will need to carefully proofread it to ensure it conforms to the high textual accuracy PDF Normal users' demand: typically 99.995%, or 5 errors in 100,000 characters. As opposed to Searchable PDF, any typo in the text will be immediately visible in the final PDF. This is also a good place to add elements to the document that the paper did not have. For example, if the original paper document did not have a Table of Contents or an Index, you can create one now, link the various entries to the appropriate pages in the file, and thus add value to the overall project. As mentioned earlier, once typesetting is complete, you can produce the final PDF Normal file simply by using the 'Save As PDF' or 'Print to PDF' function. Why not scan and OCR straight to PDF Normal?Most OCR applications are able to produce PDF Normal right out of the OCR stage. Why, then, go through the trouble of typesetting the document? The answer to this question comes with a good understanding of PDF Normal. This format does not leave any room for textual inconsistencies. If one line of text in the PDF is composed of Times New Roman font size 10, and the next line is made up of font size 9.5, the reader will immediately pick it up, just like she would in a Word document. Therefore, you can't rely on the OCR engine to produce a 100% consistent representation of the original paper page in terms of font type and size as well as textual accuracy. Another reason: going directly from OCR to PDF Normal does not allow you to add any value to the project - what you see on paper is what you'll get in the PDF Normal file. This is a wasted opportunity. PDF Normal: SummaryThe complexity and cost of the journey to PDF Normal depends on the format of the source (paper or already typeset electronic format), the complexity of the page layout, and whether you would like to add value to the document you want to produce. Conversion from typeset electronic format to PDF is trivial; conversion from paper is difficult and expensive. However, once you have created PDF Normal from your documents, you are in possession of the best format possible for distributing and publishing your documents on the local network and the Web. For many companies this is an invaluable resource and one that may be critical to business success. It is therefore often worth the extra money to get the best quality PDF Normal. PDF White Paper: SummaryThe PDF format has become a primary choice of representing and distributing information at low cost, both on local networks as well as the World Wide Web. The unique ability of PDF to enable documents to be viewed and printed easily has been a prime factor in its success. Just as Microsoft has done with its Windows family of Operating Systems, PDF has gained a critical mass of end-users to achieve a self-sustaining customer base. This ensures that the format will live on for many years to come. The sheer amount of plug-ins available for Adobe's Acrobat application also allows users to manipulate their PDF files in any number of ways. The PDF format is thus not a dead-end. Using the many tools available, images and text in PDF files can be exported, changed, deleted, and adjusted. Additionally, the many security options that come with PDF permit documents to be protected from tampering, piracy, and fraud. All of these broad possibilities have contributed to PDF's popularity and success. It is, however, important to point out that PDF is not the panacea of publishing. As pointed out in Part I of this White Paper, PDF is not in competition with markup languages such as SGML and XML. If you intend to normalize and repurpose your documents, PDF is not a solution, since the text in PDF files is not styled. Often the ideal solution is a combination of SGML/XML and PDF, where documents are first converted to SGML/XML, loaded into a publishing platform, and then printed to PDF. For additional information on the relationship between PDF and SGML/XML, please see the following items on our FAQ page:
Lazar Weisz Read more of this white paper on how to convert data to PDF:
© 2002/2003 Data Conversion Laboratory. All rights reserved.
This White Paper is for informational purposes only. Data Conversion Laboratory makes no warranties in this document, expressed or implied. |
|
|||||||||||||||||||||||
|
|
|
|
|
|
|
|
|||||||||||||||||||
|
Corporate office: 61-18 190th St., 2nd Floor, Fresh Meadows, NY 11365, P: 718-357-8700 |
Copyright © 1997-2009 Data Conversion Laboratory, Inc. All rights reserved. |