PDF files come in two main varieties: Image-only PDF and PDF Normal. Image-only PDF is little more than a scanned image of a page; in order for computer-readable text to be extracted, the files must first undergo an optical character recognition (OCR) process. On the other hand, PDF Normal files (such as those produced by word processors and publishing systems) already contain computer-readable text, and they frequently contain styling and tagging information as well.
How DCL can help
DCL has extensive experience in converting from all types of PDF files into XML and other formats. DCL can complete every step necessary to get your documentation into your target format you need, and we will work with you to create a documentation strategy that will ensure that you are getting the most out of your data.
- PDF or SGML?
How do I choose between PDF or SGML conversions?
http://www.dclab.com/dclfaq.asp#pdforsgml - PDFs
Is all PDF created equal?
http://www.dclab.com/dclfaq.asp#pdfequal - PDF to XML conversion
I have a bunch of PDF documents that I need transformed into XML. Can this be done easily?
http://www.dclab.com/dclfaq.asp#pdfxml
- Converting Documents to PDF
Overview; PDF Image-Only; PDF Searchable Image; PDF Normal; Image Compression; Composite PDF (11/2002)
http://www.dclab.com/white_papers/pdf_conversion.a...
- The Importance of Standards in Our Lives
Everywhere you look, travel, and shop, our world is driven by standards which have been developed by organizations that are responsible for their sphere of influence, but we take most standards for granted. (9/2010)
http://www.dclab.com/blog/2010/09/the-importance-o... - From PDF to E-Book: Problems and Solutions for PDF to ePub Conversions
As the e-book business starts to boom in earnest, many publishers find themselves needing to convert their PDF documents to the e-reader-friendly ePub standard. In this article, DCL identifies some common problems found in PDF-to-ePub conversions and explains how they can be best averted. (4/2010)
http://www.dclab.com/blog/2010/04/problems-and-sol... - CONVERTING FROM PDF TO XML & MS WORD: AVOIDING THE PITFALLS - WHITE PAPER, PART 2
Mike Gross, DCL's CTO, discusses the issues surrounding converting from PDF. This month he covers the issues related to specific target formats, including MS Word, HTML, XML/SGML, and RTF. (11/2003)
http://www.dclab.com/converting_from_pdf2.asp - SHEDDING THOSE EXCESS BYTES
Scanned documents are notoriously large in size and take a long time to transmit by fax or the Internet. But new image compression software from CVISION Technologies is throwing off excess bytes in a big way. (4/2003)
http://www.dclab.com/cvision_compression_technolog... - DCL Participates In Digital Record
Of A Vanishing People
University of Cincinnati Digital Press has republished on CD-ROM a rare pictorial record of Native Americans during the early to middle 19th century. (9/2002)
http://www.dclab.com/mckenney.asp - Alphabet Soup
Or, What File Format Should I Really Use? (10/2000)
http://www.dclab.com/alphabetsoup.asp
- Adobe's PDF (Portable Document Format) (11/2002)
http://www.adobe.com/pdf - Adobe eBooks central
http://www.adobe.com/epaper/ebooks/main.html







