Data Conversion Laboratory, Revolutionizing Publishing for the Digital Age 
  DCLab.com | About DCL | Tech Info | Press Info | Contact Us | DCLNews | Partners | Wiki | Client Area     
menu
Data Conversion Lab

About DCL
  Why go to DCL?
  Clients
  Company Background
  Management
  DCL in the News
  Events
  Mission

DCL News
  Current Issue
  Back Issues
  Subscribe

Technology
  Technology Resources
  FAQ's
  Glossary
  Presentations
  DCL Work Tracking

Press Info

Clients' Area

Contact DCL
  Directions
  Request Estimate
  Positions

Books2Bytes
Popular Pages
* Current Issue of DCLnews
* DCL featured in The Columbia Guide to Digital Publishing
* Slash Document Costs
* Ann Rockley on ROI in CM
* PDF Resources
* XML Conversion Resources
* Roundtrip Document Conversion
* DCL Resources Library
*

Converting Legacy Data...

*

Aviation & Aerospace

*

PDF Conversion to XML & MS-Word

*

PDF Conversion

*

Quark to XML

* Getting Content into XML
Fact Sheets
* Public Access for Research Materials
* S1000D Conversion
* Content Reuse Assessment
* Document Conversion
* SPL - Pharmaceutical Industry
* Harmonizer™
* Jeppesen Map Revision Service
Technical Papers
* Why STM Publishers Should Use XML...
* Department of Defense and the Power of XML
* Your Data in XML
* SGML to SGML 1
* SGML to SGML 2
* Quark to XML
* Plan Ahead
* Do it Yourself?
* Encyclopedia
Presentations
* Conversion to XML: Documents versus Data (11/2003)
* Data Migration Considerations  (6/2003)
* Technology for Cost-Containment and Efficiency  (4/2003)
* Converting Textbooks to Meet the National XML Standard for Accessibility  (3/2003)
* More Presentations

Quark to XML Conversion

Over the past several years, Quark Xpress has become one of the more popular desktop publishing packages.  This should be no surprise.  Combining ease of use with powerful features, Quark has brought the publishing process to the desktop.

Other Resources

Converting Quark to XML FAQ

XML Resources

 

Other Formats to XML

Converting from PDF to XML

Converting from Adobe PageMaker and InDesign to XML

Converting from Word to XML

Getting Your Content into XML

But if the previous decade belonged primarily to Quark, this one surely belongs to XML.  And Quark to XML conversion has become an issue. While Quark remains the desktop package of choice, the Internet and the world of e-commerce are already dictating that mark-up languages, and XML in particular, become de-facto.

The Web has revolutionized information delivery, and the publishing industry has had to adapt quickly.  Naturally, many people, who've already published books, journals, and technical documentation in Quark, are now looking to convert these documents to XML.  Converting Quark to XML presents several challenges.  While formats like Interleaf and Framemaker each support rich 'ASCII'  formats that accurately represent the entire document, Quark does not.  In fact, Quark's native file format is proprietary.  And while the software includes a capability to export to a format Quark calls 'Express Tags', you are limited by an ability to export one story at a time.  And while there are several commercial plugins that attempt to allow you to export an entire document at one time, getting all the stories out along with accurate graphics information, can still prove difficult.  Our experience has shown us that you either get the information out incorrectly, or you don’t get parts of it out at all.  Either way, that’s not going to prove acceptable and manual intervention of some kind will be required.

But that’s not all you have to contend with.  Perhaps the biggest problem with going from Quark to XML is converting tables.  When you convert documents to XML, you’ll ideally want to convert all tables in the source document to a table structure (such as HTML or CALS table structure).  Unfortunately, the Quark program itself does not include a table editor.  This means that in order to simulate tables, many people simply use tabs and frames to achieve that look.  This gets the job done for print purposes, but it’s not really a great solution.  And it can cause tremendous problems when you decide to put your materials on the web and you need to convert.  What it means in terms of conversion, is that you don't necessarily know what’s actually a table, and what is not.  What looks like one on the printed page may turn out to be nothing more than a bunch of text separated by tabs, spaces and forced spans.  And if the materials were authored or formatted by multiple users at multiple locations (and they frequently are), everyone will have been making their own inconsistent decisions.  What you’re left with effectively, is a “house of cards.”  And, if you’re building software to help automate a large conversion project, you’re stuck attempting to 'guess' at what is a table, as well as what the structure of that table really is.  The result?  While logic is king in the world of conversion programming, you’ll end up needing to apply that logic to files that were often formatted without any logic at all!

Over the years, DCL has built a suite of filters that help deal with these issues.  By analyzing the text and tab structures in the input file, along with the use of specific Quark style names, our process can get us much of the way to where we want to be.  Our methodology works quite well for simple and medium tables in Quark, but for things like complicated tables or badly styled materials, we’ve learned to anticipate post-software manual cleanup.

It should also be noted that there are plug-ins to Quark (such as Tableworks) that let you build tables within Quark as a true table structure.  The advantage is that you’ll end up with more structured Quark files.  Unfortunately, these plug-ins still don't allow you to export the table structure, so even in these cases, you’ll end up playing the 'guessing' game.

The Last Word?

Quark has recently announced a tool (called Avenue) that attempts to export from Quark to XML.  I believe that this tool will be usable for very simple documents, or ones that are particularly well styled in the input file.  Is it the ultimate solution?  Probably not.  Based on our experience with conversion from Quark, it’s still likely that tricky conversion features, such as cross-referencing, special characters, and tables, will be hard to do with general purpose tools, and will still need a customized conversion.

Michael Gross
Director of Research & Development
Data Conversion Laboratory
Phone: 718-357-8700 x 236
Fax: 718-357-8776
mikegross@dclab.com

  Structured Product Labeling

Content Reuse

Subscribe

Books2Bytes

DCL Library

Columbia Guide
GSA Schedule
AIA Member
DCL Calendar

Ultramain User Conference 2008, Albuquerque, NM, May 11-15, 2008. More…

PTC User Long Beach, CA, June 2-4, 2008. More…

Mark Logic User San Francisco, CA, June 10-12, 2008. More…

X-Pubs London, England, June 22-24, 2008. More…

Doc Train Life Sciences Indianapolis, IN, June 23-25, 2008. More…

Best Practices Santa Fe, NM, September 15-17, 2008. More…
XyUser Phoenix, AZ, September 22-24, 2008. More…
9th Annual Vasont Users' Group Meeting, Hershey, PA, October 6-8, 2008. More…

DITA/TECHCOMM 2008, Raleigh, NC, November 3-6 2008. More…

ATA e-Business Europe. Details TBA.

 
DCL Calendar

Documentation and Training West 2008 Vancouver, BC, May 6-9, 2008. More…

 
Recent News

CMS/DITA Santa Clara, CA, April 7-9, 2008. More…

DIA Med Comm Orlando, FL, March 10-11, 2008. More…

DIA EDM Philadelphia, PA, February 5-7, 2008. More…

Gilbane Boston Conference Boston, MA, November 29, 2007. More…

The LavaCon Conference on Advanced Technical Communication and Project Management New Orleans, LA, October 27-30, 2007. More…

2007 ATA e-Business Forum Miami, Florida, Oct 17-19, 2007. More…

DITA 2007™-East, Raleigh, North Carolina, October 4-6, 2007. More…

2007 XyUser Group Fall Conference, Boston, MA, Sept 23-26, 2007. More…

Mark Logic 2007 User Conference, San Francisco, CA, May 15-17, 2007. More…

Content Management Strategies/DITA North America Conference 2007, Boston, MA, March 26-28, 2007. More…

DIA 18th Annual Workshop, San Diego, CA. March 4-7, 2007. More…

DIA 2007 EDM & CDM Conference, Philadelphia, PA, Feb 6 - 8, 2007. More…

DITA 2007 – West, San Jose, CA, February 5-7, 2007. More…

Framemaker 2006 Chautauqua, Austin, TX, Nov 8-10, 2006. More…

PTC/User World Event 2006, Grapevine, TX, June 4-6. More…

19th Annual DIA Conference Philadelphia, PA, February 7-9. More…

XyUser's Conference, San Diego, California, September 11-14. DCL's Don Bridges delivered a presentation on "Content Reuse" More…

Structured Product Labeling, Washington, DC, August 23-24. More…

Tri-XML 2005, Raleigh, NC , July 28. DCL's Don Bridges delivered a presentation on "Content Reuse" More…

Pharmaceutical Labeling and Product Identification, Whippany, NJ, June 16-17. DCL's Don Bridges delivered a presentation on "Structured Product Labeling (SPL) and the Implications of Implementing an XML Solution." More…

More…

Data Conversion Laboratory, Inc.   61-18 190th St., 2nd Floor, Fresh Meadows, NY 11365   718-357-8700   convert@dclab.com

Copyright © 1997-2008  Data Conversion Laboratory, Inc. All rights reserved.