DCLWiki | Client Area  
DCL  

representational space

   Refer a friend  Email this Page
   Print friendly version Print-Friendly
   Request Information Request Information
   Subscribe  Subscribe

          LinkedInTwitterFacebook

representational space
Services
Content Reuse
Document Conversion
Quality Assurance
Rendering & Publishing
SPL Labeling
Source Formats
   - Word Processors
   - Publishing Systems
   - PDF
   - Other Formats
Target Formats
   - XML & SGML
   - DITA
   - Military DTDs
   - NLM
   - Public DTDs
   - S1000D
   - Other Standards
Other Services »
representational space
Memberships

 

 Part Two in a Series on SGML to SGML ...

 

ABSTRACT:  As the Internet evolves and information becomes increasingly valuable, so too does the competition for revenues from that information.  As publishers and societies seek new ways to yield revenues (and maintain existing ones), repurposing of data for licensing reasons becomes a critical part of the overall strategy.  To that end, SGML/XML is the way to go – but which SGML?  And what about multiple SGMLs?

In our last article, we discussed some of the problems that typically occur in SGML to SGML conversions, making them more complex. In this article, we’d like to discuss some ways you can set up your initial DTD so as to make it easier to port your content to other DTDs later.

1)  Try not to dispose of text in the source material, because you may need to put it back – This is absolutely critical, but we’ve found it to be a recurring issue. As we discussed previously, the level at which a paragraph is decomposed into a series of content tags can vary greatly from DTD to DTD. For instance, if we take the example of bibliographic entries at the end of a scholarly journal article, one DTD may require these entries to be completely decomposed into author, article title, journal title, volume, page range, etc., disposing of the punctuation that exists between components, while another may require just the author’s names and article title be decomposed, leaving the rest of the entry as it appeared in hardcopy. From a structured markup perspective, disposing of the punctuation makes sense (since it contains ‘how it looks’, not ‘what it is’ information). In reality, however, you may need to convert to a less structured output that requires these entries to be fully punctuated. Therefore, if your initial DTD required that this punctuation be removed, you’ll have to write considerably more complicated code in order to convert from the first to the second DTD.

The solution is quite simple, but current publishing / markup practices don’t allow for it. Basically, it makes sense to preserve the punctuation inside of special tagging. This way, it’s there if you need it, and it can be ignored if you don’t. Take another example: you’ve got a cross-reference to a figure. The source file might contain text like “see Fig. 5”, or see “Figure 5”. But some DTDs require the actual ‘Fig.’ or ‘Figure’ text to be removed. Here again, if you can hold this information inside of its own tag, it will be there to put back if you’re converting to a DTD that requires this information to be there.

2)  Avoid nontraditional approaches – Because you may eventually need to go to another DTD that does not yield as much flexibility as the one you’re currently designing, try to avoid using approaches that may be difficult to implement in other applications.

Here are two simple examples, related to special characters. The first is an approach that we’ve seen used to apply tagging around a character in order to produce a diacritical above it, such as <acute>A</acute>, instead of using standard ISO characters where possible, such as &Aacute;. Although this tagging approach does allow you to produce more types of diacritical characters, it is likely that a target DTD will not allow an equivalent, making the conversion task much more difficult. Another example is one in which attributes are being used to store actual document text. For example: <chapter title=”Introduction”> .In these situations, special emphasis of formatting is difficult, and we’ve seen DTDs which overcome this by ‘inventing’ emphasis character entities, such as

<chapter title=”Purifying H&sub;2&esub;O”>

Here, the &sub and &esub; are making use of character entities in a nonstandard way, again adding complexity to conversion to other DTDs.

In order to yield maximum utility from your data, these types of workaround approaches are best avoided.

3)  Try to anticipate how other DTDs may require tagging - If possible, examine other approaches or try to anticipate how other DTDs may require tagging. This will help make the later conversion much easier.

For example, some DTDs use empty cross-reference tagging in SGML (id/idref usage), such as see <xref id=”S35”>Rules and Regulations.

In this case, the tagging will cause the display of a point that one can click on to go to a cross-reference link. Here, the entire text around the hyperlink will not be highlighted (which may satisfy the requirements of the source DTD). However, If you examine other DTDs or consider other approaches, you may be able to anticipate other cross-reference linking approaches that may require tagging around the whole piece of text to be cross-ref’d (ie: See <xref id=”S35”>Rules and Regulations</xref>.

Again, this scenario means that the source SGML tagging uses the first approach (an empty <xref> tag), leaving the conversion software ‘guessing’ how far the <xref> text should cover. This guessing will mean that significant effort will have to go into the coding to optimize the guessing process. We know from experience that some percentage of the time, the heuristic used for the guessing will be wrong. Therefore, we’ll probably have to do considerable manual review of the converted tagging. Clearly, it’s better to plan for this up front, if possible. Doing extra tagging in advance will probably require significantly less effort than doing it manually later.

While each SGML to SGML conversion will have its unique set of obstacles to overcome, considering some of these guidelines will help smooth the conversion effort. Sometimes you may not have that luxury. Think of a situation where the source DTD has already been built, and you don’t have the flexibility to change it to help you deal with potential future DTDs. In such a case, you may consider first going to a ‘super-DTD’ that allows for the features you may need later, to convert to other DTDs. Then, it’s pretty easy to ‘dummy’ the SGML down to all of the other DTDs. This approach can be extremely helpful in allowing you to produce final SGML outputs that are less prone to the errors or complications typical when trying to automate SGML to SGML conversions.

Michael Gross
Director of Research & Development
Data Conversion Laboratory
Phone: 718-357-8700 x 236
Fax: 718-357-8776
mikegross@dclab.com

 


Click here to see the preceding "What's The Story?" article on Converting SGML to SGML

Click here to see Mike's "What's The Story?" article -
Converting Quark to XML.

 
representational space
DCL Library
Articles, fact sheets, presentations and white papers
representational space
Events

CIDM Best Practices Conference
September 13–15, 2010
Hampton, Virginia

Vasont Users' Group Meeting
September 27–30, 2010
Hershey, Pennsylvania

Internet Librarian Conference
October 25–27, 2010
Monterey, California

Journal Article Tag Suite Conference (JATS-Con)
November 1–2, 2010
Bethesda, Maryland

SPARC Digital Repositories Meeting
November 8–9, 2010
Baltimore, Maryland

More Events »

representational space

News
Brill Again Turns to Data Conversion Laboratory (DCL™) for Key Project


DCL and GeerStreet Announce Strategic Partnership


DCL's “Dan Tonkery on the iPad and the Future of Technical Publications” Published in CIDM News


DCL's “Guide to Conversion Cost Variables” Published in Best Practices Newsletter


DCL's “Dan Tonkery on the iPad and the Future of Technical Publications” Translated on German Blog

More News »


representational space
representational space representational space representational space representational space representational space representational space representational space


Corporate office:
61-18 190th Street, 2nd Floor, Fresh Meadows, NY 11365
718-357-8700
Data Conversion Lab
Copyright © 1997-2010  Data Conversion Laboratory, Inc. All rights reserved.