|
||||
| DCLab.com | About DCL | Tech Info | Press Info | Contact Us | DCLNews | Partners | Wiki | Client Area | ||||
|
Part Two in a Series on SGML to SGML ...
ABSTRACT: As the Internet evolves and information becomes increasingly valuable, so too does the competition for revenues from that information. As publishers and societies seek new ways to yield revenues (and maintain existing ones), repurposing of data for licensing reasons becomes a critical part of the overall strategy. To that end, SGML/XML is the way to go – but which SGML? And what about multiple SGMLs? In our last article, we discussed some of the problems that typically occur in SGML to SGML conversions, making them more complex. In this article, we’d like to discuss some ways you can set up your initial DTD so as to make it easier to port your content to other DTDs later. 1) Try not to dispose of text in the source material, because you may need to put it back – This is absolutely critical, but we’ve found it to be a recurring issue. As we discussed previously, the level at which a paragraph is decomposed into a series of content tags can vary greatly from DTD to DTD. For instance, if we take the example of bibliographic entries at the end of a scholarly journal article, one DTD may require these entries to be completely decomposed into author, article title, journal title, volume, page range, etc., disposing of the punctuation that exists between components, while another may require just the author’s names and article title be decomposed, leaving the rest of the entry as it appeared in hardcopy. From a structured markup perspective, disposing of the punctuation makes sense (since it contains ‘how it looks’, not ‘what it is’ information). In reality, however, you may need to convert to a less structured output that requires these entries to be fully punctuated. Therefore, if your initial DTD required that this punctuation be removed, you’ll have to write considerably more complicated code in order to convert from the first to the second DTD. The solution is quite simple, but current publishing / markup practices don’t allow for it. Basically, it makes sense to preserve the punctuation inside of special tagging. This way, it’s there if you need it, and it can be ignored if you don’t. Take another example: you’ve got a cross-reference to a figure. The source file might contain text like “see Fig. 5”, or see “Figure 5”. But some DTDs require the actual ‘Fig.’ or ‘Figure’ text to be removed. Here again, if you can hold this information inside of its own tag, it will be there to put back if you’re converting to a DTD that requires this information to be there. 2) Avoid nontraditional approaches – Because you may eventually need to go to another DTD that does not yield as much flexibility as the one you’re currently designing, try to avoid using approaches that may be difficult to implement in other applications. Here are two simple examples, related to special characters. The first is an approach that we’ve seen used to apply tagging around a character in order to produce a diacritical above it, such as <acute>A</acute>, instead of using standard ISO characters where possible, such as Á. Although this tagging approach does allow you to produce more types of diacritical characters, it is likely that a target DTD will not allow an equivalent, making the conversion task much more difficult. Another example is one in which attributes are being used to store actual document text. For example: <chapter title=”Introduction”> .In these situations, special emphasis of formatting is difficult, and we’ve seen DTDs which overcome this by ‘inventing’ emphasis character entities, such as <chapter title=”Purifying H⊂2&esub;O”> Here, the &sub and &esub; are making use of character entities in a nonstandard way, again adding complexity to conversion to other DTDs. In order to yield maximum utility from your data, these types of workaround approaches are best avoided. 3) Try to anticipate how other DTDs may require tagging - If possible, examine other approaches or try to anticipate how other DTDs may require tagging. This will help make the later conversion much easier. For example, some DTDs use empty cross-reference tagging in SGML (id/idref usage), such as see <xref id=”S35”>Rules and Regulations. In this case, the tagging will cause the display of a point that one can click on to go to a cross-reference link. Here, the entire text around the hyperlink will not be highlighted (which may satisfy the requirements of the source DTD). However, If you examine other DTDs or consider other approaches, you may be able to anticipate other cross-reference linking approaches that may require tagging around the whole piece of text to be cross-ref’d (ie: See <xref id=”S35”>Rules and Regulations</xref>. Again, this scenario means that the source SGML tagging uses the first approach (an empty <xref> tag), leaving the conversion software ‘guessing’ how far the <xref> text should cover. This guessing will mean that significant effort will have to go into the coding to optimize the guessing process. We know from experience that some percentage of the time, the heuristic used for the guessing will be wrong. Therefore, we’ll probably have to do considerable manual review of the converted tagging. Clearly, it’s better to plan for this up front, if possible. Doing extra tagging in advance will probably require significantly less effort than doing it manually later. While each SGML to SGML conversion will have its unique set of obstacles to overcome, considering some of these guidelines will help smooth the conversion effort. Sometimes you may not have that luxury. Think of a situation where the source DTD has already been built, and you don’t have the flexibility to change it to help you deal with potential future DTDs. In such a case, you may consider first going to a ‘super-DTD’ that allows for the features you may need later, to convert to other DTDs. Then, it’s pretty easy to ‘dummy’ the SGML down to all of the other DTDs. This approach can be extremely helpful in allowing you to produce final SGML outputs that are less prone to the errors or complications typical when trying to automate SGML to SGML conversions. Michael Gross
Click here to see the preceding "What's The Story?" article on Converting SGML to SGML Click
here to see Mike's "What's The Story?" article - |
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Data Conversion Laboratory, Inc. 61-18 190th St., 2nd Floor, Fresh Meadows, NY 11365 718-357-8700 convert@dclab.com Copyright © 1997-2008 Data Conversion Laboratory, Inc. All rights reserved. |