|
|
The Perils of Converting a Lot of Data In-House. The Volume Problem Most of us know how well Jack fared after he cut the beanstalk. After all, he
walked away with the goose that lays the golden egg. Every morning, another
golden egg would be waiting for him. Those eggs saved him and his mother from
poverty. Before long, they were contented suburban homeowners.
40 days later, he had 1,048,576 geese to take care of and gold was so common
that nobody wanted it. The lesson is simple: volume always complicates matters. Most recipes will
work if you double the ingredients. But try multiplying by 50 or 100 and all
you'll have is a mess in the kitchen and a big room full of hungry people. The SGML Expert High technology is no exception to the problem of volume. Consider Gus, for
example. He is Acme Corporation's resident SGML expert, hired as part of Acme's
initiative to have all of its product documentation stored as SGML. Gus is a
technical wizard. He designed a DTD for Acme in two weeks, and proudly shows off
chapter 1 of the Acme Dustscraper Repair Manual, which he tagged himself in just
one day. A commendable effort, but there are 10 chapters in the Acme Dustscraper
Repair Manual and Acme has 100 manuals. It would take Gus over 4 years to get
all that documentation into SGML. Even if Acme could wait 4 years, they need Gus
for other things. After all, he's crucial to ramping up the rest of the company
to the new SGML system. Gus Days So far we've determined that having Gus convert all the data is unacceptable.
But what are the other options? Well, the work can be divided up among Acme's
staff, or temporary employees can be hired specifically for this project. Before
we make any such decisions, however, it's important to determine just how much
effort is involved. About 1,000 chapters need to be converted. It takes Gus one day to tag a
chapter. We can therefore assume an effort of 1,000 Gus-days (the four years
mentioned above). So, hire 100 Gus's and you'll be done in two weeks. Easy! Except for the volume problem. Where are you going to find 100 SGML experts
who are willing to work for only two weeks? And even if you could, can you
afford to pay 100 people what you're paying Gus? And when you do hire them, how
are you going to get all 100 to tag the data the same way? Everyone will have
his/her own interpretation. The only way to get useable SGML from these experts
is to have Gus train them in his DTD. Ah hah! If you're going to need training anyway, hire unskilled or
semi-skilled workers at one third the cost of Gus. That's fine, but it will take
them three times as long. The point is, what works for low volume doesn't work for high volume. New
solutions are required. Software An automated solution is ideally suited for high volumes of data. The
computer is about 1,000 times faster than Gus. You've finally solved the volume
problem. All you have to do is find or develop software that will completely and
accurately convert your data to SGML. Guess what? You'd have an easier time cloning Gus than getting such a
program. Why? Because this isn't just a conversion. You are adding structure to
your documents, which requires inference and subjective decision-making. The Best of Both Worlds Ah, but surely the computer can do most of the grunt work and then Gus can
fix it up afterwards. Yes, combining automation with expert review seems to be
the best approach. But only if it's done right. If you do enough damage to your car, the insurance company will give you
money to buy another one rather than fix the one you have. Similarly, fixing
cookie-cutter SGML can actually take longer than tagging it by hand. It's clear
that one key to a successful conversion is to automate as much as you can as
cleanly as you can. Here is where Acme makes a frightening discovery: an SGML expert is not a
conversion expert. Gus doesn't know how best to develop or configure a
conversion program. Why should he? That's like asking a race car driver to fix
your car: it's simply a different field of expertise. What Does a Conversion Expert Do? Conversion is not a standard field of knowledge. As far as I know, there are
no degrees available: the most reliable indicator of expertise is a track
record. So, even though there is no universally accepted methodology, I can
cover some guiding principles used at DCL for managing a large conversion. Standardization Large volumes require standardization to prevent chaos. Otherwise, different
interpretations will generate inconsistent results. DCL implements
"conversion specifications," which detail every element in a document
and how it should be coded in the new format. These specifications are used as a
standards document throughout the project. Also, DCL uses a project team
approach, with one data analyst per project. This analyst is solely responsible
for interpreting how data should be coded. All exceptions to the written rules
are brought to him. Even details such as file naming conventions are
standardized, because the smallest discrepancy can snowball at large volumes. Customized Software One key to successfully using conversion software
is to customize it. DCL has developed its own suite of conversion filters that
it configures to the specifications of each project. It has even created its own
generic intermediate formats. These robust "hub" formats divide the
conversion in half so that changes in specs require only partial rework of data
that's already been converted. Quality Control As discussed earlier, it is crucial to minimize the amount of cleanup
necessary after the conversion is finished. While it is true that DCL's editors
know nothing about Acme Dustscrapers, they know plenty about SGML (and all the
other standard electronic formats). These editors parse the new SGML and then do
a "format review." This second review is necessary because parsed SGML
is not necessarily correct SGML. The SGML is filtered into a viewing package. Tags, which require slow,
tedious checking, are converted to visual cues. It then becomes immediately
apparent to an editor if something is tagged right or not, simply by comparing
it to the original hard copy. Customer Feedback The most critical element of quality control is customer feedback. DCL keeps
the entire conversion process open to Acme, so that a misunderstanding doesn't
result in thousands of mistagged pages. Normally, two samples are provided to
the customer before the volume work begins. These samples, along with the
conversion specifications, must be approved by the client at the start. Once the conversion is underway, partial deliveries are sent to the client as
they are completed. This is more than just checking DCL's work. "Live"
data gives Acme a better understanding of how it will best implement new data on
its new system. Experience For most companies, conversion is a rare occurrence. Therefore, no past
experience exists to provide guideposts and warning signs. DCL has converted
millions of pages to and from every major format. Which brings us to our
conclusion. No Surprises Perhaps the most pernicious problem of large volumes is that the work
involved is impossible to predict. In other words, even if you do budget for all
the Gus days you think you need, you might very well need more. This could lead
to disgruntled workers and even more disgruntled executives. DCL has learned, through experience, to make its process flexible enough to
stay on schedule. Problems are either avoided or prepared for in advance.
Potential concerns are brought to the customer before they multiply. To put it
simply, you can get away with a little sloppiness when you have one goose, but a
million geese demand serious attention. Your company is not set up to be a conversion house. I recommend you hire
someone who is. Otherwise, you just might lay an egg. Want more information on this topic? Click here!
|
|
|
|
|
|
|
|
|
|
| |||||||||||||||||||||||||||||||||||||||||||