|
||||
| DCLab.com | About DCL | Tech Info | Press Info | Contact Us | DCLNews | Partners | Wiki | Client Area | ||||
|
Installing a new content management system is only half the story; the other half is loading the content you’re going to manage. DCL’s Don Bridges reports. Congratulations! You’ve decided on a new documentation system. Single source publishing here we come. Whether you’re in the early implementation stages, or already installed, there are still details to resolve, a big one is how to upload the content you want to manage. First, the hard decision: How much of your old materials do you really need? And as in every move, the more you can toss out, the easier the move. But if you’ve been in business for a while, and have ongoing projects, it would be nice to get all your live materials under control of that new CMS. This needs some hard decisions on which materials you really need and which ones you’ll never use again. Once you’ve decided what you need, the next step is to examine your options for getting the materials from your old system into the new one. What’s the best approach? How long will the conversion take? And what will it cost? Do you do it yourself, or outsource? You may want to take a leaf out of Benjamin Franklin’s book when considering what is the best option for you. He used to list the issues and the pros and cons of each, and then he would make his decision, giving the most important issues precedence.
In our experience, the key issues to consider are:
Quality The quality of a conversion depends on how accurate it is and how it deals with ambiguous elements such as tables and footnotes. Quality can be divided into two areas: Textual accuracy and Tagging accuracy. Textual accuracy is an indication of the fidelity of the words that are presented. This is often an issue when the source materials are hardcopy or paper (or PDF Page Image). Textual accuracy suffers when data has to be rekeyed (or OCR’d), because humans and computers can make mistakes. The typical standard for hardcopy conversion is 99.95% textual accuracy. While this is very good, it also means that one out of every 2,000 characters can be wrong – additional QC and review is frequently a good idea. The story is better for electronic conversions (where the materials are available in a digital format). For these types of conversions, 100% accuracy is often obtainable when the conversion process accounts for all fields and is capable of maintaining all the text properly. Tagging accuracy is more subjective and is an indication of the fidelity to the defined tagging scheme. This one is a lot tougher to measure because of the ambiguities in the schema or DTD, and the fact that the materials were unlikely to have been written to fit the schema. In the XML world there is a temptation to equate tagging accuracy with whether a file parses validly with the DTD or Schema. But the reality is that just because a file parses, it doesn’t mean that it’s correct. How do you maximize tagging accuracy? The most conventional approach is to use a combination of computer scripts to perform an automated conversion, and a manual clean up to fix all of the things that the script did not convert correctly. Most conversion scripts handle easy materials, such as paragraph tags, without a problem. The difficulty comes with tables, multiple column layouts, equations, cross references, graphics, headers/footers, footnotes, and other such complexities. Most scripts have a really tough time getting this right. Another issue comes with contextual tagging (often referred to as content tags or implicit tags). When I write “123 Main Street” almost of us would know that this is an address and that “123” is the street number. But it’s a whole other issue to teach a program to recognize these nuances 100% of the time. Remember, whatever the script isn’t able to tag correctly will have to be cleaned up manually. (This will be a topic of a future newsletter article, to be informed of its appearance please email us at convert@dclab.com or click here to signup.) Schedule The length of time a conversion takes will depend on (a) how good a job the automated conversions do at tagging materials correctly; and (b) how many resources are available to resolve the manual clean-up aspects. When speaking about legacy data conversion, I invariably ask members of the audience, who have tackled a legacy data conversion project internally: “On average, how long did it take to clean-up a page of your materials?” The typical response is four to six minutes per page. And that’s just clean up. That doesn’t include how long it takes to analyze the data, create the scripts, and process the data – which is likely to be at least a few weeks At first glance, that may not seem too bad. But if you have a repository of 5,000 pages, at five minutes per page you can plan on approximately 3 months worth of cleaning up data. And of course the more you have the longer it takes. For 10,000 pages you’ve got a six-month project on your hands. So how does this compare to outsourcing your conversion? Well the process might seem to move slowly at first, because there’s a premium on spending extra time making the automated aspects work as well as possible, but then the process speeds up considerably due to the large labor pool available to clean up the data. In our experience, the average project needs six weeks to work out the process and test it, and then one week for each 3,000-5,000 pages. So for the same 5,000 pages you can typically expect a seven-week effort; and for the 10,000-page project it’s eight weeks. This would make all the difference if schedule is important and you’re trying to hit those implementation goals sooner rather than later. Cost The cost of a conversion project will vary widely depending on the complexity and readability of the material, as well as whether you are converting in-house or outsourcing. Most people expect an internal effort to be cheaper…but this is not usually the case. While there are various startup costs associated with conversion, the cost of the software packages and hardware you utilize is negligible. The real expense comes in your labor costs to clean up the materials, and in your costs to manage the process. If you are doing the conversion in-house, consider that you are taking staff from their usual work; and if you are hiring people for the project there are real costs associated with that too. If you are hiring temps, realize that those already fluent in the intricacies of a given target format are not easily available, and temps who need to be trained will often require significant management resources. In an upcoming issue of DCLnews we’ll introduce a calculator to help you calculate your own relative costs – (to be notified of availability click here convert@dclab.com, to subscribe to DCLnews, click here) Security Many organizations have sensitive or classified materials that they wish to convert. It is therefore vital that, if you are to outsource this material, the organization undertaking the work has strict safeguards in place for sensitive materials. They should also have previous experience with information from the ultra-competitive markets like pharmaceuticals, high-tech or finance; or ultra-sensitive markets like military, legal or nuclear utilities. If you can find a company that has dealt with these kinds of materials, they will understand your concerns and have the safeguards in-place to ensure that your materials are secure at all times. Risk Reducing your risk comes down to the competency and experience of those undertaking the conversion. If this is the first time that a particular type of conversion has been attempted, that’s likely to increase your risk. If you are looking to minimize risks, it helps to use a proven approach. Most organizations that attempt a do-it-yourself approach are driven by either cost or an innate desire to have more control of the process. While these are commendable reasons, what’s often missed is the risk factor. One of the risks is that the process will take longer than predicted and that the amount of manual cleanup might be significantly more than expected. Experience goes a long way towards mitigating risk. If risk is an issue, it’s important to take a candid view of your experience and your vendor’s experience with this type of project. Scalability The final issue is scalability. Can you do a production conversion similar to how you can do a pilot conversion? Sure, you can stay up all night and crunch through a hundred pages with stellar results. But is this the same process you would want to use for thousands of pages? To gain any degree of scalability, you want to incorporate as much automation as you can in the process. And this is where you will want to leverage computers, because they work very fast and very cheap once configured correctly. For you to have a scalable process you will want to have the automated tagging as accurate and complete as possible. The above covers the key factors to consider when making your decisions about how to get a conversion project done. Answering these questions honestly will go a long way towards assuring that your migration to a new content management system will go according to schedule and be free of hitches.
DCLnews Editorial |
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Data Conversion Laboratory, Inc. 61-18 190th St., 2nd Floor, Fresh Meadows, NY 11365 718-357-8700 convert@dclab.com Copyright © 1997-2008 Data Conversion Laboratory, Inc. All rights reserved. |