Industry Outlook is a regular Data Center Journal Q&A series that presents expert views on market trends, technologies and other issues relevant to data centers and IT.
This week, Industry Outlook talks with Mark Gross, President of Data Conversion Laboratory (DCL), about the importance of digitization and choosing a flexible format for storing content. Mark is a recognized authority on XML implementation and document conversion. Prior to DCL, he was with the consulting practice of Arthur Young & Co. Mark has a BS in engineering from Columbia University and an MBA from New York University. He has also taught at the New York University Graduate School of Business, the New School and Pace University. He is a frequent speaker and writer on the topic of automated conversions to XML.
Industry Outlook: How has digitization evolved, and what is its role today?
Mark Gross: I’ve been in the content-conversion and digitization business for over 35 years. In that time, I’ve seen quite a bit of progress and innovation, as well as a lot of change that directly affects the way I’ve had to approach my business. Spanning those decades, milestones such as the birth of desktop publishing, the emergence of formats such as SGML and XML, increasingly available and inexpensive data communication, and the arrival of big data (along with everything that term can mean) have all advanced, and drastically changed, how people work with content.
Talk of all this progress begs the question of how, in the midst of drastic change, can one’s content keep up while maintaining its integrity and accessibility. The simple answer is digitization. But inside that one-word answer are numerous details and lessons. Digitization is more than taking paper and turning it into text. For the present and most certainly into the future, content needs to be easily findable and accessible across most if not all devices and platforms—but also transformable to meet future needs.
IO: More and more mobile devices are hitting the market. Is it possible to convert content to a format that will suit them all?
MG: It’s difficult to foresee the details of what future technology holds, as new developments arise constantly. But you can “future-proof” your content by marking it up in a structured, standard format, retaining as much “smart” tagging as possible. That way, when mobile technology changes to require new formats, you already have your content in a format that’s easily adaptable.
For example, math is particularly difficult to display uniformly across multiple devices, as not all of them support mathematical rendering in the same way. Although MathML has become a standard on many computers, it remains unavailable on many mobile devices. The easy solution is to capture math as images, but in five years, doing so may be insufficient. That’s just an example, but I believe the safest approach is to have all the sources in an easily convertible markup, even if it means converting the same content in multiple ways—math as images and as MathML, for example. Alternative tagging allows for multiple ways to tag the same content in the same document, allowing the reading application or device to choose which one to display. For example, you may have a video in your content, but when it’s opened on a device that doesn’t yet support video, a message will appear indicating exactly that.
Depending on the type of content, HTML5 may be versatile enough to store the details you need for the future, but with technical data, the more robust XML is likely the way to go. Once content is in these markup languages, it can be transformed to the needed format. Today, that format is likely the current version of EPUB, which has become the open standard format for e-reader platforms. But not all readers use EPUB, and new versions will come along as the technology develops. Even within the open EPUB standard, different platforms may read and display content differently. By maintaining content in a more robust format, you enable automatic generation of multiple formats optimized to a specific platform. For example, Apple has a beautiful way to display footnotes in a pop-up fashion, and standard EPUB3 tagging can handle that approach. But this same tagging in another EPUB reader will display the footnotes a different way that is less readable; for example, you will more likely want to group those footnotes at the end of each chapter if you are targeting the Kindle market.
IO: Isn’t this the same thing as responsive design?
MG: Although the concept is the same, responsive design (RWD) is specific to web sites, whereas XML and semantics are for tagging across all types of content. Once content is in XML, it can be used for websites, different reader platforms and even print. Sites designed with RWD are flexible and adapt sensibly to whatever platform or device is viewing them.
IO: How does converting content to a mobile-friendly format translate to new revenue streams for a B-to-B company?
MG: Content often has a long lifespan, but the format doesn’t. If you had stored your content in some of the formats that proliferated a decade or two ago, you would have needed to make major investment to take advantage of the new technology that gave you new opportunities. Moving your content to a robust, flexible format allows you to retain your content assets in a form that lets you quickly create new products and distribute to new audiences (or to old audiences with new devices and new capabilities). For maximum flexibility, we suggest to most of our customers that they maintain their content in XML, the lingua franca of the content world. Maintaining content in XML allows you to convert your content more easily to almost any new technology standard and to new target audiences. Doing so in a modular fashion, which XML facilitates, allows you to create new products by enabling content reorganization, so you can pick and choose the materials that are of interest to new audiences.
IO: What about companies that produce highly complex content—mathematical equations, schematics and so on? How can they ensure accuracy of their results?
MG: Mobile technology today is seldom up to the task of fully supporting complex content such as math and tables, which is why for complex content it’s particularly important to implement a more robust tagging scheme such as XML. You may need to temporarily “dumb down” your content to fit the specific mobile device (for example, instead of the actual math formulas, you may require images of the formulas), but XML will still retain the original materials with all their tagging and allow you to quickly take advantage of new technology as it becomes available.