|
Got XML? Thanks to the efforts of standards groups like the W3C that have diligently supported and created guidelines for data formats being used on the Web, XML (Extensible Markup Language) is now the most commonly accepted syntax for creating documents for use online whether it be for web or internal use. Essentially what XML provides is a way for programs and machines to share data across platforms or portability. But now that you have an XML repository, what are you going to do with it? What and how may you deploy it? Simply having large masses of XML converted data doesn't necessarily mean that the data in this form is even useful. Enter XQuery. XQuery facilitates the ability to extract data intelligently from both real and virtual documents, and thereby enables the interaction between the web world and the database world. Its main benefit is essentially providing the ability to search and access XML files like databases and extract and manipulate data from XML documents or any data source that can be rendered as XML. Norm Walsh, Principal Technologist at Mark Logic Corporation describes how the combination of XQuery and XML will provide the flexibility and functionality required to manage and optimize an XML repository. The first step in leveraging a company's information assets is to transform them into XML. XML provides the open, uniform platform on top of which can be built sophisticated applications to deliver dynamic content across multiple systems enabling better identification and sharing of both external and internal content. XML allows the full richness of content (or "unstructured data", if you prefer) to be maintained while still providing a description of how the data is structured. Different data types use different structures. Just as books have titles, parts, chapters, paragraphs, purchase orders contain dates, addresses, items, and prices while scientific journals have articles, titles, paragraphs, tables, figures and images. You get to decide what structure best reflects the information that your organization needs in its documents. Graphics, media, and other resources can be stored alongside the XML text. One important observation is that while most companies have a wealth of information that either is or could be in XML, analysts estimate that as much as 70% of total corporate data is still unstructured. Moreover, since all content is not the same, finding a solution that calls for all content to fit into a single structure is either impossible to manage or requires a structure so loose that it will contain little useful information. Of course, even if you could fit all of today's information into a single structure, or even a small number of structures, new information would inevitably arrive tomorrow, so flexibility is also a key characteristic for managing data and data structures. XML gives you that flexibility. But no matter how much the virtues of XML are extolled, at the end of the day a big pile of XML is just that, a big pile of XML. XML isn't going to do the job all by itself. After all an organization's content is made accessible, tools are then required that will help take advantage of the content by enabling intelligent access, identification, and ultimately sharing and reuse. Enter XQuery, one of the best tools around to accomplish this. So what is XQuery? Well, "XML Query Language", or XQuery for short, is a World Wide Web Consortium (W3C) specification that provides flexible query facilities to XML content. XQuery extracts data from real and virtual documents and collections both locally and on the Web, providing interaction between the Web world and the database world. It is a standardized way of searching through semi-structured data that is either physically stored as XML or virtualized as XML. The XQuery effort at W3C is lead by the XML Query Working Group, whose purpose is to develop open standards so that XML query evolves in a single direction.
The problem with other programming languages isn't that they aren't able to process XML, it's that they aren't able to process XML efficiently. Data has to be converted from XML to the language's native data structures. Once converted, it must be manipulated with functions that don't understand the underlying model and are, consequently, not always a good fit. This "impedance mismatch" causes confusion and can introduce errors. Finally, the programming language structures have to be converted back into XML. Each of these steps is tedious, time consuming, and introduces the possibility of errors. In a sophisticated application, this process may have to occur several times for each XML resource. On the other hand, XQuery's native data model is XML. XQuery's functions are designed to operate on XML. None of this conversion is necessary and impedance mismatches don't occur. What's more, in the context of an XML Server, the fundamental efficiencies of XQuery are augmented by a powerful database and a sophisticated search engine that, like XQuery itself, have been designed from the beginning to operate uniquely on XML.
Consider a simple example. Suppose you have a large collection of documents that fall under some regulatory control process. You might have thousands of documents that apply to Alabama, thousands that apply to Alaska, etc. However, you need to use this collection of documents to create a new information product that includes regulations about widgets in Massachusetts. Simply selecting all of the documents that apply to Massachusetts isn't useful because they won't all be about widgets. By the same token, a simple full-text search for widgets isn't useful because it'll find documents for every state. What you need is the ability to apply the search to only those documents relevant to both widgets and Massachusetts. The powerful combination of XML and XQuery lets you do exactly this. Additionally, an emerging category of products called XML Servers are also available that index all of the content put into the database, both the markup and the data. That means that selective queries like the one just described can be performed without any a prior knowledge of what sorts of queries you're going to need to perform. With this power in hand, organizations can build and deploy new applications that take information assets in directions never imagined before. Consider these three examples:
One prerequisite for all of these success stories is a large volume of XML content from which to draw material. But what if your company doesn't have a lot of XML? What if it isn't yet willing to commit to an XML-centric work flow right away? The good news is that most organizations today probably have more XML than they realize. As XML continues to grow in ubiquity, we see applications like Microsoft Word and Adobe's InDesign using XML as their native storage format. The best XML Servers, unlike other databases and search engines, can work with any schema. These products are ideally positioned to leverage this new content immediately. What's more, most XML Servers comes with built in conversion tools for other formats such as PDF, which allow content to be found and potentially repurposed without significant intervention. Organizations seeking an open, uniform platform on top of which to build sophisticated, content-centric applications should consider vendor solutions that have XQuery at their core.
DCLNews Editorial
|
|
|||||||||||||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|||||||||||||||||||||||||||||||
|
Corporate office: 61-18 190th Street, 2nd Floor, Fresh Meadows, NY 11365 718-357-8700 |
Copyright © 1997-2010 Data Conversion Laboratory, Inc. All rights reserved. |