Data Conversion Laboratory, Revolutionizing Publishing for the Digital Age 
  DCLab.com | About DCL | Tech Info | Press Info | Contact Us | DCLNews | Partners | Wiki | Client Area     
menu
Data Conversion Lab

About DCL
  Why go to DCL?
  Clients
  Company Background
  Management
  DCL in the News
  Events
  Holiday Calendar
  Mission

DCL News
  Current Issue
  Back Issues
  Subscribe

Technology
  Technology Resources
  FAQ's
  Glossary
  Presentations
  DCL Work Tracking

Press Info

Clients' Area

Contact DCL
  Directions
  Request Estimate
  Positions

Books2Bytes
Popular Pages
* Current Issue of DCLnews
* DCL featured in The Columbia Guide to Digital Publishing
* Slash Document Costs
* Ann Rockley on ROI in CM
* PDF Resources
* XML Conversion Resources
* Roundtrip Document Conversion
* DCL Resources Library
*

Converting Legacy Data...

*

Aviation & Aerospace

*

PDF Conversion to XML & MS-Word

*

PDF Conversion

*

Quark to XML

* Getting Content into XML
Fact Sheets
* Public Access for Research Materials
* S1000D Conversion
* Content Reuse Assessment
* Document Conversion
* SPL - Pharmaceutical Industry
* Harmonizer™
* Jeppesen Map Revision Service
Technical Papers
* Why STM Publishers Should Use XML...
* Department of Defense and the Power of XML
* Your Data in XML
* SGML to SGML 1
* SGML to SGML 2
* Quark to XML
* Plan Ahead
* Do it Yourself?
* Encyclopedia
Presentations
* Conversion to XML: Documents versus Data (11/2003)
* Data Migration Considerations  (6/2003)
* Technology for Cost-Containment and Efficiency  (4/2003)
* Converting Textbooks to Meet the National XML Standard for Accessibility  (3/2003)
* More Presentations

GUEST ARTICLE

Why MathML Adds Value To
STM Publishing
By Paul Topping,
President of Design Science, Inc.

Over the last decade, several attempts have been made to come up with a structured markup to adequately represent mathematical equations. MathML, an XML tagging vocabulary, has received a significant amount of attention and support over the last few years as a suitable markup language for mathematics.

In this article, Paul  Topping, President of Design Science, Inc, highlights the benefits of using MathML to represent mathematical equations and reveals how useful it can be when carried through to the web browser.

Introduction

STM (Scientific, Technical, Medical) publishers, like all for-profit businesses, are looking to increase profits by either cutting costs, increasing sales, or both. To cut costs, they look to the application of improved computer and software technology (eg, introducing XML-based workflows). Increasing sales is a tougher problem requiring a creative solution. Product differentiation is harder in STM publishing than in most other business areas because competitors have access to the same authors and subject knowledge, while the print medium does not allow much scope for adding value.

The presence and success of the World Wide Web changes everything. With its ability to deliver information in many varied forms, it has the power to break the value addition logjam. Initially, STM publishers have taken advantage of the new medium by simply using it to avoid the substantial cost of paper and printing. However, this approach does not add much value from the reader's perspective. Yes, a tree is saved but surely the web has greater potential than that!

In this paper, we will show how value can be added by enriching online text with other kinds of information — mathematical meaning, in particular. While other media, such as video and sound, may be used to add value to web publishing, they are costly to produce and are not necessarily of interest to scientists and engineers. Authors normally do not include such media with their work and, therefore, STM publishers are solely responsible for adding them to the product. They usually do not have expertise in these technologies and must outsource such work, making it even more costly. On the other hand, since STM authors do supply mathematics with their work, incorporating mathematical meaning in web content mostly requires publishers to simply adjust their workflow so as not to discard it.

A New Medium: HTML+MathML

Math is missing from HTML ...

The most important document format for web publishing is, of course, HTML. While other media (eg, graphics, video, sound) can be embedded in web pages and other kinds of documents (eg, PDF, spreadsheets, word processing documents) can be delivered via the web, HTML is the glue that binds them all together. Although great strides have been made since the invention of HTML to add more powerful formatting facilities, HTML still has no facilities for formatting MathML notation. This is a problem for STM publishers as the presence of mathematical notation is one of the unique characteristics of STM content.

... until the invention of MathML!

The World Wide Web Consortium (W3C) [1] sets most of the standards for the web. In 1997, the W3C's Math Working Group finished the MathML 1.0 Specification (superceded in 2001 by MathML 2.0 [2]). MathML is one of several XML-based languages intended by the W3C to extend HTML. To learn more about MathML, visit the W3C's math home page [3] or see our articles, "MathML for Math and Science Communication" [4] and "A Gentle Introduction to MathML" [5].

Although MathML is useful in any XML-based exchange of mathematical information, it was always the hope of the MathML community to see it displayed directly in web pages. As most observers of computer technology know, it is one thing to invent a standard but another to make software vendors support it. Up until recently, MathML support within browsers has been absent. But two events have occurred this year that combine to make HTML+MathML a viable platform for web publishing:

Adding Value to Content

While application of new software technology to the production process has great potential to cut costs, real savings are notoriously hard to realize. Adding value to the product, if it can be done, should be more attractive to the STM publisher.

PDF works but does not add much value

The first stage in publishing's transition to the web is to use some form of electronic paper. Adobe's PDF [9] as a content delivery medium exemplifies this approach. Its advantages include faithfulness to print and the ease with which it can be produced by an existing print workflow. It is perfect for delivering the electronic equivalent of print journals and books.

As good as PDF is, it has its disadvantages:

  • Its faithfulness to print makes it harder to read online. Its columns of text do not reflow to adjust for different browser environments.
  • The Acrobat Reader takes over the browser window making PDF content less integrated with other web content.
  • The PDF format is limited in its ability to combine non-text media, such as video and sound.
  • Although PDF text can be manipulated by the reader, mathematical notation must be displayed as graphics, limiting its usefulness to the reader.

The bottom line on PDF is that, while it is cheap to produce and duplicates print media well, it does not take full advantage of the potential represented by the web.

STM readers (scientists, engineers, researchers, and educators) like to share

Scientists, engineers, researchers, and educators are the market for STM publications. Libraries may purchase them but they ultimately serve the same group. Unlike the readers of novels, whose sharing consists of the occasional book report, STM readers are driven by the desire to share information. Science and technology move forward by researchers making small steps and then reporting them to each other. Although words are by far the main medium for such information sharing, mathematics is also a key component. In many ways, the words are there to support the mathematics.

It is surprising, then, that publishers don't try to make the mathematical part of their content more useful to their readership. The answer, of course, is that today's STM readers are satisfied with just being able to read the math. However, just as scientists and engineers have incorporated web searching as a primary tool, other web-enabled technologies and practices will soon become essential to their work. The ability to work with mathematics in publications will soon be important to STM readers.

Copying text is plagiarism, copying math just makes good sense

Although the ability to copy text is an important feature of most of the computer software we use, copying text from someone else's work is considered cheating even in the STM world. Although scientists want their ideas disseminated as widely as possible, they do not want their exact words duplicated (except in the context of a review, of course). Their attitude toward copying mathematics is different. Like their ideas, the math is present in the publication as a base for others to work with and build upon. Although math displayed as a graphic can be copied, it can't be calculated, analyzed, or graphed. When MathML is used to display math in the web page, on the other hand, the meaning of the mathematics is available and all of these operations become possible. This is the essence of our claim that MathML adds value to STM publishing.

What is MathML?

MathML is tagged text, just like HTML

The fraction, x/2, is represented in MathML as:

<math>
  <mfrac>
    <mi>x</mi>
    <mn>2</mn>
  </mfrac>
</math>

As you can see, MathML is somewhat verbose, like HTML, and, also like HTML, although it can be typed in directly, MathML is usually created using tools such as equation editors or converted from some other representation.

MathML comes in two flavors, Presentation MathML and Content MathML

MathML consists of two sub-languages, Presentation MathML and Content MathML. Both kinds of MathML describe mathematical structure but with differing emphasis. The two sub-languages can be used separately or together.

Presentation MathML focuses on the formatting aspects of mathematical notation. The fraction example above uses Presentation MathML. The "mfrac" element specifies a particular notation, that of two sub-expressions separated by a horizontal or diagonal bar. Although this notation commonly means that the first sub-expression is divided by the second, only the notation to be used is specified by Presentation MathML.

Content MathML focuses on mathematical meaning. The following example uses Content MathML to express the mathematical operation commonly associated with the earlier example:

<math>
  <apply>
    <divide/>
    <ci>x</ci>
    <cn>2</cn>
  </apply>
</math>

Although this operation is commonly expressed in notation as a fraction, only the mathematical operation is being specified. Alternate notations exist for this operation (eg, x ¸ 2).

HTML+MathML as a Content Delivery Medium

What does MathML look like in a web browser?

The short answer is "like mathematics", of course. Here is a partial screen shot of MathML displayed in Internet Explorer using our MathPlayer software:

It is important to note that the non-math text shown above is displayed using plain old HTML, allowing mathematical notation to be fully integrated into normal web pages. If that was all MathML could do, it would still be an improvement over PDF because it doesn't take over the entire page in the browser. Text and math can reflow to fit the width of the browser window. If the user set his browser to display text in at a large size, the math will also be displayed in that larger size.

But what else can my readers do with it?

Above, we claimed that MathML makes it possible for the reader to do more with the mathematics. Let's look at how this works:

When the reader right-clicks on a MathPlayer-rendered equation, a menu is displayed. The Copy MathML command copies the underlying MathML of the equation to the Windows clipboard, ready to be pasted into any program that accepts MathML. The latest versions of the two major computer algebra systems, Mathematica [10] and Maple [11], both accept MathML via the clipboard. MathML may also be pasted directly into the reader's favorite HTML editor for use in new content and into WebEQ [12], Design Science's popular MathML editor.

MathML at your Command

MathPlayer 1.0 also has a Commands sub-menu on its right-click menu:

In this first version of MathPlayer, the commands are limited to opening the equation in MathType [13] or WebEQ [12], Design Science's own products. These items will only appear if the reader has the corresponding software product installed on their computer.

In future versions of MathPlayer, we expect to add more commands to this menu that will allow the reader to directly calculate, graph, and analyze with the math. Remember, this is already possible with version 1.0 using cut and paste via the clipboard. Now that MathML can be published on the web, we expect more software vendors to add MathML support to their products in the near future. We are working with such vendors, as well as readers and publishers, to help define future items on MathPlayer's Commands menu.

Conclusions

We feel that STM Publishers can increase the value of their products to readers by publishing online content in the HTML+MathML format. Scientists, engineers, educators, and students are driven by a need to share information with their colleagues. Mathematical notation is an important part of STM content, perhaps a defining characteristic. Recent advances in web browser technology have made HTML+MathML a viable delivery medium for STM content. All that remains is for publishers to take advantage of it.

11/6/2002
Paul Topping
Design Science, Inc 

Related Papers

In a related white paper, MathML Workflows in STM Publishing [14], we describe how MathML content can be created, edited, and published in the STM publishing context.

References

  1. World Wide Web Consortium (W3C), http://www.w3.org/
  2. MathML 2.0, http://www.w3.org/TR/2001/REC-MathML2-20010221
  3. W3C's math home page, http://www.w3.org/Math
  4. "MathML for Math and Science Communication", http://www.dessci.com/webmath/tech/mathml.stm
  5. "A Gentle Introduction to MathML", http://www.dessci.com/support/tutorials/mathml/default.stm
  6. Netscape 7.0 PR 1 (Preview Release 1), http://channels.netscape.com/ns/browsers/7/default.jsp
  7. Mozilla open-source browser project, http://www.mozilla.org/
  8. MathPlayer, http://www.dessci.com/webmath/mathplayer/
  9. Adobe's PDF (Portable Document Format), http://www.adobe.com/pdf
  10. Mathematica, http://www.wolfram.com/
  11. Maple, http://www.maplesoft.com/
  12. WebEQ, http://www.dessci.com/webmath/webeq/features.stm
  13. MathType, http://www.dessci.com/features/win/default.stm
  14. "MathML Workflows in STM Publishing",
    http://www.dessci.com/features/white_papers/mathml_workflows.htm

*PLUS* Read more on STM Publishing and XML at DCL Library

 

  Structured Product Labeling

Content Reuse

Subscribe

Books2Bytes

DCL Library

Columbia Guide
GSA Schedule
AIA Member
DCL Calendar

Best Practices Santa Fe, NM, September 15-17, 2008. More…
XyUser Phoenix, AZ, September 22-24, 2008. More…
9th Annual Vasont Users' Group Meeting, Hershey, PA, October 6-8, 2008. More…

DITA/TECHCOMM 2008, Raleigh, NC, November 3-6 2008. More…

ATA e-Business Europe. Details TBA.

 
Recent News

Doc Train Life Sciences Indianapolis, IN, June 23-25, 2008. More…

X-Pubs London, England, June 22-24, 2008. More…

Mark Logic User San Francisco, CA, June 10-12, 2008. More…

PTC User Long Beach, CA, June 2-4, 2008. More…

Ultramain User Conference 2008, Albuquerque, NM, May 11-15, 2008. More…

Documentation and Training West 2008 Vancouver, BC, May 6-9, 2008. More…

CMS/DITA Santa Clara, CA, April 7-9, 2008. More…

DIA Med Comm Orlando, FL, March 10-11, 2008. More…

DIA EDM Philadelphia, PA, February 5-7, 2008. More…

Gilbane Boston Conference Boston, MA, November 29, 2007. More…

The LavaCon Conference on Advanced Technical Communication and Project Management New Orleans, LA, October 27-30, 2007. More…

2007 ATA e-Business Forum Miami, Florida, Oct 17-19, 2007. More…

DITA 2007™-East, Raleigh, North Carolina, October 4-6, 2007. More…

2007 XyUser Group Fall Conference, Boston, MA, Sept 23-26, 2007. More…

Mark Logic 2007 User Conference, San Francisco, CA, May 15-17, 2007. More…

Content Management Strategies/DITA North America Conference 2007, Boston, MA, March 26-28, 2007. More…

DIA 18th Annual Workshop, San Diego, CA. March 4-7, 2007. More…

DIA 2007 EDM & CDM Conference, Philadelphia, PA, Feb 6 - 8, 2007. More…

DITA 2007 – West, San Jose, CA, February 5-7, 2007. More…

Framemaker 2006 Chautauqua, Austin, TX, Nov 8-10, 2006. More…

PTC/User World Event 2006, Grapevine, TX, June 4-6. More…

19th Annual DIA Conference Philadelphia, PA, February 7-9. More…

XyUser's Conference, San Diego, California, September 11-14. DCL's Don Bridges delivered a presentation on "Content Reuse" More…

Structured Product Labeling, Washington, DC, August 23-24. More…

Tri-XML 2005, Raleigh, NC , July 28. DCL's Don Bridges delivered a presentation on "Content Reuse" More…

Pharmaceutical Labeling and Product Identification, Whippany, NJ, June 16-17. DCL's Don Bridges delivered a presentation on "Structured Product Labeling (SPL) and the Implications of Implementing an XML Solution." More…

More…

Data Conversion Laboratory, Inc.   61-18 190th St., 2nd Floor, Fresh Meadows, NY 11365   718-357-8700   convert@dclab.com

Copyright © 1997-2008  Data Conversion Laboratory, Inc. All rights reserved.