|
GUEST ARTICLE
Why
MathML Adds Value To STM Publishing By
Paul Topping, President of Design
Science, Inc.
|
Over the last decade, several attempts have been made to come up with a
structured markup to adequately represent mathematical equations. MathML, an
XML tagging vocabulary, has received a significant amount of attention and
support over the last few years as a suitable markup language for
mathematics.
In this article, Paul Topping, President of Design
Science, Inc, highlights
the benefits
of using MathML to represent mathematical equations and reveals
how useful it can be when carried through to the web
browser.
|
Introduction
STM (Scientific, Technical, Medical) publishers, like all for-profit
businesses, are looking to increase profits by either cutting costs, increasing
sales, or both. To cut costs, they look to the application of improved computer
and software technology (eg, introducing XML-based workflows). Increasing sales
is a tougher problem requiring a creative solution. Product differentiation is
harder in STM publishing than in most other business areas because competitors
have access to the same authors and subject knowledge, while the print medium
does not allow much scope for adding value.
The presence and success of the World Wide Web changes everything. With its
ability to deliver information in many varied forms, it has the power to break
the value addition logjam. Initially, STM publishers have taken advantage of the
new medium by simply using it to avoid the substantial cost of paper and
printing. However, this approach does not add much value from the reader's
perspective. Yes, a tree is saved but surely the web has greater potential than
that!
In this paper, we will show how value can be added by enriching online text
with other kinds of information — mathematical meaning, in particular. While
other media, such as video and sound, may be used to add value to web
publishing, they are costly to produce and are not necessarily of interest to
scientists and engineers. Authors normally do not include such media with their
work and, therefore, STM publishers are solely responsible for adding them to
the product. They usually do not have expertise in these technologies and must
outsource such work, making it even more costly. On the other hand, since STM
authors do supply mathematics with their work, incorporating mathematical
meaning in web content mostly requires publishers to simply adjust their
workflow so as not to discard it.
A New Medium: HTML+MathML
Math is missing from HTML ...
The most important document format for web publishing is, of course, HTML.
While other media (eg, graphics, video, sound) can be embedded in web pages and
other kinds of documents (eg, PDF, spreadsheets, word processing documents) can
be delivered via the web, HTML is the glue that binds them all together.
Although great strides have been made since the invention of HTML to add more
powerful formatting facilities, HTML still has no facilities for formatting
MathML notation. This is a problem for STM publishers as the presence of
mathematical notation is one of the unique characteristics of STM content.
... until the invention of MathML!
The World Wide Web Consortium (W3C) [1] sets
most of the standards for the web. In 1997, the W3C's Math Working Group
finished the MathML 1.0 Specification (superceded in 2001 by MathML 2.0 [2]).
MathML is one of several XML-based languages intended by the W3C to extend HTML.
To learn more about MathML, visit the W3C's
math home page [3] or see our articles, "MathML for Math and
Science Communication" [4] and "A Gentle
Introduction to MathML" [5].
Although MathML is useful in any XML-based exchange of mathematical
information, it was always the hope of the MathML community to see it displayed
directly in web pages. As most observers of computer technology know, it is one
thing to invent a standard but another to make software vendors support it. Up
until recently, MathML support within browsers has been absent. But two events
have occurred this year that combine to make HTML+MathML a viable platform for
web publishing:
Adding Value to Content
While application of new software technology to the production process has
great potential to cut costs, real savings are notoriously hard to realize.
Adding value to the product, if it can be done, should be more attractive to the
STM publisher.
PDF works but does not add much value
The first stage in publishing's transition to the web is to use some form of
electronic paper. Adobe's PDF [9] as a
content delivery medium exemplifies this approach. Its advantages include
faithfulness to print and the ease with which it can be produced by an existing
print workflow. It is perfect for delivering the electronic equivalent of print
journals and books.
As good as PDF is, it has its disadvantages:
- Its faithfulness to print makes it harder to read online. Its columns of
text do not reflow to adjust for different browser environments.
- The Acrobat Reader takes over the browser window making PDF content less
integrated with other web content.
- The PDF format is limited in its ability to combine non-text media, such
as video and sound.
- Although PDF text can be manipulated by the reader, mathematical notation
must be displayed as graphics, limiting its usefulness to the reader.
The bottom line on PDF is that, while it is cheap to produce and duplicates
print media well, it does not take full advantage of the potential represented
by the web.
STM readers (scientists, engineers, researchers, and educators) like to
share
Scientists, engineers, researchers, and educators are the market for STM
publications. Libraries may purchase them but they ultimately serve the same
group. Unlike the readers of novels, whose sharing consists of the occasional
book report, STM readers are driven by the desire to share information. Science
and technology move forward by researchers making small steps and then reporting
them to each other. Although words are by far the main medium for such
information sharing, mathematics is also a key component. In many ways, the
words are there to support the mathematics.
It is surprising, then, that publishers don't try to make the mathematical
part of their content more useful to their readership. The answer, of course, is
that today's STM readers are satisfied with just being able to read the math.
However, just as scientists and engineers have incorporated web searching as a
primary tool, other web-enabled technologies and practices will soon become
essential to their work. The ability to work with mathematics in publications
will soon be important to STM readers.
Copying text is plagiarism, copying math just makes good sense
Although the ability to copy text is an important feature of most of the
computer software we use, copying text from someone else's work is considered
cheating even in the STM world. Although scientists want their ideas
disseminated as widely as possible, they do not want their exact words
duplicated (except in the context of a review, of course). Their attitude toward
copying mathematics is different. Like their ideas, the math is present in the
publication as a base for others to work with and build upon. Although math
displayed as a graphic can be copied, it can't be calculated, analyzed, or
graphed. When MathML is used to display math in the web page, on the other hand,
the meaning of the mathematics is available and all of these operations become
possible. This is the essence of our claim that MathML adds value to STM
publishing.
What is MathML?
MathML is tagged text, just like HTML
The fraction, x/2, is represented in MathML as:
<math>
<mfrac>
<mi>x</mi>
<mn>2</mn>
</mfrac>
</math>
As you can see, MathML is somewhat verbose, like HTML, and, also like HTML,
although it can be typed in directly, MathML is usually created using tools such
as equation editors or converted from some other representation.
MathML comes in two flavors, Presentation MathML and Content MathML
MathML consists of two sub-languages, Presentation MathML and Content MathML.
Both kinds of MathML describe mathematical structure but with differing
emphasis. The two sub-languages can be used separately or together.
Presentation MathML focuses on the formatting aspects of mathematical
notation. The fraction example above uses Presentation MathML. The "mfrac"
element specifies a particular notation, that of two sub-expressions separated
by a horizontal or diagonal bar. Although this notation commonly means that the
first sub-expression is divided by the second, only the notation to be used is
specified by Presentation MathML.
Content MathML focuses on mathematical meaning. The following example uses
Content MathML to express the mathematical operation commonly associated with
the earlier example:
<math>
<apply>
<divide/>
<ci>x</ci>
<cn>2</cn>
</apply>
</math>
Although this operation is commonly expressed in notation as a fraction, only
the mathematical operation is being specified. Alternate notations exist for
this operation (eg, x ¸ 2).
HTML+MathML as a Content Delivery Medium
What does MathML look like in a web browser?
The short answer is "like mathematics", of course. Here is a partial screen
shot of MathML displayed in Internet Explorer using our MathPlayer software:

It is important to note that the non-math text shown above is displayed using
plain old HTML, allowing mathematical notation to be fully integrated into
normal web pages. If that was all MathML could do, it would still be an
improvement over PDF because it doesn't take over the entire page in the
browser. Text and math can reflow to fit the width of the browser window. If the
user set his browser to display text in at a large size, the math will also be
displayed in that larger size.
But what else can my readers do with it?
Above, we claimed that MathML makes it possible for the reader to do more
with the mathematics. Let's look at how this works:

When the reader right-clicks on a MathPlayer-rendered equation, a menu is
displayed. The Copy MathML command copies the underlying MathML of the equation
to the Windows clipboard, ready to be pasted into any program that accepts
MathML. The latest versions of the two major computer algebra systems, Mathematica [10] and Maple [11], both accept MathML via the
clipboard. MathML may also be pasted directly into the reader's favorite HTML
editor for use in new content and into WebEQ [12], Design
Science's popular MathML editor.
MathML at your Command
MathPlayer 1.0 also has a Commands sub-menu on its right-click menu:

In this first version of MathPlayer, the commands are limited to opening the
equation in MathType
[13] or WebEQ
[12], Design Science's own products. These items will only appear if the
reader has the corresponding software product installed on their computer.
In future versions of MathPlayer, we expect to add more commands to this menu
that will allow the reader to directly calculate, graph, and analyze with the
math. Remember, this is already possible with version 1.0 using cut and paste
via the clipboard. Now that MathML can be published on the web, we expect more
software vendors to add MathML support to their products in the near future. We
are working with such vendors, as well as readers and publishers, to help define
future items on MathPlayer's Commands menu.
Conclusions
We feel that STM Publishers can increase the value of their products to
readers by publishing online content in the HTML+MathML format. Scientists,
engineers, educators, and students are driven by a need to share information
with their colleagues. Mathematical notation is an important part of STM
content, perhaps a defining characteristic. Recent advances in web browser
technology have made HTML+MathML a viable delivery medium for STM content. All
that remains is for publishers to take advantage of it.
11/6/2002 Paul
Topping Design
Science, Inc
Related Papers
In a related white paper, MathML
Workflows in STM Publishing [14], we describe how MathML content can be
created, edited, and published in the STM publishing context.
- World Wide Web Consortium (W3C), http://www.w3.org/
- MathML 2.0, http://www.w3.org/TR/2001/REC-MathML2-20010221
- W3C's math home page, http://www.w3.org/Math
- "MathML for Math and Science Communication", http://www.dessci.com/webmath/tech/mathml.stm
- "A Gentle Introduction to MathML", http://www.dessci.com/support/tutorials/mathml/default.stm
- Netscape 7.0 PR 1 (Preview Release 1), http://channels.netscape.com/ns/browsers/7/default.jsp
- Mozilla open-source browser project, http://www.mozilla.org/
- MathPlayer, http://www.dessci.com/webmath/mathplayer/
- Adobe's PDF (Portable Document Format), http://www.adobe.com/pdf
- Mathematica, http://www.wolfram.com/
- Maple, http://www.maplesoft.com/
- WebEQ, http://www.dessci.com/webmath/webeq/features.stm
- MathType, http://www.dessci.com/features/win/default.stm
- "MathML Workflows in STM Publishing",
http://www.dessci.com/features/white_papers/mathml_workflows.htm
*PLUS* Read more on STM
Publishing and XML
at
DCL Library
|