1.1 Mathematics and its notation

1.2 Origins and goals

1.2.1 History of MathML

1.2.2 Limitations of HTML

1.2.3 Requirements for mathematical markup

1.2.4 Goals of the MathML project

1.3 Role of MathML on the Web

1.3.1 Existing mathematical markup languages

1.3.2 HTML extension mechanism

1.3.3 Browser extension mechanism

1.4 Overview of MathML

1.4.1 Taxonomy of MathML elements

1.4.2 View markup

1.4.3 Content markup

1.4.4 Combining presentation and content

1.5 MathML in docs

1.6 MathML examples

1.6.1 View markup examples

1.6.2 Content Markup Examples

1.6.3 Examples of mixed markup

1.7 MathML syntax and grammar

1.7.2 XML Syntax Example

2.1 Mozilla & Firefox

2.2 Microsoft Internet Explorer

MathML (Mathematical Markup Language) is an XML-based markup language for mathematical applications. It was developed by the WWW Consortium (W3C) and adopted as a Recommendation. current version is Mathematical Markup Language (MathML) Version 2.0 (Second Edition), approved October 21, 2003.

MathML implements two "viewpoints" of mathematical markup. One of its types is Presentation Markup, which describes the visual form of representation of a mathematical formula. The second is Content Markup, which expresses semantic content.

MathML considers not only the presentation, but also the meaning of the formula elements. A mathematical semantics markup system is also being developed to complement MathML. It's called OpenMath.

1.2.1 History of MathML

The task of presenting mathematical information for computer processing and electronic means of communication arose long before the advent of the Internet. It used to be common practice for scientists to write papers in some form based on ASCII characters and then send them to each other via e-mail. Several mathematical markup languages, notably T E X, were already in widespread use in 1992, before the Web had such a prominent position.

From the outset, the Network has proven itself to be a very effective method make information available to a large number of people. However, even though the World Wide Web was originally conceived and implemented by scientists for scientists, the possibilities for including mathematical expressions in HTML were extremely limited. Currently, most of the mathematical information on the Web is presented in the form of text with graphic images scientific expressions (in GIF or JPEG format) or as entire PDF documents.

The World Wide Web Consortium (W3C) understood that the lack of foundations for scientific communication was a major problem. As early as 1994, Dave Raggett made a proposal to include HTML Math in the HTML 3.0 prototype. At a conference in Darmstadt in April 1995, a round table on mathematical notation was held. In November of the same year, representatives of Wolfram Research put forward a proposal to the W3C team to implement support for mathematics within HTML. The May 1996 meeting of the Digital Library Initiative in Champaign-Urbana played an important role in bringing together many stakeholders. This meeting resulted in the formation of an editorial review board for HTML Math. Subsequently, this working group grew, and in March 1997 was formally re-formed as the first W3C Math Working Group. The second W3C Math Working Group was formed in July 1998.

The MathML project reflects the interests and opinions of various groups of specialists. Much in the development of MathML deserves special mention. For example, this concerns the issue of accessibility, where there were particularly tangible difficulties. T. V. Raman has done a lot of work in this direction. Neil Soiffer and Bruce Smith from Wolfram Research shared their experience in solving mathematical representation problems gained during their work on the Mathematica 3.0 project. Their ideas had an important influence on the structure of the view elements. Paul Topping of Design Science also contributed to the math formatting and editing. MathML has benefited greatly from partnerships with a number of working group members associated with other work on encoding mathematical information in SGML and in the computer algebra communities. These include Stephen Buswell of StiloTechnologies, NicoPoppelier of ElsevierScienceStéphaneDalmas of INRIA (SophiaAntipolis), StanDevitt of WaterlooMaple, AngelDiaz and RobertS. Sutor from IBM, and StephenM. Watt from the University of Western Ontario. Also, MathML has been influenced by the OpenMath project, the work of the ISO 12083 working group, and the work of Stilo Technologies on the DTD fragment for "semantic" mathematics. The American Mathematical Society played a key role in the development of MathML. Among other things, representatives of this organization became chairmen of both W3C Math Working Groups. From May 1996 to March 1997 the band was led by Ron Whitney. Patrick Ion co-chaired the group from March 1997 to June 1998 with Robert Miner of The Geometry Center and from July 1998 with Angel Diaz of IBM.

The goals of MathML development require a flexible and extensible mathematical notation system that allows interaction with external programs and high-quality display in various information environments. But any markup language that satisfies all these requirements is quite complicated.

At the same time, for many user groups, such as students, it is important to have an easy way to include mathematical expressions in web pages. Similarly, for other groups, for example, for users of the T E X system, best solution there would be a system that would allow for direct incorporation of markup into web pages using a language like T E X. In general, different groups of users require different forms of input and output that best suit their needs. Therefore, ideally, a system for placing mathematical documents on the Web should provide both specialized services for input and output, as well as general services for information exchange and display in various information environments.

In practice, a review of what a mathematical standard should provide in a network for specialized and general needs leads to the idea of ​​a layered architecture. The first level includes standard powerful tools for the exchange, processing and display of mathematical data. The second level includes specialized tools designed for specific user groups, with which you can easily encode mathematical information for distribution to a limited circle of users.

MathML is designed to mark up mathematical information at the lower, more general level of a two-tier architecture. This involves marking up a complex notational and semantic structure in a strict, regular form that is easy to process by display, search and indexing tools, and other mathematical applications.

As a consequence, MathML markup is not intended to be used directly by authors. MathML is human-readable, which helps a lot with debugging, but in all but the simplest cases, it's too complex for manual coding. Instead, authors will have to use special formula editors, converters, and other specialized software for creating MathML documents. Alternatively, some rendering programs and math document support systems can convert other input formats to MathML on the fly.

In some ways, MathML is similar to other low-level communication formats, such as the PostScript language developed by Adobe. You can create PostScript files different ways, depending on your needs: experts create and edit them manually, authors create them with word processors, designers - illustrators and so on. If you have a PostScript file, you can distribute it to a very large audience because PostScript display devices such as printers and screen viewers are widely available.

One of the reasons for the development of MathML as a markup language of the general communicative level is to stimulate the development of mathematical software Top level networks. MathML is a way of coordinating the efforts of software module developers to create and display mathematical material. Simplifying the development of functional parts big system, MathML can stimulate the development of programs that will be very useful to potential users.

Authors can create MathML documents using the tools best suited to their needs. Students may prefer visual editors formulas, with the ability to save MathML markup blocks in an XHTML file. Researchers can use computer algebra packages that automatically encode mathematical information so that colleagues can take it from a web page and process it. Academic journal publishers can use a program that converts T E X markup to HTML and MathML. Regardless of the method of creating a web page containing MathML, all the benefits of a common communication layer become available. Different programs that work with MathML can be used on the same document to output it to speech and print, as well as to enter into the computer algebra system and to manage it as part of a large archive of web documents. For high-quality printing of mathematical documents in the MathML format, the reverse conversion to standard systems layouts, including T E X, which is specially created for this purpose. Finally, it can be expected that MathML will eventually be integrated into other areas where mathematical formulas occur, such as spreadsheets, aggregation packages, and engineering tools.

The W3C Math Working Group is collaborating with various software companies to ensure that various MathML tools are available soon, including both document generation and display tools. A current list of programs that work with MathML is located on the Math World Wide Web Consortium page.

The original concept of HTML Math was simply to extend the set of HTML tags and thus provide direct interpretation in the browser. Even before that, however, the exploding growth of the network made it clear that a global extension mechanism was required, and mathematical information is only one of the types of structured data that can be integrated into the Web using such a mechanism.

Given that MathML is to be integrated into the Web as an extension, it is very important that MathML and programs that use it can interoperate well with the existing web environment. In particular, MathML must be designed with three types of interaction in mind. First, for mathematical content creation, it is important that existing mathematical markup languages ​​can be converted to MathML and that the ability to create MathML documents be added to existing editors. Secondly, it should be possible to embed MathML markup in HTML markup as an addition to it, in which case, in the future, it will be available to browsers, search engines and all types of web applications that now work with HTML. And finally, it must be possible to display MathML embedded in HTML, modern browsers even if the result is far from ideal. With the transition from HTML to XHTML, all of the above requirements will become even more necessary.

The World Wide Web is completely international. Mathematics is a language used throughout the world. Mathematical notation in science and technology is closely related to national languages. The W3C aims to be a constructive force in bringing communication to the world. Therefore, MathML developers faced the problem of internationalization. This version of MathML is not known to be incompatible with left-to-right languages. Left-to-right notation is standard in MathML 2, and it is clear that the need to write mathematical formulas in texts in some national languages ​​has not yet arisen. The so-called "bi-directional technology" is only being developed, and best support formulas in this context is a task for future developments.

1.7.1 MathML syntax and grammar

MathML is based on (Extensible Markup Language), which means its syntax follows the rules of XML syntax, and its grammar is defined by DTD (Document Type Definition). In other words, the details for using tags, attributes, entities, and everything else are defined in the XML Language Specification, while the details about MathML elements and attributes, element nesting, and so on, are defined in the MathML DTD.

The W3C, in an effort to increase the ease and flexibility of using XML on the Web and to support the creation of modular XML applications, has found that the basic form of a DTD is not flexible enough. Therefore, a W3C working group was formed to develop XML Schemas, which are specification documents and should replace DTDs. MathML 2.0 is designed to enable mathematicians to take full advantage of emerging Web technologies. So there is a schema for MathML.

MathML also defines syntax and grammar rules in addition to general rules, which it inherits as an XML application. These rules allow MathML to represent significantly more information than is possible with pure XML, without introducing a large number new elements and use significantly more complex DTDs or schemas. Of course, the disadvantage of introducing MathML specific rules is that the generated documents cannot be processed by XML processors and validators.

There are two main types of additional MathML grammar and syntax rules. The first type involves setting additional criteria for attribute values. For example, in pure XML, it is not possible to require an attribute's value to be a positive integer. The second type of rule defines more detailed restrictions on child elements (such as their order) than those given in DTDs or even schemas. For example, XML cannot specify that the first child element should be treated differently from the rest.

1.7.2 XML Syntax Example

Because MathML is based on XML, the MathML specification uses XML terminology. XML data consists of Unicodes characters (which include normal ASCII characters), entity references (informally called entities) such as<, которые обычно представляют расширенные символы, и элементы, такие как x.

Elements often contain other XML data, called their "content" or "body", between their "open" and "end" tags, just like in HTML. There are also "empty elements" such as Where the opening tag ends with a /> to indicate that the element has no content or end tag. The opening tag must contain named options, called attributes, such as fontstyle="normal" in the example above.

Since uppercase and lowercase letters are distinguished in XML, MathML element and attribute names are case sensitive. For readability, the MathML specification defines most of them in lowercase.

In a formal discussion of XML markup, a distinction is made between an element such as mrow and the tags that define it. and. What's between the tags and, is called the content or body of the mrow element. An "empty element", such as none, has no body and is defined by a single view tag . This specification will not emphasize this distinction between tags and elements. For example, we will sometimes refer to elements and , meaning the element that these tags belong to. This is done so that element references are distinguished from attribute references. However, the terms "element" and "tag" will be used in strict compliance with XML terminology.

1.7.3 Child elements versus arguments

Many MathML elements require a certain number of child elements, or give additional meaning to child elements in a particular position. As noted above, this type of constraint is specific to MathML and cannot be specified using the XML syntax and grammar. When a child element of a given MathML element satisfies these additional conditions, we will talk about it as argument and not about child element to emphasize the specificity of its use. Note that the term "argument" is used in this technical sense, unless otherwise noted.

Some elements have different requirements for the number or type of arguments. These additional requirements are described for each specific element.

1.7.4 MathML attribute values

According to the XML language specification, element attributes must be specified in one of the following forms:

attribute-name="value"

attribute-name="value"

where the spaces around the "=" sign are optional.

Attribute names appear in monospace font in the text of the specification, as do examples.

Attribute values, which in MathML can be a string of arbitrary characters, must be enclosed in double ("") or single ("") quotes. An attribute value may contain a type of quotation mark that is not used to enclose the entire value.

MathML uses a more complex syntax for attribute values ​​than the general XML syntax specified by the MathML DTD. These additional rules are for MathML applications and it is a MathML bug to violate them, but they cannot be tracked by XML processors. The value syntax for MathML elements is defined in an attribute table and is followed by a description of each element using the notation described below. When a MathML application processes attribute values, all spaces, except those that separate individual words or numbers, are ignored. Character data can be included in attribute values ​​directly or using reference entities.

In particular, the characters ", ", & and< могут быть включены в значения атрибутов MathML (когда это разрешено синтаксисом) с использованием сущностей ",",& и <, соответственно.

MathML DTD, declares the types of most attribute values ​​as a CDATA string. This allows you to increase compatibility with existing SGML and XML based software and expand the list of predefined values. The same reasoning applies to XML schemas.

1.7.4.1 Syntax notations used in the MathML specification

To describe the MathML-specific syntax for valid attribute values, this document uses the following conventions and notations.

Notation What does
number decimal integer or rational number (string of digits with one decimal point), possibly beginning with a "-" sign
unsigned-number decimal integer or real number, unsigned
integer decimal integer, possibly starting with a "-" sign
positive-integer decimal integer, unsigned, not 0
string arbitrary string (always the full value of the attribute)
character single non-whitespace character or referring MathML entity; possibly separated by spaces
#rrggbb color in RGB format; The three pairs of hexadecimal digits in example #5599dd define the proportion of red, green, and blue on a scale of x00 to xFF that produces a bright cyan.
h-unit horizontal length unit (allowed units are listed below)
v-unit vertical length unit (allowed units are listed below)
css-fontfamily
css-color-name explained below, in the subsection on CSS
other words in italics explained in the text, separately for each attribute
form + one or more "form" instances
form* zero or more instances of "form"
f1 f2...fn one instance of each form, consecutively, optionally separated by whitespace
f1 | f2 |... | fn any of the above forms
optional "form" instance
(form) same as just form
unmarked words words included verbatim in attribute values ​​(if not part of an explanatory phrase)
characters in quotes characters verbatim included in the attribute value (for example, "+" or "+")

Priority of the operation, from highest to lowest:

form + or form *

f1 f2... fn (shape sequence)

f1 | f2 |... | fn (one of the forms)

Type of string may contain arbitrary characters that are defined in the XML CDATA attribute values. There are no syntax rules in MathML string may be part of an attribute value, not the entire value.

Adjacent keywords and numbers in attribute values ​​must be separated by whitespace characters, except for unit identifiers following the numbers (as specified in the h-unit and v-unit character syntaxes). Whitespace characters are not required, but are allowed between any of the tokens listed above, except (for CSS compatibility) immediately before unit identifiers, between "-" signs and numbers, between # and rrggbb or rgb.

Values ​​for numeric attributes that specify dimensions and must be dependent on the current font may be specified in font-associated units or specified absolute units (described below). Horizontal dimensions are usually specified in em, and vertical dimensions in ex. The em or ex identifiers immediately follow the number. For example, horizontal indents from the "+" operator are usually specified in ems, although other units may be used. Font-related units are preferred over absolute units, as they allow you to increase or decrease the size of the rendered element based on the current font size.

For most numeric attributes, the possible values ​​are limited to some subset, other values ​​are not errors (unless otherwise noted) but are rounded up or down by the mapper to the nearest valid value. The set of valid values ​​may depend on the renderer and is not defined by MathML.

If a numeric value, according to the attribute syntax, can contain a minus sign ("-"), such as number or integer, then it is not an error to use it when negative values ​​are not significant. Instead, the value must be processed by the application as described in the previous paragraph. Explicitly specifying a plus sign ("+") as part of a numeric value is prohibited, except when specifically stated in the syntax (as "+" or "+"), and its presence may change the meaning of the attribute value (as described in each of these attributes).

The h-unit, v-unit, css-fontfamily, and css-color-name symbols are discussed in the following subsections.

1.7.4.2 Attributes with units

Some attributes accept horizontal and vertical dimensions as numbers followed by a "unit identifier" (often referred to as "unit"). The syntax symbols h-unit and v-unit refer to horizontal and vertical dimensions, respectively. The possible units of measure and the sizes to which they apply are listed in the table below; they are the same for horizontal and vertical dimensions, but the syntax symbols are different (as a reminder of the direction they use).

Unit identifiers and their semantic meaning are taken from. However, the syntax for a number followed by an identifier in MathML is not identical to that in CSS, since numbers in CSS cannot end with a decimal point and can begin with a "+" sign.

Valid horizontal and vertical units in MathML:

The typographic units em and ex are discussed further in the "Additional Notes" section.

% is "relative unit"; when an attribute value is given as n% (for any numeric value n), the value is defined as the default value multiplied by n divided by 100. The default value (or the way it can be obtained if it is not a constant) is described in the attribute table for each element, and its meaning is described in the subsequent attribute documentation. (The mpadded element has its own syntax for % and does not allow it to be used as a unit identifier)

For consistency with CSS, length units in MathML may be optional. When this is the case, the unit character in the attribute syntax is enclosed in square brackets, such as number . The meaning of an attribute value without units is described in the documentation for each attribute; usually the specified number is multiplied by the default value. (In this case, the number nnn without a unit is equivalent to the number nnn multiplied by 100 and with a % sign. For example, ( equivalent to ()

As an exception (also for CSS compatibility), numeric values ​​of zero do not require a unit identifier, even if the syntax requires it. In this case, the presence or absence of a unit identifier does not matter, since any number multiplied by 0 is 0.

For most of the attributes in this specification, the units used in the typographical set are chosen as the standard units of measure; when a specific quantity value is not specified, the standard units of measure are usually specified in a table or in an attribute description. The most commonly used units are em or ex. However, any unit may be used unless otherwise specified in the description of a particular attribute.

Note that some attributes, such as framepacing in , can contain more than one numeric value, each followed by a different unit of measure.

It is customary to use the units ex mainly for vertical dimensions, and em for horizontal dimensions, although this is not a requirement. These units of measure depend on the font used to display the element in whose attributes they are applied, and its size. So they must be interpreted after attributes such as fontfamily and fontsize if they occur on the same element, since changing the current font or its size may change the size of the units.

The definition of the length of each unit of measurement (but not the MathML syntax for length values) is the same as in CSS, except that the font sets special values ​​for em and ex that are different from the values ​​defined in CSS (font size and "x" - height respectively).

1.7.4.3 CSS compatible attributes

Some of the MathML attributes listed below correspond to the text display properties defined in CSS1. This is done so that renderers can query the CSS environment for the appropriate properties when defining default attribute values.

The ability to define style properties through MathML and CSS attributes also has disadvantages. At the very least, this is confusing, and at worst, it causes the equations to unintentionally change meaning when changing the CSS for the entire document. Therefore, these attributes are deprecated. In turn, MathML 2.0 introduces four new mathematical style attributes. These attributes use boolean values ​​to better convey the abstract categories of symbols used in mathematics, and provide a clear separation between MathML and CSS.

The following table maps the deprecated MathML 1.01 style attributes to their CSS counterparts:

The order in which attributes and style sheets are processed.

CSS or similar style sheets can specify changes to the display properties of MathML elements. Because display properties can be changed by both the element's attributes and the renderer, it is necessary to determine the order in which changes occur from different sources. An example of auto-negotiation is the situation with fontsize. In the case of "absolute" changes, such as setting a new property value independent of the old value (as opposed to "relative" changes, such as incrementing or multiplying by a number), only the most recent absolute changes are in effect, so the highest priority change source must be handled last.

In the case of CSS, the order in which changes affecting the display properties of a MathML element from different sources are handled should be: (changed first; lowest priority)

Automatic changes to properties or attributes based on the type of the parent element and the position of the element in the parent (as mentioned above about fontsize changes according to scriptlevel; such changes are usually applied by the parent element itself before rendering display properties to the current element

From the reader's style sheets: styles that not declared "important"

Explicitly set attributes of the current MathML element

From the reader's style sheets: styles that are declared "important" (last modified; highest priority).

Note that the order of changes made by CSS style sheets is defined in the CSS itself (this is the order defined by CSS2). The following explanation applies only to the case where there are changes in this order due to the precise specification of the MathML attributes.

Explanation: Display attributes in MathML are similar to display attributes in HTML (such as align), which, according to the order defined in CSS, must be processed with the same priority. Moreover, this choice of precedence allows readers to decide, by defining CSS styles as "important", which of their settings should override explicit MathML settings. Since MathML expressions consisting of content or presentation elements are primarily intended to convey meaning, and the "graphical representation" (if any) should assist in this (but is not important in itself), it is likely that readers will want to their style preferences took precedence. The main exception is when display attributes change the meaning of an expression.

1.7.4.4 Default attribute values

Default values ​​for MathML attributes are usually given along with a detailed description of the corresponding element. The default values ​​in attribute tables in normal font are accurate (unless they are obvious explanations), the italicized passages describe how the default values ​​can be computed.

Default values ​​declared as inherited are taken from the rendering environment as described for mstyle, or, in some cases described separately, from the values ​​of other attributes of the surrounding elements, or from a specific portion of those values. A value is always used that can be given exactly if it is known; it never depends on the content or attributes of the given element, only on its environment. (Its meaning when used may, however, depend on these attributes or content)

The default values ​​described as automatic must be calculated by the renderer in such a way as to produce a high quality image. The way how to achieve this is usually not specified in the MathML specification. A value is always used that can be given exactly if it is known; but it usually depends on the content of the element and possibly on the display environment.

Other italicized descriptions of default values ​​that occur in attribute tables are explained separately for each attribute.

The single or double quotes that must be enclosed in the values ​​of an attribute located in the opening XML tag are not shown in the value syntax in the attribute table, but are shown in the text of the examples.

Note that, in general, there are no values ​​that can be accurately assigned to MathML attributes and mimic the effect of their absence for attributes that are inherited or automatic. Specifying "inherited" or "automatic" will definitely not work, and is not allowed at all. Furthermore, even for view attributes (for which specific defaults are given here), the mstyle element must be used to change the elements it contains. Therefore, the MathML DTD defines most presentation attribute defaults as #IMPLIED, which prevents XML processors from adding any special defaults to these attributes. MathML schema works the same way.

1.7.4.5 Attribute values ​​in the MathML DTD

In the XML DTD, the allowed attribute values ​​may be defined as generic strings, or may be restricted in various ways (by enumerating possible values, or specifying a particular data type). The choice of XML attribute type affects the extent to which validation can be performed using the DTD.

The MathML DTD defines formal XML attribute types for all MathML attributes, including enumerations of valid values ​​in some cases. In general, though, the MathML DTD is relatively lax, often defining attribute values ​​as strings; this is done to be compatible with SGML parsers, which allow multiple attributes of a single MathML element to have the same value (such as true and false), and to allow expansion of the list of predefined values.

At the same time, even though an attribute's value can be defined as a string in a DTD, only certain values ​​are valid in MathML, as described above and in the remainder of this specification. For example, many attributes require numeric values. The following section describes the valid attribute values ​​for each element. However, the lack of rigidity in a DTD does not imply that those requirements are not part of MathML, or that they cannot be enforced by a particular MathML renderer.

Moreover, the MathML DTD is provided as a convenience; although full compatibility with the text of the specification is intended, the text shall govern in case of conflict.

1.7.5 Attributes common to all MathML elements

To facilitate the use of styling mechanisms such as XSLT and CSS2, all MathML elements have class, style, and id attributes in addition to the attributes described for each element. MathML renderers that do not support CSS may ignore these attributes. MathML defines the values ​​of these attributes as generic strings, even though the style engines have a stricter syntax for them. Therefore, any value for them is valid in MathML.

To ensure compatibility with linking mechanisms, all MathML elements have an xlink: href attribute.

All MathML elements also have an xref attribute for use in parallel markup. id is also used in this context.

Each MathML element, as a legacy from MathML 1.0, also accepts the deprecated other attribute, which was intended to pass non-standard attributes without violating the MathML DTD. MathML renderers are only required to process this attribute if they respond to all non-standard MathML attributes. However, the use of the other attribute is strongly discouraged, as there are other ways in MathML to convey specific information.

1.7.6 Collapsing spaces in input

MathML ignores whitespace characters that occur outside of tokens. Non-whitespace characters are not allowed here. Whitespace characters occurring in the content of tokens are stripped at the ends, that is, all whitespace characters at the beginning and end of the content are stripped. Whitespace characters within the content of MathML elements are collapsed cononically, that is, each sequence of 1 or more such characters is replaced by 1 (sometimes called a null character).

In MathML, as in XML, whitespace refers to a simple space, tab, newline, or newline, that is, characters with Unicode codes U+0020, U+0009, U+000A, U+000D, respectively.

For example,

( equivalent to (, and

equivalent to Theorem 1:.

Authors wishing to place whitespace characters at the beginning or end of the token content, or a sequence of more than one whitespace character so that they are not ignored, must use other non-displayable whitespace characters. For example, compare

Theorem 1:

When the first example is displayed, there are no spaces before the word "Theorem", one between "Theorem" and "1: ", and none after "1: ". In the second example, a single space will be displayed before the word "Theorem", two spaces before "1: ", and none after "1: ".

Note that the xml: space attribute is not applicable in this situation, because XML processors pass whitespace characters in tokens to the MathML processor; deletion occurs according to MathML processing rules.

For whitespace characters occurring outside the content of the mi, mn, mo, ms, mtext, ci, cn, and annotation tokens, the mspace element must be used, as opposed to the mtext element containing only whitespace characters

2. Possibilities of modern browsers when working with MathML

As a test case to demonstrate the capabilities of browsers, a simple XHTML page was created containing examples of both markups. We describe the main requirements for it. First, it must be a valid XHTML document, i.e.:

be a valid xml document;

the root element must be an html element in the XHTML namespace, like this:

"http://www.w3.org/TR/MathML2/dtd/xhtml-math11-f.dtd">

MathML fragments must belong to the MathML namespace, for example:

...

The test case that is used below: test. xhtml.

2.1 Mozilla & Firefox

Version used: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv: 1.7 5) Gecko/20041107 Firefox/1.0.

Mozilla and Firefox built on the same core have built-in MathML markup rendering capabilities. True, so far they are limited to only support for view markup. So in our test case, the fragment of the view markup was displayed correctly, which cannot be said about the content markup.

The solution to this problem can be the use of special XSLT styles "XSLT stylesheets for MathML". This approach is possible because support for XSLT transformations is built into the browser. To do this, you need to download a set of XSLT styles and in the first line of our page indicate a link to the header file mathml. xsl:

For security reasons, Mozilla allows XSLT styles located in a different MathML source view domain with selection highlighting. True, when using the content markup and XSLT styles, we will not see the source code, but the result of the transformation.

Other features include integration with search engines. When a formula fragment is selected, the context menu allows you to send a query to the search engine.

But so far this is a reserve for the future, since such a search has not yet yielded results.

2.2 Microsoft Internet Explorer

Version used: 6.0.2800.1106 (SP1; Q867801; Q823353; Q833989)

Microsoft's browser does not have built-in support for MathML. To correctly display mathematical formulas, you can use the freely distributed MathPlayer plugin.


In addition to the actual display of mathematical markup, it allows you to quickly copy MathML notation. You can also enlarge the formula for better clarity if you wish:

Among the shortcomings, it should be noted the inability to select or copy a fragment of a mathematical expression. There is also no way (as in Mozilla) to correctly copy the formula with the surrounding text.

2.3 Opera

Version used: 7.54u1 (Build 3918; Platform Win32; System Windows 2000; Java not installed).

This browser at this stage of its development does not have the ability to correctly display MathML markup.

List of used literature

1. Dorofeev A.V., Fedotov A.M. Electronic publications in the Internet environment and the multiplicity of Russian language encodings // Computational technologies, 1997, v.2, N 3, pp.31-44.

2. Oleinik O.V., Tolkacheva E.M., Fedotov A.M. Electronic editions and presentation of mathematical texts on WWW // Computational technologies, 1997, v.2, N 3, pp.60-67.

3. Shokin Yu.I., Fedotov A.M., Znamensky S.AT. Electronic Publications and Problems of Multiplicity of Russian Language Encodings // Information Technologies and Computing Systems, 1997, N 2, pp.90-101.

4. Znamensky S.AT. Standardization of Russian TeX: utopia or inevitability // Computational technologies, 1997, v.2, N 3, pp.51 - 59.

5. Galaktionov V.V. Extensible Mark-up Language (XML): An industry standard that defines the next generation Internet software architecture. Communication JINR, Р10-2000-44, Dubna, 2000.

6. Mityunin V.A. Overview of tools for publishing and viewing mathematical documents on the Internet - http://mathmag. spbu.ru/article/4/

7. Math on the Web: A Status Report - http://www.dessci.com/ webmath /status/

8. Including Math Notation in Web Pages - http://mathforum.org/typeseting/

9. MathML 1.01 - http://www.w3.org/TR/REC-MathML/

10. MathML 2.0 - http://www.w3.org/TR/MathML2/

With the spread of global computer networks (in particular, the Internet), it became necessary to place mathematical texts in it, among other things.

The MathML language is a subset of the XML language (e X tensible M arkup L anguage - Extensible Markup Language) which is often used to create other languages. This use of XML is quite natural today and has worked well in other cases where the use of HTML to convey new types of data ran into the limitations of the format. To date, the W3C has published the 2nd edition version 2.0 of the MathML language specification, which indicates the viability and sustainability of the project.

XML based markup languages:

  • Wireless Markup Language (WML): data format for WAP (wireless) devices (mobile phones);
  • Synchronized Multimedia Integration Language (SMIL):
  • Specifies temporary layout, appearance, etc. for multimedia presentations;
  • Specifies the order in which multimedia files are played;
  • SMIL compatible player required for viewing (AMBULANT, MS IE6);
  • Guide and examples: http://www.multimedia4everyone.com/
  • Scalable Vector Graphics (SVG): for describing two-dimensional vector graphics;
  • Mathematical Markup Language (MathML): to describe mathematical notation (formulas);
  • Chemical Markup Language (CML): for representing chemical formulas;
  • other.

Among the goals set by the W3C Mathematics Working Group when creating MathML were:

  • providing coding of mathematical materials for communications of all levels of educational and scientific type;
  • providing encoding of both mathematical symbolism and its meanings;
  • support for creating templates and other mathematical editing techniques;
  • ensuring conversion to other mathematical formats of both a purely presentational and semantic nature, as well as from these formats to the created mathematical markup language. Output formats should include means for displaying graphic information, speech synthesis, text representation in a form suitable for input into computer algebra systems, compatibility with other languages ​​for describing mathematical texts, such as TAR, the ability to display "pure" text (i.e. . not including mathematical symbols and expressions), the ability to print texts in various forms, including output in Braille. At the same time, conversions between different formats can lead to information loss;
  • the ability to transfer information, taking into account the characteristics of specific visualization programs;
  • support for efficient browsing processes for long mathematical expressions;
  • providing extensibility (in ways that are not known in advance).

The general principle of using MathML is that mathematical constructs are embedded in a regular HTML document and (if the browser or special program supports this specification) are adequately reproduced when the document is downloaded from the network.

The first thing you have to deal with in MathML and what distinguishes this markup language from its analogues is the use of two ways to encode expressions. One of them is based on the direct transfer of formula syntax ( presentation), the other, on the contrary, reflects the semantics of the expression ( content). Presentational markup describes mathematical symbolism with expressions that are built using some inference schemes, specifying ways to place subexpressions, such as fractions, superscripts, and subscripts. Semantic markup describes mathematical objects and functions, where for each node an expression tree is constructed according to some specific scheme, and the branches of this tree correspond to subexpressions.

Currently, web pages created using MathML can be viewed in the following browsers (the “+” sign means that newer versions also work):

  • Windows:

o IE 5.0 with Techexplorer plugin

  • Macintosh:

o IE 5.0+ with Techexplorer plugin

Mozilla 0.9.9+

  • Linux/Unix:

o Netscape 6.1 with Techexplorer plugin

Mozilla 0.9.9+

o Amaya, all versions (Presentation MathML only)

All elements of MathML are divided into three groups: elements representation, elements content and interface elements.

View elements describe a visually oriented two-dimensional structure of mathematical notation. For example, element mrow usually used to denote a horizontal row of parts of an expression, and the element msup, which marks the superscript. Typically, each view element corresponds to one type of notational scheme, such as row, superscript, subscript, and so on. Any formula consists of parts, which may consist of the simplest elements, such as numbers, letters or other symbols.

The most important representation elements are mi , mn , and mo , used to represent identifiers, numbers, and operators, respectively. Typically, these elements are displayed in different styles: numbers are in roman type, identifiers are in italics, and extra white space is left around the operators.

In markup terms, most MathML elements are defined opening and closing tags that limit the content of an element. Some elements, such as operation signs ( ) are defined by a single tag.

Let's consider in more detail some of the elements necessary for the layout of mathematical formulas, using the presentational markup as an example.

Tokens (token elements) represent individual characters, names, numbers, designations, etc. Basically, tokens can only have characters as content.

MathML ignores whitespace characters that occur outside of tokens. Non-whitespace characters are not allowed here. Whitespace characters occurring in the content of tokens are stripped at the ends, that is, all whitespace characters at the beginning and end of the content are stripped. Whitespace characters within the content of MathML elements are canonically collapsed, that is, each sequence of 1 or more such characters is replaced by 1 (sometimes called a null character).

Main elements

Indices

Some mathematical operations that can be used with the tag .

+ +
< >
<
<= >=
++ ++
.NOT. not
and and
invisible multiplication sign
+ +

Let's look at some examples of formulas in MathML.

1) sin 2 α + cos 2 α \u003d 1

sin

α

+

cos

α

=

The Greek letter α is obtained using the code α (recall that Unicode is used).

Result

We also remind you that in order to work with MathML in Internet Explorer, you need to install MathPlayer.

Any file containing MathML markup must have the lines before the document header

In addition, any MathML code opens with the tag

and closed with a tag.

a2

b2

sin

x+y

2x

x2

y2

Consider the elements for the layout of tables and matrices.

1

1

1

1

1

1

0

0

0

1

1

0

0

0

1

1

0

0

0

1

1

1

1

1

1

Example 2

ax+by

=c

a1x+

b1y=

c1

Various mathematical symbols, if necessary, must be looked up in the Unicode encoding table.

Example.

S

-1

Σ

i=0

A fairly significant number of examples with integrals can be found on, so we will not dwell on them here, we recommend that the reader familiarize himself with the examples on the indicated resource.

As noted above, MathML allows presentational and semantic representations. Here we have focused on presentational as the most digestible and more commonly used. However, to give at least some idea of ​​the other option, consider a small illustrative example and write it in two representations.

Example. x 2 - 6x + 9 = 0

Of course, typing formulas in MathML is a rather lengthy task and requires some effort. However, those who have worked long enough in LaTeX will not notice much difference. However, more often the user prefers to use different tools. Let's name a few.

Firstly, mathematical packages, say, Mathematica or Maple, allow you to save the formulas typed in them in the MathML format.

This resource uses the Java script ASCIIMathML.js (ver 2.0; September 2007; http://www1.chapman.edu/~jipsen/mathml/asciimath.html , written by Peter Jipsen) running on the user's computer, which is loaded when downloading the demo page http://www1.chapman.edu/~jipsen/mathml/asciimathdemo.html . Therefore, in particular, this resource can be used locally: it is enough to save the mentioned demo html page and you can convert simple formulas without connecting to the Internet.

In conclusion, we note that MathML as a markup tool is also well suited for generating various mathematical tasks (see examples above). At the same time, for example, JavaScript can be used for programming, then the user will be able to generate an arbitrary number of task options at his discretion. You can also provide for the generation of answers to all tasks, which is very simple.

We expect that the reader's acquaintance with the means of layout of mathematical texts does not end there, and then he will be able to independently choose the tool that interests him, and maybe even apply the described technologies in practice.

Literature and Internet resources

Prior to HTML5, using formulas was a real pain in the ass. Judge for yourself: in 2005, it was necessary to have at hand either a special browser, or split text into proper HTML and inserts from images or PDF. Search and other operations of editing and/or output to the screen/paper was an ambiguous task, to which entire monographs were devoted.

In 2012, it was already easier. Now you can connect the necessary plugins (Firemath for FireFox and Daum Equation Editor for Chrome). But the ambiguity of the standards (and support) actually forced us to write the same article for each of the browsers (and for their versions). Or greet users with a magical greeting “Your browser needs to be updated/added with an extension.”

Uncomfortable? - Yes! Was it time consuming to find a universal solution? - Yes! Makes you think about what type of recording is better (presentation or content), which converter to use (and there are only about a dozen of them well-known)? - YES! YES! YES!

As a result, the work of publishing turned into the development of two or three markup lexicons and the study of the operation of at least one transcoder program.

Now, with the advent of HTML5, things have become much easier. It has a new container .
Every valid instance of MathML must be inside this container.
It does not allow nesting, but there can be an arbitrary number of other child elements inside.

Tag attributes

In addition to the following attributes, the tag accepts any attributes from "> .

class, id, style
When used in conjunction with stylesheets.
dir
Specifies the direction of the formula: ltr - left to right or rtl - right to left.
ref
Used to set a hyperlink to the specified URI.
mathbackground
Background color. You can use #rgb , #rrggbb and HTML color names.
math color
Text color. You can use #rgb , #rrggbb and HTML color names.
display
This attribute specifies the output method. Possible values:

  • block- means that this element will be displayed outside the current text range, as a block that can be placed anywhere without changing the meaning of the text;
  • inline - means that this element will be displayed inside the current text span, and cannot be moved out of it without changing the value of this text.

The default value is inline .

mode

Deprecated display attribute value.
Possible values ​​are display (which has the same effect as display="block") and inline .
overflow
Determines how the expression behaves if the text is too long to fit within the specified width range.
Possible values: linebreak (default), scroll , elide , truncate , scale .

Examples

Representation in HTML5

MathML in HTML5 a 2 + b 2 = c 2

Representation in XHTML

MathML in XHTML a 2 + b 2 = c 2
Notes: XHTML documents with MathML must be served as application/xhtml+xml . You can easily achieve this by adding the .xhtml extension to your local files. For Apache servers, you can set the .htaccess file for this extension to the correct MIME type. Since we saved our MathML as an XML document, we need to be sure that the XML document is well-formed.

Browser Support

Browser Support

Full Versions
Element Chrome Firefox (Gecko) Internet Explorer Opera safari
XHTML Description (only 24th) 1.0 (1.7 and up) 9.5 5.1
HTML5 description (only 24th) 4.0 (2.0) 5.1
dir 12.0 (12.0)
href WebKit bug 85733 7.0 (7.0) WebKit bug 85733
mathbackground (only 24th) 4.0 (2.0) 5.1
math color (only 24th) 4.0 (2.0) 5.1
overflow

Mobile versions

Element Android Chrome for Android Firefox Mobile (Gecko) IE Mobile Opera Mobile Safari Mobile
XHTML Description 1.0 (1.0)
HTML5 description 4.0 (2.0)
dir 12.0 (12.0)
href 7.0 (7.0)
mathbackground 4.0 (2.0)
math color 4.0 (2.0)
overflow

Used for presentation and formulas in . MathML is recommended by the math group.

MathML specification version 1.01 was released in July, version 2.0 appeared in February. In October, the second edition of MathML version 2.0 was published, which is currently the latest specification released by the math group.

MathML considers not only performance, but also meaning formula elements. A mathematical semantics markup system is also being developed to complement MathML. It's called OpenMath.

Example

Software support

The main ones directly supporting MathML are the latest versions and variations. Many other browsers support this format by installing the appropriate . For example, in order to support MathML, the MathPlayer plugin is used.

In addition, MathML is supported by major office programs such as and , as well as mathematical software products, for example,

With the development of mathematical symbolism, methods were developed and improved
its storage and transmission. For example, the mathematicians of ancient Babylon took their notes
on clay tablets, in the period of the late Middle Ages, the first printed
books, and finally, the modern era is characterized by an ever-increasing flow of electronic
publications. It is the search for adequate methods for the design of mathematical texts
on the Internet and led to the creation of MathML.

The need for such a tool is due to the fact that the HTML format, despite many wonderful properties, has a rather limited ability to convey mathematical notation. Most often, formulas on HTML pages are presented as graphics (raster or vector), but this method has obvious drawbacks. For example, a drawing formula is almost impossible to edit, and its printing quality usually leaves much to be desired. Based on this, it is already clear that for the Web it is desirable to encode mathematical symbolism somehow, and as "transparently" as possible for client programs (browsers). The development of these ideas led to the creation of a whole family of mathematical markup languages, which include the MathML considered today.

It should be noted that the problem of representing mathematical symbols in electronic form is not limited to the need to develop a separate specification. This is a complex scientific and technical problem, which is still far from its final solution, which is confirmed by the presence of a large number of proposed approaches, often poorly coordinated with each other. One such approach is specialized markup languages, which include MathML. Of course, its developers were aware of the depth of this issue and set a goal to create a specification that satisfies the following limited, but still quite important requirements:

  • ease of development and manual set of basic mathematical notation;
  • maximum compatibility with other mathematical formats, which should be provided by the appropriate converters;
  • the ability to output formulas to various terminal devices;
  • support for extensibility, i.e. the introduction of new symbols, schemes, etc.

To these goals related to the principles of construction of the specification, others were added related to the peculiarities of using MathML in applications. It is necessary to ensure the output of formulas on screens and printers with the highest quality, to organize means of information exchange (for example, buffer operations of copying/pasting formula fragments). It is clear that all this will be implemented by developers of application software, however, potential opportunities should be laid down initially.

In a few words, let's outline the place of MathML in its related group of mathematical markup languages. Unlike a number of its representatives, of which TeX should be mentioned first of all, MathML has semantic means for constructing mathematical expressions. If the document of the same TeX, in fact, is a detailed description of some text with an exact indication of the position of all its elements, then MathML (more tied specifically to the content) is much more flexible in this respect, since the final form of the document can easily be changed in accordance with the requirements user.

About MathML

MathML is a subset of the extended XML markup language, which is often used to create other languages. This use of XML is quite natural today and has worked well in other cases where the use of HTML to convey new types of data ran into the limitations of the format. To date, the W3C has published the 2nd edition version 2.0 of the MathML language specification, which indicates the viability and sustainability of the project.

The general principle of using MathML is that mathematical constructs are embedded in a regular HTML document and (if the browser or special program supports this specification) are adequately reproduced when the document is downloaded from the network.

The first thing you have to deal with in MathML and what distinguishes this markup language from its analogues is the use of two ways to encode expressions. One of them is based on the direct transfer of the syntax of the formula (presentation), the other, on the contrary, reflects the semantics of the expression (content). Simply put, the first way conveys the notation of the formula without regard to its meaning, the second, on the contrary, reflects its mathematical content.

Syntax coding

Rice. one

If you look at the form of representation of various mathematical expressions, you can
notice that with a fairly large number of special characters, there is
relatively few ways to arrange them. Thus, expressions can be built
using superscripts and subscripts, one part of the formula can be above / below the other,
expressions can be in matrix elements, etc. This principle is
the basis of syntax coding, in which mathematical expressions of any complexity
are formed using a small set of templates (the so-called layout schemata),
corresponding to the basic relations encountered in mathematical formulas.

To demonstrate this principle, let's look at how a common fraction is written in MathML. It has only two elements - the numerator and denominator, which is reflected in the corresponding template:

Here tag , as already mentioned, serves to create the actual fraction. The linethickness attribute determines the thickness of the dividing line, if it is omitted, the standard value will be used (both options are shown in the figure).

The numerator is represented by the tag , which in turn contains child elements. This tag can include any number of nested expressions that create a formula that is horizontally aligned along the baseline. In our case, this is the sum of two variables x and y(meaning of tags and will be explained below). Note that an entry without a tag would lead to an error, since the first expression encountered in the record would be taken as the numerator ( x). Finally, the denominator consists of one variable Z, passed by the tag .

The main elements used in MathML representation are symbols (tokens) and templates (layout schemata mentioned above). The first are elements of the language that can contain only letters (for example, to denote variables) and proper mathematical symbols, but not other elements.

Element used to enter identifiers. It operates according to the following rule: if the element's value is a single character, it is treated as a variable and displayed in italics, if the value is a string, it is displayed in roman type (this property is used to write functions like sin, ln and etc.). Element is designed to display mathematical operators, and , which we have not met yet, is for displaying numbers. Each of the described elements has a certain set of attributes that allow you to change the default character display.

Now let's take a look at some MathML templates that convey basic mathematical expressions. We met two of them above: these are tags for specifying an ordinary fraction and an expression aligned along the baseline. Other most important patterns are passed with the following tags:

  • outputs the radical sign with a nested expression. Similar tag serves to display the root n th degree;
  • specifies an expression enclosed in parentheses. Attributes can be used to specify a separator for nested expressions and some other characteristics;
  • tags for passing superscripts/subscripts. For example, an expression with a superscript
    (degree) is given as HIGH_INDEX EXPRESSION

There are also templates for designing almost all the most important mathematical expressions and matrices (about 30 types in total).

Semantics Encoding

Now let's move on to the way of coding using the semantics of expressions. As noted above, it reflects the mathematical content of the formula. The key to convey semantics is the element . Let's illustrate its use with a simple example. The following code creates a fraction of the same form as shown in Fig. one.

Rice. 2

In our example, the first goes element denoting
division (fraction). Let's say right away that in semantic coding, most of the operators
are passed by tags like , in which before the closing angle bracket
there is a slash (the so-called "empty" elements). Then follow
arguments: one more element , passing the summation operator x
and y, and - Z. Accordingly, the sum (the first argument of the division operator)
displayed as the numerator of a fraction, and the variable Z- as a denominator. MathML
contains about 90 operators divided into several categories: arithmetic,
algebraic, logical, etc.

In this example, each operator encountered was applied to a pair of arguments. But, as follows from the above general view of the element , there can be any number of arguments (if the operator itself allows it). For example, the expression shown in Fig. 2 is written as

In the last two examples, one point remained without explanation - tags for transfer
alphabetic identifiers and numbers. it and respectively
- complete analogues of elements and when encoding syntax.
Note that the tag has no analogue when using semantics,
since all information about the operator is transmitted by a special operator tag,
coming after .

To set the structure of the formula, not only . For example, to express a relationship (equality, inequality, inclusion, etc.), there is a special tag . The following snippet creates the formula shown in Fig. 3.



x

y
z


Here says that the mathematical expression includes one of the relations, and the "empty" tag indicates its specific type, "less than". The ID will be on the left side. x, on the right - the sum of two variables, determined using the familiar element .

The information we have given about MathML, although far from complete, is quite enough to start using this language on your own. Moreover, there are special software tools designed to get rid of routine work.

MathML software

The related W3C page provides links to approximately
three dozen recommended products for working with MathML. The most important seems to be
the situation with browsers, since it characterizes the degree of recognition to the greatest extent
any web technology. To date, only two products are ready correctly
interpret MathML: Amaya (which can be downloaded from the site of the same W3C) and
Mozilla. Unfortunately, other developers are in no hurry to include support for their products.
this promising technology. Our most popular browsers Microsoft Internet
Explorer and Netscape Navigator in the base MathML don't "understand"
however, there are special
plug-ins - from IBM , Design
Science, Theorist
interactive.

The same three companies also supply more complete versions of their software, already designed for creating MathML documents. In particular, it was in the IBM techexplorer Hypermedia Browser that the examples for this article were prepared. Similar tasks can be solved with the help of less specialized applications. For example, computer mathematics systems (Mathematica, Maple, Mathcad) usually export their documents to HTML format with MathML "blotches".

MathML is also supported by some desktop publishing systems for preparation
scientific and technical documentation. Of the most famous products of this class
can be called WebEQ ,
which is a package of Java applications to provide a complete set cycle
and publication of documents in MathML and WebTeX formats; wolfram
Publicon for preparing presentation-quality mathematical texts with
the ability to export to MathML; word processors by MacKichan
Software that can save documents from their main TeX format
in MathML.

There are also special converters for converting to/from MathML. TeX is the most common source format. MathML, in turn, turns into the same TeX or popular graphic formats.

However, even without such, not always available, packages at your disposal, after some preparation, you can create MathML documents manually. After all, these are ordinary text files, for working with which (as well as with HTML) a simple text editor is enough.

In conclusion, we emphasize once again that MathML appeared relatively recently (the description of version 1.0 was published in 1998) and is in its infancy. It cannot be excluded that in a few years MathML will give way to a more powerful and advanced technology. However, it is already safe to say that the deep ideas embedded in this language will serve as a solid foundation for creating future methods for presenting complex scientific and technical documents.