We continue our study of XML again and in this article we will get acquainted with XML constructs such as processing instructions, comments, attributes, and other XML elements. These elements are basic and allow you to flexibly, in strict accordance with the standard, mark up documents of absolutely any complexity.

Some points, such as XML tags, we have already partially considered in the previous article "". Now we will once again touch on this topic and analyze it in more detail. This is done on purpose to make it easier for you to visualize the whole picture of XML constructs.

XML elements. Empty and non-empty XML elements

As mentioned in the previous article, tags in XML do not just mark up text, as is the case in HTML, but single out individual elements (objects). In turn, the elements hierarchically organize the information in the document, which in turn made them the basic structural units of the XML language.

In XML, elements can be of two types - empty and non-empty. Empty elements do not contain any data, such as text or other constructs. Unlike empty elements, non-empty elements can contain any data, such as text or other XML elements and constructs. To get the gist of the above, let's look at examples of empty and non-empty XML elements.

Empty XML element

Non-empty XML element

Element content...

As we can see from the example above, the main difference between empty and non-empty elements is that they consist of only one tag. In addition, it is also worth noting that in XML all names are case sensitive. This means that the names myElement, MyElement, MYELEMENT, etc. are different, so this moment should be remembered immediately in order to avoid mistakes in the future.
So, we figured out the elements. Now let's move on to the next point, such as the logical organization of XML documents.

The logical organization of XML documents. Tree structure of XML data

As you remember, the main structure of the XML language is elements that can contain other nested structures and thus form a hierarchical structure in the form of a tree. In this case, the parent element will be the root, and all other child elements will be the branches and leaves of the XML tree.

To make it easier to understand the essence of the above, let's look at the following image with an example.

As we can see, the organization of an XML document in the form of a tree is a fairly simple structure to process. At the same time, the expressive complexity of the tree itself is quite large. It is the tree representation that is the most optimal way to describe objects in XML.

XML attributes. Rules for writing attributes in XML

In XML, elements can also contain attributes with values ​​assigned to them, which are enclosed in single or double quotes. An attribute for an element is set as follows:

In this case, an attribute with the name "attribute" and the value "value" was used. Here it is worth noting right away that the XML attribute must contain some value and cannot be empty. Otherwise, the code will be incorrect in terms of XML.

You should also pay attention to the use of quotes. Attribute values ​​can be enclosed in either single or double quotes. In addition, it is also possible to use some quotes inside others. To demonstrate, consider the following examples.

Before proceeding to consider other XML constructs, it is also worth noting that when creating attributes, such Special symbols, like ampersand "&" or angle brackets "<>". These characters are reserved as control characters ("&" is an entity, and "<» и «>” open and close the element tag) and cannot be used in a “pure” way. To use them, you need to resort to replacing special characters.

XML processing instructions (processing instructions). XML declaration

In XML, it is possible to include instructions in a document that carry specific information for applications that will process a particular document. Processing instructions in XML are created as follows.

As you can see from the example above, in XML, processing instructions are enclosed in angled quotes followed by a question mark. This is a bit like the regular one we looked at in our first PHP tutorials. The first part of the processing instruction defines the application or system for which the second part of this instruction or its contents is intended. However, processing instructions are only valid for the applications to which they are addressed. An example of a processing instruction would be the following instruction.

It's worth noting that there is a special construct in XML that looks a lot like a processing instruction, but it's not itself. This is an XML declaration that passes to the processing software some information about the properties of the XML document, such as encoding, version of the language in accordance with which it is written this document etc.

As you can see from the example above, the XML declaration contains so-called pseudo-attributes, which are very similar to the regular attributes we talked about just above. This is because, by definition, an XML declaration and processing instructions cannot contain attributes, so these declarations are called pseudo-attributes. It is worth remembering for the future in order to avoid various errors.

Since we have dealt with pseudo-attributes, let's look at what they mean.

  • Encoding - responsible for encoding the XML document. Usually the encoding is UTF8.
  • Version - the version of the XML language in which this document is written. This is usually XML version 1.0.

Well, now let's move on to the final part of the article and consider such XML constructs as comments and CDATA sections.

Is there an escape character for double quote in xml? I want to write a tag like:

but if i put " then it means the line ended. I need something like this (c++):

Printf("Quote = \" ");

Is there a character to be written before the double quote to escape it?

A new, improved answer to an old, frequently asked question...

When to escape double quote in XML

Double quote (") may appear no way out :

    In XML text content:

    He said, "Don't quote me."

    In XML attributes separated by single quotes ("):

    Note: passing to single quotes (") also does not require escaping:

Double quote (") must be shielded :

    In XML attributes separated by double quotes:

bottom line

The double quote (") should only be escaped as "in XML" in a very limited context.

If you just need to try something quickly, here is a quick and dirty solution. Use single quotes for the attribute value:

In C++ you can use the EscapeXML ATL API. This is the correct way to handle special characters...

Here are the common characters that must be escaped in XML, starting with double quotes:

  1. double quotes (") are escaping into "
  2. ampersand (&) escapes to &
  3. single quotes (") are escaping into "
  4. less (<), экранируется до <
  5. greater than (>), escaped to >

Others have answered in how to deal with specific escaping in this case.

The broader answer is don't try to do it yourself. Use the XML API - there are many available for almost every modern programming platform.

The XML API will handle things like this for you automatically, making it much harder to go wrong. Unless you're writing the XML API yourself, you rarely have to worry about such details.

For a long time, the standard prescribes for inserting ordinary quotes in HTML text use the construction "For within tags, quotes "" are used to denote attributes.

However, I have not yet come across a browser that would not show as a quote a simple symbol "OUTSIDE of any tags. So tell me, dear colleagues, maybe the use of" outside the tags is simply unnecessary tediousness? You can calmly and without further ado write "? Especially in texts where there are a lot of quotation marks, and compliance with strict design rules (about the correct use of national quotation marks) is irrelevant.

IMHO, many people do this ... but the question is not entirely clear: if you understand that according to the standards you need to write quotes like ", but lazily, despite the fact that a lot of sites work like that, then what do you expect to hear? I think that , whether the display of quotes will be supported in new versions of browsers, no one knows, so the most obvious recommendation can be given: if you don't want problems in the future, stick to the standards 100% :) But you already know this. that's all, forget it, and in 10 years everything will be the same, I (Microsoft, Mozilla, etc.) guarantee?

Lynn "Coffee Man"[dossier]
yes, by the way ... now it’s useful to read, nowhere is it stated that quotes should be represented as "
http://www2.stack.ru/~julia/HTML401/charset.html :

Some authors use the character entity reference """ to encode instances of double quotes ("), as this character can be used to delimit attribute values.

about what, need use exactly entity it is said only about<, >and &:

If the author wants to put the character "<", во избежание возможной путаницы с началом тега (метка начала тега) он должен использовать ссылку "<" (десятичный код ASCII 60). Точно так же во избежание проблем со старыми версиями пользовательских агентов, некорректно принимающих символы ">" for the end of the tag (end-of-tag mark), you should use the reference ">" (ASCII decimal code 62).

To avoid confusion with character references (character reference start marker), "&" reference (ASCII decimal code 38) should be used instead of "&". In addition, the "&" reference should also be used in attribute values, since character references within CDATA attribute values ​​are allowed.

But I’m just expecting something like Lynn’s answer: that there is actually no such standard. It didn't even occur to me - my information is from popular textbooks and for reasons of "everyone does it".

Or another option: but if you follow the new standards that I have not encountered in my practice - like xhtml (exactly, I checked xhtml), then such a trick will not work. Therefore, there is no need to create portability problems for the written HTML code.

Or, finally, how do you do it yourself?

&, by the way, generates a similar question. The above document says "to avoid confusion". But confusion is possible only if the & is followed by one of the provided codes. What if it's, say, a URL like "..../script?A=1&B=2" ? Am I risking anything if I mistakenly specified such a URL as href (which, of course, works correctly during the test)? Anything other than the extremely unlikely situation that in 10 years (when the site is outdated or has already been rewritten ten times) there will be an entity with an extravagant name &B without a trailer; ? In other words, how carefully should all such cases be checked?

Daniel, if you are sure that you have no problems with existing codes, then you can write simply &. If in the future it appears new code- then it, I think, will be declared explicitly not in the HTML 4.01 specification, therefore it should not affect a normally declared document. Or do you expect to secure support for future standards by simple change document schema?

Daniel Alievsky[dossier]
In XML, a normal quote as text also poses no problem (respectively, in XHTML, of course). IMHO quotes are usually translated into " for only one reason - you don't want to write two functions to bring text to a safe form when substituting in XML / HTML / XHTML.

Hello, dear site visitors! Let's continue the theme of the XML markup language and look at the use of attributes. Attributes can be present on XML elements, just like in HTML. Attributes provide Additional information about the element.

XML Attributes

AT HTML attributes provide additional information about the elements:

XML Attributes Must Be Quoted

Values attributes in xml must always be enclosed in quotation marks. Both single and double quotes can be used. To specify the gender of a person element, you can write this:

If the attribute value itself contains double quotes, you can use single quotes, as in this example:

XML Elements vs. Attributes

Take a look at the following examples:

Victoria
Petrova

female
Victoria
Petrova

In the first example, gender (sex) is an attribute. In the latter, sex is an element. Both examples provide the same information.

There are no rules about when to use attributes and when to use elements. Attributes are handy in HTML. In XML, I advise you to avoid them. Use elements instead.

My Favorite Method

The following three XML documents contain exactly the same information:

The date XML attribute is used in the first example:

The expanded date element is used in the third one: (THIS IS MY FAVORITE WAY):



10
01
2008

Petya
Sveta
Reminder

Avoid XML Attributes?

Some of the problems with using xml attributes:

  • attributes cannot contain multiple values ​​(elements can)
  • attributes cannot contain tree structures (elements can)
  • attributes are harder to extend (for future changes)

Don't do it like this:

to="Vasya" from="Sveta" heading="Reminder"
body="Don't forget to call me tomorrow!">

XML Attributes for Metadata



Vasya
Sveta
Reminder
Don't forget to call me tomorrow!


Sveta
Vasya
Re: Reminder
OK

The id attributes above are used to identify different notes. They are not part of the note itself.

What I'm trying to say here is that metadata (data about data) should be stored as xml attributes, and the data itself should be stored as elements.

Thank you for your attention!.

As in HTML, XML elements may have attributes. At the same time, the value attributes in XML and the rule for their creation is in many ways similar to .

Attributes provide additional information about an element.

XML attributes

In HTML, attributes provide some additional information about an element:

Attributes often provide information that is not part of the data. In the example below, the file type does not depend on the data, but this information may be important for applications that will manipulate this element:

computer.gif

XML attributes must be enclosed in quotes

The attribute value must always be enclosed in quotation marks. It can be either double or single quotes. For example, the gender of a character can be written like this:

either like this:

If the attribute value itself contains double quotes, then single quotes can be used. For example:

or use entity symbols:

XML elements or attributes

Look at the following examples:

Example #1

Anna Smith

Example #2

female Anna Smith

In the first example, gender is specified in an attribute. In the second, gender is written as an element. Both examples provide the same information.

There are no rules governing when to use attributes and when to use elements. Attributes are widely used in HTML. In XML, I think it's best to avoid them and use elements instead.

What's better?

The following three XML documents contain exactly the same information:

The date is written as an attribute:

Tove Jani Reminder

The date is written as an element:

10/01/2008 Tove Jani Reminder Don't forget about me this weekend!

The date is written as an expanded element(In my opinion the best option):

10 01 2008 Tove Jani Reminder Don't forget about me this weekend!

Avoid XML attributes?

There are some problems when using attributes:

  • attributes cannot contain multiple values ​​(elements can)
  • attributes cannot contain tree structures (elements can)
  • attributes are hard to extend (for future changes)

Never use the following constructs:

XML attributes for metadata

Sometimes the elements are given identifiers. These identifiers are used to identify XML elements in exactly the same way as identification attributes in HTML. The following example demonstrates this:

Tove Jani Reminder Don't forget about me this weekend! Jani Tove Re: Reminder I won't forget

In the example above, the id attribute is used to identify different notes. This information is not part of the note itself.

The main idea of ​​all that has been said is that metadata (data about data) should be written as attributes, and the data itself as elements.