Entities are a mechanism for assigning names to chunks of content. As an XML parser processes your document, any entities it finds are replaced by the content of the entity.
This is a good way to have re-usable, easily changeable chunks of content in your XML documents. It is also the only way to include one marked up file inside another using XML.
There are two types of entities which can be used in two different situations; general entities and parameter entities.
You cannot use general entities in an XML context (although you define them in one). They can only be used in your document. Contrast this with parameter entities.
Each general entity has a name. When you want to reference a general entity (and therefore include whatever text it represents in your document), you write &entity-name;. For example, suppose you had an entity called current.version which expanded to the current version number of your product. You could write:
<para>The current version of our product is ¤t.version;.</para>
When the version number changes you can simply change the definition of the value of the general entity and reprocess your document.
You can also use general entities to enter characters that you could not otherwise include in an XML document. For example, < and & cannot normally appear in an XML document. When the XML parser sees the < symbol it assumes that a tag (either a start tag or an end tag) is about to appear, and when it sees the & symbol it assumes the next text will be the name of an entity.
Fortunately, you can use the two general entities < and & whenever you need to include one or other of these.
A general entity can only be defined within an XML context. Typically, this is done immediately after the DOCTYPE declaration.
Example 3-10. Defining General Entities
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" [ <!ENTITY current.version "3.0-RELEASE"> <!ENTITY last.version "2.2.7-RELEASE"> ]>
Notice how the DOCTYPE declaration has been extended by adding a square bracket at the end of the first line. The two entities are then defined over the next two lines, before the square bracket is closed, and then the DOCTYPE declaration is closed.
The square brackets are necessary to indicate that we are extending the DTD indicated by the DOCTYPE declaration.
Like general entities, parameter entities are used to assign names to reusable chunks of text. However, whereas general entities can only be used within your document, parameter entities can only be used within an XML context.
Parameter entities are defined in a similar way to general entities. However, instead of using &entity-name; to refer to them, use %entity-name; [1]. The definition also includes the % between the ENTITY keyword and the name of the entity.
Example 3-11. Defining Parameter Entities
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" [ <!ENTITY % param.some "some"> <!ENTITY % param.text "text"> <!ENTITY % param.new "%param.some more %param.text"> <!-- %param.new now contains "some more text" --> ]>
This may not seem particularly useful. It will be.
Add a general entity to example.xml.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" [ <!ENTITY version "1.1"> ]> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>An Example XHTML File</title> </head> <!-- You might well have some comments in here as well --> <body> <p>This is a paragraph containing some text.</p> <p>This paragraph contains some more text.</p> <p align="right">This paragraph might be right-justified.</p> <p>The current version of this document is: &version;</p> </body> </html>
Validate the document using xmllint.
Load example.xml into your web browser (you may need to copy it to example.html before your browser recognizes it as an XHTML document).
Unless your browser is very advanced, you will not see the entity reference &version; replaced with the version number. Most web browsers have very simplistic parsers which do not handle XML DTD constructs. Furthermore, the closing ]< of the XML context are not recognized properly by browser and will probably be rendered.
The solution is to normalize your document using an XML normalizer. The normalizer reads in valid XML and outputs equally valid XML which has been transformed in some way. One of the ways in which the normalizer transforms the XML is to expand all the entity references in the document, replacing the entities with the text that they represent.
You can use xmllint to do this. It also has an option to drop the initial DTD section so that the closing ]< does not confuse browsers:
% xmllint --noent --dropdtd example.xml > example.html
You should find a normalized (i.e., entity references expanded) copy of your document in example.html, ready to load into your web browser.
[1] |
Parameter entities use the Percent symbol. |