This document provides a set of guidelines for developing XML documents and schemas that are internationalized properly. Following the best practices describes here allow both the developer of XML applications, as well as the author of XML content to create material in different languages.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at Internationalization Tag Set (ITS) Working Group, part of the archives for this list are publicly available.
Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
This document is a complement to the W3C Recommendation Internationalization Tag Set (ITS) Version 1.0 [ITS]. However, not all internationalization-related issues can be resolved by the special markup described in ITS. The best practices in this document therefore go beyond application of ITS markup to address a number of problems that can be avoided by correctly designing the XML format, and by applying a few additional guidelines when developing content.
This document and Internationalization Tag Set (ITS) Version 1.0 [ITS] implement requirements formulated in Internationalization and Localization Markup Requirements [ITS REQ].
This set of best practices does not cover all topics about internationalization for XML. Other useful reference material includes: Character Model for the World Wide Web 1.0: Fundamentals [CharMod], and Unicode in XML and other Markup Languages [Unicode in XML].
This document is divided into two main sections:
The first one is intended for the designers and developers of XML applications (also referred to here as 'schemas' or 'formats').
The second is intended for the XML content authors. This includes users modifying the original content, such as translators.
Section 2: When Designing an XML Application provides a list of some of the important design choices you should make in order to ensure the internationalization of your format.
Section 4: Generic Techniques provides additional generic techniques such as writing ITS rules or adding an attribute to a schema. Such techniques apply to many of the best practices.
Section 5: ITS Applied to Existing Formats provides a set of concrete examples on how to apply ITS to existing XML based formats. This section illustrates many of the guidelines in this document.
Section 3: When Authoring XML Content provides a number of guidelines on how to create content with internationalization in mind. Many of these best practices are relevant regardless of whether or not your XML format was developed especially for internationalization.
Section 4.1: Writing ITS Rules provides practical guidelines on how to write ITS rules. Such techniques may be useful when applying some of the more advanced authoring best practices.
Designers and developers of XML applications should take into account the following best practices:
| Best Practice | Implementing as a new feature | Handling legacy markup |
|---|---|---|
| Defining markup for natural language labelling | Make sure the its:dir attribute is defined for the root element of your document, and for any element that has text content. | Provide an ITS Rules document where you use the its:translateRule elements to indicate which elements have non-translatable content. |
| Defining markup to override translate information |
| Provide an ITS Rules document where you use the ITS Translate data category. |
| Providing information related to text segmentation | Provide an ITS Rules document where you use its:rubyRule element to associate your ruby markup with its equivalent in ITS. | |
| Defining markup for notes to localizers |
| Provide an ITS Rules document where you use the its:termRule elements to indicate which elements are terms and information related to them (e.g. definitions). |
| Defining markup for specifying or overriding terminology-related information |
| |
| Working with multilingual documents | For documents that need to go through some localization tasks, always store the localized version of the text in a separate document. | |
| Naming elements and attributes |
| Not applicable |
| Defining a span-like element | Make sure you define a span-like element in your schema that will allow authors to associate arbitrary content with properties such as directionality, language information, etc. | If no span-like element already exists in your schema, you may be able to use its:span. |
| Documenting internationalization and localization features of your schema | Make sure you document the internationalization and localization aspects of your schema by providing a set of relevant ITS rules in a single standalone ITS Rules document. | |
Where it says "How to implement this as a new feature", this section describes how to create new schemas or add new features to existing schemas. When doing this you may need to take into account the following:
Think twice before creating your own schema. Seriously consider using existing formats such as DITA, DocBook, Open Document Format, Office Open XML, XML User Interface Language, Universal Business Language, etc. Those formats have many useful insights already built in.
Check carefully whether an existing format comes with a built-in capability for modification. DocBook and DITA, for example, come with their own set of features for adapting their format to special needs.
The modification mechanisms available will depend on the schema language (DTD, XML Schema, RELAX NG, etc.) For example, namespace-based modularization of schemas is difficult to achieve with DTDs.
NVDL is an example of a meta-schema language was designed especially to allow integration of several existing vocabularies into a single XML vocabulary without the need to know the details of source schemas. This means that with NVDL you can usually create a schema for compound documents more easily than with other schema technologies.
Each schema language provides different ways of extending or modifying existing schemas. Some examples are the XML Schema part 1). Therefore a schema which works in one environment may not work in a different one.
What is possible also depends on the features of the schema which the modification is targeting. For example:
An XML Schema redefine is only possible if the modified schema has been created with named types.
If you are working with XML Schema, you can only apply the technique of 'chameleon' or 'proxy' schemas (see XML Schema document for ITS has a target namespace and therefore cannot be a 'chameleon' schema.
Note: The considerations above are only a portion of what you need to take into account. You need to know a lot more when diving into schema modularization.
The XML namespace provides the xml:lang attribute allows for empty values. That is:
In a DTD you must not use NMTOKEN as the data type, instead use CDATA.
In XML Schema the built-in data type language does not allow empty values. However, the declaration for http://www.w3.org/2001/xml.xsd does allow for empty values and therefore can be used.
It is not recommended to use your own attribute or element to specify the language of the content. The its:langRule element to specify what attribute or element is used instead of xml:lang.
Note: This example is a multilingual document, which has its own set of issues (see Best Practice 12: Working with multilingual documents).
<myRes> <messages> <msg id="1"> <langcode>en</langcode> <text>Cannot find file.</text> </msg> <msg id="2"> <langcode>fr</langcode> <text>Fichier non trouvé.</text> </msg> </messages> </myRes>
The corresponding ITS Rules document contains an xml:lang attribute and applies to the text element.
<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0"> <its:langRule selector="/text[../langcode]" langPointer="../langcode"/> </its:rules>
Why do this
Information about the language of content can be very important for correctly rendering or styling text in some scripts, applying spell-checkers during content authoring, appropriate selection of voice for text-to-speech systems, script-based processing, and numerous other reasons. You must provide a standard way to specify the language for the document as a whole, but also for parts of the document where the language changes.
In scripts such as Arabic and Hebrew characters may run from both left to right and right to left when displayed. Directional markup allows you to manage the flow of characters. For an example of how directional markup is used see its:dirRule element to address this requirement.
How to implement this as a new feature
Make sure the its:dirRule element to associate the different directionality indicators with their equivalents in ITS.
In this document the textdir attribute is used to specify directionality of a text run.
<text xml:lang="en"> <body> <par>In Hebrew, the title <quote xml:lang="he" textdir="r2l">פעילות הבינאום, W3C</quote> means <quote>Internationalization Activity, W3C</quote>.</par> </body> </text>
Note: This example shows the directionality of the source text correctly. This is to ensure that you understand the concepts being described. For such display, you need a sophisticated editor that resolves directionality of the source text correctly. Many editors are not yet this sophisticated. See the related discussion about ITS Directionality data category.
<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0"> <its:dirRule selector="/*[@textdir='l2r']" dir="ltr"/> <its:dirRule selector="/*[@textdir='r2l']" dir="rtl"/> <its:dirRule selector="/*[@textdir='lro']" dir="lro"/> <its:dirRule selector="/*[@textdir='rlo']" dir="rlo"/> </its:rules>
Why do this
Generally the Unicode bidirectional algorithm will produce the correct ordering of mixed directionality text in scripts such as Arabic and Hebrew. Sometimes, however, additional help is needed. For instance, in the sentence of Example 4 the 'W3C' and the comma should appear to the left side of the quotation. This cannot be achieved using the bidirectional algorithm alone.
The following will display incorrectly, since no directional markup has been used:
The title says "פעילות הבינאום, W3C" in Hebrew.
The text 'W3C' and the comma should be to the left of the quoted Hebrew text. If your browser supports bidirectional display, the following should appear correctly, since directional markup has been added to the element surrounding the quote:
The title says "פעילות הבינאום, W3C" in Hebrew.
The desired effect can be achieved using Unicode control characters, but this is not recommended (See Unicode in XML and other Markup Languages [Unicode in XML]). Markup is needed to establish the default directionality of a document, and to change that where appropriate by creating nested embedding levels.
Markup is also occasionally needed to disable the effects of the bidirectional algorithm for a specified range of text.
How to implement this as a new feature
Make sure you store all translatable text as element content, not as attribute values.
It is bad design to use the desc attribute to store the alternate descriptive text for the image element, as in this example.
<image src="elephants.png" desc="Elephants bathing in the Zambezi River."/>
Instead, define the content of image itself to hold the text you need. This way there is no translatable text in an attribute.
<image src="elephants.png">Elephants bathing in the Zambezi River.</image>
Note: In many cases, using translatable element content instead of translatable attributes will result in one sentence being embedded within another one. For instance, in Example 5 the description of the image will be embedded inside the text of the paragraph that contains it. In such cases, do not forget to declare the relevant element (here image) as 'nested', as described in Best Practice 6: Providing
information related to text segmentation.
Handling markup not in the ITS namespace
If you are working with an existing schema where there are attributes with translatable values, you should provide an ITS Rules document where you use the http://www.w3.org/TR/2007/REC-its-20070403/#trans-datacat
http://www.w3.org/TR/2007/REC-its-20070403/#elements-within-text
The its:translateRule to specify these translatable attributes.
Note: Where the language of content is given as xml:lang="zxx", where zxx indicates content that is not in a language, the element in question is probably not to be translated. You should provide a rule for this.
In the following document, the content of the head element should not be translated, and the value of the alt attribute should be translated. In addition, the content of the del element should not be translated.
<myDoc xml:lang='en'> <head> <id xml:lang="zxx">H4-A3-F8-A1</id> <author>Robert Griphook</author> <rev>v13 2007-10-27</rev> </head> <par>To start click <ins>the <ui>Start</ui> button</ins><del>green icon</del> and fill the form labeled by the following icon: <ref file="vat.png" alt="Value Added Tax Form"/></par> </myDoc>
The following rules specify exceptions from the default ITS behavior for documents like the one above.
<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0">
<its:translateRule selector="/myDoc/head" translate="no"/>
<its:translateRule selector="/*/@alt" translate="yes"/>
<its:translateRule selector="/del" translate="no" />
<its:translateRule selector="/@*[ancestor::del]" translate="no"/>
<its:translateRule selector="/*[lang('zxx')] | /@*[lang('zxx')]" translate="no"/>
</its:rules>First translateRule: The content of head in myDoc is not translatable. By inheritance, the child elements of head are also assumed not translatable.
Second translateRule: All the alt attributes are translatable.
Third translateRule: The content of del is not translatable.
Fourth translateRule: The non-translatability of del applies also to any attribute that may have been set as translatable by a prior rule (i.e. the second rule).
Fifth translateRule: Any element or attribute with their language set to zxx is not translatable.
Why do this
By default, ITS assumes that the content of all elements is translatable and that all attributes have non-translatable values. If your XML document type does not correspond to this default assumption it is important to indicate what are the exceptions. Doing so can significantly improve translation throughput.
The its:translate attribute is defined for the root element of your documents, and for any element that has text content.
For examples of how to add attributes in your existing schema see Section 4.2: Example of adding an attribute to an existing schema.
It is also recommended that you define the its:translateRule element to associate this mechanism with the ITS Translate data category. The order in which the rules are listed is important:
First translateRule: Indicates that the content of any element with a translate attribute set to no is not translatable.
Second translateRule: Indicates that any attribute value of any element with a translate attribute set to no is not translatable. This is needed because some attributes are translatable in DITA and we need to make sure they are not translated when translate="no" is used in the elements where they are.
Third translateRule: Indicates that the content of any element with a translate attribute set to yes is translatable. This takes care of the cases where translate="yes" is used to override a prior translate="no".
<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0"> <its:translateRule selector="/*[@translate='no']" translate="no"/> <its:translateRule selector="/*[@translate='no']/descendant-or-self::*/@*" translate="no"/> <its:translateRule selector="/*[@translate='yes']" translate="yes"/> </its:rules>
You can find a more complete example of how DITA markup is associated with ITS in Section 5.4.2: Associating existing DITA markup with ITS.
Why do this
In some cases, the author of a document may need to change the translatability property on parts of the content, overriding ITS default behavior, or the general rules for the schema that you have specified when applying Best Practice 4: Indicating which elements and attributes should be translated.
Segmentation refers to how text is broken down, from a linguistic viewpoint, into units that can be handled by processes such as translation.
The its:withinTextRule elements to indicate which elements should be treated as either part of their parents, or as a nested but independent run of text. By default, element boundaries are assumed to correspond to segmentation boundaries.
In the following DITA document:
The elements term and b should be treated as part of their parent.
The element fn should be treated as an independent run of text.
<concept id="myConcept" xml:lang="en-us">
<title>Types of horse</title>
<conbody>
<ol>
<li>Palouse horse:<p><term>Palouse horses</term><fn>A palouse horse
is the same as an <b>Appaloosa</b>.</fn> have spotted coats.
The <term>Nez-Perce</term> Indians have been key in breeding this
type of horse.</p></li>
</ol>
</conbody>
</concept>The its:withinTextRule element is used to specify the behavior of three elements, all other elements are assumed to have the value its:withinText="no":
First withinTextRule: The elements term and b are defined as part of the text flow.
Second withinTextRule: The element fn is defined as a separate bit of content nested inside its parent element.
<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0"> <its:withinTextRule selector="/term | /b" withinText="yes"/> <its:withinTextRule selector="/fn" withinText="nested"/> </its:rules>
These rules applied to the DITA document above will result in four distinct runs of text:
title: "Types of horse"
li: "Palouse horse:"
p: "{term}Palouse horses{/term}{fn/} have spotted coats. The {term}Nez-Perce{/term} Indians have been key in breeding this type of horse."
fn: "A palouse horse is the same as an {b}Appaloosa{/b}."
Why do this
Many applications that process content for linguistic-related tasks need to be able to perform a basic segmentation of the text content. They need to be able to do this without knowing the semantics of the elements.
While in many cases it is possible to detect mixed content automatically, there are some situations where the structure of an element makes it impossible for tools to know for sure where appropriate segmentation boundaries fall. For example, the boundaries of some inline elements, such as emphasis, do not typically correspond to segmentation boundaries; on the other hand, some inline elements embedded in a parent element, such as footnotes or quotations, may define segments that should be handled separately from the text in which they are embedded.
Intelligent segmentation is particularly important in translation to successfully match source text against translation-memory databases.
Ruby text is used to provide a short annotation of an associated base text. It is most often used to provide a reading (pronunciation) guide.
The its:ruby element and its children are defined for all elements where there is text content.
Handling markup not in the ITS namespace
If you are working with an existing schema where there is a way to specify ruby text that has the same semantics as the its:rb, rParen as its:rt.
<text> <para>この本は <rubyBlock> <rBase>慶応義塾大学</rBase> <rParen>(</rParen> <rText>けいおうぎじゅくだいがく</rText> <rParen>)</rParen> </rubyBlock>の歴史を説明するものです。</para> </text>
This its:rt and its:rt have equivalent elements as well.
<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0"> <its:rubyRule selector="/rBase" rubyPointer=".." rpPointer="../rParen" rtPointer="../rText" /> </its:rules>
Why do this
Ruby is a type of annotation for text. It can be used with any language, but is very commonly used with East Asian scripts to provide phonetic transcriptions of characters that are likely to be unfamiliar to a reader. For example it is widely used in educational materials and children’s texts. It is also occasionally used to convey information about meaning.
Because ruby annotation may be needed when localizing into Japanese or Chinese, it is a good idea to make provision for it in your schema, even if your original documents are to be developed into a language that does not use such markup.
The its:locNoteRef, as well as the its:locNoteRef are defined in your schema. This markup allows content authors to provide localization-related notes as its:locNote element specifies that the message with the identifier NotFound has a corresponding explanation note in an external HTML file. The URI for the exact location of the note is stored in the its:locNoteRef attribute.
<myRes>
<head>
<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0">
<its:locNoteRule locNoteType="description"
selector="/msg[@id='NotFound']"
locNoteRef="EX-devlocnotes-4.html#NotFound" />
</its:rules>
</head>
<body>
<msg id="NotFound">Cannot find {0} on {1}.</msg>
</body>
</myRes>The HTML file with the localization notes is a simple document with the anchor elements corresponding to the identifiers in the referring XML document.
<!DOCTYPE html PUBLIC "-/W3C/DTD XHTML 1.0 Transitional/EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Language" content="en-us">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Localization Notes</title>
</head>
<body lang="en">
<p><a name="NotFound"></a>{0} is a filename<br />
{1} is a directory name</p>
</body>
</html>
It is also recommended that you define the its:locNote element.
The its:locNoteRule
The its:locNote element with the message that has the identifier 'DisableInfo', and flags it as important. This would also work if the rule was in an external file, allowing content authors to provide notes without modifying the source document.
<myDoc>
<head>
<its:rules xmlns:its="http://www.w3.org/2005/11/its"
version="1.0" its:translate="no">
<its:locNoteRule locNoteType="alert" selector="/msg[@id='DisableInfo']">
<its:locNote>The variable {0} has three possible values: 'printer',
'stacker' and 'stapler options'.</its:locNote>
</its:locNoteRule>
</its:rules>
</head>
<body>
<msg id="DisableInfo">The {0} has been disabled.</msg>
</body>
</myDoc>Note: The example includes its:translate="no" in the its:locNote attribute to store note text, offering the possibility of closely associating the note with the relevant content, using this approach makes it difficult to annotate the notes themselves for language, directionality, etc.
It can be argued that notes, being metadata, have different requirements to the content itself. Schema developers should carefully consider which approach to use. If all notes will always be written by English-speaking content developers, it may be acceptable to use attribute values, but if notes may be written by content developers in Arabic or Hebrew, they are almost certainly going to want to use directional markup and span elements in the notes themselves, so an element-based approach would almost certainly be better.
Handling markup not in the ITS namespace
If you are working with an existing schema where there is a way to provide notes to the localizers that is not implemented using ITS, you should provide an ITS Rules document where you use the its:locNoteRule element to associate your notes markup with its equivalent in ITS.
In this document the comment element is a note for its sibling text element.
<messages>
<msg id="ERR_NOFILE">
<text>The file '{0}' could not be found.</text>
<comment>The variable {0} is the name of a file.</comment>
</msg>
</messages>The its:locNoteRule element specifies that the text elements have an associated localization description in their sibling comment elements.
<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0"> <its:locNoteRule selector="/msg/text" locNoteType="description" locNotePointer="../comment"/> </its:rules>
Why do this
To assist the translator to achieve a correct translation, authors may need to provide information about the text that they have written. For example, the author may want to do the following:
Tell the translator how to translate part of the content (e.g. "Leave text in uppercase").
Expand on the meaning or contextual usage of a particular element, such as what a variable refers to or how a string will be used on the UI.
Clarify ambiguity and show relationships between items sufficiently to allow correct translation (e.g. in many languages it is impossible to translate the word 'enabled' in isolation without knowing the gender, number and case of the thing it refers to.)
Explain why text is not to be translated, point to text reuse, or describe the use of conditional text.
Indicate why a piece of text is emphasized (important, sarcastic, etc.)
its:termInfoRef can be of any type (e.g. human-readable or machine-specific). It is up to the application processing the data to make the distinction.In this document, the elements term and dt, as well as any element with a syn attribute, denote terms. In addition, they can all have associated information.
<myDoc> <body> <p>A <term def="d001" syn="#alterego">doppelgänger</term> is basically <def xml:id="d001">the counterpart of a person</def>. It is almost the same as an <emph syn="#alterego">alter ego</emph>, but with a more sinister connotation. Sometimes the word <emph syn="#alterego">fetch</emph> is also used.</p> </body> <definitions> <entry xml:id="alterego"> <dt>alter ego</dt> <dd>A second self. Figurative sense: trusted friend.</dd> <origin>Latin, literally: "second I"</origin> </entry> </definitions> </myDoc>
The set of ITS rules below indicates:
First termRule: The term element is a term and its associated information can be accessed in the node that has the identifier corresponding to the value in its def attribute.
Second termRule: Any element with a syn attribute is considered a term and the syn attribute contains a URI location where some associated information can be found.
Third termRule: The dt element is a term and its associated information is in its sibling element dd.
<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0"> <its:termRule selector="/term" term="yes" termInfoPointer="id(@def)"/> <its:termRule selector="/*[@syn]" term="yes" termInfoRefPointer="@syn"/> <its:termRule selector="/dt[../dd]" term="yes" termInfoPointer="../dd"/> </its:rules>
Why do this
The capability of specifying terms within the source content is important for terminology management and beneficial to translation and localization quality. For example, term identification facilitates the creation of glossaries and allows the validation of terminology usage in the source and translated documents.
Term identification is also useful for change management and to ensure source language quality.
Terms may require various associated information, such as part of speech, gender, number, term types, definitions, notes on usage, etc. To avoid associated information to be repeated throughout a document, it should be possible for identified terms to link to externalized attribute data, such as glossary documents and terminology database.
The its:termRule element to address this requirement.
How to do this
Make sure the its:rules element provides access to the http://www.w3.org/TR/2007/REC-its-20070403/#terminology
This best practice refers specifically to situations where copies of the same content are stored in multiple languages in a single document. It is perfectly acceptable to have multilingual text in a document otherwise.
How to do this
For documents that need to go through some localization tasks, always store the localized version of the text in a separate document.
This is an example of bad design. It shows a single document that contains multiple translations of the same content:
<messages> <msg xml:id='fileNotFound'> <text xml:lang="en">File not found.</text> <text xml:lang="fr">Fichier non trouvé.</text> </msg> </messages>
Instead, use one document for each language. Here one in English, and the other one in French. Other languages would go in similar separate documents.
<messages xml:lang="en"> <msg xml:id='fileNotFound'> <text>File not found.</text> </msg> </messages>
<messages xml:lang="fr"> <msg xml:id='fileNotFound'> <text>Fichier non trouvé.</text> </msg> </messages>
Note: It is admissible to store multilingual copies of a content in a single document before the document to send to localization, or after all localization tasks are done. For example, a final resource file could be constructed by collating the different language entries.
Note: It is admissible to provide the localizer with multilingual documents in XML formats that are specifically designed for localization, and are industry standards, like the XML Localisation Interchange File Format [XLIFF 1.2].
Why do this
There are two main reasons to avoid sending documents for localization if the source material is located in parallel with the different translations in the same document:
It is difficult to manage concurrent translations in all languages. It is very likely that each translation will be done by a different translator, in a different location. To facilitate this, the document will have to be broken down into separate parts and reconstructed later on. This adds processing time, increases cost and provides more opportunities for the introduction of errors.
Depending on the point in the document's lifecycle, such a document may already contain translations, some up-to-date and some outdated (because the source material may have changed). In order to identify what parts need to be localized and what parts should be left alone, the document would then also need to contain custom information about localization state, which may or may not be supported by localization tools.
How to do this
Make sure the names of the elements and attributes of your schema reflect their functions, rather than one possible way of rendering their content.
This is an example of bad design. The element b is used for several purposes.
<doc> <p>To run the application, click the <b>Start</b> button.</p> <p><b>Make sure to enter your username</b>, and then press <b>OK</b>.</p> </doc>
Instead, define different elements based on their functions rather than a pre-supposed rendering.
<doc> <p>To run the application, click the <ui>Start</ui> button.</p> <p><emph>Make sure to enter your username</emph>, and then press <ui>OK</ui>.</p> </doc>
Also, if possible, avoid element names which do not follow a fixed naming scheme (for example, element names that serve also as identifiers).
This is an example of bad design. The names of the elements also serve as text identifiers.
<strings> <str1>Input path:</str1> <str2>Help</str2> <str3>OK</str3> <str4>Cancel</str4> </strings>
Instead, use elements names that follow a fixed naming scheme, and use xml:id for the identifiers.
<strings> <str xml:id="str1">Input path:</str> <str xml:id="str2">Help</str> <str xml:id="str3">OK</str> <str xml:id="str4">Cancel</str> </strings>
Why do this
The name of an element should indicate what its function is, not how its content will be presented, because presentation may vary depending on different factors such as language, script, medium, or accessibility.
Using documents where elements or attributes do not follow a predictable naming pattern may cause problems when using XSLT-driven processes. It may also be an issue for translation tools. This is especially true if not all parts of the document are to be translated, since it would be more difficult to specify rules to distinguish the translatable nodes from the non-translatable ones.
A span-like element is an element that can be used to mark up arbitrary content and associate it with various properties such as directionality or language information. Examples of such an element include the span element in XHTML, or the phrase element in DocBook.
How to do this
Make sure you define a span-like element in your schema that will allow authors to associate arbitrary content with properties such as directionality, language information, etc.
If your schema does not already provide such an element, you could provide the http://www.w3.org/TR/2007/REC-its-20070403/
How to do this
Make sure you document the internationalization and localization aspects of your schema by providing a set of relevant ITS rules in a single standalone ITS Rules document.
Your ITS Rules document should include the following information, when applicable:
The correspondence between any proprietary mechanism you have to specify the language of content and its:translate (see Best Practice 5: Defining markup to override translate information).
The list of elements that should be treated as "nested" or "within text" from a segmentation viewpoint (see Best Practice 6: Providing information related to text segmentation).
The correspondence between any proprietary mechanism you have to mark up ruby text and its:ruby (See Best Practice 7: Defining markup for ruby text).
What part of your markup holds notes for the localizers (see Best Practice 8: Defining markup for notes to localizers).
What part of your markup denotes terms and term-related information (see Best Practice 10: Identifying terminology-related elements).
You can find some examples of ITS Rules documents for existing XML formats in Section 5: ITS Applied to Existing Formats.
Why do this
Although some XML vocabularies are easy to understand or process, it is often helpful or necessary to provide explicit information about a given vocabulary. If such a vocabulary is to be used in a multilingual context, it is of high importance to provide information, such as which elements contain translatable content, because general information on purpose, general structure, and node types very often are not sufficient. In a way, this need for explicit information is related to the general good practice of documenting source code.
In XML it should come naturally to use a well-defined, structured format to capture such information. For information related to internationalization and translation, ITS Rules documents are a good choice for the following reasons:
They are designed to take into account many important aspects of internationalization and translation.
They capture information precisely (for example, selectors identify to which nodes a data category pertains).
They can be processed by ITS-aware applications.
They can be easily combined with additional structured information (e.g. related to version control, as shown in the example below).
An ITS processor should still be able to process a file as an external ITS rules file if the format of the file contains your own customized information in addition to the ITS rules. The following is an example of that.
<myFormatInfo xmlns:its="http://www.w3.org/2005/11/its"> <desc>ITS rules used by the Open University</desc> <hostVoc>http://www.example.com/ns/myFormat</hostVoc> <rulesId>98ECED99DF63D511B1250008C784EFB1</rulesId> <rulesVersion>v 1.81 2006/03/28 07:43:21</rulesVersion> <its:rules version="1.0"> <its:translateRule selector="/header" translate="no"/> <its:translateRule selector="/term" translate="no"/> <its:termRule selector="/term" term="yes"/> <its:withinTextRule withinText="yes" selector="/term|/b"/> </its:rules> </myFormatInfo>
Authors of XML content should consider the following best practices:
| Best Practice | Summary |
|---|---|
| Specifying the language of content | Use its:locNote, its:termInfoRef (or their equivalent in your schema) to mark terms and supply term-related information. |
| Storing markup from another format | If possible, use the XML namespace mechanism to store different vocabularies inside a single XML document. |
A number of these practices can be followed only when the XML application has been internationalized properly using the design guidelines in Section 2: When Designing an XML Application.
Your schema should provide the xml:lang set to fr.
<document xml:lang="en"> <para>The motto of Québec is the short phrase: <q xml:lang="fr">Je me souviens</q>. It is chiseled on the front of the Parliament Building.</para> </document>
If the schema you are using does not provide an xml:lang when used with the lang element.
<stringList> <msg id="connected"> <lang code="cs">Jste připojeni k Internetu.</lang> <lang code="de">Sie sind an das Netz angeschlossen.</lang> <lang code="fr">Vouz êtes connecté à la Toile.</lang> <lang code="it">Sei connesso al Web.</lang> <lang code="ja">インターネットに接続しました。</lang> <lang code="ko">웹에 연결되었습니다.</lang> <lang code="ru">Вы подключены к Интернету.</lang> </msg> </stringList>
Note: This example is a multilingual document, which has its own set of issues as described in Best Practice 12: Working with multilingual documents.
The developer of the stringList document type should provide an ITS Rules document in compliance with Best Practice 1: Defining markup for natural language labelling for existing schemas. Here the xml:lang.
<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0"> <its:langRule selector="/lang[@code]" langPointer="@code" /> </its:rules>
Note: In some cases, a change in language has implications for translation. For example, content in a different language may have to remain untranslated, or require specific handling. Such information could be provided to the localizer using http://www.rfc-editor.org/rfc/bcp/bcp47.txt
http://www.w3.org/International/tests/sec-cjk-fonts.html
Your schema should provide its:dir is used to specify the directionality of a right-to-left text run in a document that is by default left-to-right.
<text xmlns:its="http://www.w3.org/2005/11/its" xml:lang="en" its:version="1.0"> <body> <par>In Hebrew, the title <quote xml:lang="he" its:dir="rtl">פעילות הבינאום, W3C</quote> means <quote>Internationalization Activity, W3C</quote>.</par> </body> </text>
Without the markup, the Hebrew title will display incorrectly. The text 'W3C' and the comma will be to the right of the quoted Hebrew text, rather than to its left. The markup provides the contextual information that tells the user agent that the comma and 'W3C' text are part of a right-to-left flow of text.
Note: This example shows the directionality of the source text correctly. This is to ensure that you understand the concepts being described. For such display, you need a sophisticated editor that resolves directionality of the source text correctly. Many editors are not yet this sophisticated. See the related discussion about http://www.w3.org/TR/2007/NOTE-unicode-xml-20070516/
Your schema should provide its:translate the author can indicate that the last par should not be translated.
Note that the author does not need to specify that the head element should not be translated, because this is defined for all documents of type myDoc by the ITS Rules document provided by the developer of the myDoc schema (see just below).
<myDoc xmlns:its="http://www.w3.org/2005/11/its" its:version="1.0"> <head> <lastRev>2007-10-23 041254Z</lastRev> <docID>1A454AE4-7EB8-4ed2-A58E-1EC7F75BB0D5</docID> </head> <par>To apply these terms to you library, attach the following notice. It is safest to attach it to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found.</par> <par>The notice should read (preferably in English):</par> <par its:translate="no">This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. This software is distributed as open source under LGPL.</par> </myDoc>
This is the ITS Rules document created by the developer of the myDoc document type (implementing Best Practice 4: Indicating which elements and attributes should be translated). These rules override the ITS default that all element content should be translated, but attribute values should not.
<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0"> <its:translateRule selector="/myDoc/head" translate="no"/> <its:translateRule selector="/img/@alt" translate="yes"/> </its:rules>
This is what the rules mean:
First translateRule: The head element and its children should not be translated.
Second translateRule: The alt attribute of any img element should be translated.
To override translate information for attributes, you have to use an its:translateRule.
<myDoc xmlns:its="http://www.w3.org/2005/11/its" its:version="1.0"> <head> <lastRev>2007-11-12 234503Z</lastRev> <docID>D1EA7453-DC53-488a-B950-137BE0EF5253</docID> <its:rules> <its:translateRule selector="/img[@role='ui']/@alt" translate="no"/> </its:rules> </head> <par>Once you have selected your options, click the <img xml:lang="en-us" src="runBtn.png" role="ui" alt="Run"/> button to start the process.</par> </myDoc>
The its:translate is used to markup a proper name and two loan words in an attempt to indicate that they should not be translated. You should not do this.
<book xmlns:its="http://www.w3.org/2005/11/its" its:version="1.0"> <body> <p>Everything started when <span its:translate="no">Zebulon</span> discovered that he had a <span its:translate="no">doppelgänger</span> who was a serious baseball <span its:translate="no">aficionado</span>.</p> </body> </book>
It may, however, be useful to the translator to mark up loan-words or any special words in this example as terms, as described in the section Best Practice 23: Identifying terms.
Why do this
Although the set of ITS rules provided with the schema should specify any exceptions to the default ITS translation rules for a given schema (see Best Practice 4: Indicating which elements and attributes should be translated), there are cases where these general rules need to be overridden for specific elements, in specific documents. It is up to the author of the content to indicate these cases using markup.
Your schema should provide xml:id (or an equivalent mechanism) to allow you to assign unique identifiers to elements. See Best Practice 9: Defining markup for unique identifiers.
Segmentation refers to how text is broken down, from a linguistic viewpoint, into units that can be stored separately and handled by processes such as translation. The schema author ought to create a list of these elements where they differ from the ITS defaults (see Best Practice 6: Providing information related to text segmentation).
How to do this
Use unique identifiers in the way provided by your schema on each element that constitutes a segmentation boundary.
Note: Often, ids are automatically assigned by authoring or content management applications. Thus, authors may not have to worry about them in some cases.
If possible use globally unique and persistent values as identifier values.
Why do this
Providing unique identifiers can be very useful for change analysis, text tracking, and various other tasks often utilized during the authoring and the localization of documents.
This is explained in more detail in Best Practice 9: Defining markup for unique identifiers.
CDATA sections are often used to place programming code or other special vocabularies in XML with minimal effort. There are often better ways of including such content.
How to do this
Do not put content that will be translated into CDATA sections.
This is an example of bad design. In this document, part of the content is in a CDATA section. It is no longer possible to mark up that content for language changes, terms, text direction, translate information, or any of the other things that may be needed to facilitate localization.
<myData> <item course="12" page="2"> <title>Accessing the R&D facilities</title> <body><![CDATA[The R&D facilities are located in the South wing of Building 12-W, in the East quarter of the section Q. IMPORTANT ==> These facilities are accessible only to personal with Class Omega-45Q1 clearance.]]></body> </item> </myData>
Instead, use normal XML for your content. This allows you to tag the content as needed. For instance, here the author has added some terminology markup.
<myData xmlns:its="http://www.w3.org/2005/11/its" its:version="1.0"> <item course="12" page="2"> <title>Accessing the R&D facilities</title> <body>The R&D facilities are located in the South wing of Building 12-W, in the East quarter of the section Q. IMPORTANT ==> These facilities are accessible only to personal with <span its:term="yes">Class Omega-45-Q1</span> clearance.</body> </item> </myData>
If the CDATA section encloses a large, self-contained block of data, such as a script or an XML example, you may be able to replace the section by some inclusion mechanism such as XInclude or XLink.
In SVG you can place a script directly into an SVG document, in which case you usually use CDATA sections to avoid having to escape characters in the script's code.
<?xml version="1.0" encoding="utf-8"?>
<svg width="6cm" height="5cm" viewBox="0 0 600 500"
xmlns="http://www.w3.org/2000/svg" version="1.1">
<!-- Script is inlined and enclosed in CDATA section -->
<script type="text/ecmascript"> <![CDATA[
function circle_click(evt) {
var circle = evt.target;
var currentRadius = circle.getAttribute("r");
if (currentRadius < 100)
circle.setAttribute("r", currentRadius*2);
else
circle.setAttribute("r", currentRadius*0.5);
}
]]> </script>
<rect x="1" y="1" width="598" height="498" fill="none" stroke="blue"/>
<circle onclick="circle_click(evt)" cx="300" cy="225" r="10"
fill="red"/>
<text x="300" y="480"
font-family="Verdana" font-size="35" text-anchor="middle">
Click on circle to change its size
</text>
</svg>Instead, you could use XLink to store the script in a separate file and reference it from the SVG document.
<?xml version="1.0" encoding="utf-8"?>
<svg width="6cm" height="5cm" viewBox="0 0 600 500"
xmlns="http://www.w3.org/2000/svg" version="1.1"
xmlns:xlink="http://www.w3.org/1999/xlink">
<!-- Script is included from external file -->
<script type="text/ecmascript" xlink:href="animate.js"/>
<rect x="1" y="1" width="598" height="498" fill="none" stroke="blue"/>
<circle onclick="circle_click(evt)" cx="300" cy="225" r="10"
fill="red"/>
<text x="300" y="480"
font-family="Verdana" font-size="35" text-anchor="middle">
Click on circle to change its size
</text>
</svg>It is quite common to use CDATA sections to put examples of source code into XML documents. The following example shows how to do this using DocBook.
<?xml version="1.0" encoding="utf-8"?>
<example xmlns="http://docbook.org/ns/docbook">
<title>Skeleton of XHTML page</title>
<programlisting><![CDATA[<html xmlns="http://www.w3.org/1999/xhtml"
xml:lang="en">
<head>
<title>… page title goes here …</title>
</head>
<body>
… page content goes here …
</body>
</html>]]></programlisting>
</example>
Instead, you could use XInclude to store the example code in a separate file and include it during at processing time. Note that you have to use parse="text" to treat the included file as plain text rather than markup.
<?xml version="1.0" encoding="utf-8"?> <example xmlns="http://docbook.org/ns/docbook" xmlns:xi="http://www.w3.org/2001/XInclude"> <title>Skeleton of XHTML page</title> <programlisting><xi:include href="EX-xhtml-skeleton.xhtml" parse="text" encoding="utf-8"/></programlisting> </example>
If you must use CDATA sections:
Document the type of content (for example with an attribute set to the appropriate MIME-type). This may help tools to use an appropriate parser to process the content.
Aim to produce well-formed content. This will allow parsers to process the content more easily.
Note: CDATA is often used to store textual content containing HTML or XML tags. This is not recommended. See Best Practice 24: Storing markup from another format for more details.
Note: Using CDATA does not affect whether white-space is preserved or not by XML processors. To preserve white-space use the its:locNoteRef (or equivalent mechanisms) to allow you to communicate with those who will localize your content. See Best Practice 8: Defining markup for notes to localizers.
How to do this
Use its:locNoteRef is used to point to an explanation of the acronym RFID.
its:locNote is used to indicate what kind of value the element <xsl:value-of select="PNum"/> corresponds to.
Note: When working with XSLT, you need to decide whether the ITS markup should be in the output or not, and may have to use different markup accordingly. In this example, the ITS attributes do not appear in the output.
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns="http://www.w3.org/1999/xhtml"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:its="http://www.w3.org/2005/11/its"
its:version="1.0">
<xsl:template match="/data">
<xsl:variable name="Lang" select="Lang"/>
<xsl:variable name="EMail" select="EMail"/>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="{$Lang}" lang="{$Lang}">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<title>Login</title>
</head>
<body>
<p>Login Into Queztal-Systems</p>
<form method="POST">
<table border="0" id="table2">
<tr><td>First, place your pass card in front of the reader to scan your
<xsl:text its:locNoteRef="http://en.wikipedia.org/wiki/RFID">RFID</xsl:text>.
When the light turns green, enter your password in the box below, and
click Submit.</td></tr>
<tr><td><input type="password" name="pword" size="25"/></td></tr>
</table>
<p><input type="submit" value="Submit" name="go"/></p>
</form>
<p>If you have difficulties login in, please call
<xsl:value-of select="PNum" its:locNote="Toll-free phone number"/>,
or send an email to
<a href="mailto:{$EMail}"><xsl:value-of select="EMail"/></a>.</p>
</body>
</html>
</xsl:template>
</xsl:stylesheet>Why do this
There are many reasons to provide information to localizers. You may want to:
Expand on the meaning or contextual usage of a particular element, such as what a variable refers to or how a string will be used in the user interface.
Clarify ambiguity and show relationships between items sufficiently to allow correct translation. For example, in many languages it is impossible to translate the word "enabled" in isolation without knowing the gender, number and case of the thing it refers to.
Explain why text is not translated, point to text reuse, or describe the use of conditional text.
Indicate why a piece of text is emphasized (important, sarcastic, etc.)
Using XML comments for doing this may not be enough as they may get stripped out or ignored during the localization process.
Inserted text refers to any text that is marked by a placeholder in the source XML document and automatically inserted within text content when the document is processed.
Types of inserted text include:
Boilerplate text reused in different contexts.
Various parts of a sentence composed by bringing together separate pieces of text.
Variable placeholders that are replaced by their values when the document is processed.
The implementation of such text can be done in different ways in XML. Some examples are:
Using entity references.
Using XSLT processing.
Using XInclude mechanisms.
Using XLink mechanisms.
Using a custom mechanism specific to a given format (e.g. the conref attribute in [DITA 1.0]).
How to do this
Use inserted text only when the text is self-contained and does not affect its surrounding context. For example, titles and quotations are inserted text that, usually, would not cause problems.
Avoid using inserted text that has any effect or dependence on the context where it is inserted.
For additional background information about issues and approaches related to text insertion and re-use of text see the articles its:termInfoRef (or their equivalents in your schema) to provide the localizers with some context. See Best Practice 21: Providing notes for localizers and Best Practice 23: Identifying terms.
In this example, in the first message, the element var is used to insert the name of a printer. In the second example, it is used to insert a filename. The its:locNote attribute is utilized to provide a description of what the variables represent. This may help in deciding how to translate each message.
<strings xmlns:its="http://www.w3.org/2005/11/its" xml:lang="en" its:version="1.0"> <msg id="pmAdded">The printer <var arg="0" its:locNote="Printer name"/> has been added to the list.</msg> <msg id="fmAdded">The file <var arg="0" its:locNote="Filename"/> has been added to the list.</msg> </strings>
This is a French translation of the document shown above. The context provided allowed to disambiguate the variable and to get a more accurate translation.
<strings xmlns:its="http://www.w3.org/2005/11/its" xml:lang="fr" its:version="1.0"> <msg id="pmAdded">L'imprimante <var arg="0" its:locNote="Printer name"/> a été ajoutée à la liste.</msg> <msg id="fmAdded"><var arg="0" its:locNote="Filename"/> a été ajouté à la liste.</msg> </strings>
Why do this
If not used properly, inserted text can cause important (and sometimes unresolvable) problems during localization. Consider the following:
This is an example of bad design. In this example, the author, working with the DITA format [DITA 1.0], decided to reference a term in a termbase by using the conref mechanism. In this case, the term t123 in termbase.xml has the value 'hydraulic lift'.
<p>Using a <term conref="termbase.xml#t123"/>, raise the vehicle from the ground.</p>
At a first glance the example above seems to work fine in English. However, such a construction has several problems:
You should not separate the article from the noun. If "hydraulic lift" is independently replaced in the future by some other term, you may need to change the article to 'an' or remove it.
The article/noun separation also causes trouble for the translators. Without any easy way to see the actual term when translating the paragraph, they may not be able to decide the gender or number of the article.
If it is used at the beginning of a different sentence, the term would need to be capitalized.
The term is singular in the termbase, but it may need to be plural somewhere else in the document.
In inflected languages the form required in the text may be different from the form stored in the termbase. For example, in Polish the term would be stored in its nominative form ("dźwignia hydrauliczna"), while it should be in its instrumental form once inserted in this context: "Używając dźwignię hydrauliczną podnieś pojazd z ziemi."
Your schema should provide its:term and its:termInfoRef to associate Vector Files with its corresponding term information.
<myManual xmlns:its="http://www.w3.org/2005/11/its" its:version="1.0">
<head>
<its:rules>
<its:termRule selector="/ui" term="yes"/>
</its:rules>
<title>Generating <span its:term="yes" its:termInfoRef="#vFile">Vector
Files</span></title>
</head>
<body>
<par>Select the command <ui>Build Output Files</ui> from the
<ui>Tasks</ui> menu to generate the final <term ref="vFile">vector
files</term>.</par>
</body>
<extra>
<terms>
<termDef xml:id="vFile">A <emph>vector file</emph> is a binary document
that contains the complete set of vectors needed to draw the background
layer of a map.</termDef>
</terms>
</extra>
</myManual>This ITS Rules document is the one created by the developer of the myManual document type (in implementing Best Practice 10: Identifying terminology-related elements). It provides one termRule element indicating that any term element is a term and its associated information is located in the element that is identified with the value stored in the ref attribute of term.
<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0"> <its:termRule selector="/term" term="yes" termInfoRef="id(@ref)"/> </its:rules>
Why do this
If you do not indicate what words are terms of interest in the content, the translators will not know that these terms need to be translated consistently. Often, multiple translators are working on different files in a given project, and the way they choose to translate specific words can be inconsistent with the way that other translators have translated them. If important terms are marked in the content, they can extract these terms from the content before the content is translated, and pre-translate them in the form of a shared electronic dictionary. This ensures consistency of translation of important terms.
While markup denoting terms for a given schema level should be specified in a set of ITS rules provided with the schema (See Best Practice 10: Identifying terminology-related elements), there are cases where these general rules need to be overridden or complemented for specific elements, in specific documents. It is up to the author of the content to provide such overriding markup.
How to do this
If possible, use the XML namespace mechanism to store different vocabularies inside a single XML document.
In this document, the elements top and body both contain HTML markup coded as text. There is no easy way to make the distinction between the HTML markup and the HTML text content.
<pages> <row> <key>ENConvClasses</key> <top><span class="h1">Elibur Library</span> - Conversation Groups</top> <body><![CDATA[<p>These small discussion groups meet <b>weekly</b> and are for people learning English. Each group is led by a volunteer who is a native speaker of American English. Groups converse about books, articles, and other materials.</p> <p>Space is limited. Ask for availability to <a href="mailto:[email protected]"> [email protected]</a>.</p>]]></body> </row> </pages>
Instead, use the XML namespace mechanism. Here the content of top and body is now a mix of text and XHTML elements. This avoid any confusion between text and HTML tags.
<pages xmlns:h="http://www.w3.org/1999/xhtml"> <row> <key>ENConvClasses</key> <top><h:span class="h1">Elibur Library</h:span> - Conversation Groups</top> <body><h:p>These small discussion groups meet <h:b>weekly</h:b> and are for people learning English. Each group is led by a volunteer who is a native speaker of American English. Groups converse about books, articles, and other materials.</h:p> <h:p>Space is limited. Ask for availability to <h:a href="mailto:[email protected]">[email protected]</h:a>.</h:p></body> </row> </pages>
Another alternative to using markup as text is to store it externally and include it into the document using a mechanism such as XInclude or XLink.
If you must include markup as text content:
Make sure to document the type of content, for example with an attribute set to the appropriate MIME-type. This may help tools to use a more appropriate parser to process the given content.
Aim at having the content well-formed. This will allow parsers to process it more easily.
Why do this
Some XML documents are used to store different types of data for purposes such as exchange or export. In some cases such data is itself XML data. For example, some XHTML content stored in a database can be exported to an XML container file for localization and re-imported back into the database.
Note: The use of escaping for literal examples of markup is not a problem. The issue is only for large volume of XML/HTML data contained in another XML document.
Storing such XML data inside XML elements as text content (i.e. with its markup tags escaped), has several drawbacks:
Any handling of such content is made difficult by the impossibility to separate text from markup without extra processing.
Often, such content is put in CDATA sections, which has its own set of issues. See Best Practice 20: Avoiding CDATA sections.
The escaped markup cannot be validated.
If there is a process turning markup into escaping, there is the danger of double escaping.
This section provides a set of generic techniques that are applicable to various guidelines; for example, how to add ITS attributes to different types of schemas, or how to optimize XPath expressions for the ITS selector attribute.
Whether they are external or embedded, there are a few things you should take into consideration when writing ITS rules.
Try to keep the number of nodes to be overridden to a minimum. This improves performance. For example, if most of a document should not be translated, it is better to set the root element to be non-translatable than to set all elements. The inheritance mechanism will have the same effect for a much lower computing cost.
Because a rule has precedence over the ones before, you should start with the most general rules first and progressively override them as needed. Some rules may be more complex if they need to take into account all the aspects of inheritance.
ITS 1.0 defines the its:translate attribute, have the highest precedence.
Next are its:rules element have an inherent precedence which depends on their position in the its:rules element.
its:translateRule element has higher precedence than the one before, so it can be used to describe an exception: all <p> elements are still to be translated. This shows the interplay between different rules and demonstrates that the last one always "wins".
Another exception to the first its:translateRule element would be inherited, and this <notes> element would not be translatable.
Finally, the content of the <documentation> element within the <head> element is also translatable, but not the content of any attributes in the document. This demonstrates the role of defaults for the ITS Translate data category.
<doc xmlns:its="http://www.w3.org/2005/11/its"> <head> <documentation>Some translatable text.</documentation> <its:rules version="1.0"> <its:translateRule selector="/text" translate="no"/> <its:translateRule selector="/p" translate="yes"/> </its:rules> </head> <text> <data>Some data with <code>coded parts</code> (<notes its:translate="yes"> and translatable text</notes>).</data> <p>Some text with <b>bolded words</b>.</p> </text> </doc>
When writing rules for documents that use XML namespaces you must make sure that you declare the namespaces, and use the relevant prefixes in the different XPath expressions.
The first document uses several different XML vocabularies:
The host format is not associated with any namespace. Its elements have no prefix.
The "inventory-book" vocabulary is associated with the namespace http://www.example.com/inventory-book. The elements belonging to that namespace have a bk prefix.
The XHTML vocabulary is associated with the namespace http://www.w3.org/1999/xhtml. The elements belonging to that namespace ave a h prefix.
The XLink vocabulary is associated with the namespace http://www.w3.org/1999/xlink. There is one attribute belonging to that namespace and it has a xlink prefix.
The ITS vocabulary is associated with the namespace http://www.w3.org/2005/11/its. There is one element belonging to that namespace and it has an its prefix.
<inventory xmlns:bk="http://www.example.com/inventory-book"
xmlns:h="http://www.w3.org/1999/xhtml"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:its="http://www.w3.org/2005/11/its">
<header>
<identity>3E039D7D-B416-47e8-83B3-3F4DF9EDDB87</identity>
<lastUpdate>2007-11-12</lastUpdate>
<desc>Inventory made by Joan, for shelves H to K only.</desc>
<its:rules version="1.0" xlink:href="EX-namespaces-2.xml" xlink:type="simple"/>
</header>
<list>
<bk:book xml:id="item00A83">
<bk:isbn>0312875819</bk:isbn>
<bk:quantity>2</bk:quantity>
<bk:type>HIST</bk:type>
<bk:author>Bradshaw, Gillian</bk:author>
<bk:pub>Forge Books; New Ed edition (June 2, 2001)</bk:pub>
<bk:title>The Sand-Reckoner</bk:title>
<bk:desc>
<h:p>Building on a few antique facts, Bradshaw ably recreates the extraordinary
life of Archimedes, the great mathematician and engineer who lived in Syracuse from
287 to 212 B.C. After a few years studying in Alexandria, Archimedes returns home
where his father is dying and his city at war with the Romans.
<h:img src="0312875819large.png" alt="The Sand-Reckoner (by Gillian Bradshaw)"/>
</h:p>
</bk:desc>
</bk:book>
</list>
</inventory>The XLink and ITS namespaces are just used for associating this document with the external ITS rules file shown below.
The ITS Rules document contains several rules that determine what parts of the inventory document should be translated. The rules use XPath expressions where the elements are prefixed. These prefixes are associated with the namespaces used in the inventory. Here is a description of each its:translateRule, from top to bottom:
The first indicates that the inventory element should not be translated. This is inherited by all the children of inventory. Most of the content of the inventory is not to be translated, so the easiest way to define the proper rules for this type of document is to say that the root element should not be translated, and then list all the exceptions.
The second indicates that the desc element of the host format should be translated.
The third indicates that the title of the http://www.example.com/inventory-book namespace should be translated.
The fourth indicates that the desc element of the http://www.example.com/inventory-book namespace should be translated.
The last indicates that the alt attribute in the HTML img element should be translated.
<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0" xmlns:book="http://www.example.com/inventory-book" xmlns:html="http://www.w3.org/1999/xhtml"> <its:translateRule selector="/inventory" translate="no"/> <its:translateRule selector="/desc" translate="yes"/> <its:translateRule selector="/book:title" translate="yes"/> <its:translateRule selector="/book:desc" translate="yes"/> <its:translateRule selector="/html:img/@alt" translate="yes"/> </its:rules>
ITS uses XPath expressions in several contexts to identify nodes. The most prominent contexts are selectors, and pointer attributes like those shown in the following rules:
<its:translateRule selector="/term" translate="no"/>
or
<its:locNoteRule locNoteType="description" selector="/msg/data" locNotePointer="../notes"/>
When writing ITS-related XPath expressions like the ones above, the following should be considered:
ITS XPath expressions pertain to XPath 1.0 or its successor
The values of ITS selector attributes are XPath absolute location paths
The values of ITS pointer attributes are XPath relative location paths. The ITS pointer attributes are: its:termInfoRefPointer, its:rbcPointer, Global Approach of the ITS Specification). Using only XSLT patterns in ITS selector attributes helps to avoid issues which may arise with respect to the match attribute in XSLT template elements.
In addition to this general advice, you should take into account best practices related to writing XPath expressions (see for example the XPath tutorial).
This example shows how to add an attribute (here xml:lang) to an existing document type. We will add the attribute to an element called para.
Note that this example only shows a few ways of adding attributes. There are many others, depending on the schema language and the modularization techniques used in the existing schema.
xml:lang declaration in XML Schema.<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<!-- Import for xml:lang and xml:space -->
<xs:import namespace="http://www.w3.org/XML/1998/namespace"
schemaLocation="http://www.w3.org/2001/xml.xsd"/>
...Once the xml.xsd schema is imported, you can use the reference to xml:lang in XML Schema.
...
<xs:element name="para">
<xs:complexType>
<xs:sequence maxOccurs="unbounded">
...
</xs:sequence>
<xs:attribute ref="xml:lang" use="optional"/>
</xs:complexType>
</xs:element>
...<element name="para"
xmlns="http://relaxng.org/ns/structure/1.0"
datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
<attribute name="xml:lang">
<choice>
<data type="language"/>
<value></value>
</choice>
</attribute>
...
</element>xml:lang in a DTD.<!ELEMENT para (#PCDATA)>
<!ATTLIST para
xml:lang CDATA #IMPLIED>This section presents several examples of how ITS can be used to enhance the internationalization readiness of some well-known XML document types. These examples are only illustrative and may have to be adapted to fit the needs of each specific user.
Two topics are covered for each format:
How should ITS be integrated in specific markup schemas? For example, for XHTML it promotes the interoperability of ITS implementations if you specify that the ITS selection mechanism for indicating what parts of an XML document the ITS translate data category and its values should be applied to.
The following XML vocabularies are discussed:
XHTML [XHTML 1.0] is a reformulation of the three HTML 4 document types as applications of XML 1.0. HTML is an SGML (Standard Generalized Markup Language) application, widely regarded as the standard publishing language of the World Wide Web.
In XHTML 1.0, the XHTML namespace may be used with other XML namespaces as per Namespaces in XML [XML Names], but such documents are no longer strictly conformant XHTML 1.0.
Here is an example of a document containing ITS rules which is a non-conformant XHTML 1.0 document.
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:its="http://www.w3.org/2005/11/its" lang="en" xml:lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="keywords" content="ITS example, XHTML translation" />
<its:rules version="1.0" xmlns:h="http://www.w3.org/1999/xhtml">
<its:translateRule selector="/h:meta[@name='keywords']/@content"
translate="yes" />
<its:termRule selector="/h:span[@class='term']" term="yes" />
</its:rules>
<title>ITS Working Group</title>
</head>
<body>
<h1>Test of ITS on <span class="term">XHTML</span></h1>
<p>Some text to translate.</p>
<p its:translate="no">Some text not to translate.</p>
</body>
</html>There are three ways to use ITS with XHTML and keep the XHTML document conformant:
Use XHTML Modularization [XHTMLMod1.1]. See Section 5.1.2: Using XHTML Modularization 1.1 for the Definition of ITS for details.
Use external ITS global rules, as shown in the following example. Even local information within the document that would be handled by ITS attributes can be set indirectly.
These rules illustrate some of the ITS data categories you can associate with specific XHTML markup. The first its:termRule indicates that any span element with class="term" is a term.
<its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0" xmlns:h="http://www.w3.org/1999/xhtml"> <its:translateRule selector="/h:meta[@name='keywords']/@content" translate="yes" /> <its:translateRule selector="/h:p[@class='notrans']" translate="no" /> <its:termRule selector="/h:span[@class='term']" term="yes" /> </its:rules>
The corresponding document:
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <meta name="keywords" content="ITS example, XHTML translation" /> <title>ITS Working Group</title> </head> <body> <h1>Test of ITS on <span class="term">XHTML</span></h1> <p>Some text to translate.</p> <p class="notrans">Some text not to translate.</p> </body> </html>
Use NVDL. See Section 5.1.3: Using NVDL to integrate ITS into XHTML for details.
This section describes how to use XHTML Modularization 1.1 [XHTMLMod1.1] for the definition of ITS. It first defines an ITS abstract module which is then implemented in the XML Schema format. The module is meant to be integrated in existing or new schemas which rely on XHTML Modularization 1.1.
The following is the abstract definition of the elements for its:rules element. See Section 5.1.4: Associating existing XHTML markup with ITS.
| Elements | Attributes | Minimal Content Model | ||||||
|---|---|---|---|---|---|---|---|---|
| rules | version (CDATA), xlink:href (URI), xlink:type ("simple") | ( translateRule | locNoteRule | termRule | dirRule | rubyRule | langRule | withinTextRule )* | ||||||
| translateRule | Selector, translate ("yes"|"no") | EMPTY | ||||||
| locNoteRule | Selector, locNotePointer (CDATA), locNoteType ("alert"| "description"), locNoteRef (URI), locNoteRefPointer (CDATA) | locNote? | ||||||
| locNote | translate ("yes"|"no"), locNote (CDATA), locNoteType ( "alert" | "description"), locNoteRef (URI), termInfoRef ( URI ), term ( "yes" | "no" ), dir ( "ltr" | "rtl" | "lro" | "rlo" ) | (PCDATA | ITS attributes to be used locally. Again these definitions make use of XHTML Modularization 1.1.
5.1.2.2ITS XML Schema Module ImplementationThe following schema contains the implementation of the abstract markup module in XML Schema. <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.w3.org/2005/11/its"
xmlns:its="http://www.w3.org/2005/11/its"
xmlns:h="http://www.w3.org/1999/xhtml" elementFormDefault="qualified"
xmlns:xlink="http://www.w3.org/1999/xlink">
<xs:import namespace="http://www.w3.org/1999/xlink" schemaLocation="xlink.xsd"/>
<xs:import namespace="http://www.w3.org/1999/xhtml"
schemaLocation="xhtml-schemas/xhtml-ruby-1.xsd"/>
<xs:simpleType name="translate.type">
<xs:restriction base="xs:string">
<xs:enumeration value="yes"/>
<xs:enumeration value="no"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="term.type">
<xs:restriction base="xs:string">
<xs:enumeration value="yes"/>
<xs:enumeration value="no"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="locNoteType.type">
<xs:restriction base="xs:string">
<xs:enumeration value="alert"/>
<xs:enumeration value="description"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="dir.type">
<xs:restriction base="xs:string">
<xs:enumeration value="ltr"/>
<xs:enumeration value="ltr"/>
<xs:enumeration value="lro"/>
<xs:enumeration value="rlo"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="withinText.type">
<xs:restriction base="xs:string">
<xs:enumeration value="yes"/>
<xs:enumeration value="no"/>
<xs:enumeration value="nested"/>
</xs:restriction>
</xs:simpleType>
<xs:attributeGroup name="its.Selector.attlist">
<xs:attribute name="selector" type="xs:string" use="required"/>
</xs:attributeGroup>
<xs:attributeGroup name="its.ITSLocal.attlist">
<xs:attribute name="translate" form="qualified" use="optional"
type="its:translate.type"/>
<xs:attribute name="locNote" type="xs:string" form="qualified"
use="optional"/>
<xs:attribute name="locNoteType" form="qualified" use="optional"
type="its:locNoteType.type"/>
<xs:attribute name="locNoteRef" type="xs:anyURI" form="qualified"
use="optional"/>
<xs:attribute name="termInfoRef" type="xs:string" form="qualified"
use="optional"/>
<xs:attribute name="term" type="its:term.type" form="qualified"
use="optional"/>
</xs:attributeGroup>
<xs:element name="rules" type="its:rules.type"/>
<xs:complexType name="rules.type" mixed="false">
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element ref="its:translateRule"/>
<xs:element ref="its:locNoteRule"/>
<xs:element ref="its:termRule"/>
<xs:element ref="its:dirRule"/>
<xs:element ref="its:rubyRule"/>
<xs:element ref="its:langRule"/>
<xs:element ref="its:withinTextRule"/>
</xs:choice>
<xs:attributeGroup ref="its:rules.attlist"/>
</xs:complexType>
<xs:attributeGroup name="rules.attlist">
<xs:attribute name="version" use="required" type="xs:string"/>
<xs:attribute ref="xlink:href" use="optional"/>
<xs:attribute ref="xlink:type" use="optional"/>
</xs:attributeGroup>
<xs:element name="translateRule" type="its:translateRule.type"/>
<xs:complexType name="translateRule.type">
<xs:attributeGroup ref="its:its.Selector.attlist"/>
<xs:attribute name="translate" use="required" type="its:translate.type"/>
</xs:complexType>
<xs:element name="locNoteRule" type="its:locNoteRule.type"/>
<xs:complexType name="locNoteRule.type">
<xs:sequence minOccurs="0" maxOccurs="1">
<xs:element ref="its:locNote"/>
</xs:sequence>
<xs:attributeGroup ref="its:its.Selector.attlist"/>
<xs:attribute name="locNotePointer" type="xs:string" use="optional"/>
<xs:attribute name="locNoteType" use="required" type="its:locNoteType.type"/>
<xs:attribute name="locNoteRef" type="xs:anyURI" use="optional"/>
<xs:attribute name="locNoteRefPointer" type="xs:string" use="optional"/>
</xs:complexType>
<xs:element name="locNote" type="its:locNote.type"/>
<xs:complexType name="locNote.type" mixed="true">
<xs:attribute name="translate" use="optional" type="its:translate.type"/>
<xs:attribute name="locNote" type="xs:string" use="optional"/>
<xs:attribute name="locNoteType" use="optional" type="its:locNoteType.type"/>
<xs:attribute name="locNoteRef" type="xs:anyURI" use="optional"/>
<xs:attribute name="termInfoRef" type="xs:anyURI" use="optional"/>
<xs:attribute name="term" use="optional" type="its:term.type"/>
<xs:attribute name="dir" use="optional" type="its:dir.type"/>
</xs:complexType>
<xs:element name="termRule"/>
<xs:complexType name="termRule.type">
<xs:attributeGroup ref="its:its.Selector.attlist"/>
<xs:attribute name="term" type="its:term.type" use="required"/>
<xs:attribute name="termInfoRef" type="xs:anyURI" use="optional"/>
<xs:attribute name="termInfoRefPointer" type="xs:string" use="optional"/>
<xs:attribute name="termInfoPointer" type="xs:string" use="optional"/>
</xs:complexType>
<xs:element name="dirRule" type="its:dirRule.type"/>
<xs:complexType name="dirRule.type">
<xs:attributeGroup ref="its:its.Selector.attlist"/>
<xs:attribute name="dir" type="its:dir.type" use="required"/>
</xs:complexType>
<xs:element name="rubyRule"/>
<xs:complexType name="rubyRule.type">
<xs:sequence>
<xs:element ref="its:rubyText"/>
</xs:sequence>
<xs:attributeGroup ref="its:its.Selector.attlist"/>
<xs:attribute name="rubyPointer" type="xs:string" use="optional"/>
<xs:attribute name="rtPointer" type="xs:string" use="optional"/>
<xs:attribute name="rpPointer" type="xs:string" use="optional"/>
<xs:attribute name="rbcPointer" type="xs:string" use="optional"/>
<xs:attribute name="rtcPointer" type="xs:string" use="optional"/>
<xs:attribute name="rbspanPointer" type="xs:string" use="optional"/>
</xs:complexType>
<xs:element name="rubyText" type="its:rubyText.type"/>
<xs:complexType name="rubyText.type" mixed="true">
<xs:attribute name="translate" type="its:translate.type" use="optional"/>
<xs:attribute name="locNote" type="xs:string" use="optional"/>
<xs:attribute name="locNoteType" type="its:locNoteType.type" use="optional"/>
<xs:attribute name="locNoteRef" type="xs:anyURI" use="optional"/>
<xs:attribute name="term" type="its:term.type" use="optional"/>
<xs:attribute name="termInfoRef" type="xs:string" use="optional"/>
<xs:attribute name="dir" type="its:dir.type" use="optional"/>
<xs:attribute name="rbspan" type="xs:string" use="optional"/>
</xs:complexType>
<xs:element name="langRule"/>
<xs:complexType name="langRule.type">
<xs:attributeGroup ref="its:its.Selector.attlist"/>
<xs:attribute name="langPointer" type="xs:string" use="required"/>
</xs:complexType>
<xs:element name="withinTextRule"/>
<xs:complexType name="withinTextRule.type">
<xs:attributeGroup ref="its:its.Selector.attlist"/>
<xs:attribute name="withinText" type="its:withinText.type"/>
</xs:complexType>
</xs:schema>The following is a driver file which can be used to evoke the schema above. <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xhtml="http://www.w3.org/1999/xhtml"
targetNamespace="http://www.w3.org/1999/xhtml"
xmlns:its="http://www.w3.org/2005/11/its"
xmlns="http://www.w3.org/1999/xhtml" blockDefault="#all">
<xs:annotation>
<xs:documentation> This is the XML Schema Driver for new Document Type
XHTML Basic 1.0 + ITS
$Id: Overview.html,v 1.12 2018/10/09 13:20:06 denis Exp $
</xs:documentation>
<xs:documentation
source="http://www.w3.org/TR/xml-i18n-bp/#integration-its-xhtmlmod"/>
</xs:annotation>
<xs:import namespace="http://www.w3.org/2005/11/its"
schemaLocation="its-module.xsd"/>
<xs:redefine schemaLocation="xhtml-schemas/xhtml-basic10.xsd">
<xs:group name="HeadOpts.mix">
<xs:choice>
<xs:group ref="HeadOpts.mix"/>
<xs:element ref="its:rules"/>
</xs:choice>
</xs:group>
<xs:attributeGroup name="Common.attrib">
<xs:attributeGroup ref="Common.attrib"/>
<xs:attributeGroup ref="its:its.ITSLocal.attlist"/>
</xs:attributeGroup>
</xs:redefine>
</xs:schema>The file below is an instance which can be validated against this schema. <html xmlns="http://www.w3.org/1999/xhtml" xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:its="http://www.w3.org/2005/11/its">
<head>
<title> </title>
<its:rules version="1.0">
<its:locNoteRule locNoteType="alert" selector="..." locNoteRef="...">
</its:locNoteRule>
<its:locNoteRule locNoteType="alert" selector="...">
<its:locNote> </its:locNote>
</its:locNoteRule>
<its:termRule selector="..." term="yes"/>
</its:rules>
</head>
<body>
<h3> </h3>
<table>
<tr>
<td> </td>
</tr>
</table>
<ul>
<li its:locNote="..." its:translate="no"> </li>
</ul>
</body>
</html>
|