Dynamator Pure HTML for every page generation technology.
           

Advanced Internationalization with Dynamator

Contents:

The Dynamator Internationalization Guide describes how to use Dynamator to internationalize the static content of a web application. To simplify things, examples were in Pig Latin. The nice thing about Pig Latin is that it uses the ASCII character set. That allowed us to focus on basic principles without having to consider the complexities of character sets. But many languages don't use ASCII. In this guide, you'll learn how to use Dynamator with a character set other than ASCII.

We'll continue with the Hello World example, this time translating it into several real languages, each represented by a different character encoding.

We'll also continue with the assumption that what gets localized is text strings, not entire HTML pages. With Dynamator, the process for localizing entire pages is a subset of the process for localizing text strings. So either way, you'll know what to do.

While I will try to provide enough details for you to duplicate the examples, a complete treatment of character sets and character encodings is out of scope for this guide.

One disclaimer. I don't know any of the languages that are presented on this page. I would welcome corrections from anyone who knows better.

We'll start with an example that uses the ISO-8859-1 character encoding, demonstrated using a European language, then move to the UTF-8 character encoding, demonstrated using an Asian language. Together, these two encodings should support 90% of internationalization needs. We'll then briefly discuss use of other encodings.

ISO-8859-1 in Spanish

The ISO-8859-1 character encoding supports all major Western European languages. In addition to all the ASCII characters, it also includes characters used by languages such as German, French, and Spanish. This example uses Spanish. Other than the character encoding issues, the internationalization process will look a lot like the process we used for the Pig Latin example.

Internationalizing Static Text

Here's how to use Dynamator to create a Spanish version of an HTML page. The item sequence matches the sequence in the Pig Latin example.

  1. Create the locale-specific directory tree. We're going to create a generic Spanish page, so we'll name the directory 'es'. However, be aware that Spanish differs by region, so you'll probably want to be more specific (e.g. 'es_mx' for mexico). Under the locale directory, create html, dyn, and htdocs directories.
  2. The original HTML files have already been updated with id attributes for the Pig Latin example, so they don't need to be changed.
  3. The reference translation (in this case English) Dynamator files don't need to be changed.
  4. Copy the reference translation Dynamator files into the locale-specific directory, translate the text, and add locale identifiers.

    Two kinds of locale identifiers need to be added.

    First, the Dynamator file needs to begin with an XML processing instruction that specifies the character set. The XML processing instruction informs Dynamator that the file contains characters in the specified character set so that it can process them correctly. If the text in the HTML file around non-ASCII characters is garbled, the chances are that the XML processing instruction was omitted.

    For Spanish (and most Western European languages), the processing instruction looks like this:

    <?xml version="1.0" encoding="iso-8859-1"?>
    

    Second, browsers need to be informed of the language and character set of the HTML page. This is done by adding two meta directives to the HTML head section: a content-type directive that specifies the character set, and a content-language directive that specifies the language. The meta directives should be placed as close to the beginning of the file as possible.

    For Spanish, the meta-directives look like this:

    <meta http-equiv="content-type" content="text/html; charset=iso-8859-1">
    <meta http-equiv="content-language" content="es">
    

    The Spanish Dynamator file looks like this:

    html/es/dyn/HelloWorld.dyn
    <?xml version="1.0" encoding="iso-8859-1"?>
    <dynamator language="none" suffix="html">
      <tag tag="head">
        <before-content>
          <meta http-equiv="content-type" content="text/html; charset=iso-8859-1">
          <meta http-equiv="content-language" content="es">
        </before-content>
      </tag>
      <tag tag="title">
        <content>
          Una paginación simple
        </content>
      </tag>
      <id name="HelloText">
        <content>
          ¡Ola Mundo!
        </content>
      </id>
    </dynamator>
    

  5. Create localized HTML by running Dynamator. To correctly output special characters, the character encoding must be specified on the command line.
    prompt> cd html/es/dyn
    prompt> java dynamate -e iso-8859-1 -d ../html ../../HelloWorld.html

    You might think that specifying the encoding on the command line is redundant, since it was already specified in the Dynamator localization file. You might even be right, but that's the way it works today. The command line encoding determines the encoding Tidy uses when it processes the input HTML file, as well as the encoding used by Dynamator's output processor. The encoding in the Dynamator file determines how Dynamator processes that file.

  6. The output from the previous step will be a localized HTML file in the language-specific HTML directory.
    html/es/html/HelloWorld.html
    <!-- generated by Dynamator Mon Dec 31 00:13:13 CST 2001
                                     --><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
    <html>
      <head>
          <meta http-equiv="content-type" content="text/html; charset=iso-8859-1">
          <meta http-equiv="content-language" content="es">
                                         <title>
          Una paginación simple
        </title>
      </head>
      <body>
        <p id="HelloText">
          ¡Ola Mundo!
        </p>
      </body>
    </html>
    

Consolidating Server Code

Let's move on to the Hello User example, so we can see how server code is handled.

Applying the process described above to the HelloUser example, we obtain the following HTML file, in Spanish:

html/es/html/HelloUser.html
<!-- generated by Dynamator Mon Dec 31 01:07:52 CST 2001
--><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
  <head>
      <meta http-equiv="content-type" content="text/html; charset=iso-8859-1">
      <meta http-equiv="content-language" content="es">
    <title>
      Saludo del utilizador
    </title>
  </head>
  <body>
    <p id="HelloText">
      ¡Ola <span id="UserName">Utilizador</span>!
    </p>
  </body>
</html>

We can apply the Dynamator server code file to this file with the following command:

prompt> cd html/es/html
prompt> java dynamate -e iso-8859-1 -d ../htdocs -f ../../../dyn HelloUser.html

The result is:

html/es/htdocs/HelloUser.jsp
<%-- generated by Dynamator Mon Dec 31 09:39:51 CST 2001
                           --%><!--  generated by Dynamator Mon Dec 31 01:07:52 CST 2001
 --><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
  <head>
    <meta http-equiv="content-type" content="text/html; charset=iso-8859-1">
    <meta http-equiv="content-language" content="es">
    <title>Saludo del utilizador</title>
  </head>
  <body>
   <p id="HelloText">¡Ola <span id="UserName"><%= 
      session.getValue("username")
                                %></span>!</p>
  </body>
</html>

But this is not quite sufficient. For JSP, we need to add a page directive to inform the JSP engine of the character set; otherwise many JSP engines will choke on non-ASCII characters. This directive is locale-specific, and may be the only locale-specific code needed by any server page. We can use the Dynamator <include> facility to include a different page directive for each locale. The Dynamator locale-specific file looks like this:

html/es/locale.dyn
<dynamator>
  <prolog><%@ page contentType="text/html; charset=iso-8859-1"%>
  </prolog>
</dynamator>

As with other files, it has the same name and is placed in the same relative location for each locale.

Note that there is no whitespace after the <prolog> tag; this causes the following text to be inserted into the output file without preceding whitespace.

Each Dynamator server code file needs to reference this file. The server code file for this example now looks like this (the change to the original file is in bold font):

dyn/HelloUser.dyn
<dynamator language="jsp">
  <include file="locale.dyn"/>
  <id name="UserName">
    <content>
      session.getValue("username")
    </content>
  </id>
</dynamator>

The command line now contains an include path argument so that Dynamator can locate the included file in a locale-specific directory:

prompt> cd html/es/html
prompt> java dynamate -e iso-8859-1 -d ../htdocs -f ../../../dyn -I .. HelloUser.html

The resulting JSP file looks like this:

html/es/htdocs/HelloUser.jsp
<%-- generated by Dynamator Mon Dec 31 16:01:10 CST 2001
--%><%@ page contentType="text/html; charset=iso-8859-1" %>
                             <!--  generated by Dynamator Mon Dec 31 01:07:52 CST 2001
 --><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
  <head>
    <meta http-equiv="content-type" content="text/html; charset=iso-8859-1">
    <meta http-equiv="content-language" content="es">
    <title>Saludo del utilizador</title>
  </head>
  <body>
   <p id="HelloText">¡Ola <span id="UserName"><%= 
      session.getValue("username")
                                %></span>!</p>
  </body>
</html>

UTF-8 in Japanese

UTF-8 is one of the most popular encodings for international languages. It is growing in popularity because it is able to represent all characters used by every written language, and because it is compatible with ASCII and ISO-8859-1.

We will now repeat the above exercise using Japanese. Chances are you were able to read the Spanish upside-down exclamation point character (¡). If so, it's because your browser supports ISO-8859-1 encoding and uses fonts that have the upside-down exclamation point glyph. Unless you already read Japanese web pages, chances are that you won't be able to read the following examples. You'll probably see weird-looking characters, or characters that look like boxes. That's because your computer probably doesn't have Japanese fonts. To read the examples in Japanese, you'll have to install Japanese fonts and configure your browser to support Japanese. (Instructions are out of scope for this guide, but if you have a recent version of Internet Explorer it's painless: just go to http://www.microsoft.com/japan.)

We'll make one other change with this example: we'll get rid of the Dynamator generation notices.

The Japanese HelloUser Dynamator file looks like this:

html/ja/dyn/HelloUser.dyn
<?xml version="1.0" encoding="utf-8"?>
<dynamator language="none" suffix="html">
  <tag tag="head">
    <before-content>
      <meta http-equiv="content-type" content="text/html; charset=utf-8">
      <meta http-equiv="content-language" content="ja">
    </before-content>
  </tag>
  <tag tag="title">
    <content>
      ユーザーの挨拶
    </content>
  </tag>
  <id name="HelloText">
    <content>
      こんにちは<span id="UserName">ユーザー</span>!
    </content>
  </id>
</dynamator>

The Japanese HTML file is created using the following command:

prompt> cd html/ja/dyn
prompt> java dynamate -e utf-8 -d ../html -G ../../HelloUser.html

The -G option removes the Dynamator generation notice.

The Japanese HTML demo file looks like this:

html/ja/html/HelloWorld.html
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
  <head>
      <meta http-equiv="content-type" content="text/html; charset=utf-8">
      <meta http-equiv="content-language" content="ja">
      <title>
        ユーザーの挨拶
      </title>
  </head>
  <body>
    <p id="HelloText">
      こんにちは<span id="UserName">ユーザー</span>!
    </p>
  </body>
</html>

We'll create a Japanese-specific JSP page declaration:

html/ja/locale.dyn
<dynamator>
  <prolog><%@ page contentType="text/html; charset=utf-8" %>
  </prolog>
</dynamator>

We can use the same Dynamator server code file we used with all the other locales. The only thing that changes is the character encoding specified on the command line:

prompt> cd html/ja/html
prompt> java dynamate -e utf-8 -d ../htdocs -f ../../../dyn -G -I .. HelloUser.html

The result is:

html/ja/htdocs/HelloUser.jsp
<%@ page contentType="text/html; charset=utf-8" %>
                             <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
  <head>
    <meta http-equiv="content-type" content="text/html; charset=utf-8">
    <meta http-equiv="content-language" content="ja">
    <title>ユーザーの挨拶</title>
  </head>
  <body>
   <p id="HelloText">こんにちは<span id="UserName"><%= 
      session.getValue("username")
                                %></span>!</p>
  </body>
</html>

Other encodings

Although ISO-8859-1 and UTF-8 together should satisfy many projects' needs, they won't work in every situation. Some browsers don't support UTF-8, and many international websites use other encodings. Fortunately, Dynamator should support any encoding that is supported both by Java and by Xerces.

The process used for these character encodings is no different from the process already described. The only thing that changes is the name of the encoding.

If you're curious, the Dynamator i18n example includes a locale that uses the Big-5 character encoding.

Conclusion

Working with different character sets is a fact of life for many internationalization projects. Dynamator makes dealing with static content in various character encodings relatively painless.

We have presented the localization process sequentially, as if a single individual were performing it. When you consider each role separately, the advantages of the Dynamator approach become clear.

With Dynamator, localizers can work directly with files in their native encoding, rather than translating text to ASCII escape sequences for Java property files. And they can ensure that pages containing international characters display correctly without delivering localized text to programmers. This autonomy makes them much more productive.

With Dynamator, programmers don't have to touch files containing encoded characters. All they have to do is to create a single file for each locale specifying the character encoding.

For programmers, the biggest benefit of using Dynamator for internationalized applications remains the consolidation of server code into a single file. This code works the same regardless of encoding or locale. It exists in only one place; not copied to every server page for every locale. The inevitable code changes are easy to apply.

There are many facets to internationalization, and Dynamator helps with just one of them. But because that facet involves the coordination of so many different roles, getting it right is an important key to project success. By improving the workflow for static content, Dynamator targets the critical path of most internationalization projects, resulting in a faster, simpler, more effective process.