Implementing a New Locale

From Apache OpenOffice Wiki
Jump to: navigation, search

Documentation caution.png The procedures, directory layout, and file contents described here reflect the structure of the i18npool module as of version 1.1.0, and not the i18n module for 1.0.2.


One of the most important tasks in implementing a new locale is to define all the locale data to be used, listed in the following table as types returned by the interface methods:

Type Count exactly 1 exactly 1
sequence<> 1 or more
sequence<> 1 or more
sequence<> at least all format codes (see below)
sequence<> collator implementations 0 or more, if none specified the ICU collator will be called for the language given in <LanguageCountryInfo>
sequence<string> search options (transliteration modules) 0 or more
sequence<string> collation options (transliteration modules) 0 or more
sequence<string> names of supported transliterations (transliteration modules) 0 or more exactly 1, though may have empty elements
sequence<string> reserved words all words of
sequence<> numbering levels

(no public XLocaleData API method available, used by and accessible through method getDefaultContinuousNumberingLevels() implemented in i18npool)

exactly 8 <NumberingLevel> entities
sequence<> outline styles

(no public XLocaleData API method available, used by and accessible through method getDefaultOutlineNumberings() implemented in i18npool )

exactly 8 <OutlineStyle> entities consisting of 5 <OutlineNumberingLevel> entities each

Locale data is defined in an XML file. It is translated into a C++ source file during the build process, which is compiled and linked together with other compiled locale data files into shared libraries. The contents of the XML file, their elements, and how they are to be defined are described in i18npool/source/localedata/data/locale.dtd. The latest revision available for a specific CVS branch of that file provides up-to-date information about the definitions, as well as additional information.

If the language-country combination is not already listed in tools/inc/lang.hxx and tools/source/intntl/isolang.cxx and svx/source/dialog/langtab.src, is probably not prepared to deal with your specific locale. For assistance, you can consult (Add the New Language to the Resource System) and join the mailing list (see also

In order to conform with the available build infrastructure, the name of your locale data file should follow the conventions used in the i18npool/source/localedata/data directory: <language>_<country>.xml, where language is a lowercase, two letter ISO-639 code, and country is an uppercase two letter ISO-3166 code. Start by copying the en_US.xml file to your <language>_<country>.xml file and adapt the entries to suit your needs. Add the corresponding *.cxx and *.obj target file name to the i18npool/source/localedata/data/ Note that there is an explicit rule defined, so that you do not need to add the *.xml file name anywhere. You must also add the locale to the aDllsTable structure located in i18npool/source/localedata/data/localedata.cxx. Make sure to specify the correct library name, since it must correspond to the library name used in the makefile. Finally, the public symbols to be exported must be added to the linker map file corresponding to the library. You can use the i18npool/source/localedata/data/linkermapfile-check.awk script to assist you. Instructions for how to use the script are located the header comments of the file.


To be able to load documents of versions up to and including StarOffice 5.2 (old binary file format), each locale must define all number formats mentioned in and assign the proper formatindex="..." attribute.
Failing to do so may result in data not properly displayed or not displayed at all if a built-in "System" or "Default" format code was used (as generally done by the average user) and the document is loaded under a locale not having those formats defined. Since old versions did merge some format information of the [Windows] Regional Settings, it might be necessary to define some duplicated codes to fill all positions. To verify that all necessary elements are defined, use a non-product build of and open a number formatting dialog, and select your locale from the Language list box. An assertion message box appears if there are any missing elements. The errors are only shown the very first time the locale is selected in a given document.


In general, definition of number format codes follows the user visible rules, apart from that any non-ASCII character must be entered using UTF-8 encoding. For a detailed description of codes and a list of possible keywords please consult the English online help on section "number format codes".
Be sure to use the separators you declared in the <LC_CTYPE> section in the number format codes, for example <DecimalSeparator>, <ThousandSeparator>, otherwise the number formatter generates incorrect formats.
Verify the defined codes again by using the number formatter dialog of a non-product build. If anything is incorrect, an assertion message box appears containing information about the error.The format indices 1..49 are reserved and, for backward compatibility, must be used as stated in offapi/com/sun/star/i18n/NumberFormatIndex.idl. Note that 48 and 49 are used internally and must not be used in locale data XML files. All other formats must be present.

<FormatCode usage="DATE"> and <FormatCode usage="DATE_TIME">

Characters of date and time keywords, such as YYYY for year, had previously been localized for a few locales (for example, JJJJ in German). The new I18N framework no longer follows that approach, because it may lead to ambiguous and case insensitive character combinations that cannot be resolved at runtime. Localized keyword support is only given for some old locales, other locales must define their codes using English notation.
The table below shows the localized keyword codes:
DayOfWeek Era Year Month Day Hour
English (and all other locales not mentioned) A G Y M D H
de_AT, de_CH, de_DE, de_LI, de_LU J T
nl_BE, nl_NL J U
fr_BE, fr_CA, fr_CH, fr_FR, fr_LU, fr_MC O A J
it_CH, it_IT O X A G
pt_BR, pt_PT O A
es_AR, es_BO, es_CL, es_CO, es_CR, es_DO, es_EC, es_ES, es_GT, es_HN, es_MX, es_NI, es_PA, es_PE, es_PR, es_PY, es_SV, es_UY, es_VE O A
da_DK T
nb_NO, nn_NO, no_NO T
sv_FI, sv_SE T
fi_FI V K P T

<FormatCode usage="DATE" formatindex="21"> and <FormatCode usage="DATE_TIME" formatindex="47">

The formatindex="21" DATE_SYS_DDMMYYYY format code is used to edit date formatted data. It represents a date using the most detailed information available, for example, a 4-digit year and instead of a 2-digit year. The YMD default order (how a date is assembled) is determined from the order encountered in this format.
Similarly, the formatindex="47" DATETIME_SYS_DDMMYYYY_HHMMSS format code is used to edit date-time data. Both format codes must display data in a way that is parable by the application, in order to be able to reassemble edited data. This generally means using only YYYY,MM,DD,HH,MM,SS keywords and <DateSeparator> and <TimeSeparator>.

<FormatCode usage="CURRENCY">

The [$xxx-yyy] notation is needed for compatibility reasons. The xxx part denotes the currency symbol, and the yyy part specifies the locale identifier in Microsoft Language ID hexadecimal notation. For example, having "409" as the locale identifier (English-US) and "$" as the currency symbol results in [$$-409]. A list of available Language IDs known to the [PRODUCTNAME] application can be found at project util module tools in file tools/inc/lang.hxx. Format indices 12, 13, 14, 15, 17 with [$xxx-yyy] notation must use the xxx currency symbol that has the attribute usedInCompatibleFormatCodes="true" (see element <LC_CURRENCY> in the locale.dtd file).


The interface provides a general calendar service. All calendar implementations are managed by a class CalendarImpl, the front-end, which dynamically calls a language-specific implementation.

Calendar_gregorian is a wrapper to ICU's Calendar class.

If you need to implement a locale-specific calendar, you can choose to either derive your class from Calendar_gregorian or to write your own class.

There are three steps needed to create a locale-specific calendar:

  1. Name your calendar <name> (for example, 'gengou' for Japanese Calendar) and add it to the locale data XML file with proper day/month/era names.
  2. Derive a class either from Calendar_gregorian or XCalendar, name it as Calendar_<name>, which will be loaded by CalendarImpl when the calendar is specified.
  3. Add your new calendar as a service in i18npool/source/registerservices/registerservices.cxx.

If you plan to derive from the Gregorian calendar, you need to know the mapping between your new calendar and the Gregorian calendar. For example, the Japanese Emperor Era calendar has a starting year offset to Gregorian calendar for each era. You will need to override the method Calendar_gregorian::mapToGregorian() and Calendar_gregorian::mapFromGregorian() to map the Era/Year/Month/Day between the Gregorian calendar and the calendar for your language.


The interface provides toUpper(), toLower(), toTitle() and methods to get various character attributes defined by Unicode. These functions are implemented by the cclass_unicode class. If you need language specific requirements for these functions, you can derive a language specific class cclass_<locale_name> from cclass_unicode and overwrite the methods. In most cases, the attributes are well defined by Unicode, so you do not need to create your own class.

The class also provides a generic parser. If a particular language needs special number parsing, detected non-ASCII numbers are fed to the service to obtain the ASCII representation, which in turn is interpreted and converted to a double precision floating point value.

A manager class CharacterClassificationImpl will handle the loading of language specific implementations of CharacterClassification on the fly. If no implementation is provided, the implementation defaults to class cclass_unicode.


The interface provides support for Character(Cell)/Word/Sentence/Line-break services. For example, BreakIterator provides the APIs to iterate a string by character, word, line and sentence. The interface is used by the Output layer for the following operations:

  • Cursor positioning and selection: Since a character or cell can take more than one code point, cursor movement cannot be done by simply incrementing or decrementing the index.
  • Complex Text Layout Languages (CTL): In CTL languages (such as Thai, Hebrew, Arabic and Indian), multiple characters can combine to form a display cell. Cursor movement must traverse a display cell instead of a single character.

Line breaking must be highly configurable in desktop publishing applications. The line breaking algorithm should be able to find a line break with or without a hyphenator. Additionally, it should be able to parse special characters that are illegal if they occur at the end or beginning of a line.

Both requirements are locale-sensitive.

The BreakIterator components are managed by the class BreakIteratorImpl, which will load the language-specific component in service name BreakIterator_<language> dynamically.

The base break iterator class BreakIterator_Unicode is a wrapper to the ICU BreakIterator class. While this class meets the requirements for western languages, it does not meet the requirements for other languages, such as those of South Asia (CJK) and South East Asia (Indian, Thai, Arabic), where enhanced functionality is required, as described previously.

Thus the current BreakIterator base class has two derived classes, BreakIterator_CJK and BreakIterator_CTL. BreakIterator_CJK provides a dictionary based word break for Chinese and Japanese, and a forbidden rule driven line break for Chinese, Japanese and Korean. BreakIterator_CTL provides a more specific definition of character/cell/cluster grouping for languages like Thai and Arabic.

Use the following steps to create a language-specific BreakIterator service:

  1. Derive a class either from BreakIterator_CJK or BreakIterator_CTL, name it as BreakIterator_<language>.
  2. Add new service in registerservices.cxx

There are three methods for word breaking: nextWord(), previousWord(), getWordBoundary(). You can overwrite them with your own language rules.

BreakIterator_CJK provides input string caching and dictionary searching for longest matching. You can provide a sorted dictionary (the encoding must be UTF-8) by creating the following file: i18npool/source/breakiterator/data/<language>.dict.

The utility gendict will convert the file to C code, which will be compiled into a shared library for dynamic loading.

All dictionary searching and loading is performed in the xdictionary class. The only thing you need to do is to derive your class from BreakIterator_CJK and create an instance of the xdictionary with the language name and pass it to the parent class.


The interface must be used to provide text collation for the new locale. There are two types of collations, single level and multiple level collation.

Most European and English locales need multiple level collation. uses the ICU collator to cover these needs.

Most CJK languages only require single level collation. There is a two step lookup table that performs the collation for these languages. If you have a new language or algorithm in this category, you can derive a new service from Collator_CJK and provide index and weight tables. Here is a sample implementation:

  #include <collator_CJK.hxx>
  static sal_uInt16 index[] = {
  static sal_uInt16 weight[] = {
  sal_Int32 SAL_CALL Collator_zh_CN_pinyin::compareSubstring(
         const ::rtl::OUString& str1, sal_Int32 off1, sal_Int32 len1,
         const ::rtl::OUString& str2, sal_Int32 off2, sal_Int32 len2)
         throw (::com::sun::star::uno::RuntimeException)
      return compare(str1, off1, len1, str2, off2, len2, index, weight);
  sal_Int32 SAL_CALL Collator_zh_CN_pinyin::compareString(
         const ::rtl::OUString& str1,
         const ::rtl::OUString& str2)
         throw (::com::sun::star::uno::RuntimeException) 
      return compare(str1, 0, str1.getLength(), str2, 0, str2.getLength(),
         index, weight);

Front end implementation Collator will load and cache the language-specific service on the name Collator_<locale> dynamically.

The steps to add new services:

  1. Derive the new service from the above class
  2. Provide the index and weight tables
  3. Register the new service in registerservices.cxx
  4. Add the new service in the collation section in the locale data file.


The interface can be used for string conversion. The front end implementation TransliterationImpl will load and cache specific transliteration services by a predefined enum in, or dynamically by implementation name.

Transliterations have been defined in three categories: Ignore, OneToOne and Numeric. All of them are derived from transliteration_commonclass.

Ignore services are for ignore case, half/full width, and Katakana/Hiragana. You can derive your new service from it, and overwrite folding/transliteration methods.

OneToOne services are for one to one mapping, such as converting lowercase to uppercase. The class provides two more services, to take a mapping table or mapping function to do folding and transliteration. You can derive a class from it and provide a table or function for the parent class to do the transliteration.

Numeric services are used to convert a number to a number string in specific languages. It can be used to format Date string and other types of strings.

To add a new transliteration

  1. Derive a new class from the three classes previously mentioned.
  2. Overwrite folding/transliteration methods or provide a table for the parent to perform the transliteration.
  3. Register the new service in registerservices.cxx
  4. Add the new service in the transliteration section in the locale data file


The interface can be used for string conversion. The service implementing the interface provides a function to determine if the text conversion should be interactive or not along with functions that can be used for automatic and interactive conversion.

It is possible to create conversion-dictionaries, which are searched for entries to be used by the text conversion service, thus allowing the user to customize the text conversion.

The following is an example:

  // comment: Step 1: get the Desktop object from the office
  // Step 2: open an empty text document
  // Step 3: insert a sample text
  // Step 4: convert sample text
  // Step 5: insert converted text
  public class TextConversion {
      public static void main(String args[]) {
          // You need the desktop to create a document
          // The getDesktop method does the UNO bootstrapping, gets the
          // remote servie manager and the desktop object.
 xDesktop = null;
          xDesktop = getDesktop();
 xTextDocument =
              createTextdocument( xDesktop );
 xTextConversion = 
          try {
              // Korean sample text
              String aHeader = "\u7b2c\u0020\u0031\u0020\u7ae0\u0020\ud55c\ubb38\uc758\u0020\uad6c\uc870\u0028\u69cb\u9020\u0029";
              String aText = "\uc6b0\uc120\u0020\ud55c\uc790\ub294\u0020\uc11c\uc220\uc5b4\u0020\u0028\u654d\u8ff0\u8a9e\u0029\uc758\u0020\uc704\uce58\uac00\u0020\uc6b0\ub9ac\ub9d0\uacfc\u0020\ub2e4\ub974\ub2e4\u002e";
              // access interfaces and cursor to be used
     xText = (
                , xTextDocument.getText());
     xSimpleText = (
                , xText);
     xCursor = xText.createTextCursor(); 
              // set text properties (font, language) to be used for the sample
     xPS = (
                , xCursor);
     aKorean = new "ko", "KR", "");
              xPS.setPropertyValue( "CharFontNameAsian", "Gulim" );
              xPS.setPropertyValue( "CharLocaleAsian", aKorean );
              xPS.setPropertyValue( "CharHeightAsian", new Integer(24) );
              xPS.setPropertyValue( "CharHeight", new Integer(24) );
              // insert original text
              xSimpleText.insertString( xCursor, "Original text:", false );
              xSimpleText.insertControlCharacter( xCursor,, false );
              xSimpleText.insertString( xCursor, aHeader, false );
              xSimpleText.insertControlCharacter( xCursor,, false );
              xSimpleText.insertString( xCursor, aText, false );
              xSimpleText.insertControlCharacter( xCursor,, false );
              xSimpleText.insertControlCharacter( xCursor,, false );
              xSimpleText.insertControlCharacter( xCursor,, false );
              // apply Hangul->Hanja conversion
              short nConversionType =;
              int nConversionOptions =;
              // call to function for non-interactive text conversion
              // (usually not used for Hangul/Hanja conversion but here used
              // anyway for the examples sake)
              aHeader = xTextConversion.getConversion( aHeader, 0, aHeader.length(), 
                            aKorean, nConversionType, nConversionOptions);
              // sample for function calls in an interactive conversion
              StringBuffer aBuf = new StringBuffer( aText );
              int i = 0;
              boolean bFound = true;
              int nLen = aText.length();
              while (i < nLen - 1 && bFound) {
         aResult = 
                      xTextConversion.getConversions( aText, i, nLen - i, 
                          aKorean, nConversionType, nConversionOptions);
                  // check if convertible text portion was found
                  bFound = !(aResult.Boundary.startPos == 0 && aResult.Boundary.endPos == 0);
                  if ( bFound ) {
                      String[] aCandidates = aResult.Candidates;
                      // let the user choose one candidate from the list
                      // (in this non-interactive example we'll always choose 
                      // the first one)
                      String aChoosen = aCandidates[0];
                      aBuf.replace( aResult.Boundary.startPos, 
                                    aChoosen );
                      // continue with text after current converted
                      // text portion
                      if ( aResult.Boundary.endPos > i )
                        i = aResult.Boundary.endPos;
                      else {
                        // or advance at least one position 
                        System.out.println("unexpected forced advance...");
              aText = aBuf.toString();
              // insert converted text
              xSimpleText.insertString( xCursor, "Converted text:", false );
              xSimpleText.insertControlCharacter( xCursor,, false );
              xSimpleText.insertString( xCursor, aHeader, false );
              xSimpleText.insertControlCharacter( xCursor,, false );
              xSimpleText.insertString( xCursor, aText, false );
          catch( Exception e) {
      public static getDesktop() {
 xDesktop = null;
 xMCF = null;
          try {
     xContext = null;
              // get the remote office component context
              xContext =;
              // get the remote office service manager
              xMCF = xContext.getServiceManager();
              if( xMCF != null ) {
                  System.out.println("Connected to a running office ...");
                  Object oDesktop = xMCF.createInstanceWithContext(
                      "", xContext);
                  xDesktop = ( UnoRuntime.queryInterface(
            , oDesktop);
                      System.out.println( "Can't create a desktop. No connection, no remote office servicemanager available!" );
              catch( Exception e) {
              return xDesktop;
          public static getTextConversion() {
     xTextConv = null;
     xMCF = null;
              try {
         xContext = null;
                  // get the remote office component context
                  xContext =;
                  // get the remote office service manager
                  xMCF = xContext.getServiceManager();
                  if( xMCF != null ) {
                      Object oObject = xMCF.createInstanceWithContext(
                          "", xContext);
                      xTextConv = ( UnoRuntime.queryInterface(
                , oObject);
                      System.out.println( "Can't create a text conversion service. No office servicemanager available!" );
                  if( xTextConv != null )
                      System.out.println( "Successfully instantiated text conversion service." );
              catch( Exception e) {
              return xTextConv;
          public static createTextdocument(
     xDesktop )
     aTextDocument = null;
              try {
         xComponent = CreateNewDocument(xDesktop,"swriter");
                  aTextDocument = (
                , xComponent);
              catch( Exception e) {
              return aTextDocument;
          protected static CreateNewDocument(
              String sDocumentType )
              String sURL = "private:factory/" + sDocumentType;
     xComponent = null;
     xComponentLoader = null;
     xValues[] =
     xEmptyArgs[] =
              try {
                  xComponentLoader = (
                , xDesktop);
                  xComponent = xComponentLoader.loadComponentFromURL(
                      sURL, "_blank", 0, xEmptyArgs);
              catch( Exception e) {
              return xComponent ;


The interface provides the functionality to convert between ASCII Arabic digit numbers and locale-dependent numeral representations. It performs the conversion by implementing special transliteration services. The interface also provides a mechanism to generate attributes to be stored in the XML file format (see the XML file format documentation, section "Common Data Style Attributes", "number:transliteration-..."), as well as a conversion of those XML attributes needed to map back to a specific representation style. If you add a number transliteration for a specific locale and reuse one of the constants, please add the description to if your changes are to be added back to the code repository.


The interface can be used to provide the functionality to generate index pages. The main method of this interface is getIndexCharacter(). Front end implementation IndexEntrySupplier will dynamically load and cache language specific service based on the name IndexEntrySupplier_<locale>.

Languages to be indexed have been divided into two sets. The first set contains Latin1 languages, which can be covered by 256 Unicode code points. A one step lookup table is used to generate index characters. An alphabetic and numeric table has been generated, which covers most Latin1 languages. But if you need another algorithm or have a conflict with the table, you can create your own table and derive a new class from IndexEntrySupplier_Euro. Here is a sample implementation:

  #include <sal/types.h>
  #include <indexentrysupplier_euro.hxx>
  #include <indexdata_alphanumeric.h>
  OUString SAL_CALL i18n::IndexEntrySupplier_alphanumeric::getIndexCharacter(
         const OUString& rIndexEntry,
         const lang::Locale& rLocale, const OUString& rSortAlgorithm )
         throw (uno::RuntimeException)
      return getIndexString(rIndexEntry, idxStr);

where idxStr is the table.

For the languages that could not be covered in the first set, such as CJK, a two step lookup table is used. Here is a sample implementation:

  #include <indexentrysupplier_cjk.hxx>
  #include <indexdata_zh_pinyin.h>
  OUString SAL_CALL i18n::IndexEntrySupplier_zh_pinyin::getIndexCharacter(
         const OUString& rIndexEntry,
         const lang::Locale& rLocale, const OUString& rSortAlgorithm )
         throw (uno::RuntimeException)
      return getIndexString(rIndexEntry, idxStr, idx1, idx2);

where idx1 and idx2 are two step tables and idxStr contains all the index keys that will be returned. If you have a new language or algorithm, you can derive a new service from IndexEntrySupplier_CJK and provide tables for the parent class to generate the index.

Note that the index depends on collation, therefore, each index algorithm should have a collation algorithm to support it.

To add new service:

  1. Derive the new service from IndexEntrySupplier_Euro.
  2. Provide a table for the lookup
  3. Register new service in registerservices.cxx

A Comment on Search and Replace

Search and replace is also locale-dependent because there may be special search options that are only available for a particular locale. For instance, if the Asian languages support is enabled, you'll see an additional option for "Sounds like (Japanese)" in the Edit - Find & Replace dialog box. With this option, you can turn on or off certain options specific to Japanese in the search and replace process.

Search and replace relies on the transliteration modules for various search options. The transliteration modules are loaded and the search string is converted before the search process.

Content on this page is licensed under the Public Documentation License (PDL).
Personal tools
In other languages