Import of Hindi numbers from Microsoft Word documents
Specification Status | |
Author | Henning Brinkmann |
Last Change | 17.09.2007 |
Status | Preliminary Help |
Abstract
Microsoft Word marks numbers with the script to use by a hint. Furthermore there is an option to display numbers as Hindi, Arabic, by Context or determined by the System. This specification defines how the script hint and the display option shall be handled on import of Microsoft Word documents.
Contents
References
Reference Document | Check | Location (URL) |
Specification Process Entry Check | passed | n/a |
Product Requirement, RFE, Issue ID (required) | available | [1] |
Product Concept Document | not available | |
Test case specification (required) | not available | <PLEASE ENTER LOCATION HERE> |
IDL Specification | not available | |
Software Specification Rules | n/a | n/a |
Other, e.g. references to related specs |
Contacts
Role | Name | E-Mail Address |
Developer | Henning Brinkmann | Henning.Brinkmann@sun.com |
Quality Assurance | Michael Rüß | Michael.Ruess@sun.com |
Documentation | Uwe Fischer | Uwe.Fischer@sun.com |
User Experience | <First Name, Last Name> | <User@openoffice.org> |
Acronyms and Abbreviations
Acronym / Abbreviation | Definition |
<WYSIWYG> | <What You See Is What You Get> |
Detailed Specification
When a digit is marked to have CTL script in the imported Word document it shall be imported as Hindi digit iff the bidi language is one of the languages mentioned below.
Language | Language Code |
Arabic(Algeria) | 0x1401 |
Arabic(Bahrain) | 0x3c01 |
Arabic(Egypt) | 0xc01 |
Arabic(Iraq) | 0x801 |
Arabic (Jordan) | 0x2c01 |
Arabic(Kuwait) | 0x3401 |
Arabic(Lebanon) | 0x3001 |
Arabic(Libya) | 0x1001 |
Arabic(Morocco) | 0x1801 |
Arabic(Oman) | 0x2001 |
Arabic(Qatar) | 0x4001 |
Arabic(Saudi Arabia) | 0x401 |
Arabic(Syria) | 0x2801 |
Arabic(Tunisia) | 0x1c01 |
Arabic(U.A.E) | 0x3801 |
Arabic(Yemen) | 0x2401 |
This feature shall only
be activated iff the configuration item RegardHindiDigits
(see below) is true.
If the configuration item RegardHindiDigits is set the following mapping between Arabic and Hindi characters applies:
Arabic (Unicode) | Hindi (Unicode) |
0 (U+0030) | ٠ (U+0660) |
1 (U+0031) | ١ (U+0661) |
2 (U+0032) | ٢ (U+0662) |
3 (U+0033) | ٣ (U+0663) |
4 (U+0034) | ٤ (U+0664) |
5 (U+0035) | ٥ (U+0665) |
6 (U+0036) | ٦ (U+0666) |
7 (U+0037) | ٧ (U+0667) |
8 (U+0038) | ٨ (U+0668) |
9 (U+0039) | ٩ (U+0669) |
Help | User Interface Element Templates | Example Spec
Migration
The specified features improves interoperability with Microsoft Word.
Configuration
Configuration | Group | Setting | Type | Default | Comment | |
Writer.xcs | FilterFlags/WinWord | RegardHindiDigits | xs:long | false | If true yields to digits marked as CTL script to be imported as Hindi digits. |
Help | Configuration Table Template
File Format
This specification covers import only and thus has no consequences regarding the file format.
Help | File Format Table Template
Open Issues
- It seems that this specification should also cater to the needs of Persian and Urdu users.
- Farsi and Urdu use a slightly different form of Hindi numerals; see e.g. http://behdad.org/download/Publications/persiancomputing/a007.pdf and http://www.microsoft.com/middleeast/arabicdev/windows/winxp/DigitsSupport.aspx. According to the Microsoft source, Farsi uses different forms for 4,5 and 6, and Urdu uses different forms for 4,5,6 and 7. These digits are Unicode U+06F0..U+06F9 (where the Arabic "regular" Hindi numerals are U+0660..U+0669). I could not find different code-points mentioned for Urdu numerals, and in fact I found references which made Urdu digits the same as the Persian ones. -- Shai2platonix 01:27, 22 April 2008 (CEST)
- Urdu (and Sindhi) uses the same Unicode code points for Extended Arabic-Indic (aka Persian) digits but has some glyph variation that is selected at the font rather than encoding level (using OpenType lang features and so), see Unicode book, Ch. 8.2 Arabic. --Khaled Hosny 17:59, 23 June 2008 (CEST)