Import of Hindi numbers from Microsoft Word documents

From Apache OpenOffice Wiki
Revision as of 15:59, 23 June 2008 by Khaled Hosny (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
Specification Status
Author Henning Brinkmann
Last Change 17.09.2007
Status Preliminary Help

Abstract

Microsoft Word marks numbers with the script to use by a hint. Furthermore there is an option to display numbers as Hindi, Arabic, by Context or determined by the System. This specification defines how the script hint and the display option shall be handled on import of Microsoft Word documents.

References

Reference Document Check Location (URL)
Specification Process Entry Check passed n/a
Product Requirement, RFE, Issue ID (required) available [1]
Product Concept Document not available
Test case specification (required) not available <PLEASE ENTER LOCATION HERE>
IDL Specification not available
Software Specification Rules n/a n/a
Other, e.g. references to related specs

Contacts

Role Name E-Mail Address
Developer Henning Brinkmann Henning.Brinkmann@sun.com
Quality Assurance Michael Rüß Michael.Ruess@sun.com
Documentation Uwe Fischer Uwe.Fischer@sun.com
User Experience <First Name, Last Name> <User@openoffice.org>

Acronyms and Abbreviations

Acronym / Abbreviation Definition
<WYSIWYG> <What You See Is What You Get>

Detailed Specification

When a digit is marked to have CTL script in the imported Word document it shall be imported as Hindi digit iff the bidi language is one of the languages mentioned below.


Language Language Code
Arabic(Algeria) 0x1401
Arabic(Bahrain) 0x3c01
Arabic(Egypt) 0xc01
Arabic(Iraq) 0x801
Arabic (Jordan) 0x2c01
Arabic(Kuwait) 0x3401
Arabic(Lebanon) 0x3001
Arabic(Libya) 0x1001
Arabic(Morocco) 0x1801
Arabic(Oman) 0x2001
Arabic(Qatar) 0x4001
Arabic(Saudi Arabia) 0x401
Arabic(Syria) 0x2801
Arabic(Tunisia) 0x1c01
Arabic(U.A.E) 0x3801
Arabic(Yemen) 0x2401


This feature shall only be activated iff the configuration item RegardHindiDigits (see below) is true.

If the configuration item RegardHindiDigits is set the following mapping between Arabic and Hindi characters applies:

Arabic (Unicode) Hindi (Unicode)
0 (U+0030) ٠ (U+0660)
1 (U+0031) ١ (U+0661)
2 (U+0032) ٢ (U+0662)
3 (U+0033) ٣ (U+0663)
4 (U+0034) ٤ (U+0664)
5 (U+0035) ٥ (U+0665)
6 (U+0036) ٦ (U+0666)
7 (U+0037) ٧ (U+0667)
8 (U+0038) ٨ (U+0668)
9 (U+0039) ٩ (U+0669)
Help | User Interface Element Templates | Example Spec

Migration

The specified features improves interoperability with Microsoft Word.

Configuration

Configuration Group Setting Type Default Comment |
Writer.xcs FilterFlags/WinWord RegardHindiDigits xs:long false If true yields to digits marked as CTL script to be imported as Hindi digits.
Help | Configuration Table Template

File Format

This specification covers import only and thus has no consequences regarding the file format.

Help

Help | File Format Table Template

Open Issues

Urdu (and Sindhi) uses the same Unicode code points for Extended Arabic-Indic (aka Persian) digits but has some glyph variation that is selected at the font rather than encoding level (using OpenType lang features and so), see Unicode book, Ch. 8.2 Arabic. --Khaled Hosny 17:59, 23 June 2008 (CEST)
Personal tools