LoadICUBreakIterator
Breaking encapsulation of ICU BreakIterator
Because of Issue 84467 (duplicate of the Issue 81519 ) we are using RuleBasedBreakIterator() constructor
and then we want to setBreakType()
there.
There is a fix to this that removes the patch to ICU by creating a subclass of RuleBasedBreakIterator
which can access the protected
setBreakType()
member. The bug is here: Issue 88411
ICU code:
- BreakIterator reference
- RuleBasedBreakIterator reference
OpenOffice.org code:
Mailing list discussions:
- Discussion about ICULanguageBreakFactory
- ports/121787 FreeBSD problem report
- Debian bug 448745
- icu-support
Example reasons to use custom rules:
- Issue 72868 Writer/Impress: line does not break after Chinese punctuation and before Latin letters
- Issue 80891 character in the forbidden list sometimes appears at the home of line
- Issue 83229 wrong hyphenation when word does contain a hyphen
- Issue 83649 Line break should be between typographical quote and left bracket
- Issue 83464 line brake between letter and $
- Issue 81448 slash and backslash make non-braking spaces of preceding spaces
Use cases of loadICUBreakIterator
Questions:
- Why does
wordRule
need to be static and preserved across the calls? - Is rulestring
word
used at all? Other WordTypes?
public method | loadICU call | resulting rule text |
---|---|---|
nextCharacters(Text, nStartPos, rLocale, SKIPCELL, sal_Int32 nCount, nDone) prevCharacters(Text, nStartPos, rLocale, SKIPCELL, sal_Int32 nCount, nDone) |
loadICUBreakIterator(rLocale, LOAD_CHARACTER_BREAKITERATOR, 0, "char", Text) | char
|
nextWord( const OUString& Text, sal_Int32 nStartPos, rLocale, ANYWORD_IGNOREWHITESPACES) previousWord(const OUString& Text, sal_Int32 nStartPos, rLocale, ANYWORD_IGNOREWHITESPACES) getWordBoundary( const OUString& Text, sal_Int32 nPos, rLocale, ANYWORD_IGNOREWHITESPACES, sal_Bool bDirection) |
loadICUBreakIterator(rLocale, LOAD_WORD_BREAKITERATOR, ANYWORD_IGNOREWHITESPACES, NULL, Text) | edit_word
|
nextWord( const OUString& Text, sal_Int32 nStartPos, rLocale, DICTIONARY_WORD) previousWord(const OUString& Text, sal_Int32 nStartPos, rLocale, DICTIONARY_WORD) getWordBoundary( const OUString& Text, sal_Int32 nPos, rLocale, DICTIONARY_WORD, sal_Bool bDirection) |
loadICUBreakIterator(rLocale, LOAD_WORD_BREAKITERATOR, DICTIONARY_WORD, NULL, Text) | dict_word
|
nextWord( const OUString& Text, sal_Int32 nStartPos, rLocale, WORD_COUNT) previousWord(const OUString& Text, sal_Int32 nStartPos, rLocale, WORD_COUNT) getWordBoundary( const OUString& Text, sal_Int32 nPos, rLocale, WORD_COUNT, sal_Bool bDirection) |
loadICUBreakIterator(rLocale, LOAD_WORD_BREAKITERATOR, WORD_COUNT, NULL, Text) | count_word
|
nextWord( const OUString& Text, sal_Int32 nStartPos, rLocale, another_word_type) previousWord(const OUString& Text, sal_Int32 nStartPos, rLocale, another_word_type) getWordBoundary( const OUString& Text, sal_Int32 nPos, rLocale, another_word_type, sal_Bool bDirection) |
loadICUBreakIterator(rLocale, LOAD_WORD_BREAKITERATOR, another_word_type NULL, Text) | word (???)
|
beginOfSentence( const OUString& Text, sal_Int32 nStartPos, rLocale) endOfSentence( const OUString& Text, sal_Int32 nStartPos,rLocale) |
loadICUBreakIterator(rLocale, LOAD_SENTENCE_BREAKITERATOR, 0, NULL, Text); | NULL |
getLineBreak( const OUString& Text, sal_Int32 nStartPos, const lang::Locale& rLocale, sal_Int32 nMinBreakPos, const LineBreakHyphenationOptions& hOptions, const LineBreakUserOptions& /*rOptions*/ ) |
loadICUBreakIterator(rLocale, LOAD_LINE_BREAKITERATOR, 0, "line", Text); | line
|
- Figure out if locale BreakIteratorRules (
{edit_word, dict_word, count_word, char, line}
) gives something for the requested locale - If not, try to load rule+
_
+ lang string anyway.