Breaking encapsulation of ICU BreakIterator

Because of Issue 84467 (duplicate of the Issue 81519 ) we are using RuleBasedBreakIterator() constructor and then we want to setBreakType() there.

There is a fix to this that removes the patch to ICU by creating a subclass of RuleBasedBreakIterator which can access the protected setBreakType() member. The bug is here: Issue 88411

ICU code:

BreakIterator reference
RuleBasedBreakIterator reference

OpenOffice.org code:

BreakIterator_Unicode::loadICUBreakIterator function

Mailing list discussions:

Discussion about ICULanguageBreakFactory
ports/121787 FreeBSD problem report
Debian bug 448745
icu-support

Example reasons to use custom rules:

Use cases of `loadICUBreakIterator`

Questions:

Why does wordRule need to be static and preserved across the calls?
Is rulestring word used at all? Other WordTypes?

public method	loadICU call	resulting rule text
nextCharacters(Text, nStartPos, rLocale, SKIPCELL, sal_Int32 nCount, nDone) prevCharacters(Text, nStartPos, rLocale, SKIPCELL, sal_Int32 nCount, nDone)	loadICUBreakIterator(rLocale, LOAD_CHARACTER_BREAKITERATOR, 0, "char", Text)	`char`
nextWord( const OUString& Text, sal_Int32 nStartPos, rLocale, ANYWORD_IGNOREWHITESPACES) previousWord(const OUString& Text, sal_Int32 nStartPos, rLocale, ANYWORD_IGNOREWHITESPACES) getWordBoundary( const OUString& Text, sal_Int32 nPos, rLocale, ANYWORD_IGNOREWHITESPACES, sal_Bool bDirection)	loadICUBreakIterator(rLocale, LOAD_WORD_BREAKITERATOR, ANYWORD_IGNOREWHITESPACES, NULL, Text)	`edit_word`
nextWord( const OUString& Text, sal_Int32 nStartPos, rLocale, DICTIONARY_WORD) previousWord(const OUString& Text, sal_Int32 nStartPos, rLocale, DICTIONARY_WORD) getWordBoundary( const OUString& Text, sal_Int32 nPos, rLocale, DICTIONARY_WORD, sal_Bool bDirection)	loadICUBreakIterator(rLocale, LOAD_WORD_BREAKITERATOR, DICTIONARY_WORD, NULL, Text)	`dict_word`
nextWord( const OUString& Text, sal_Int32 nStartPos, rLocale, WORD_COUNT) previousWord(const OUString& Text, sal_Int32 nStartPos, rLocale, WORD_COUNT) getWordBoundary( const OUString& Text, sal_Int32 nPos, rLocale, WORD_COUNT, sal_Bool bDirection)	loadICUBreakIterator(rLocale, LOAD_WORD_BREAKITERATOR, WORD_COUNT, NULL, Text)	`count_word`
nextWord( const OUString& Text, sal_Int32 nStartPos, rLocale, another_word_type) previousWord(const OUString& Text, sal_Int32 nStartPos, rLocale, another_word_type) getWordBoundary( const OUString& Text, sal_Int32 nPos, rLocale, another_word_type, sal_Bool bDirection)	loadICUBreakIterator(rLocale, LOAD_WORD_BREAKITERATOR, another_word_type NULL, Text)	`word` (???)
beginOfSentence( const OUString& Text, sal_Int32 nStartPos, rLocale) endOfSentence( const OUString& Text, sal_Int32 nStartPos,rLocale)	loadICUBreakIterator(rLocale, LOAD_SENTENCE_BREAKITERATOR, 0, NULL, Text);	NULL
getLineBreak( const OUString& Text, sal_Int32 nStartPos, const lang::Locale& rLocale, sal_Int32 nMinBreakPos, const LineBreakHyphenationOptions& hOptions, const LineBreakUserOptions& /rOptions/ )	loadICUBreakIterator(rLocale, LOAD_LINE_BREAKITERATOR, 0, "line", Text);	`line`

Figure out if locale BreakIteratorRules ({edit_word, dict_word, count_word, char, line}) gives something for the requested locale
If not, try to load rule+_ + lang string anyway.

LoadICUBreakIterator

Breaking encapsulation of ICU BreakIterator

Use cases of `loadICUBreakIterator`

Views

Personal tools

Navigation

Search

Tools

LoadICUBreakIterator

Breaking encapsulation of ICU BreakIterator

Use cases of loadICUBreakIterator

Views

Personal tools

Navigation

Search

Tools

Use cases of `loadICUBreakIterator`