Grammar Checking API
From Apache OpenOffice Wiki
Please view the guidelines
Overview of the basic interfaces required
- XFlatParagraph implemented by a FlatParagraph object (FP)
FP objects should be small wrappers, each of them created individually for a single iteration. If the same “real” paragraph is part of two parallel text markup processes there can be two different FP objects. The interface gives access to the "flat" text of a paragraph (means e.g. that the content of fields will be included) by providing it as a simple string. All operations that need to specify sub-strings will use position and length parameters.Besides giving access to the string and allowing some simple manipulations of the text or it's language attributes, this object specifically has two methods:
- isChecked(css.text.TextMarkupType) that will yield true if the FP object has been marked as checked by the grammar checker (in case the TextMarkupType is GRAMMAR);
- isModified() to indicate that its content has been changed or deleted since the creation of the object. If an FP is modified, the results of grammar checking for this specific paragraph have to be discarded and the paragraph needs to be processed again. If an FP is marked as “checked” it shall be skipped in further checkings.Finally this interface will allow to place (and remove) visual markings that in case of grammar checking mark incorrect text parts, other markups of course can have different meanings. This text markup will be based on indexing the string belonging to the FP.
Please note that when we talk about paragraphs in the following text that, unless otherwise stated, it will always be an FP. This may not necessarily be a paragraph as in the documents context, it can be a collection of them (e.g. a list) and it not only contains the flow text but also other text content like text frames, headers and footers etc. As only the document core can handle such FP objects efficiently this is a document specific implementation.
- XFlatParagraphIterator implemented by the model of the document to check
As this is a document type specific implementation (only the document core can know how to create and access FP objects in the most efficient way), objects implementing this interface have to be retrieved by a provider interface of the document model. Its implementation also usually will be bound to a specific implementation of the FP object. The FPIterator will know where to start the iteration: interactive checking starts at the current cursor position, background processing starts at the beginning of the document (with special consideration of the visual area, see below). So the iterator needs to know whether it is used for interactive or background checking. It also needs to know whether it is used for grammar checking or another text markup iteration because it needs that information to detect the next (non-checked) paragraph (see below).The most important method is getNextPara and is to return an XFlatParagraph interface to the next paragraph to be checked. Returning an empty reference means there is nothing left to be checked for now. An FPIterator object will keep track of the “current” FP object internally so that it knows how to create the “next paragraph”. The order of the iteration should probably be in reading order but is entirely left to the implementation. Thus especially the following will be allowed:
- The iteration shall skip paragraphs that have been already checked.
- The iteration may end prematurely, for example if automatic grammar checking was meanwhile disabled. Besides that the client of the FPiterator may want to terminate the iteration by releasing the FPiterator object.
- A full iteration will automatically wrap-around at the end of the document and continue from the beginning until no more invalid FP objects are found.
- Theoretically, for automatic grammar checking, it would also be OK to iterate more than once over the same paragraph, e.g. if it was modified while being checked.
- As in general users expect the visible paragraphs to be checked first, a possible implementation of getNextPara is as follows:
- get the first visible FP; if it is not already checked return it
- proceed accordingly with all other visible FPs
- check the FP following the current one and return it of not checked
- proceed accordingly with all other FP until the starting FP is reached
- XGrammarChecker implemented by all grammar checker components
The grammar checker is always provided with the text of the whole paragraph. If it has needs to do so it may check all the (previous) text in the paragraph but it must only report errors within the bound of the current sentences. If it return all errors in that sentence at once (since this is considered to be the best for the user) or only the first one is left to the implementation for the time being. Keeping this interface as simple as possible basically makes it possible to wrap a C interface behind this API. This allows easier integration of a grammar checkers without using UNO.
- XGrammarCheckingIterator implemented by the service css.linguistic2.GrammarCheckingIterator
The object implementing this interface is the mediator between the grammar checkers and the document (which both should not know about each other). Especially it provides the grammar checking dialog and the context menu with the required data and interfaces to change the text. Decoupling of text block access and checking makes it possible to avoid access to the document directly from the grammar checker and so makes the implementation of multi-threaded access much easier.
This interface provides a call-back function that is used by the GrammarCheckingIterator to provide the specific client with the result of the grammar checking and have it act accordingly (mark wrong parts, fill the context menu or have the dialog show the new sentence with it's errors and corrections).
This interface allows a interested client to register as listener and thus getting informed about the grammar checking results.