Filtering Process

From Apache OpenOffice Wiki
< Documentation‎ | DevGuide
Revision as of 10:16, 30 September 2008 by Mba (Talk | contribs)

Jump to: navigation, search




In OpenOffice.org the whole process of loading or saving content is a modular system based on UNO services. Some of them are abstract (like e.g. the com.sun.star.document.ExtendedTypeDetection and the filter services) and so allow to bind extendable sets of instances implementing them, others (like e.g. the com.sun.star.document.TypeDetection service) are those that define the work flow. As they are exchangeable like any UNO service the whole process of finding and using filters can be changed without any need to change other involved components.

Loading content

The most general way to load content into OpenOffice.org is calling the com.sun.star.frame.XComponentLoader:loadComponentFromURL() method of a suitable object. Such object may be the com.sun.star.frame.Desktop object or any instance of the com.sun.star.frame.Frame service. Content will end up in a frame object always, if called at the desktop the method will find or create this frame using some of the passed arguments as described in the API documentation linked above. Then it will forward the call to this frame.

The content will be passed to the loadComponentFromURL() call as a com.sun.star.document.MediaDescriptor service that here is implemented as a Sequence of com.sun.star.beans.PropertyValue. In most cases it will contain several properties that allow to create an object implementing com.sun.star.io.XStream that can be used to read the content. It also may contain some properties that the code of other objects (filter, model, controller, frame etc.) can use to steer the loading process. If no properties shall be handed over and the file content is specified by a URL only, the URL can be passed as an explicit argument and the MediaDescriptor is empty.

Before loading can start, two objects must be found that will work together: the suitable filter and a document model it can load into. As basically any content type may be loaded into any available document model and even the same type could be loaded into said model in different ways, we could have many filters for a particular content type. To find the right one by evaluating what is passed in the MediaDescriptor is the job of the com.sun.star.document.TypeDetection service. The result of this detection will be the name of the content type, the name of the wanted filter and the service name of the document model that shall be the target of the loading process. The filter will be created by the com.sun.star.document.FilterFactory service.

The TypeDetection also employs the com.sun.star.document.ExtendedTypeDetection that examines the given resource and confirms the unique type name determined by TypeDetection. The MediaDescriptor is updated, if necessary, and a unique type name is returned.

Finally, the component loader ensures there is a frame, or creates a new one, if necessary, and asks a frame loader service (com.sun.star.frame.FrameLoader or com.sun.star.frame.SynchronousFrameLoader) to load the resource into the frame. Its interface com.sun.star.frame.XFrameLoader has a method load() that takes a frame, the MediaDescriptor and an event listener, and creates a com.sun.star.document.ImportFilter instance at the FilterFactory to load the resource into the given frame. For this purpose, it calls createInstance() with the filter implementation name (such as com.sun.star.comp.Writer.GenericXMLFilter) or createInstanceWithArguments() with the implementation name and additional arguments used to initialize the filter.

Then, the loader calls setTargetDocument() and filter() on the ImportFilter service. The ImportFilter creates its results in the given target document.

Storing content

A URL or a stream is passed to storeToURL() or storeAsURL() in the interface com.sun.star.frame.XStorable, implemented by office documents. The store properties create a media descriptor that is filled with the URL or stream, and the store properties. The TypeDetection provides a unique type name that is used with the FilterFactory to create a com.sun.star.document.ExportFilter.

The XStorable implementation calls setSourceDocument() and filter() at the filter, which writes the results to the storage specified in the MediaDescriptor passed to filter().

Template:Documentation/Note

If a URL or an already open stream takes part in the load or save process of the OpenOffice.org, the following services and operations are involved:

General Filtering Process

In the following, the modules that participate in the loading process are discussed in detail.

MediaDescriptor

The media descriptor is an abstract description of a content specifying the where from and the how for the handling of the content to be performed. A content is also called a medium. Refer to MediaDescriptor for further information. Inside the OpenOffice.org, it is realized as a sequence of com.sun.star.beans.PropertyValue structs as a parameter.

A descriptor is passed to various methods which are involved in the load and save process.

Every member of the process can use this descriptor and change it to update the information about the document. This descriptor is used as an [inout] parameter by com.sun.star.document.XTypeDetection:queryTypeByDescriptor() and com.sun.star.document.XExtendedFilterDetection:detect(). The MediaDescriptor is [in] only in com.sun.star.frame.XComponentLoader:loadComponentFromURL(), com.sun.star.frame.XFrameLoader:load() and com.sun.star.document.XFilter:filter(). With methods that take the MediaDescriptor as [in] parameter only, a manual synchronization must be done by the outside code. The caller of a method that accepts the MediaDescriptor as [in] parameter only merges the results, for example, return values, manually into the original descriptor. The model is not available at loading time. It is the result of the load request.

Documentation caution.png It is not allowed to hold a member of this descriptor by reference longer than it is used, especially a possible stream item. For example, it would not be possible to close a stream that is still referenced by others. It is only allowed to use it directly or as a copy.


Template:Documentation/Tip

TypeDetection

Every content to be loaded must be specified, that is, the type of content represented in the OpenOffice.org must be well known in OpenOffice.org. The type is usually document type,.however, the results of active contents, for example, macros, or database contents are also described here.

A special service com.sun.star.document.TypeDetection is used to accomplish this. It provides an API to associate, for example, a URL or a stream with the extensions well known to OpenOffice.org, MIME types or clipboard formats. The resulting value is an internal unique type name used for further operations by using other services, for example, com.sun.star.frame.FrameLoaderFactory. This type name can be a part of the already mentioned MediaDescriptor.

It is not necessary or useful to replace this service by custom implementations.,It works in a generic method on top of a special configuration. Extending the type detection is done by changing the configuration and is described later. It is required to make these changes if new content formats are provided for [OpenOffice.org, because this is the reason to integrate custom filters into the product.

ExtendedTypeDetection

Based on the registered types, flat detection is already possible, that is,. the assignment of types, for example, to a URL, on the basis of configuration data only. Tlat detection cannot always get a correct result if you imagine someone modifying the file extension of a text document from .odt to .txt.. To ensure correct results, we need deep detection, that is, the content has to be examined. The com.sun.star.document.ExtendedTypeDetection service performs this task. It is called detector. It gets all the information collected on a document and decides the type to assign it to. In the new modular type detection, the detector is meant as a UNO service that registers itself in the OpenOffice.org and is requested by the generic TypeDetection mechanism, if necessary.

To extend the list of the known content types of OpenOffice.org, we suggest implementing a detector component in addition to a filter. It improves the generic detection of OpenOffice.org and makes the results more secure.

Inside OpenOffice.org, a detector service is called with an already opened stream that is used to find out the content type. In case no stream is given, it indicates that someone else uses this service, for example, outside OpenOffice.org). It is then allowed to open your own stream by using the URL part of the MediaDescriptor. If the resulting stream is seekable, it should be set inside the descriptor after its position is reset to 0. If the stream is not seekable, it is not allowed to set it. Please follow the already mentioned rules for handling streams.

FrameLoader

Frame loaders load a detected type. A visual component is expected as the result. Such visual components are:

Further details are found in section Framework API.

A frame loader service exist in different versions:

It can be searched or created by another service com.sun.star.frame.FrameLoaderFactorythat is described below. The synchronous version is optional. Both services can be implemented at the same component, but the synchronous version is preferred, if it is supported.

There are two ways to extend OpenOffice.org to load a new content format:

  • implementing a frame loader that uses its own internal mechanism to create the expected visual component, for example, . local file access.
  • implementing a filter that does the same,but is used by a generic frame loader implementation.

Note that the first method does not work for exporting, because a loader service can not be used at save time. To enable a content format for import and export is to provide a filter service. A generic frame loader implementation already exists in OpenOffice.org that uses all well known registered filters in a uniform way. So the second method is preferred.

Content on this page is licensed under the Public Documentation License (PDL).
Personal tools