Filtering Process

From Apache OpenOffice Wiki
< Documentation‎ | DevGuide
Revision as of 20:40, 30 September 2008 by Mba (Talk | contribs)

Jump to: navigation, search




In OpenOffice.org the whole process of loading or saving content is a modular system based on UNO services. Some of them are abstract (like e.g. the com.sun.star.document.ExtendedTypeDetection and the filter services) and so allow to bind extendable sets of instances implementing them, others (like e.g. the com.sun.star.document.TypeDetection service) are those that define the work flow. As they are exchangeable like any UNO service the whole process of finding and using filters can be changed without any need to change other involved components.

Loading content

The most general way to load content into OpenOffice.org is calling the com.sun.star.frame.XComponentLoader:loadComponentFromURL() method of a suitable object. Such object may be the com.sun.star.frame.Desktop object or any instance of the com.sun.star.frame.Frame service. Content loaded this way will end up in a frame object always, if called at the desktop the method will find or create this frame using some of the passed arguments as described in the API documentation linked above. Then it will forward the call to this frame. Here's a diagram showing the workflow that will be explained in the following parargraphs.

General Filtering Process

The content will be passed to the loadComponentFromURL() call as a com.sun.star.document.MediaDescriptor service that here is implemented as a Sequence of com.sun.star.beans.PropertyValue. In most cases it will contain several properties that allow to create an object implementing com.sun.star.io.XStream or com.sun.star.io.XInputStream that can be used to read the content. It also may contain some properties that the code of other objects (filter, model, controller, frame etc.) can use to steer the loading process. If no properties shall be handed over and the file content is specified by a URL only, the URL can be passed as an explicit argument and the MediaDescriptor can stay empty. To understand how to work with the MediaDescriptor in the implementation of a filter or elsewhere see the documentation of it in the chapter about loading documents.

The component loader uses instances of the com.sun.star.frame.FrameLoader or com.sun.star.frame.SynchronousFrameLoader services. Which frame loader instance will be used depends on the type of the content. This type must be detected first (see below) based on the TypeDetection configuration that allows to register filters or frame loaders for a particular type. OpenOffice.org has a generic Frame Loader service that is used when the detected type has no own frame loader registered but filters. If a custom frame loader is registered for a particular type, it's up to that implementation how the content loading process is carried out and if it uses filters or not. As the current topic is "filters", we will concentrate on the generic frame loader here.

Template:Documentation/Note

To load content based on a filter first it must be detected which filter is the right one to use and which document type must be used for this filter to work properly. As basically any content type may be loaded into any available document type and even the same type could be loaded into the same document type in different ways, we could find many registered filters for a particular content type. Finding the right one by evaluating what is passed in the MediaDescriptor is the job of the com.sun.star.document.TypeDetection service. The result of this detection will be the name of the content type, the name of the wanted filter and the service name of the document model that shall be the target of the loading process. These results will be placed into the MediaDescriptor so that any code in other objects called later can use that information. By providing either the type name of the content or the document service name in the MediaDescriptor handed over to the component loader the search for a filter can be narrowed down to a subset of filters that match these criteria. By providing a filter name in the MediaDescriptor the detection can even be bypassed completely (the component loader will add the matching type and document service names to the MediaDescriptor though). The whole process of the Type Detection is explained below.

The next steps will be managed by the the generic com.sun.star.frame.SynchronousFrameLoader service and hands the target frame over to it. The Frame Loader will create the document of the wanted type using the document service name found in the MediaDescriptor. It will also take the detected filter name and ask the com.sun.star.document.FilterFactory service to create the filter and perhaps initialize it with some necessary parameters and ask it for importing the content into the new document (this is described in the chapter about filters). If all of this went fine, it will attach the document to the target frame by creating a Controller object for the document model.

Storing content

A URL or a stream is passed to storeToURL() or storeAsURL() in the interface com.sun.star.frame.XStorable, implemented by office documents. The store properties create a media descriptor that is filled with the URL or stream, and the store properties. If the MediaDescriptor contains a type name or a filter name, the suitable export filter will be created using the FilterFactory. If neither of them is provided, the document will be stored with the latest ODF filter.

TypeDetection

Every content to be loaded must be specified, that is, the type of content represented in the OpenOffice.org must be well known in OpenOffice.org. The type is usually document type,.however, the results of active contents, for example, macros, or database contents are also described here.

A special service com.sun.star.document.TypeDetection is used to accomplish this. It provides an API to associate, for example, a URL or a stream with the extensions well known to OpenOffice.org, MIME types or clipboard formats. The resulting value is an internal unique type name used for further operations by using other services, for example, com.sun.star.frame.FrameLoaderFactory. This type name can be a part of the already mentioned MediaDescriptor.

It is not necessary or useful to replace this service by custom implementations.,It works in a generic method on top of a special configuration. Extending the type detection is done by changing the configuration and is described later. It is required to make these changes if new content formats are provided for [OpenOffice.org, because this is the reason to integrate custom filters into the product.

The TypeDetection also employs the com.sun.star.document.ExtendedTypeDetection that examines the given resource and confirms the unique type name determined by TypeDetection. The MediaDescriptor is updated, if necessary, and a unique type name is returned.

ExtendedTypeDetection

Based on the registered types, flat detection is already possible, that is,. the assignment of types, for example, to a URL, on the basis of configuration data only. Tlat detection cannot always get a correct result if you imagine someone modifying the file extension of a text document from .odt to .txt.. To ensure correct results, we need deep detection, that is, the content has to be examined. The com.sun.star.document.ExtendedTypeDetection service performs this task. It is called detector. It gets all the information collected on a document and decides the type to assign it to. In the new modular type detection, the detector is meant as a UNO service that registers itself in the OpenOffice.org and is requested by the generic TypeDetection mechanism, if necessary.

To extend the list of the known content types of OpenOffice.org, we suggest implementing a detector component in addition to a filter. It improves the generic detection of OpenOffice.org and makes the results more secure.

Inside OpenOffice.org, a detector service is called with an already opened stream that is used to find out the content type. In case no stream is given, it indicates that someone else uses this service, for example, outside OpenOffice.org). It is then allowed to open your own stream by using the URL part of the MediaDescriptor. If the resulting stream is seekable, it should be set inside the descriptor after its position is reset to 0. If the stream is not seekable, it is not allowed to set it. Please follow the already mentioned rules for handling streams.

Content on this page is licensed under the Public Documentation License (PDL).
Personal tools