Difference between revisions of "Documentation/DevGuide/OfficeDev/Filtering Process"

From Apache OpenOffice Wiki
Jump to: navigation, search
m (Storing content)
 
(33 intermediate revisions by 5 users not shown)
Line 7: Line 7:
 
|NextPage=Documentation/DevGuide/OfficeDev/Filter
 
|NextPage=Documentation/DevGuide/OfficeDev/Filter
 
}}
 
}}
{{DISPLAYTITLE:Filtering Process}}
+
{{Documentation/DevGuideLanguages|Documentation/DevGuide/OfficeDev/{{SUBPAGENAME}}}}
 +
{{DISPLAYTITLE:Filtering Process}}
 
__NOTOC__
 
__NOTOC__
In {{PRODUCTNAME}} the whole process of loading or saving content is a modular system based on UNO services. Some of them are abstract (like e.g. the <idl>com.sun.star.document.ExtendedTypeDetection</idl> and the filter services) and so allow to bind extendable sets of instances implementing them, others (like e.g. the <idl>com.sun.star.document.TypeDetection</idl> service) are those that define the work flow. As they are exchangeable like any UNO service the whole process of finding and using filters can be changed without any need to change other involved components.
+
In {{AOo}} the whole process of loading or saving content is a modular system based on UNO services. Some of them are abstract (like e.g. the <idl>com.sun.star.document.ExtendedTypeDetection</idl> and the filter services) and so allow to bind extendable sets of instances implementing them, others (like e.g. the <idl>com.sun.star.document.TypeDetection</idl> service) are those that define the work flow. As they are exchangeable like any UNO service the whole process of finding and using filters can be changed without any need to change other involved components.
  
 
===Loading content===
 
===Loading content===
  
The most general way to load content into {{PRODUCTNAME}} is calling the [http://api.openoffice.org/docs/common/ref/com/sun/star/frame/XComponentLoader.html#loadComponentFromURL com.sun.star.frame.XComponentLoader:loadComponentFromURL]() method of a suitable object. Such object may be the <idl>com.sun.star.frame.Desktop</idl> object or any instance of the <idl>com.sun.star.frame.Frame</idl> service. Content will end up in a frame object always, if called at the desktop the method will find or create this frame using some of the passed arguments as described in the API documentation linked above. Then it will forward the call to this frame. Here's a diagram showing the workflow.
+
The most general way to load content into {{AOo}} is calling the <idlm>com.sun.star.frame.XComponentLoader:loadComponentFromURL</idlm>() method of a suitable object. Such object may be the <idl>com.sun.star.frame.Desktop</idl> object or any instance of the <idl>com.sun.star.frame.Frame</idl> service. Content loaded this way will end up in a frame object always, if called at the desktop the method will find or create this frame using some of the passed arguments as described in the API documentation linked above. Then it will forward the call to this frame. Here's a diagram showing the workflow that will be explained in the following paragraphs.
  
[[Image:sequence_diagram_load_url.png|none|thumb|500px|General Filtering Process]]
+
[[Image:sequence_diagram_load_url.png|thumb|center|500px|General Filtering Process]]
  
The content will be passed to the loadComponentFromURL() call as a <idl>com.sun.star.document.MediaDescriptor</idl> service that here is implemented as a Sequence of <idl>com.sun.star.beans.PropertyValue</idl>. In most cases it will contain several properties that allow to create an object implementing <idl>com.sun.star.io.XStream</idl> that can be used to read the content. It also may contain some properties that the code of other objects (filter, model, controller, frame etc.) can use to steer the loading process. If no properties shall be handed over and the file content is specified by a URL only, the URL can be passed as an explicit argument and the MediaDescriptor is empty.
+
The content will be passed to the <code>loadComponentFromURL()</code> call as a <idl>com.sun.star.document.MediaDescriptor</idl> service that here is implemented as a Sequence of <idl>com.sun.star.beans.PropertyValue</idl>. In most cases it will contain several properties that allow to create an object implementing <idl>com.sun.star.io.XStream</idl> or <idl>com.sun.star.io.XInputStream</idl> that can be used to read the content. It also may contain some properties that the code of other objects (filter, model, controller, frame etc.) can use to steer the loading process. If no properties shall be handed over and the file content is specified by a URL only, the URL can be passed as an explicit argument and the MediaDescriptor can stay empty. To understand how to work with the MediaDescriptor in the implementation of a filter or elsewhere, especially how to retrieve a stream from it, see the [[Documentation/DevGuide/OfficeDev/Handling_Documents#MediaDescriptor| documentation of it]] in the chapter about loading documents.
  
Before loading can start, two objects must be found that will work together: the suitable filter and a document model it can load into. As basically any content type may be loaded into any available document model and even the same type could be loaded into said model in different ways, we could have many filters for a particular content type. To find the right one by evaluating what is passed in the MediaDescriptor is the job of the <idl>com.sun.star.document.TypeDetection</idl> service. The result of this detection will be the name of the content type, the name of the wanted filter and the service name of the document model that shall be the target of the loading process. The filter will be created by the <idl>com.sun.star.document.FilterFactory</idl> service.
+
The component loader uses instances of the <idl>com.sun.star.frame.FrameLoader</idl> or <idl>com.sun.star.frame.SynchronousFrameLoader</idl> services. Which frame loader instance will be used depends on the type of the content. This type must be detected first (see below) based on the TypeDetection configuration that allows to register filters or frame loaders for a particular type. {{AOo}} has a generic Frame Loader service that is used when the detected type has no own frame loader registered but filters. If a custom frame loader is registered for a particular type, it's up to that implementation how the content loading process is carried out and if it uses filters or not. As the current topic is "filters", we will concentrate on the generic frame loader here.
  
The <code>TypeDetection</code> also employs the <idl>com.sun.star.document.ExtendedTypeDetection</idl> that examines the given resource and confirms the unique type name determined by <code>TypeDetection</code>. The <code>MediaDescriptor</code> is updated, if necessary, and a unique type name is returned.
+
{{Note|The <idl>com.sun.star.frame.FrameLoader</idl> service is deprecated. If a custom frame loader is registered, it should be a <idl>com.sun.star.frame.SynchronousFrameLoader</idl> service.}}
  
Finally, the component loader ensures there is a frame, or creates a new one, if necessary, and asks a frame loader service (<idl>com.sun.star.frame.FrameLoader</idl> or <idl>com.sun.star.frame.SynchronousFrameLoader</idl>) to load the resource into the frame. Its interface <idl>com.sun.star.frame.XFrameLoader</idl> has a method <code>load()</code> that takes a frame, the <code>MediaDescriptor</code> and an event listener, and creates a <idl>com.sun.star.document.ImportFilter</idl> instance at the <code>FilterFactory</code> to load the resource into the given frame. For this purpose, it calls <code>createInstance()</code> with the filter implementation name (such as <code>com.sun.star.comp.Writer.GenericXMLFilter</code>) or <code>createInstanceWithArguments()</code> with the implementation name and additional arguments used to initialize the filter.
+
To load content based on a filter first it must be detected which filter is the right one to use and which document type must be used for this filter to work properly. As basically any content type may be loaded into any available document type and even the same type could be loaded into the same document type in different ways, we could find many registered filters for a particular content type. Finding the right one by evaluating what is passed in the MediaDescriptor is the job of the <idl>com.sun.star.document.TypeDetection</idl> service. The result of this detection will be the name of the content type, the name of the wanted filter and the service name of the document model that shall be the target of the loading process. These results will be placed into the MediaDescriptor so that any code in other objects called later can use that information. By providing either the type name of the content or the document service name in the MediaDescriptor handed over to the component loader the search for a filter can be narrowed down to a subset of filters that match these criteria. By providing a filter name in the MediaDescriptor the detection can even be bypassed completely (the component loader will add the matching type and document service names to the MediaDescriptor though). As the whole process of the Type Detection is completely based on the configuration, it will be described in the [[Documentation/DevGuide/OfficeDev/Configuring_a_Filter_in_OpenOffice.org|chapter about the TypeDetection configuration]].
  
Then, the loader calls <code>setTargetDocument()</code> and <code>filter()</code> on the <code>ImportFilter</code> service. The <code>ImportFilter</code> creates its results in the given target document.
+
The next steps will be managed by the generic <idl>com.sun.star.frame.SynchronousFrameLoader</idl> service and hands the target frame over to it. The Frame Loader will create the document of the wanted type using the document service name found in the MediaDescriptor. It will also take the detected filter name and ask the <idl>com.sun.star.document.FilterFactory</idl> service to create the filter and perhaps initialize it with some necessary parameters and ask it for importing the content into the new document (this is described in the chapter [[Documentation/DevGuide/OfficeDev/Filter|about filters]]). If all of this went fine, it will attach the document to the target frame by creating a Controller object for the document model.
  
 
===Storing content===
 
===Storing content===
  
A URL or a stream is passed to <code>storeToURL()</code> or <code>storeAsURL()</code> in the interface <idl>com.sun.star.frame.XStorable</idl>, implemented by office documents. The store properties create a media descriptor that is filled with the URL or stream, and the store properties. The <code>TypeDetection</code> provides a unique type name that is used with the <code>FilterFactory</code> to create a <idl>com.sun.star.document.ExportFilter</idl>.  
+
A MediaDescriptor is passed to <code>storeToURL()</code> or <code>storeAsURL()</code> in the interface <idl>com.sun.star.frame.XStorable</idl>, implemented by office documents. It will contain several properties that allow to create an object implementing <idl>com.sun.star.io.XStream</idl> or <idl>com.sun.star.io.XOutputStream</idl> that can be used to store the content. It also may contain some properties that give more information about how the storing process should be done. If no properties shall be handed over and the target file is specified by a URL only, the URL can be passed as an explicit argument and the MediaDescriptor can stay empty. To understand how to work with the MediaDescriptor in the implementation of a filter or elsewhere, especially how to retrieve a stream from it, see the [[Documentation/DevGuide/OfficeDev/Handling_Documents#MediaDescriptor| documentation about it]] in the chapter about loading documents.
  
The <code>XStorable</code> implementation calls <code>setSourceDocument()</code> and <code>filter()</code> at the filter, which writes the results to the storage specified in the <code>MediaDescriptor</code> passed to <code>filter()</code>.
+
If the MediaDescriptor contains a type name or a filter name, the suitable export filter will be created using the <code>FilterFactory</code>. If neither of them is provided, the document will be stored with the latest ODF filter.
 
+
{{Documentation/Note|Many existing filters are legacy filters. The <code>XStorable</code> implementation does not use the <code>FilterFactory</code> to create them, but triggers filtering by internal calls.}}
+
 
+
In the following, the modules that participate in the loading process are discussed in detail.
+
 
+
=== MediaDescriptor ===
+
 
+
The media descriptor is an abstract description of a content specifying the where from and the how for the handling of the content to be performed. A content is also called a medium. Refer to [[Documentation/DevGuide/OfficeDev/Handling Documents#MediaDescriptor|MediaDescriptor]] for further information. Inside the {{PRODUCTNAME}}, it is realized as a sequence of <idl>com.sun.star.beans.PropertyValue</idl> structs as a parameter.
+
 
+
A descriptor is passed to various methods which are involved in the load and save process.
+
 
+
Every member of the process can use this descriptor and change it to update the information about the document. This descriptor is used as an <code>[inout]</code> parameter by [http://api.openoffice.org/docs/common/ref/com/sun/star/document/XTypeDetection.html#queryTypeByDescriptor com.sun.star.document.XTypeDetection:queryTypeByDescriptor]() and [http://api.openoffice.org/docs/common/ref/com/sun/star/document/XExtendedFilterDetection.html#detect com.sun.star.document.XExtendedFilterDetection:detect](). The <code>MediaDescriptor</code> is <code>[in]</code> only in [http://api.openoffice.org/docs/common/ref/com/sun/star/frame/XComponentLoader.html#loadComponentFromURL com.sun.star.frame.XComponentLoader:loadComponentFromURL](), [http://api.openoffice.org/docs/common/ref/com/sun/star/frame/XFrameLoader.html#load com.sun.star.frame.XFrameLoader:load]() and [http://api.openoffice.org/docs/common/ref/com/sun/star/document/XFilter.html#filter com.sun.star.document.XFilter:filter](). With methods that take the <code>MediaDescriptor</code> as <code>[in]</code> parameter only, a manual synchronization must be done by the outside code. The caller of a method that accepts the <code>MediaDescriptor</code> as <code>[in]</code> parameter only merges the results, for example, return values, manually into the original descriptor. The model is not available at loading time. It is the result of the load request.
+
 
+
{{Documentation/Caution|It is not allowed to hold a member of this descriptor by reference longer than it is used, especially a possible stream item. For example, it would not be possible to close a stream that is still referenced by others. It is only allowed to use it directly or as a copy.}}
+
 
+
 
+
{{Documentation/Tip|The stream part of the <code>MediaDescriptor</code> is a special item. If a stream exists, it must be used. Only if a stream does not exist, is it allowed to open a new one using the URL. The stream should be set in the <code>MediaDescriptor</code> to provide it for following users of the descriptor.One rule exists for all: the stream inside the descriptor should be seekable. In case it is not, it makes no sense to provide it to the other members of the whole process, especially used sub-modules. On the other hand, a module can be called with a non-seekable stream from outside to perform the operation. For example, for detection or loading it should be no problem. In case a non-seekable stream comes in, but seeking is important, it must be used buffered.Another central question is: who controls the lifetime of the stream or the stream position ? The lifetime of a non-seekable stream is controlled by the creator every time. It has to be deleted after using. Seekable streams should be added to the <code>MediaDescriptor</code> and will be released by the creator of the <code>MediaDescriptor</code>. Every (sub-) module must be called with a stream seeked to position 0. Of course, non-seekable streams must be newly created and unused. Internally it can do anything with this stream. Furthermore it is not necessary (or even impossible) to restore any positions. The user of the module has to do such things.}}
+
 
+
=== TypeDetection ===
+
 
+
Every content to be loaded must be specified, that is, the type of content represented in the {{PRODUCTNAME}} must be well known in {{PRODUCTNAME}}. The type is usually document type,.however, the results of active contents, for example, macros, or database contents are also described here.
+
 
+
A special service <idl>com.sun.star.document.TypeDetection</idl> is used to accomplish this. It provides an API to associate, for example, a URL or a stream with the extensions well known to {{PRODUCTNAME}}, MIME types or clipboard formats. The resulting value is an internal unique type name used for further operations by using other services, for example, <idl>com.sun.star.frame.FrameLoaderFactory</idl>. This type name can be a part of the already mentioned <code>MediaDescriptor</code>.
+
 
+
It is not necessary or useful to replace this service by custom implementations.,It works in a generic method on top of a special configuration. Extending the type detection is done by changing the configuration and is described later. It is required to make these changes if new content formats are provided for [{{PRODUCTNAME}}, because this is the reason to integrate custom filters into the product.
+
 
+
=== ExtendedTypeDetection ===
+
 
+
Based on the registered types, flat detection is already possible, that is,. the assignment of types, for example, to a URL, on the basis of configuration data only. Tlat detection cannot always get a correct result if you imagine someone modifying the file extension of a text document from .odt to .txt.. To ensure correct results, we need deep detection, that is, the content has to be examined. The <idl>com.sun.star.document.ExtendedTypeDetection</idl> service performs this task. It is called detector. It gets all the information collected on a document and decides the type to assign it to. In the new modular type detection, the detector is meant as a UNO service that registers itself in the {{PRODUCTNAME}} and is requested by the generic <code>TypeDetection</code> mechanism, if necessary.
+
 
+
To extend the list of the known content types of {{PRODUCTNAME}}, we suggest implementing a detector component in addition to a filter. It improves the generic detection of {{PRODUCTNAME}} and makes the results more secure.
+
 
+
Inside {{PRODUCTNAME}}, a detector service is called with an already opened stream that is used to find out the content type. In case no stream is given, it indicates that someone else uses this service, for example, outside {{PRODUCTNAME}}). It is then allowed to open your own stream by using the URL part of the <code>MediaDescriptor</code>. If the resulting stream is seekable, it should be set inside the descriptor after its position is reset to 0. If the stream is not seekable, it is not allowed to set it. Please follow the already mentioned rules for handling streams.
+
 
+
=== FrameLoader ===
+
 
+
Frame loaders load a detected type. A visual component is expected as the result. Such visual components are:
+
 
+
* trivial components only implementing <idl>com.sun.star.awt.XWindow</idl>
+
* simple office components implementing the <idl>com.sun.star.frame.Controller</idl> service
+
* full featured office components implementing the <idl>com.sun.star.document.OfficeDocument</idl> service.
+
::Further details are found in section [[Documentation/DevGuide/OfficeDev/Framework API|Framework API]].
+
 
+
A frame loader service exist in different versions:
+
 
+
* <idl>com.sun.star.frame.FrameLoader</idl> for asynchronous
+
* <idl>com.sun.star.frame.SynchronousFrameLoader</idl> for synchronous load processes.
+
 
+
It can be searched or created by another service <idl>com.sun.star.frame.FrameLoaderFactory</idl>that is described below. The synchronous version is optional. Both services can be implemented at the same component, but the synchronous version is preferred, if it is supported.
+
 
+
There are two ways to extend {{PRODUCTNAME}} to load a new content format:
+
 
+
* implementing a frame loader that uses its own internal mechanism to create the expected visual component, for example, . local file access.
+
* implementing a filter that does the same,but is used by a generic frame loader implementation.
+
 
+
Note that the first method does not work for exporting, because a loader service can not be used at save time.  To enable a content format for import and export is to provide a filter service. A generic frame loader implementation already exists in {{PRODUCTNAME}} that uses all well known registered filters in a uniform way. So the second method is preferred.
+
  
 
{{PDL1}}
 
{{PDL1}}
  
 
[[Category:Documentation/Developer's Guide/Office Development]]
 
[[Category:Documentation/Developer's Guide/Office Development]]

Latest revision as of 14:34, 9 August 2021



In Apache OpenOffice the whole process of loading or saving content is a modular system based on UNO services. Some of them are abstract (like e.g. the com.sun.star.document.ExtendedTypeDetection and the filter services) and so allow to bind extendable sets of instances implementing them, others (like e.g. the com.sun.star.document.TypeDetection service) are those that define the work flow. As they are exchangeable like any UNO service the whole process of finding and using filters can be changed without any need to change other involved components.

Loading content

The most general way to load content into Apache OpenOffice is calling the loadComponentFromURL() method of a suitable object. Such object may be the com.sun.star.frame.Desktop object or any instance of the com.sun.star.frame.Frame service. Content loaded this way will end up in a frame object always, if called at the desktop the method will find or create this frame using some of the passed arguments as described in the API documentation linked above. Then it will forward the call to this frame. Here's a diagram showing the workflow that will be explained in the following paragraphs.

General Filtering Process

The content will be passed to the loadComponentFromURL() call as a com.sun.star.document.MediaDescriptor service that here is implemented as a Sequence of com.sun.star.beans.PropertyValue. In most cases it will contain several properties that allow to create an object implementing com.sun.star.io.XStream or com.sun.star.io.XInputStream that can be used to read the content. It also may contain some properties that the code of other objects (filter, model, controller, frame etc.) can use to steer the loading process. If no properties shall be handed over and the file content is specified by a URL only, the URL can be passed as an explicit argument and the MediaDescriptor can stay empty. To understand how to work with the MediaDescriptor in the implementation of a filter or elsewhere, especially how to retrieve a stream from it, see the documentation of it in the chapter about loading documents.

The component loader uses instances of the com.sun.star.frame.FrameLoader or com.sun.star.frame.SynchronousFrameLoader services. Which frame loader instance will be used depends on the type of the content. This type must be detected first (see below) based on the TypeDetection configuration that allows to register filters or frame loaders for a particular type. Apache OpenOffice has a generic Frame Loader service that is used when the detected type has no own frame loader registered but filters. If a custom frame loader is registered for a particular type, it's up to that implementation how the content loading process is carried out and if it uses filters or not. As the current topic is "filters", we will concentrate on the generic frame loader here.

Documentation note.png The com.sun.star.frame.FrameLoader service is deprecated. If a custom frame loader is registered, it should be a com.sun.star.frame.SynchronousFrameLoader service.

To load content based on a filter first it must be detected which filter is the right one to use and which document type must be used for this filter to work properly. As basically any content type may be loaded into any available document type and even the same type could be loaded into the same document type in different ways, we could find many registered filters for a particular content type. Finding the right one by evaluating what is passed in the MediaDescriptor is the job of the com.sun.star.document.TypeDetection service. The result of this detection will be the name of the content type, the name of the wanted filter and the service name of the document model that shall be the target of the loading process. These results will be placed into the MediaDescriptor so that any code in other objects called later can use that information. By providing either the type name of the content or the document service name in the MediaDescriptor handed over to the component loader the search for a filter can be narrowed down to a subset of filters that match these criteria. By providing a filter name in the MediaDescriptor the detection can even be bypassed completely (the component loader will add the matching type and document service names to the MediaDescriptor though). As the whole process of the Type Detection is completely based on the configuration, it will be described in the chapter about the TypeDetection configuration.

The next steps will be managed by the generic com.sun.star.frame.SynchronousFrameLoader service and hands the target frame over to it. The Frame Loader will create the document of the wanted type using the document service name found in the MediaDescriptor. It will also take the detected filter name and ask the com.sun.star.document.FilterFactory service to create the filter and perhaps initialize it with some necessary parameters and ask it for importing the content into the new document (this is described in the chapter about filters). If all of this went fine, it will attach the document to the target frame by creating a Controller object for the document model.

Storing content

A MediaDescriptor is passed to storeToURL() or storeAsURL() in the interface com.sun.star.frame.XStorable, implemented by office documents. It will contain several properties that allow to create an object implementing com.sun.star.io.XStream or com.sun.star.io.XOutputStream that can be used to store the content. It also may contain some properties that give more information about how the storing process should be done. If no properties shall be handed over and the target file is specified by a URL only, the URL can be passed as an explicit argument and the MediaDescriptor can stay empty. To understand how to work with the MediaDescriptor in the implementation of a filter or elsewhere, especially how to retrieve a stream from it, see the documentation about it in the chapter about loading documents.

If the MediaDescriptor contains a type name or a filter name, the suitable export filter will be created using the FilterFactory. If neither of them is provided, the document will be stored with the latest ODF filter.

Content on this page is licensed under the Public Documentation License (PDL).
Personal tools
In other languages