Pdf Import Extension/Current Architecture

From Apache OpenOffice Wiki
Jump to: navigation, search

Currently, the PDF import extension utilizes xpdf for parsing the pdf file, and generating a bunch of low-level output operations to synthesize an ODF document.

This is a bit cumbersome, as xpdf is GPL licensed, which makes it necessary to run it completely out-of-process for OOo (being LGPL-licensed). A dedicated replacement parser is in the making (filter/source/pdfimport/pdfparse), will take some time to be on par with xpdf, though.

The way PDF files get imported looks like this:

Pdf architecture.png

That is, once triggered from the framework filter configuration, the importer component passes on the filename of the pdf file to the xpdf executable, which loads and parses it, generating a bunch of pretty low-level drawing commands (like "put a glyph at position (x,y)") on stdout. This, in turn, is then read back from the office process, put into a tree structure page-wise, which is afterwards worked upon to combine glyhs, polygons etc. into pieces a bit more sensible to the user (draw shapes, and actual paragraphs of text).

Tree classes

This is the inheritance graph of the classes representing the graphical document tree:

Pdfimport-tree-nodes.png

Text gets merged into paragraph elements (according to locality and general text direction), which in turn reside in frame elements.

Output generation classes

This is the interface and the two existing classes generating actual document output:

Pdfimport-odfgenerator.png

Specifically, the XmlEmitter interface is defined like this:

   /** Output interface to ODF
   
       Should be easy to implement using either SAX events or plain ODF
    */
   class XmlEmitter
   {
   public:
       virtual ~XmlEmitter() {}
       
       /** Open up a tag with the given properties
        */
       virtual void beginTag( const char* pTag, const PropertyMap& rProperties ) = 0;
       /** Write PCTEXT as-is to output
        */
       virtual void write( const rtl::OUString& rString ) = 0;
       /** Close previously opened tag
        */
       virtual void endTag( const char* pTag ) = 0;
   };

Low-level event input

This is the interface and the existing implementation receiving the low-level output commands from the pdf file (the "draw glyph at (x,y)" type of input):

Pdfimport-contentsink.png

There's one more class of this type in the unit test directory filter/source/pdfimport/test, implementing a stub device that just checks basic event generation sanity.

Specifically, the ContentSink interface is defined like this:

   struct ContentSink
   {
       virtual ~ContentSink() {}
       
       /// Total number of pages for upcoming document
       virtual void setPageNum( sal_Int32 nNumPages ) = 0;
       virtual void startPage( const ::com::sun::star::geometry::RealSize2D& rSize ) = 0;
       virtual void endPage() = 0;
       
       virtual void hyperLink( const ::com::sun::star::geometry::RealRectangle2D& rBounds,
                               const ::rtl::OUString&                             rURI ) = 0;
       
       virtual void pushState() = 0;
       virtual void popState() = 0;
       
       virtual void setFlatness( double ) = 0;
       virtual void setTransformation( const ::com::sun::star::geometry::AffineMatrix2D& rMatrix ) = 0;
       virtual void setLineDash( const ::com::sun::star::uno::Sequence<double>& dashes,
                                 double                                         start ) = 0;
       virtual void setLineJoin( sal_Int8 lineJoin ) = 0;
       virtual void setLineCap( sal_Int8 lineCap ) = 0;
       virtual void setMiterLimit(double) = 0;
       virtual void setLineWidth(double) = 0;
       virtual void setFillColor( const ::com::sun::star::rendering::ARGBColor& rColor ) = 0;
       virtual void setStrokeColor( const ::com::sun::star::rendering::ARGBColor& rColor ) = 0;
       virtual void setBlendMode( sal_Int8 blendMode ) = 0;
       virtual void setFont( const FontAttributes& rFont ) = 0;
     
       virtual void strokePath( const ::com::sun::star::uno::Reference< 
                                      ::com::sun::star::rendering::XPolyPolygon2D >& rPath ) = 0;
       virtual void fillPath( const ::com::sun::star::uno::Reference< 
                                    ::com::sun::star::rendering::XPolyPolygon2D >& rPath ) = 0;
       virtual void eoFillPath( const ::com::sun::star::uno::Reference< 
                                      ::com::sun::star::rendering::XPolyPolygon2D >& rPath ) = 0;
       
       virtual void intersectClip(const ::com::sun::star::uno::Reference< 
                                        ::com::sun::star::rendering::XPolyPolygon2D >& rPath) = 0;
       virtual void intersectEoClip(const ::com::sun::star::uno::Reference< 
                                          ::com::sun::star::rendering::XPolyPolygon2D >& rPath) = 0;
       
       virtual void drawGlyphs( const rtl::OUString&                               rGlyphs,
                                const ::com::sun::star::geometry::RealRectangle2D& rRect,
                                const ::com::sun::star::geometry::Matrix2D&        rFontMatrix ) = 0;
    
       /// issued when a sequence of associated glyphs is drawn
       virtual void endText() = 0;
       
       /// draws given bitmap as a mask (using current fill color)
       virtual void drawMask(const ::com::sun::star::uno::Sequence<
                                   ::com::sun::star::beans::PropertyValue>& xBitmap,
                             bool                                           bInvert ) = 0;
       /// Given image must already be color-mapped and normalized to sRGB.
       virtual void drawImage(const ::com::sun::star::uno::Sequence<
                                    ::com::sun::star::beans::PropertyValue>& xBitmap ) = 0;
       /** Given image must already be color-mapped and normalized to sRGB. 
    
           maskColors must contain two sequences of color components
        */
       virtual void drawColorMaskedImage(const ::com::sun::star::uno::Sequence<
                                               ::com::sun::star::beans::PropertyValue>& xBitmap,
                                         const ::com::sun::star::uno::Sequence< 
                                               ::com::sun::star::uno::Any>&             xMaskColors ) = 0;
       virtual void drawMaskedImage(const ::com::sun::star::uno::Sequence<
                                          ::com::sun::star::beans::PropertyValue>& xBitmap,
                                    const ::com::sun::star::uno::Sequence<
                                          ::com::sun::star::beans::PropertyValue>& xMask,
                                    bool                                             bInvertMask) = 0;
       virtual void drawAlphaMaskedImage(const ::com::sun::star::uno::Sequence<
                                               ::com::sun::star::beans::PropertyValue>& xImage,
                                         const ::com::sun::star::uno::Sequence<
                                               ::com::sun::star::beans::PropertyValue>& xMask) = 0;
   };
Personal tools