Difference between revisions of "Office Open XML/Legacy Implementation"

From Apache OpenOffice Wiki
Jump to: navigation, search
m (Andre verschob die Seite Office Open XML nach Office Open XML/Legacy Implementation, ohne dabei eine Weiterleitung anzulegen: Keep legacy page for now, create new page for new design.)
 
(30 intermediate revisions by 7 users not shown)
Line 1: Line 1:
"Office Open XML" is an XML based file format that has been published as [http://www.ecma-international.org/publications/standards/Ecma-376.htm ECMA-376]. It is used as default file format by Microsoft Office 2007.
+
= OOXML Basics =
  
There are plans to support this file format in OpenOffice.org for interoperate with Microsoft Office 2007.
+
"Office Open XML" is an XML based file format that has been published as [http://www.ecma-international.org/publications/standards/Ecma-376.htm ECMA-376]. It is used as default file format by Microsoft Office 2007/2010.
  
There are 3 major types of formats
+
There are plans to support this file format in OpenOffice.org for interoperation with Microsoft Office 2007/2010.
 +
 
 +
There are 3 major types of formats with 2 minor types as important supplement.
  
 
* [[WordprocessingML]] - For word processor documents (file extensions may be docx, docm)
 
* [[WordprocessingML]] - For word processor documents (file extensions may be docx, docm)
Line 9: Line 11:
 
* [[PresentationML]] - For presentation documents (file extensions may be pptx, pptm)
 
* [[PresentationML]] - For presentation documents (file extensions may be pptx, pptm)
 
* [[DrawingML]] - Used by other markup language to represent graphics data.
 
* [[DrawingML]] - Used by other markup language to represent graphics data.
 +
* [[VML]] - A legacy vector markup.
 +
 +
== Packaging Conventions ==
 +
OpenXML document is a package that consists of a flat collection of '''"parts"'''.  Each "part" has a case-''insensitive'' part name that consists of a slash (/) delimited sequence of segment names such as "/pres/slides/slide1.xml".
 +
 +
For the most part, the ZIP compression is used to package the parts, in which case the term '''"package"''' refers to the ZIP archive, and the parts refer to the individual files archived within.  The part name in this case is the file path within the archive.
 +
 +
Each part also has a '''content type''', and '''/[Content_Types].xml''' provides the content type of each part within the archive.
 +
 +
== Relationships ==
 +
Packages and parts can contains '''explicit relationships''' to other parts as well as to external resources.  Every explicit relationship has an ID and a type, and relationship types are named using URIs.
 +
 +
The set of explicit relationships for each package or part is stored in a relationship part whose name (or path) follows a specific convention e.g. the relationship part for a part called '''"/a/b/c.xml"''' is called '''"/a/b/_rels/c.xml.rels"'''.  As a special case, the relationship part for the package as a whole is called '''"/_rels/.rels"'''.
 +
 +
= AOO.o Implementation =
 +
 +
Spreadsheet and Presentation import are both in the [[Oox]] module.
 +
Word document import is in the [[WriterFilter]] module.
 +
 +
Some legacy notes:
 +
 +
The initial version of [[Oox]] from the CWS xmlfilter02 has been integrated into SRC680_m243. The continuing [[CWS]] is xmlfilter03 in SRC680. ([http://eis.services.openoffice.org/EIS2/cws.ShowCWS?Path=SRC680%2Fxmlfilter03 view the workspace on EIS])
 +
 +
To fetch the oox code from CVS (using CVSROOT is set properly):
 +
 +
cvs co -r cws_src680_xmlfilter03 -d oox xml/oox
 +
 +
Bonsai is also convenient to follow the changes:
 +
 +
[http://bonsai.go-oo.org/cvsquery.cgi?treeid=default&module=all&branch=cws_src680_xmlfilter03&branchtype=match&dir=xml%2Foox&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=week&mindate=&maxdate=&cvsroot=%2Fhome%2Fooweb%2Fcvsup  the changes done in the last 2 weeks]
 +
 +
== Oox ==
 +
 +
There is some code in [[Oox]] module  from the [[Xml]] project. To fetch the oox code from SVN:
 +
 +
svn co https://svn.apache.org/repos/asf/openoffice/trunk/main/oox aoo/oox
 +
 +
One important note: we use the term '''fragment''' in the name of our source files to correspond with what the standard calls '''part'''.  For instance, the source file that contains class definition that handles the workbook part in [[SpreadsheetML]] is called workbookfragment.cxx.  This convention is prevalent across all application types within [[Oox]] module.
 +
 +
== Writerfilter ==
 +
 +
To fetch code from [[Writerfilter]] module from SVN:
  
There is some code in the oox module ([[OOX]]) from the [[Xml]] project. The [[CWS]] is xmlfilter02 in SRC680. ([http://eis.services.openoffice.org/EIS2/cws.ShowCWS?Path=SRC680%2Fxmlfilter02 view the workspace on EIS])
+
svn co https://svn.apache.org/repos/asf/openoffice/trunk/main/writerfilter aoo/writerfilter
  
To fetch the oox code from CVS (using CVS_ROOT is set properly):
+
== Implementation Generalities ==
  
cvs co -r cws_src680_xmlfilter02 -d oox xml/oox
+
The whole [[Oox]] filter makes use of [[FastParser]] service to implement an event driven [[SAX]] parser.
  
<h2>Implementation Generalities</h2>
+
= How to open in OpenOffice.org 3.4 =
  
The whole [[OOX]] filter makes use of the new [[FastParser]] service to implement an event driven [[SAX]] parser.
+
* [[Documentation/FAQ/General/OpeningMSO2007Files|Opening Microsoft Office 2007 files]]
  
<h2>Various Resources</h2>
+
= Various Resources =
  
 +
* [[OpenOffice_filters_using_the_XML_based_file_format|OpenOffice filters using the XML based file format]]
 +
* [http://books.evc-cit.info/ OASIS OpenDocument Essentials]
 
* http://blogs.sun.com/GullFOSS/entry/office_open_xml_import_filter
 
* http://blogs.sun.com/GullFOSS/entry/office_open_xml_import_filter
* [http://eis.services.openoffice.org/EIS2/cws.ShowCWS?Path=SRC680%2Fxmlfilter02 CWS on EIS]
+
* [http://eis.services.openoffice.org/EIS2/cws.ShowCWS?Path=SRC680%2Fxmlfilter03 CWS on EIS]
 
* [http://www.ecma-international.org/publications/standards/Ecma-376.htm ECMA published standard]
 
* [http://www.ecma-international.org/publications/standards/Ecma-376.htm ECMA published standard]
 
* [http://www.ecma-international.org/news/TC45_current_work/TC45_available_docs.htm Ecma Office Open XML File Formats Standard]
 
* [http://www.ecma-international.org/news/TC45_current_work/TC45_available_docs.htm Ecma Office Open XML File Formats Standard]
 +
* [http://blogs.msdn.com/brian_jones/archive/2005/06/20/430892.aspx Brian Jones introduction to the Office 12 File Format]
  
 +
[[Category:Filter]]
 
[[Category:Office Open XML]]
 
[[Category:Office Open XML]]

Latest revision as of 07:05, 21 May 2014

OOXML Basics

"Office Open XML" is an XML based file format that has been published as ECMA-376. It is used as default file format by Microsoft Office 2007/2010.

There are plans to support this file format in OpenOffice.org for interoperation with Microsoft Office 2007/2010.

There are 3 major types of formats with 2 minor types as important supplement.

  • WordprocessingML - For word processor documents (file extensions may be docx, docm)
  • SpreadsheetML - For spreadsheet documents (file extensions may be xlsx, xlsm)
  • PresentationML - For presentation documents (file extensions may be pptx, pptm)
  • DrawingML - Used by other markup language to represent graphics data.
  • VML - A legacy vector markup.

Packaging Conventions

OpenXML document is a package that consists of a flat collection of "parts". Each "part" has a case-insensitive part name that consists of a slash (/) delimited sequence of segment names such as "/pres/slides/slide1.xml".

For the most part, the ZIP compression is used to package the parts, in which case the term "package" refers to the ZIP archive, and the parts refer to the individual files archived within. The part name in this case is the file path within the archive.

Each part also has a content type, and /[Content_Types].xml provides the content type of each part within the archive.

Relationships

Packages and parts can contains explicit relationships to other parts as well as to external resources. Every explicit relationship has an ID and a type, and relationship types are named using URIs.

The set of explicit relationships for each package or part is stored in a relationship part whose name (or path) follows a specific convention e.g. the relationship part for a part called "/a/b/c.xml" is called "/a/b/_rels/c.xml.rels". As a special case, the relationship part for the package as a whole is called "/_rels/.rels".

AOO.o Implementation

Spreadsheet and Presentation import are both in the Oox module. Word document import is in the WriterFilter module.

Some legacy notes:

The initial version of Oox from the CWS xmlfilter02 has been integrated into SRC680_m243. The continuing CWS is xmlfilter03 in SRC680. (view the workspace on EIS)

To fetch the oox code from CVS (using CVSROOT is set properly):

cvs co -r cws_src680_xmlfilter03 -d oox xml/oox

Bonsai is also convenient to follow the changes:

the changes done in the last 2 weeks

Oox

There is some code in Oox module from the Xml project. To fetch the oox code from SVN:

svn co https://svn.apache.org/repos/asf/openoffice/trunk/main/oox aoo/oox

One important note: we use the term fragment in the name of our source files to correspond with what the standard calls part. For instance, the source file that contains class definition that handles the workbook part in SpreadsheetML is called workbookfragment.cxx. This convention is prevalent across all application types within Oox module.

Writerfilter

To fetch code from Writerfilter module from SVN:

svn co https://svn.apache.org/repos/asf/openoffice/trunk/main/writerfilter aoo/writerfilter

Implementation Generalities

The whole Oox filter makes use of FastParser service to implement an event driven SAX parser.

How to open in OpenOffice.org 3.4

Various Resources

Personal tools