Writer2xhtml

From Apache OpenOffice Wiki
Jump to: navigation, search

Writer2xhtml is part of Henrik Just's Writer2LaTeX package. There is a a wiki page here concerning its use with LaTeX. But (somewhat confusingly for a package called Writer2LaTeX), it's Writer2xhtml function is equally good. (There is also a slightly less-developed Calc2xhml function included).

Writer2xhtml can be run either within OpenOffice.org (as an extension), or externally (via command line invocation, or embedded into a Java program). However it is run, it exports substantially more complete and better-quality (X)HTML output than OpenOffice.org's built-in XHTML or HTML options.

OpenOffice.org's built-in HTML export was probably fine in 1998, but is ugly as sin, badly styled, and laden with clunky FONT tags. Its built-in XHTML export is fine if you prefer ODT's "automatic" P and T styles to those that you defined, and if you don't mind content like footnotes being silently omitted.

Writer2xhtml suffers from none of these problems. Its XHTML output is modern, clean, and easily parsed--an important thing if you intend to post-process it with XML-based tools. Its XHTML nicely uses styles that users defined, rather than those OpenOffice.org uses internally. It does not drop footnotes or screw up image links. It does have some weaknesses. For example, it is arguably excessive about stipulating lang, xml:lang, and dir attributes, though others might say it is only being "precise." Its table output is correct (even for tables with row and column spans, supported in OpenOffice.org only since 2.3 in late 2007)--but its tables heavily use (overuse?) the style attribute. But these are nits in the grand scheme of things.

Beyond clean, correct, complete output, Writer2xhtml has the virtue of a configurable mechanism for mapping between ODT elements/styles and XHTML elements/styles. So italics in your ODT can, for example, be mapped to i or em elements, or whatever you like.

Personal tools