From Apache OpenOffice Wiki
Jump to: navigation, search

This page intends to collect various architectural deficiencies (aka the pet peeves of various people) of, and lists the areas where's work in progress to improve on the architecture.

Depending on the specific count algorithm, OOo consists of approximately 7E6 lines of code (the overwhelming lot being c++, all other being an order of magnitude less (Java, Perl, Basic, Python)). This sheer size in and of itself is a problem - the code base is notorious for crashing or slowing down to a crawl various software engineering tools, from debugger to dependency analysis to reverse design extraction.

The code itself varies greatly in quality, style, and age (the latter invariably leading to the former, if you recall the history and evolvement of c++), with parts being there virtually unmodified for 10+ years, and others just recently written from scratch.

Taken together, this leads to a lot of complexity and redundancy, which is very hard to remove. What follows are some concrete instantiations of the aforementioned symptoms.

Infrastructure Improvements

  • Speeding up the build system, and maybe even make it consider global dependencies (currently, OOo has the notion of modules, which approximately map to toplevel directories in the build tree. Automatic build-time dependency calculation is currently only available on the intra-module level).
  • Making the actual design more accessible, improving upon existing solutions like LXR or Bonsai. Ultimately, this should result in refactorings of the source code being both much easier and much safer than today, by providing information where and how specific functionality is used. A prerequisite for that would be a parser that really knows about c++ - gccxml might be a starting point.

Runtime System Improvements

This is about making the implementation languages safer, and easier to use. What follows could also be subsumed under "transparency on the implementation level". When something can be used transparently, or appears transparent to a user, it is an implementation aspect she need not care about. Being able to program in an environment which is transparent with regard to lots of aspects, empowers the developer to focus on the problem at hand, not having to litter her code with mundane tasks such as memory management or locking.

  • Make threading transparent. Currently, fulfilling the contract of a UNO component regarding thread-safeness is
  1. tedious work, because normally each involved object has to acquire and release a mutex on method entry and exit, respectively
  2. almost impossible to get right, let alone verified to work correctly (no races, no deadlocks), because of the sheer mass of involved objects and mutices (the number of distinct states that would have to be checked for a proper verification is intractable for anything but the most trivial examples). The upcoming UNO Threading framework makes thread-safeness transparent, by automatically locking and unlocking when entering or exiting components on a much coarser level than single methods.
  • Make other mundane stuff transparent. Like memory management (via garbage collection, or refcounting via smart ptrs, UNO reference), or transactionality
Personal tools