Blogging an OpenOffice.org bugfix
Found a trivial, but longstanding bug to fix:
It's a typo, a misspelling in a pop-up menu in the formula editor. Obscure? Yes! I've used the formula editor seriously in the past, and did not know (or had forgotten) that a pop-up menu even existed. It's obscure to others too, this particular bug has been on the books for almost five years. Fixing it just means finding the string, correcting it, then managing to propagate the fix into the main stream of OOo. An expert would find it trivial, knowing the code, and the processes; but it is the code and the process which are intimidating to the newcomer. Maybe I should try fixing this one, and blogging the experience, as a help to others. There have been longstanding concerns in the community as to how difficult it is to contribute, and there's a fork (ooo-build) whose sole purpose seems to be that it's easier to add to. A couple years ago I tried getting started with ooo development, and, after some dozens of hours of work, managed to end up having done a build. (By comparison, even other large projects that I've worked on required no more than "cvs co source; make dep; make; make install".) But I never localized the problem I was working on, and just working with the build was so difficult, (much less trying to get my patch accepted), that I let the project slide. Maybe it's time to try again.
First step: source code. I recall that starting from a "developer snapshot" seems like the right way to go. There's no guarantee that the source tree is compilable, being simultaneously worked on by 100+ professionals making changes, hence the whole Child WorkSpace (CWS) integration system for making and integrating discrete changes. Some of these are integrated into snapshots, which provide slightly tested milestones for developers to work from. So I need to figure out which snapshot to get, and how to get it. SVN?
Time spent: ~1 hr.
Some web surfing and reading of mailing lists later, I eventually start googling guesses at what the snapshot would be named, and find the page http://development.openoffice.org/releases/3.1.m39_snapshot.html. Goodness knows what the right way to do this is. And following http://tools.openoffice.org/svn_checkout.html, I should do
I'll try that, and see what happens. Looks like I'm getting lots of code...
Time: ~30 min
While my disk is filling with source, I'll do some reading in preparation for attempting a build. I'll follow http://wiki.services.openoffice.org/wiki/AquaBuild, since I'm on OS X 10.5.6, without much else on it, and have XCode installed (XCode.app reports version 3.0). Next, Java: I'm supposed to use J2SDK 1.4.2 or 1.5.0, not 1.6, but the Applications/Utilities/Java I'm supposed to use to fix this isn't an app, but a directory. There is an app called "Java Preferences", which reports I'm using J2SE 5.0, or could use J2SE 1.4.2. Uh, SDK != SE, but maybe I can just hope. I also need "gperf", whatever that is, but I have something with that name in /usr/bin. "ccache" is recommended but optional, I'll ignore it until later. I'll also need to do some other futzing.
Time: ~30 min
Eventually (>1 hr?) the code is all here. I create a OO subdirectory under ~/src, so that my tree is ~/src/OO/DEV300_m39, and a Patches directory under OO. I get the seamonkey patch AquaBuild tells me I need, and eventually apply it with "patch -p0 < ../Patches/moz2seamonkey_connectivity.diff". Got the prebuilt moz stuff, still have no idea what an office suite is doing with a web browser. The AquaBuild page implies I need to rename the zips before copying them, but not sure what I'm supposed to rename them to, so I left their names as is, and put them in moz/zipped. Copied the build script, deleted the languages I don't need. Linked the build script. Read through it, looks as okay as it can given how little I know. Tried to run it, didn't work (permissions), fixed and tried again...
Didn't work, there was a stray space in build.sh on the wiki. Fixed it in my code and on wiki. It completed this time, with lots of output, and a "WARNINGS ISSUED", but completed. Hopefully that's not important, let's go on.
Time: ~30 min
./bootstrap just worked! Amazing.
Now I'm supposed to start the build, either w/ or w/o a "-dlv-switch". Huh? So which command am I supposed to try? Probably with it, since I'm supposed to look at some issue if I "have problems", so with it must be preferred. I try the first command, but build complains that --dontgraboutput is used only with --html... So let's just delete that, and try:
build --all -P4 --dlv_switch -link
This starts spinning, and making lots of output. Good, good. AquaBuild says this will take many hours, last time I tried this a build took ~24 hrs.
Time: ~15 min
Localizing the bug. This is normally a demanding phase, esp for someone who has no/little experience with a project, and even more so for OO.o, which has of order a milllion lines of code to work through. One would normally need to get some sense of the structure of the code, how things work together, learn enough to guess where to start looking, read lots of code, start recompiling with printf's inserted, to test various hypotheses... Here, though, it's easy, cause the problem is a typo, so I can just search for it. Rather than grep on my own machine, I'll use OpenGrok http://svn.services.openoffice.org/opengrok/, which finds the right file in ~30 seconds. It's starmath/source/commands.src:1190.
Time: ~1 min, plus 10x longer to write down how easy it was.
The build halted with:
1 module(s): apple_remote need(s) to be rebuilt
ERROR: error 65280 occurred while making /Users/ridgway/src/OO/DEV300_m39/apple_remote
Hm, looks like this milestone isn't tested to compile on OS X. Now we need to do lots of figuring out what happened. There are some binaries in a subdirectory (unxmacxi.pro/slo) of apple_remote, so there was an attempt at compiling. I don't know how to recompile from the directory, though: "make" isn't the answer. Let's just try issuing the "build" command again from the top. I think P4 meant 4 processors, which is too many; I'll change that to 2. Took a long time, and produced same result. No wait: examining output closely, it does try to compile something:
Building module apple_remote Running processes: 2 /Users/ridgway/src/OO/DEV300_m39/apple_remote ------------- Running processes: 1 [...]
Making: ./unxmacxi.pro/slo/GlobalKeyboardDevice.obj ------------- gcc -fsigned-char -fmessage-length=0 -malign-natural -c -O2 -fno-strict-aliasing -I. -I./unxmacxi.pro/inc/AppleRemote -I../inc -I./inc/pch -I./inc -I./aqua/inc -I./unx/inc -I./unxmacxi.pro/inc -I. -I/Users/ridgway/src/OO/DEV300_m39/solver/300/unxmacxi.pro/incdont_use_stl -I/Users/ridgway/src/OO/DEV300_m39/solver/300/unxmacxi.pro/inc/external -I/Users/ridgway/src/OO/DEV300_m39/solver/300/unxmacxi.pro/inc -I/Users/ridgway/src/OO/DEV300_m39/solenv/unxmacxi/inc -I/Users/ridgway/src/OO/DEV300_m39/solenv/inc -I/Users/ridgway/src/OO/DEV300_m39/res -I/Users/ridgway/src/OO/DEV300_m39/solver/300/unxmacxi.pro/incdont_use_stl -I/System/Library/Frameworks/JavaVM.framework/Versions/Current/Headers -I/System/Library/Frameworks/JavaVM.framework/Headers -I/Users/ridgway/src/OO/DEV300_m39/solver/300/unxmacxi.pro/inc/offuh -I. -I./res -I. -pipe -fsigned-char -malign-natural -Wall -Wendif-labels -Werror -fobjc-exceptions -fPIC -fno-common -DMACOSX -DUNX -DVCL -DGCC -DC341 -DINTEL -DCVER=C341 -DGLIBC=2 -D_PTHREADS -D_REENTRANT -DNO_PTHREAD_PRIORITY -DX86 -DSTLPORT_VERSION=400 -D_USE_NAMESPACE=1 -DQUARTZ -DMAC_OS_X_VERSION_MIN_REQUIRED=1040 -DHAVE_GCC_VISIBILITY_FEATURE -D__DMAKE -DUNIX -DCPPU_ENV=gcc3 -DGXX_INCLUDE_PATH=/usr/include/c++/4.0.0 -DSUPD=300 -DPRODUCT -DNDEBUG -DPRODUCT_FULL -DOSL_DEBUG_LEVEL=0 -DOPTIMIZE -DCUI -DSOLAR_JAVA -DSHAREDLIB -D_DLL_ -o ./unxmacxi.pro/slo/GlobalKeyboardDevice.o GlobalKeyboardDevice.m
But there's never lines like these:
deliver -- version: 266154 Module 'jpeg' delivered successfully. 0 files copied, 7 files unchanged
for module apple_remote. So there's a silent compilation error. Ahh! It's not silent!
cc1obj: warnings being treated as errors GlobalKeyboardDevice.m: In function '-[GlobalKeyboardDevice registerHotKeyCode:modifiers:remoteEventIdentifier:]': GlobalKeyboardDevice.m:182: warning: passing argument 5 of 'RegisterEventHotKey' makes integer from pointer without a cast dmake: Error code 1, while making './unxmacxi.pro/slo/GlobalKeyboardDevice.obj'
The error was just out of order, because of the multiple processes writing simultaneously. And the problem is known: http://qa.openoffice.org/issues/show_bug.cgi?id=96554, with a patch. Download it, apply it with patch -p0 < ../Patches/..., and try to figure out how to follow the instruction: "Attention: if you build and deliver the above module(s) you may prolongue your the build issuing command "build --from apple_remote"". Apparently, says http://tools.openoffice.org/dev_docs/build_linux.html, to do this we "build; deliver" in the module directory. So I try it:
Looks good, even though I had different options on build. Wonder if this will cause a problem. Still, press on! This time, I use the options from earlier, just in case it matters.
:instsetoo_native ridgway$ build --all --from apple_remote --dlv_switch -link -P2
After warning me against doing incompatible builds (as if I would know what was incompatible), the spinning begins.
Time: 1.5 hr
... analyzing files ... ERROR: The following files could not be found: ERROR: File not found: libjpipe.jnilib ... cleaning the output tree ... ... removing directory /tmp/ooopackaging/i_885101232338149 ...
************************************************** ERROR: ERROR: Missing files in function: remove_Files_Without_Sourcedirectory **************************************************
************************************************** ERROR: Saved logfile: /Users/ridgway/src/OO/DEV300_m39/instsetoo_native/unxmacxi.pro/OpenOffice/dmg/logging/en-US/log_DEV300_en-US.log ************************************************** Sun Jan 18 21:09:26 2009 (00:17 min.) dmake: Error code 255, while making 'openoffice_en-US.dmg' Running processes: 0
1 module(s): instsetoo_native need(s) to be rebuilt
ERROR: error 65280 occurred while making /Users/ridgway/src/OO/DEV300_m39/instsetoo_native/util
Attention: if you build and deliver the above module(s) you may prolongue your the build issuing command "build --from instsetoo_native"
Is this bad, though? How far did we get? Is there a bundle? No, instsetoo_native/unxmacxi.pro/OpenOffice/dmg just has a logging directory, containing a log which complains about libjpipe.jnilib. Huh, sounds like some Java thing.
Time: 10 min
This one is known, too: http://qa.openoffice.org/issues/show_bug.cgi?id=93516.
build --all --from instsetoo_native --dlv_switch -link -P2
Except that didn't work (same problem). I tried redoing build; deliver inside jurt/ first, then we'll see.
Time: 30 min
Woot! It says: "Successful packaging process!" All right, now let's see if it runs. It does, and the About... box even mentions my name. Still has the bug I'm fixing, plus the other usual ones besides.
Time: 10 min
Fixing the bug.
Permuting two letters in a string, quite simple.
Time: 10 sec.
Recompiling. Does this funny "build" thing know how to find just the files which need recompiling? We'll find out.
build --all --dlv_switch -link -P2
And off it spins. While I'm waiting...
Making the patch.
DEV300_m39 ridgway$ svn diff starmath/source/commands.src > ../Patches/i30642.diff
And yeah, tested new version, and it looks okay. Now the hard part: getting it submitted, and through Sun's process.
Time: 30 min
Day total: 5.75 hrs
Attached the patch to the issue. Not sure if the paths in the patch file are the usual ones.
Time: ~5 min, once I got the nerve up to actually send in such a trivial change.
Now beginning the process. Reading http://contributing.openoffice.org/programming.html#jca, it doesn't look so bad: I need to 1) sign the JCA 2) get "the committer" to approve the code. The JCA isn't so bad: apparently some people have had a problem with it, but me, I'll sign anything. Downloaded it, filled it out by hand (why is it not a fillable form?), and signed. I even reviewed it, just out of curiousity, which added some time.
Time: 30 min
More work on jca (or is it sca?). Trying scanner+email route, as I wasn't making progress on faxing. Takes some work to get my scanner to work with my machine, my wife has used it, but I haven't. Apparently it's pretty rare for me to actually need to fax/scan physical paperwork. Sent off an email to email@example.com, we'll see what happens.
Time: ~1 hr
Stuffed this log into my wiki page, too. Now I'm outed.
Time: 10 min
Checked out the issue in IZ again. Looks like the patch got noticed rapidly, amazing. So I guess I got noticed before. There was some discussion as to when to include/fix it, very reasonable, since so close to a release. Why didn't I get emailed? Not on the cc list, I guess, but in the past I've gotten emails on issues I've touched without being on cc list. Interesting.
Time: 15 min
Hm: considering that the patch has been accepted for integration, and given a specfic target release for integration, I'm going to accept that my work is done. There remains various bits of hard, unfun work: the whole CWS stuff etc, but I can let the leads do this. So all that's left for me is to write up the experience. My total amount of work is a little under 8 hrs. This is almost entirely overhead (the fix was trivial remember), and a good deal of the overhead was unnecessary in retrospect. In calendar time, I managed to get a working compile on the first day, and built a patch the same day. After the patch was submitted (incorrectly!), I got a response (in fact, responses from three separate individuals!) and a commitment to a specific release target within 15 hrs. Not bad; in fact, far better than I'd expected. Presumably, this part could have been harder if there had been concerns of any kind (copyright, suitability, ...) but it is nevertheless a demonstration of the best case responsiveness. So when would a user get to see the benefit? The 3.2 release is scheduled for Sep 2009, a full 8 months from now, but one has to look at risk vs. reward. The benefit is very close to zero, and there is always risk, especially so close to code freeze for 3.1. So I'm not arguing.
Overall experience? Substantially better than I'd expected. Perhaps I'll try something more interesting next time.