Bouncer

From Apache OpenOffice Wiki
Revision as of 17:20, 7 October 2008 by Fma (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

What is bouncer?

OpenOffice.org bouncer is the main-distribution platform for OpenOffice.org downloads. It is an open-source project of the Oregon state university. The current version of OpenOffice.org bouncer is 2. Some other major open-source projects use bouncer, e.g. Mozilla.org uses bouncer in version 1.

Main difference between version 1 and 2 is:

  • Version 1 supports GeoIP (explained below) - written in Perl
  • Version 2 supports template mechanisms to get download-sets (required for OpenOffice.org) - written in Python

Main reasons to adapt OpenOffice.org bouncer

Some of these named reasons are required to use bouncer in the future, others are optional. We try to adress some of these topics in the near future. Help is very welcome!

Geographical preselection of mirrors depending on IP, aka. GeoIP

Dispatching requests of downloading files from both users via web page and automatic download functionality implemented in OpenOffice.org (aka as product-update-service) to one of the mirror servers depending on the type of requested file and the location of the requester.

This is extremely important for countries with low IT resources and/or very unreliable/slow connections (e.g. Vietnam). It can take several days to download an OpenOffice.org build in Vietnam, and many people give up on the process. In any case, being assigned a download from a mirror on a different continent is noticeably inefficient. Users don't appreciate inefficiency.

Supporting extended mirrors for large volumes

OpenOffice.org is available in many languages on many platforms and needs space for download-sets. The current mirroring has reached limits and need to be extended. The current version 2 doesn't support extended mirror-sets with large-volumes.

This would also allow us to provide full builds for the newer localizations reaching full-release mass (80% interface and Help). We continue to add localizations, which is a huge advantage for OpenOffice.org, but we need extra capacity to host these additional localized builds.

Failed requests should offer alternative downloads

Not all languages provide a complete set of platforms and os for a version. To avoid failures in the download-request we should allow offering alternative downloads. E.g. customer requests OpenOffice.org Danish 2.4.1 for Windows with JRE but we have only a version without JRE. So we should offer Dansih for Windows without JRE or in harder cases we use a standard default for en-US for all other languages as fallback.

SOAP interface to get distributions-status, useful for automated one-click-download

When we try to automate the distribution-process it would be helpful to get the download-status of an OpenOffice.org productivity suite.

We need this both for the Bouncer downloads, and for the native-language project download pages. These native-language download pages are used predominantly for testing pre-release builds, but many people prefer to download these updated builds (usually with an enhanced localization), rather than the last stable release. So both sources of statistics are significant.

Same user-account for OOo-Bouncer as OOo-IssueZilla

It would be nice to use the same account for OpenOffice.org-bouncer like on IssueZilla.

A single-account login (unified sign-on) is one aim of the current ESC Dashboard effort. It makes sense for us all to work towards this efficiency measure, rather than creating further obstacles for it. We can already use our main OpenOffice.org login for tools like the Issue Tracker, EIS, QATrack and Pootle. Let's add Bouncer to our single-login toolbox.

SSL for log-in

Security issue to use http directly for log-in.

Anyone with source-control access is already using SSH/SSL, so it's a sensible security-measure which uses common FLOSS tools.

Better logging and statistics

Logstats only allow counting requests for version and platform. Failed requests won't be ignored, so the number of downloads is too high.

Failed downloads are bad PR for OpenOffice.org. The builds are already discouragingly large, and we can't count on users knowing how to use curl, or necessarily using a download manager or browser with an effective resume function.

Many users will give up after the first failed download. We need to know when this is happening. It would also be useful to know how many download attempts, on the average, are necessary for different geographical locations and/or systems. We would need to combine these stats with the CD/DVD uptake stats (especially for regions where the low quality of Net connections makes install via CD/DVD preferable).

The more we know, the better use we can make of our resources. :)

User-friendlier download URLs

Using Apache's mod_rewrite or similar functionality, it should be possible to generate user-friendlier download URLs, like

http://download.services.openoffice.org/ooo_300_linux_x64_deb_nonjre_de/

Avoid using &lang in URIs

All bouncer links contain "&lang" as a substring. Some broken clients, as you can see by analyzing logs, "interpret" this as the HTML entity "〈" (written "⟨", note semicolon) and download is impossible in those cases. Of course, this is not a bouncer problem in itself, but using "&lg" in the URI should not harm ("&lang" should be accepted too for backwards compatibility)

Involved community members

Please add your email if you want to help us. Thank you very much.

Links

Some other thoughts

  • only check "real" releases, other files won't need all the checks
  • or: only download random chunks of files to verify mirror
  • do load balancing between several Bouncer machines
  • rely on mirrors providing md5sum lists, maybe PGP signed, and verify these first, then check for files afterwards
  • we need better log files
  • the Fedora Mirror Manager might be an option; a contact there would be Matt_Domsch at Dell

Technical aspects

Bouncer web front end

The bouncer web-front end is written in PHP. It allows only access for all functions, there is no super-user.

Database

The database system is MySQL. Core tables for download-requests are

  • downloadables -> large table including all mirrors and the links to the files
  • mirrors -> include all active and inactive mirrors
  • files -> all downloadable files
  • product_versions -> descriptive table for product names and versions
  • oss -> os names and the related extension
  • langs -> table with languages

Checking mirror system with sentry

Sentry is the used checking system for all mirrors. The script is written in Python and needs MySQL-package and Crypto-package.

# sentry.py is the main script
# python sentry.py -help shows you all command-line options
python sentry.py -C 4  will start the full-check of the mirror-system

The python script uses methods like .executemany(...) this needs sometimes increasing some MySQL values in the database. 'max_allowed_packet' is a value which have to be increased. see dynamic system variables.

Known bugs

  • Activating/Deactivating mirrors and afterwards sorting negate setting mirror-status. Root cause is the included process-command inside the order-command.
  • Deleting Product-Version is not possible when related file-combination exists.

Test instance

Oregon State University set up a test instance. Big thanks to Lance Albertson and team!

Current status

The current instance is for testing the 'support for extended mirrors' feature (see above). In detail:

  • Mirrors have a new flag called 'extended'. Should be set if a mirror has extended space available.
  • Templates have a new flag called 'extended'. Must be set if a template is for generating download-URLs on extended mirrors. Please have in mind that, not flagging the template as extended the generated URL will lead in HTTP-error 404 for each sentry-run!
  • sentry support extended mirrors if a a flagged extended mirror exist and a extended template was found to generate download-URLs and validate them.
  • delete all inactive product-versions plus the related files inside the database. This can't be made via UI! Reduces overhead inside the database.

Test scenario

General update for:

  • Updated all mirrors which have extended space available
  • Created a template 'ooo300rc4_extended' for download-URL processing

Test case for a Danish OpenOffice.org rc4 Windows without JRE:

Installing the patch in productive system

The patch is integrated into the productive system. All changes are verified and working. The above named example uses extended mirrors by an extended marked template.

Some thoughts about the usage of extended vs. simple mirrors

If you have the choice to put a download set on an extended or simple mirror means space-limited mirror keep this in mind:

  • IT infrastructure limited regions or countries have in normal cases only simple mirrors and a small bandwith, so it makes sense to put related localized download sets on a simple mirror
  • strong IT infrastructure countries have a great bandwidth and extended mirrors available. Put related download sets on extended mirrors
  • common used languages like English (en-US) or French (fr) should be handled separate because they should be available in all regions (strong AND limited). So best choice is here a simple mirror.
Personal tools