Calc/Proposal DataPilot byIBM

From Apache OpenOffice Wiki
Jump to: navigation, search

DataPilot Performance Enhancement I

Specification Status
Author Wang Xu Ming, Helen Yue
Last Change See wiki history

Background

DataPilot is a critical function to Spreadsheet users. In Symphony 1.2 and 1.3 release, IBM Lotus Symphony Spreadsheet team developed the new UI and some new features for DataPilot based on OpenOffice 1.1 code base and also with some code merged from OpenOffice 2.4.

With our new UI, users can drag and drop to generate the DataPilot table based on the page/column/row/data fields, and change the structure easily. Furthermore, the function like DataPilot usually will process thousands of data records. Performance is one of our major focus.

During the development, test team found that there was serious performance problem when user create or update a DataPilot table. This specification will describe what performance issues we have found, and the enhancement we did on the algorithm that created and output a DataPilot table. Although the code base (2.4 and 3.1) is different, the solution should also apply to the new code base.

We are currently merging the code to OpenOffice.org 3.1 code base and will continue the enhancement work.

Test Result

Low performance when update a datapilot table

We defined 6 scenario, and test it to a sample DataPilot table which have 5000 rows as data source.

Test environment: Hardware: IBM T30 CPU: 2.4 GHz Memory:1.0 GB Operation System: Window XP SP2

Below table is the test result to OpenOffice 3.0.0. Scenario 3 and 4 showed big delays to generate the table, while another office product can complete all the scenarios within 3 seconds.

Test Scenario Open Office 3.0.0
Page: 1 Column: 2 Row: 1 Data: 1

Action: - Add Product to Row

3.15s
Page: 1 Column: 2 Row: 1 Data: 1

Action:- Product ID to Data

3.06s
Page: 1 Column: 3 Row: 3 Data: 1

Action:- Add Product to Row

25.28s
Page: 1 Column: 3 Row: 3 Data: 1

Action:- Add Product ID to Data

44.21s
Page: 1 Column: 2 Row: 2 Data: 3

Action:- Add SalesRep to Data

6.28s
Page: 1 Column: 2 Row: 1 Data: 1

Action:-Change the function of Revenue from Sum to Max

6.03s

Crash

Insert two field into row area ( Each field have about 1000 members ),causes freezing and crash.

Code Analysis

First, create a complex layout DataPilot table, and use rational quantify create a report for its performance.

Quantify report.jpg

From above table, we can see the top three "F time (% of Focus)" functions:

  • new
  • OutputDevice::DrawLine
  • SfxItemSet::==

Then we scan and debug the code, get three root causes for the performance issue:

Allocate a lot of abundant data

For a simple datapilot table:

Simple dptable.jpg

Member A1 in L1 field will create an array for all members {B1,B2,B3}. But only B1 is visible and valid.

Allocate too much memories

Every member's data is stored in a big structure.

Set too many times of border styles for output area

Some borders are set twice or more.

Solution Description

Data Source buffer

A document stored a source buffer array. Every table have a buffer id. The datapilot table can use the same id if they have same data source.

In the buffer, the members of a field can be identified by an id( the sorted index ).

Then in the output table's algorithm the ScDPItemData structure is replaced by an id.

Only allocate visible member

Enhance the algorithm of setting border style

Personal tools