Difference between revisions of "Calc/Proposal DataPilot byIBM"

From Apache OpenOffice Wiki
Jump to: navigation, search
(Analyzing result)
 
(10 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
= DataPilot Performance Enhancement I =
 +
 
  {| border="2" cellpadding="4" cellspacing="0" style="margin: 1em 1em 1em 0;  border: 1px #cccccc solid; border-collapse: collapse; width: 50%"
 
  {| border="2" cellpadding="4" cellspacing="0" style="margin: 1em 1em 1em 0;  border: 1px #cccccc solid; border-collapse: collapse; width: 50%"
  
Line 4: Line 6:
 
| colspan="2" bgcolor="#cccccc"  | '''Specification Status'''  
 
| colspan="2" bgcolor="#cccccc"  | '''Specification Status'''  
 
|-
 
|-
| width="150" | '''Author''' || width="150" |[[User:wangxum|Wang Xu Ming]]
+
| width="150" | '''Author''' || width="150" |[[User:wangxum|Wang Xu Ming]], [[User:helenyue|Helen Yue]]
 
|-
 
|-
 
| width="150" | '''Last Change''' || width ="150"|See wiki history
 
| width="150" | '''Last Change''' || width ="150"|See wiki history
Line 10: Line 12:
 
|}
 
|}
 
== Background==
 
== Background==
DataPilot is a critical function to Spreadsheet users.  
+
DataPilot is a critical function to Spreadsheet users. In Symphony 1.2 and 1.3 release, IBM Lotus Symphony Spreadsheet team developed the new UI and some new features for DataPilot based on OpenOffice 1.1 code base and also with some code merged from OpenOffice 2.4.
In 1.2 and 1.3 release, IBM Lotus Symphony Spreadsheet team developed several new features for DataPilot base on OpenOffice 1.1 code base and merged DataPilot related code in OpenOffice 2.4.
+
  
During the development, test team found that there is serious performance problem when user create or update a DataPilot table.
+
With our new UI, users can drag and drop to generate the DataPilot table based on the page/column/row/data fields, and change the structure easily. Furthermore, the function like DataPilot usually will process thousands of data records. Performance is one of our major focus.  
  
Then develop team enhanced the performance of the algorithm that created and output a DataPilot table, and will continue working on it in 2.0 version which use OpenOffice 3.1 code base.
+
During the development, test team found that there was serious performance problem when user create or update a DataPilot table. This specification will describe what performance issues we have found, and the enhancement we did on the algorithm that created and output a DataPilot table. Although the code base (2.4 and 3.1) is different, the solution should also apply to the new code base.
  
== Problem Description ==
+
We are currently merging the code to OpenOffice.org 3.1 code base and will continue the enhancement work.
 +
 
 +
== Test Result ==
  
 
'''Low performance when update a datapilot table'''
 
'''Low performance when update a datapilot table'''
  
Test team tested several operations to a sample DataPilot table which have 5000 rows data source.  
+
We defined 6 scenario, and test it to a sample DataPilot table which have 5000 rows as data source.  
  
 
Test environment: Hardware: IBM T30  CPU: 2.4 GHz  Memory:1.0 GB Operation System: Window XP SP2
 
Test environment: Hardware: IBM T30  CPU: 2.4 GHz  Memory:1.0 GB Operation System: Window XP SP2
  
Below table is the test result to OpenOffice 3.0.0:
+
Below table is the test result to OpenOffice 3.0.0. Scenario 3 and 4 showed big delays to generate the table, while another office product can complete all the scenarios within 3 seconds.
  
{| border="2" cellpadding="4" cellspacing="0" style="margin: 1em 1em 1em 0;  border: 1px #cccccc solid; border-collapse: collapse; width: 50%"
+
{| border="2" cellpadding="4" cellspacing="0" style="margin: 1em 1em 1em 0;  border: 1px #cccccc solid; border-collapse: collapse; width: 70%"
 
|-
 
|-
 
| width="150" bgcolor="#dddddd" | '''Test Scenario'''|| width="150" bgcolor="#dddddd"|'''Open Office 3.0.0'''
 
| width="150" bgcolor="#dddddd" | '''Test Scenario'''|| width="150" bgcolor="#dddddd"|'''Open Office 3.0.0'''
Line 60: Line 63:
 
Insert two field into row area ( Each field have about 1000 members ),causes freezing and crash.
 
Insert two field into row area ( Each field have about 1000 members ),causes freezing and crash.
  
 
+
== Code Analysis ==
== Analyzing result ==
+
 
First, create a complex layout DataPilot table, and use rational quantify create a report for its performance.
 
First, create a complex layout DataPilot table, and use rational quantify create a report for its performance.
  
 
[[Image:quantify report.jpg]]
 
[[Image:quantify report.jpg]]
  
From above table, we can see the top three "F time (% of Focus)" is three functions:
+
From above table, we can see the top three "F time (% of Focus)" functions:
  
 
* new  
 
* new  
Line 72: Line 74:
 
* SfxItemSet::==
 
* SfxItemSet::==
  
Then scan and debug the code, get three cause:
+
Then we scan and debug the code, get three root causes for the performance issue:
  
 
'''Allocate a lot of abundant data'''
 
'''Allocate a lot of abundant data'''
Line 90: Line 92:
 
Some borders are set twice or more.
 
Some borders are set twice or more.
  
== Solution ==
+
== Solution Description ==
  
 
'''Data Source buffer'''
 
'''Data Source buffer'''
Line 99: Line 101:
  
 
Then in the output table's algorithm the ScDPItemData structure is replaced by an id.
 
Then in the output table's algorithm the ScDPItemData structure is replaced by an id.
 +
 
'''Only allocate visible member'''
 
'''Only allocate visible member'''
 +
 
'''Enhance the algorithm of setting border style'''
 
'''Enhance the algorithm of setting border style'''
 +
 +
 +
[[Category:Calc|Proposal_DataPilot_byIBM]]

Latest revision as of 10:52, 24 June 2009

DataPilot Performance Enhancement I

Specification Status
Author Wang Xu Ming, Helen Yue
Last Change See wiki history

Background

DataPilot is a critical function to Spreadsheet users. In Symphony 1.2 and 1.3 release, IBM Lotus Symphony Spreadsheet team developed the new UI and some new features for DataPilot based on OpenOffice 1.1 code base and also with some code merged from OpenOffice 2.4.

With our new UI, users can drag and drop to generate the DataPilot table based on the page/column/row/data fields, and change the structure easily. Furthermore, the function like DataPilot usually will process thousands of data records. Performance is one of our major focus.

During the development, test team found that there was serious performance problem when user create or update a DataPilot table. This specification will describe what performance issues we have found, and the enhancement we did on the algorithm that created and output a DataPilot table. Although the code base (2.4 and 3.1) is different, the solution should also apply to the new code base.

We are currently merging the code to OpenOffice.org 3.1 code base and will continue the enhancement work.

Test Result

Low performance when update a datapilot table

We defined 6 scenario, and test it to a sample DataPilot table which have 5000 rows as data source.

Test environment: Hardware: IBM T30 CPU: 2.4 GHz Memory:1.0 GB Operation System: Window XP SP2

Below table is the test result to OpenOffice 3.0.0. Scenario 3 and 4 showed big delays to generate the table, while another office product can complete all the scenarios within 3 seconds.

Test Scenario Open Office 3.0.0
Page: 1 Column: 2 Row: 1 Data: 1

Action: - Add Product to Row

3.15s
Page: 1 Column: 2 Row: 1 Data: 1

Action:- Product ID to Data

3.06s
Page: 1 Column: 3 Row: 3 Data: 1

Action:- Add Product to Row

25.28s
Page: 1 Column: 3 Row: 3 Data: 1

Action:- Add Product ID to Data

44.21s
Page: 1 Column: 2 Row: 2 Data: 3

Action:- Add SalesRep to Data

6.28s
Page: 1 Column: 2 Row: 1 Data: 1

Action:-Change the function of Revenue from Sum to Max

6.03s

Crash

Insert two field into row area ( Each field have about 1000 members ),causes freezing and crash.

Code Analysis

First, create a complex layout DataPilot table, and use rational quantify create a report for its performance.

Quantify report.jpg

From above table, we can see the top three "F time (% of Focus)" functions:

  • new
  • OutputDevice::DrawLine
  • SfxItemSet::==

Then we scan and debug the code, get three root causes for the performance issue:

Allocate a lot of abundant data

For a simple datapilot table:

Simple dptable.jpg

Member A1 in L1 field will create an array for all members {B1,B2,B3}. But only B1 is visible and valid.

Allocate too much memories

Every member's data is stored in a big structure.

Set too many times of border styles for output area

Some borders are set twice or more.

Solution Description

Data Source buffer

A document stored a source buffer array. Every table have a buffer id. The datapilot table can use the same id if they have same data source.

In the buffer, the members of a field can be identified by an id( the sorted index ).

Then in the output table's algorithm the ScDPItemData structure is replaced by an id.

Only allocate visible member

Enhance the algorithm of setting border style

Personal tools