Difference between revisions of "Calc/Proposal DataPilot byIBM"

From Apache OpenOffice Wiki
Jump to: navigation, search
Line 11: Line 11:
 
== Background==
 
== Background==
 
DataPilot is a critical function to Spreadsheet users.  
 
DataPilot is a critical function to Spreadsheet users.  
In IBM Lotus Symphony 1.2 and 1.3 release, our Spreadsheet team developed several new features for DataPilot base on OpenOffice 1.1 code base and merged DataPilot related code in OpenOffice 2.4.
+
In 1.2 and 1.3 release, IBM Lotus Symphony Spreadsheet team developed several new features for DataPilot base on OpenOffice 1.1 code base and merged DataPilot related code in OpenOffice 2.4.
  
 
During the development, test team found that there is serious performance problem when user create or update a DataPilot table.
 
During the development, test team found that there is serious performance problem when user create or update a DataPilot table.
 +
 +
Then develop team enhanced the performance of the algorithm that created and output a DataPilot table, and will continue working on it in 2.0 version which use OpenOffice 3.1 code base.
 +
 
== Problem Description ==
 
== Problem Description ==
  
Line 19: Line 22:
  
 
Test team tested several operations to a sample DataPilot table which have 5000 rows data source.  
 
Test team tested several operations to a sample DataPilot table which have 5000 rows data source.  
 
Below is the test result to OpenOffice 3.0.0:
 
  
 
Test environment: Hardware: IBM T30  CPU: 2.4 GHz  Memory:1.0 GB Operation System: Window XP SP2
 
Test environment: Hardware: IBM T30  CPU: 2.4 GHz  Memory:1.0 GB Operation System: Window XP SP2
 +
 +
Below table is the test result to OpenOffice 3.0.0:
 +
 
{| border="2" cellpadding="4" cellspacing="0" style="margin: 1em 1em 1em 0;  border: 1px #cccccc solid; border-collapse: collapse; width: 50%"
 
{| border="2" cellpadding="4" cellspacing="0" style="margin: 1em 1em 1em 0;  border: 1px #cccccc solid; border-collapse: collapse; width: 50%"
 
|-
 
|-
| width="150" bgcolor="#dddddd" | '''Reference Document'''|| width="150" bgcolor="dddddd"|
+
| width="150" bgcolor="#dddddd" | '''Test Scenario'''|| width="150" bgcolor="#dddddd"|'''Open Office 3.0.0'''
 
|-
 
|-
|  ||available
+
| Page: 1 Column: 2  Row: 1  Data: 1
 +
Action: - Add Product to Row
 +
|| 3.15s
 
|-
 
|-
| || n/a
+
| Page: 1  Column: 2  Row: 1  Data: 1
 +
Action:- Product ID to Data
 +
|| 3.06s
 +
|-
 +
| Page: 1  Column: 3  Row: 3  Data: 1
 +
Action:- Add Product to Row
 +
|| 25.28s
 +
|-
 +
| Page: 1  Column: 3  Row: 3  Data: 1
 +
Action:- Add Product ID to Data
 +
|| 44.21s
 +
|-
 +
| Page: 1  Column: 2  Row: 2  Data: 3
 +
Action:- Add SalesRep to Data
 +
|| 6.28s
 +
|-
 +
| Page: 1  Column: 2  Row: 1  Data: 1
 +
Action:-Change the function of Revenue from Sum to Max
 +
|| 6.03s
 
|-
 
|-
 
|}
 
|}
 
'''Crash'''
 
'''Crash'''
  
Insert two field into row area ( Each field have about 1000 members ),it causes freezing and crash.
+
Insert two field into row area ( Each field have about 1000 members ),causes freezing and crash.
  
  
 
== Analyzing result ==
 
== Analyzing result ==
'''Allocate a lot of abundant data'''
+
'''Allocate a lot of abundant data'''
For a simple datapilot table:
+
 
[[Image:simple dptable.jpg]]
+
For a simple datapilot table:
  Member A1 in L1 field will create a array for all members {B1,B2,B3}. But only B1 is visible and valid.
+
 
'''Allocate too much memories'''
+
[[Image:simple dptable.jpg]]
  Every member's data is stored in a big structure.
+
 
'''Set too many times of border styles for output area'''  
+
Member A1 in L1 field will create a array for all members {B1,B2,B3}. But only B1 is visible and valid.
  Some borders are set twice or more.
+
 
 +
'''Allocate too much memories'''
 +
 
 +
Every member's data is stored in a big structure.
 +
 
 +
'''Set too many times of border styles for output area'''  
 +
 
 +
Some borders are set twice or more.
  
 
== Solution ==
 
== Solution ==
  
'''Data Source buffer'''
+
'''Data Source buffer'''
    A document stored a source buffer array. Every table have a buffer id. The datapilot table can use the same id if they have same data source.
+
 
    In the buffer, the members of a field can be identified by an id( the sorted index ).
+
A document stored a source buffer array. Every table have a buffer id. The datapilot table can use the same id if they have same data source.
    Then in the output table's algorithm the ScDPItemData structure is replaced by an id.
+
 
'''Only allocate visible member'''
+
In the buffer, the members of a field can be identified by an id( the sorted index ).
'''Enhance the algorithm of setting border style'''
+
 
 +
Then in the output table's algorithm the ScDPItemData structure is replaced by an id.
 +
'''Only allocate visible member'''
 +
'''Enhance the algorithm of setting border style'''

Revision as of 10:27, 23 June 2009

Specification Status
Author Wang Xu Ming
Last Change See wiki history

Background

DataPilot is a critical function to Spreadsheet users. In 1.2 and 1.3 release, IBM Lotus Symphony Spreadsheet team developed several new features for DataPilot base on OpenOffice 1.1 code base and merged DataPilot related code in OpenOffice 2.4.

During the development, test team found that there is serious performance problem when user create or update a DataPilot table.

Then develop team enhanced the performance of the algorithm that created and output a DataPilot table, and will continue working on it in 2.0 version which use OpenOffice 3.1 code base.

Problem Description

Low performance when update a datapilot table

Test team tested several operations to a sample DataPilot table which have 5000 rows data source.

Test environment: Hardware: IBM T30 CPU: 2.4 GHz Memory:1.0 GB Operation System: Window XP SP2

Below table is the test result to OpenOffice 3.0.0:

Test Scenario Open Office 3.0.0
Page: 1 Column: 2 Row: 1 Data: 1

Action: - Add Product to Row

3.15s
Page: 1 Column: 2 Row: 1 Data: 1

Action:- Product ID to Data

3.06s
Page: 1 Column: 3 Row: 3 Data: 1

Action:- Add Product to Row

25.28s
Page: 1 Column: 3 Row: 3 Data: 1

Action:- Add Product ID to Data

44.21s
Page: 1 Column: 2 Row: 2 Data: 3

Action:- Add SalesRep to Data

6.28s
Page: 1 Column: 2 Row: 1 Data: 1

Action:-Change the function of Revenue from Sum to Max

6.03s

Crash

Insert two field into row area ( Each field have about 1000 members ),causes freezing and crash.


Analyzing result

Allocate a lot of abundant data

For a simple datapilot table:

Simple dptable.jpg

Member A1 in L1 field will create a array for all members {B1,B2,B3}. But only B1 is visible and valid.

Allocate too much memories

Every member's data is stored in a big structure.

Set too many times of border styles for output area

Some borders are set twice or more.

Solution

Data Source buffer

A document stored a source buffer array. Every table have a buffer id. The datapilot table can use the same id if they have same data source.

In the buffer, the members of a field can be identified by an id( the sorted index ).

Then in the output table's algorithm the ScDPItemData structure is replaced by an id. Only allocate visible member Enhance the algorithm of setting border style

Personal tools