|
|
(10 intermediate revisions by 2 users not shown) |
Line 1: |
Line 1: |
− | == Calc Optimization Opportunities ==
| + | Miscellaneous performance optimization opportunities that don't have an own entry under [[Calc/To-Dos/Performance]]/... yet. |
| | | |
− | There are a lot of opportunities in calc for various reasons. This list needs extending:
| + | == In-sheet objects == |
− | | + | |
− | === Cell size ===
| + | |
− | | + | |
− | Basic problem: the most basic cell consumes about 50bytes all told, more complex cells consume far more memory, there are a number of simple & obvious things to re-factor here.
| + | |
− | | + | |
− | It's trivial to calculate the average cost - simply create a sheet with several thousand cells in it, and measure the heap allocation change on load - with [[memprof]] or some other tool, then divide by the number of cells. Similarly, wins are easy to measure this way.
| + | |
− | | + | |
− | === ScBaseCell ===
| + | |
− | | + | |
− | sc/inc/cell.hxx; code in sc/source/core/data/cell.cxx & cell2.cxx
| + | |
− | | + | |
− | <code>[cpp,N]
| + | |
− | class ScBaseCell
| + | |
− | {
| + | |
− | protected:
| + | |
− | ScPostIt* pNote;
| + | |
− | SvtBroadcaster* pBroadcaster;
| + | |
− | USHORT nTextWidth;
| + | |
− | BYTE eCellType; // enum CellType - BYTE spart Speicher
| + | |
− | BYTE nScriptType;
| + | |
− | </code>
| + | |
− | | + | |
− | Every cell carries this overhead; note that a chunk of it is not necessary for many cells:
| + | |
− | | + | |
− | * ScPostIt pointer - very, very infrequently used - we have almost no post-it note per cell.
| + | |
− | * SvtBroadcaster - used by cells that are referenced (by a single cell (ie. non-range) reference) from another cell - again, a sub-set of all cells.
| + | |
− | | + | |
− | Solutions: a little re-factoring required, but stealing a bit-field from eCellType to denote a 'special' cell:
| + | |
− | | + | |
− | <code>[cpp,N]
| + | |
− | class ScBaseCell
| + | |
− | {
| + | |
− | protected:
| + | |
− | USHORT nTextWidth;
| + | |
− | BYTE eCellType : 7; // enum CellType - BYTE spart Speicher
| + | |
− | bool bSpecial : 1; // other information to be looked up elsewhere
| + | |
− | BYTE nScriptType;
| + | |
− | </code>
| + | |
− | | + | |
− | The 'bSpecial' flag could be used to denote that there is a 'note' for this cell (in a separate hash), or that this cell has a single-cell dependant. So - we can save 2/3rds of the base size with fairly little effort.
| + | |
− | | + | |
− | === ScStringCell ===
| + | |
− | | + | |
− | In Excel similar strings across the sheet are shared. Our string class includes a sharable, immutable reference counted string - but it's not clear that we use this as intelligently as we should - pwrt. XML import / export. Possibly we should be trying to detect common strings there with a small back-hash.
| + | |
− | | + | |
− | All strings should be shared at the WorkBook level - with a single hash, doing this at the workbook level has nice properties wrt. interop with Excel shared string tables.
| + | |
− | | + | |
− | Mikeleib has made a quick hack for this. Try it out: [http://www.openoffice.org/issues/show_bug.cgi?id=63500 IZ#63500]
| + | |
− | | + | |
− | === ScFormulaCell ===
| + | |
− | | + | |
− | There are a number of problems here:
| + | |
− | | + | |
− | * redundant data: apparently we store redundant data for values. Certainly we store the 'rendered' or computed value of the cell regardless of whether it is likely to be rendered. [I don't get that, what are you referring to? --[[User:ErAck|erAck]] 02:06, 19 May 2006 (CEST)]
| + | |
− | | + | |
− | * ScFormulaCell inherits from svt/inc/listener.h - which has a virtual destructor, hence we have a vtable pointer per instance too (most likely unnecessary), as well as the listener list.
| + | |
− | | + | |
− | * Document pointer - as in the ScEditCell structure we lug around a document pointer we should 'know' as implicit context.
| + | |
− | | + | |
− | * Shared formulae - Excel will 'share' formulae - ie. very little state is duplicated if you fill a column 'D' with =((A1+B1)/C1)* SQRT(A1) or whatever. Calc by contrast will duplicate this formulae innumerable times. We need to extract immutable, position independant formula objects, reference count & share these; plus of course, elide duplicates on import. This would give a massive memory saving for large sheets - it's very common to share formulae.
| + | |
− | | + | |
− | * Splitting Matrix pieces - perhaps possible to split: nMatCols, nMatRows (6bytes), pMatrix (4-8bytes), and perhaps cMatrixFlag (1byte) into a derived 'ScMatrixFormula' sub-class ? specific to matrix formulae: a sub-set of formulae.
| + | |
− | | + | |
− | === In-sheet objects ===
| + | |
| | | |
| With a relatively modest number of in-sheet objects (which are favorite tools of complex spreadsheet creators) things become horribly slow: 30secs to load a small file with ~no data / macros & only 240 list boxes sample [http://www.openoffice.org/issues/show_bug.cgi?id=41164 document]. | | With a relatively modest number of in-sheet objects (which are favorite tools of complex spreadsheet creators) things become horribly slow: 30secs to load a small file with ~no data / macros & only 240 list boxes sample [http://www.openoffice.org/issues/show_bug.cgi?id=41164 document]. |
Line 71: |
Line 7: |
| The sheet objects need idly creating in the svx layer; also there is a floating patch to improve VCL's control management performance - wherein some of the problems lie. | | The sheet objects need idly creating in the svx layer; also there is a floating patch to improve VCL's control management performance - wherein some of the problems lie. |
| | | |
− | === Large / complex pivot sheets ===
| + | == Large / complex pivot sheets == |
| | | |
| The existing Data Pilot implementation doesn't have a shared normalized form of the data. (ie. with each field reduced to an ordinal, for O(1) lookup). We should implement just such a Data Pilot cache using a representation compatible with the PivotTable cache, and populatable from that on import. | | The existing Data Pilot implementation doesn't have a shared normalized form of the data. (ie. with each field reduced to an ordinal, for O(1) lookup). We should implement just such a Data Pilot cache using a representation compatible with the PivotTable cache, and populatable from that on import. |
| | | |
− | === threaded calculation ===
| + | == threaded calculation == |
| | | |
− | Ideally to scale to hyper-threaded machines we need to crunch a workbook's dependency graph & then thread the calcuation. | + | Ideally to scale to hyper-threaded machines we need to crunch a workbook's dependency graph and then thread the calculation. |
| | | |
− | == Chart Optimization ==
| + | Similarly the process of constructing a Data Pilot cache, and (subsequently) collating that data is one that is susceptible to threading. |
| | | |
− | The existing chart component performs extremely poorly. However - it's likely that optimizing chart2 is a better bet.
| |
− | Having said that, there seem to often be a large number of re-calculation / re-renderings of charts on load that are perhaps realated to a calc/chart2 mis-interaction [jody- substantiate?]
| |
| | | |
− | In addition the chart likes to do (expensive) label / string size calculation for (potentially) tens of thousands of labels it will never use for scatter plots, multiple times each.
| |
| | | |
− | [[Category:Calc issues]] | + | [[Category:Calc|Performance/{{SUBPAGENAME}}]] |
| + | [[Category:To-Do]] |
| + | [[Category:Performance]] |
Miscellaneous performance optimization opportunities that don't have an own entry under Calc/To-Dos/Performance/... yet.
With a relatively modest number of in-sheet objects (which are favorite tools of complex spreadsheet creators) things become horribly slow: 30secs to load a small file with ~no data / macros & only 240 list boxes sample document.
The sheet objects need idly creating in the svx layer; also there is a floating patch to improve VCL's control management performance - wherein some of the problems lie.
The existing Data Pilot implementation doesn't have a shared normalized form of the data. (ie. with each field reduced to an ordinal, for O(1) lookup). We should implement just such a Data Pilot cache using a representation compatible with the PivotTable cache, and populatable from that on import.
Ideally to scale to hyper-threaded machines we need to crunch a workbook's dependency graph and then thread the calculation.
Similarly the process of constructing a Data Pilot cache, and (subsequently) collating that data is one that is susceptible to threading.