Calc/Performance/Specific Bottlenecks
Specific bottlenecks to be worked on, identified using tools such as
valgrind --tool=callgrind
.
Contents
The Zaske case
Comparison with Excel 2003/2007 that need 1.2s where Calc needs 24s after changing a cell's value.
References:
- Zaske's blog entry
- The test case file (.zip)
- Video on YouTube (same as on the blog)
Findings: lots of formulas directly or indirectly referring the input cell, with many listening to identical ranges.
Fix: Introduce the now existing bulk broadcaster that was already used for mass changes also for single cell changes to prevent repetitive broadcasts of identical ranges.
Fixed as i95967 in CWS calcperf03. Now Calc does it in 1.2s too..
The Ou case
Loading a large plain data file takes very long.
References:
Findings:
- source/filter/xml/xmlsubti.cxx
- 38% of time spent in ScMyTables::NewColumn() because of replicated use of aTableVec[nTableCount - 1] (vector::operator[])
Note: percentage may be off due to compilation without optimization to obtain exact line numbers that may result in STLport's vector methods being differently compiled.- proposed fix: should obtain the pointer once instead.
- Similar for other places where aTableVec[xxx] is used.
- 38% of time spent in ScMyTables::NewColumn() because of replicated use of aTableVec[nTableCount - 1] (vector::operator[])
- TODO: Check all ScMyTables::.*() and ScMyTableData::.*()
- Especially for 63342857 calls to AddColumn() and NewColumn() that result in 1168654944 calls to operator[] ...
- 63081776 calls to AddColumn() originate from ScXMLTableRowCellContext::EndElement()
- Those are highly suspicious and seem to indicate that too many temporary elements are created for empty columns/cells (needs verification).
Sorting values within functions
i89976 has a document attached:
test-huge_calculations-Median-detailed.ods
NOTE: The assumptions made by the submitter as documented in the test case are plain wrong.
Findings when testing with filling C4:C3003
- 52% overall in interpr3.cxx lcl_QuickSort() and below, of which
- 32% in vector<double>::operator[] and below,
- 25% originating from the loops
- 32% in vector<double>::operator[] and below,
while (ni <= nHi && rSortArray[ni] < rSortArray[nLo]) ni++; while (nj >= nLo && rSortArray[nLo] < rSortArray[nj]) nj--;
where rSortArray[nLo] should be a temporary variable instead.
Or all that be realized using simple double[].
- 21% overall in ScValueIterator::GetThis() and below.
Querying data within functions
An internal customer's document (sorry, can't publish) doing lookup queries that don't fit into the current caching strategy.
Findings:
- 8% in 51613353 calls to com::sun::star::i18n::casefolding::getNextChar() via
- 39696595 calls to utl::TransliterationWrapper::isEqual() via
- ScTable::ValidQuery() via
- 8888 calls to ScQueryCellIterator::GetThis() via
- lcl_LookupQuery()
- 8888 calls to ScQueryCellIterator::GetThis() via
- ScTable::ValidQuery() via
- 39696595 calls to utl::TransliterationWrapper::isEqual() via
- 5% in ScTableValidQuery() most in String() and ~String() of aCellStr
- 200873636 calls to com::sun::star::i18n::casefolding::getNextChar() via
- 33173401 calls to com::sun::star::i18n::Transliteration_caseignore::compare()
- 5% in com::sun::star::i18n::oneToOneMappingWithFlag::find()
- Replicated mpIndex[high] access, might be better using temporary pointer.
- 5% in com::sun::star::i18n::casefolding::getValue()
- 58% overall in ScTable::ValidQuery() and below
- TODO: Cache results of ValidQuery()? Similar to ScLookupCache?
- 11% overall in 27341713 calls to ScBroadcastAreaSlot::StartListeningArea() and below, of which 10% are in ::std::set::insert() and below.
- TODO: refactor implementation of broadcast slots.