Sorting Values within Spreadsheet Functions

From Apache OpenOffice Wiki
Jump to: navigation, search


i89976 has a document attached:

test-huge_calculations-Median-detailed.ods

NOTE: Some assumptions made by the submitter as documented in the test case are plain wrong.

Findings when testing with filling C4:C3003

  • 52% overall in interpr3.cxx lcl_QuickSort() and below, of which
    • 32% in vector<double>::operator[] and below,
      • 25% originating from the loops
       while (ni <= nHi && rSortArray[ni]  < rSortArray[nLo]) ni++;
       while (nj >= nLo && rSortArray[nLo] < rSortArray[nj])  nj--;

where rSortArray[nLo] should be a temporary variable instead.

Or all that be realized using simple double[].

  • 21% overall in ScValueIterator::GetThis() and below.


Solution is in CWS DEV300 calcperf03  , besides the loops above being changed to use a temporary value, which benefits other functions, the use of a fully sorted array for functions MEDIAN, PERCENTILE, QUARTILE, LARGE and SMALL was eliminated by using ::std::nth_element() instead. Filling entire column C as described in the document now needs ~12 minutes instead of an hour or so. Excel needed ~15 minutes, but that can't be exactly compared because it was on different machines.

Personal tools