Import Performance of XLSX
From Apache OpenOffice Wiki
< Calc | Performance
OOX import issue 96758
A document in xlsx format found somewhere on the internet (Issue 96758 CLOSED FIXED).
Findings:
- Takes more than 20 minutes to load in a debug session
- about 95% of total load time in 1160 calls to ::oox::xls::WorksheetData::convertRowFormat() (more than 1 second per call)
- ~100% in ::oox::xls::StylesBuffer::writeCellXfToPropertySet()
- ~100% in ::oox::xls::Xf::writeToPropertySet()
- root cause: multiple XPropertySet accesses while writing font properties
- ~100% in ::oox::xls::Xf::writeToPropertySet()
- ~100% in ::oox::xls::StylesBuffer::writeCellXfToPropertySet()
2008-12-10: First step. Reduce run time of ::oox::xls::Xf::writeToPropertySet().
- Consolidated property set usage to one API call per XF (cell format object). Load time reduced from >20 minutes to 3:45 minutes. Woohoo.
- Still spends 68% of total load time (2:33 minutes) in 1160 calls to ::oox::xls::WorksheetData::convertRowFormat() (132 ms per call)
2008-12-12: Second step. Optimize overall property set usage.
- Changed interfaces of ::oox::PropertyMap and ::oox::PropertySet from property name (string) to property identifiers (integer). Identifiers will be generated on compile time from a text file with all used property names. A process singleton (created on demand) will contain a big vector of property name strings. Saves a few seconds of the total load time.
2008-12-18: Third step. Reduce call count of ::oox::xls::WorksheetData::convertRowFormat() by caching row formats and applying them at row ranges if formatting is equal across the rows.
- Turns out that the 1160 formatted rows could be merged into 13 ranges. This means that the 1160 API calls have been reduced to 13! Load time reduced from 3:40 minutes to 1:07 minutes. Needed time to format the 1160 rows reduced from 2:33 minutes to 2 seconds!