Performance/Buffered File IO

From Apache OpenOffice Wiki
Jump to: navigation, search


Buffered File I/O


  • Reducing the number of write() and/or read() system calls.
  • Eliminating the complexities of implementations.


  • Utilize higher level input and output functions in the C standard libraries that come with modern operating systems.
  • Replace's somewhat legacy, own implementations with the C standard libraries.



With modern operating systems such as Windows, Linux, and Solaris (probably including Mac OS), accessing to a local disk device has been tuned well to a situation where a number of read() and/or write() system calls are invoked within a short period. It is called deferred write-back caching or lazy write.

In comparison to accessing to a local disk device, accessing to a remote file server via a network protocol such as Network File System (NFS) or Common Internet File System (CIFS) or accessing to a removal device such as a USB memory can not be tuned very much by nature.

An article "Windows XP and Surprise Removal of Hardware" by Microsoft says "for consumer-oriented removable storage (USB, Flash, Zip, and so on), write caching is disabled by default." That implies that calling a write() system call might cause a physical access to a device or maybe emits a remote procedure call to send data to a file server.

Input and Output Capability in the C Standard Libraries

C standard library provides programmers with several fundamental C functions including input and output capabilities.

The libraries normally come with operating system since it relies on the libraries. Programmers can utilize them without any costs.

Input and output capabilities are defined in a header file 'stdio.h'. It provides buffered file I/O functions such as fopen(), fread(), fwrite(), fprintf(), ...

The reason why buffered file I/O is demanded is well described in "The C Programming Language" by Brian W. Kernighan and Dennis M. Ritchie. In short, the buffered file I/O is implemented to decrease the number of read() and/or write() system calls by buffering data in a user's layer.

In general, for writing data to a file, if a buffer has enough room, fwrite() simply copies outgoing data to a local buffer and then immediately return to its caller. It does not call a write() system call. If the rest of the buffer is running out, fwrite() calls a write() system call to flush the buffer to make a room.

For reading data from a file, if a buffer has enough amount of data, fread() simply copies requested data from the local buffer to user's area. If a buffer is empty, fread() calls a read() system call to fill the buffer with data and then copies data from the buffer.

By means of the way, the number of calling a write() and/or read() system call can be decreased.


With current implementation of, there are some rooms to improve the performance around sequential file I/O.

Writing data to a file

vcl/source/gdi/pdfwriter_impl.cxx has PDFWriterImpl::writeBuffer() for internal use.

The writeBuffer() is frequently called from here and there in the same file. One of them is writeBuffer( "Q\n", 2 ).

The writeBuffer() simply calls osl_writeFile() in the System Abstraction Layer (SAL).

The osl_writeFile() implemented in sal/osl/unx/file.cxx for UNIX and sal/osl/w32/file.cxx for Window directly calls write() and WriteFile() system call, respectively.

Frequently calling write() or WriteFile() system call with such a small sized data is too expensive and consequently lower the performance in file I/O.

Reading data from a file

Some acts of inefficient file access can be observed by tracing system calls with 'strace' command on Linux.


strace -o truss.log -e trace=all /opt/openoffice.org3/program/soffice.bin
grep -v ENOENT truss.log | less


open("/opt/openoffice.org3/program/bootstraprc", O_RDONLY|O_LARGEFILE) = 7
read(7, "[Bootstrap]\nBaseInstallation=${O"..., 79) = 79
_llseek(7, -67, [12], SEEK_CUR)         = 0
read(7, "BaseInstallation=${OOO_BASE_DIR}"..., 79) = 79
_llseek(7, -46, [45], SEEK_CUR)         = 0
read(7, "InstallMode=<installmode>\nProduc"..., 79) = 79
_llseek(7, -53, [71], SEEK_CUR)         = 0
read(7, " 3.0\nUs"..., 79) = 79
_llseek(7, -49, [101], SEEK_CUR)        = 0
read(7, "UserInstallation=$SYSUSERCONFIG/"..., 79) = 79
_llseek(7, -29, [151], SEEK_CUR)        = 0
read(7, "[ErrorReport]\nErrorReportPort=80"..., 79) = 79
_llseek(7, -65, [165], SEEK_CUR)        = 0
read(7, "ErrorReportPort=80\nErrorReportSe"..., 79) = 68
_llseek(7, -49, [184], SEEK_CUR)        = 0
read(7, "ErrorReportServer=report.service"..., 79) = 49
_llseek(7, 0, [233], SEEK_CUR)          = 0
read(7, "", 79)                         = 0
close(7)                                = 0
  1. The first read() reads 79 bytes from the file:
  2. Retrieve 1 line "[Bootstrap]\n" from an internal buffer.
  3. Rewind backward by 67 bytes with _llseek(, -67, )
  4. The second read() reads 79 bytes again.
  5. And then repeatedly does the similar things.

Its corresponding source code seems osl_readLine() in sal/osl/all/readline.c

Contents of /opt/openoffice.org3/program/bootstraprc

It's size is 233 bytes.

InstallMode=<installmode> 3.0


If we utilized higher level I/O - fopen() and fgets() - instead of directly calling low level I/O - open() and read() -, the trace results would become much simpler like this:

open("/opt/openoffice.org3/program/bootstraprc", O_RDONLY|O_LARGEFILE) = 7
read(7, "[Bootstrap]\nBaseInstallation=${O"..., 4096) = 233
close(7)                                = 0


The same situations can be seen with the following files upon starting up



Operating System's Implementations's Implementations

Personal tools