Enabling I/O buffering can yield huge performance gains as seen in the examples that follow. I/O buffering is disabled by default. As of Build v15.6.4, it can be enabled by setting the run time parameter "IO-BUFFER-SIZE". The format is "-DIO-BUFFER-SIZE=<size of buffer in bytes>".
Examples of setting buffer sizes:
-DIO-BUFFER-SIZE=4096 --- creates buffers with 4096 bytes
-DIO-BUFFER-SIZE=0 --- allows EC runtime to choose default buffer size based on the file type
Normally to disable buffering you will just omit the parameter altogether. The following will also disable it:
-DIO-BUFFER-SIZE=non-numeric | negative number --- disables I/O buffering
Deciding what size buffer to use is a key element. The temptation is that bigger is better. In many situations, using the largest buffer possible will produce the best result. However, there are many cases where moderate to smaller buffer sizes are better. For example, a short running process using small files will likely run faster with smaller buffers because it will avoid the initial startup cost of creating large buffers. Most often there is a “sweet spot” that can only be found through experimentation and knowledge of how the application is doing I/O.
Below are some examples to contrast buffered and non-buffered I/O. The first example is a simple batch process that read in a record sequential file containing 1 million records with fixed record lengths of 200 bytes and writes out the same format file to create a copy. The absence of the run time parameter “IO-BUFFER-SIZE” indicates that no I/O buffering is effective. This is the default. Although omitting the parameter is the most obvious, and preferred, way to ensure no I/O buffering is in effect, setting the “IO-BUFFER-SIZE” parameter to non-numeric or a value less than zero will also turn I/O buffering off.
The elapsed time for the non-buffered run is 23 seconds.
The second example is the same batch process that read in a record sequential file containing 1 million records with fixed record lengths of 200 bytes and writes out the same format file to create a copy. This time we've enable I/O buffering by setting “-DIO-BUFFER-SIZE=10000000”. The format is “-DIO-BUFFER-SIZE=<size of buffer in bytes>”. For this test, we created a buffer of 1 million bytes.
The elapsed time for the buffered run is 8 seconds.
The third example is another batch process that reads in a record sequential file containing 5 million records with fixed record lengths of 200 bytes and writes out the same format file to create a copy. The absence of the runtime parameter “IO-BUFFER-SIZE” indicates that no I/O buffering is effective.
The elapsed time for the non-buffered run is 1 minute and 19 seconds.
The fourth example is the same batch process that read in a record sequential file containing 5 million records with fixed record lengths of 200 bytes and writes out the same format file to create a copy. This time we've enable I/O buffering by setting “-DIO-BUFFER-SIZE=10000000”.
The elapsed time for the buffered run is 30 seconds.
Generally, any buffering will be better than no buffering. However, there are situations where the nature of the I/O is such that buffering will not help and could actually hurt. The most obvious is pure random access against a large file. In this case, the overhead of shifting the buffer start and end positions outweighs any gain. Because I/O buffering is enabled as a run time parameter, it's easy to turn it on to see if your application will benefit and remove the parameter if it will not. The key is knowing what kind of I/O your application is doing and experimenting to achieve optimal performance.
0 Comments