HSORT Description & Purpose
HSORT is a program that works with records by performing various kinds of operations on them such as: filtering, changing, summing, ordering, skipping records, choosing a subset of the records. Its main purpose is to provide an efficient way of manipulating information to enable customers to extract business-relevant data.
An elastic_cobol license is needed in order to use the HSORT product.
In order to execute "EXEC SORT" statements with HSORT instead of the default linux sort, the following property needs to be added to ebp.properties:
HSORT requires DCB entries to get needed I/O information. There is support for both File DCB and Table DCB.
HSORT Error Handling
- Syntax error check:
- All statements that are used should be allowed.
- Check for correctness of SORT FIELD, INCLUDE/OMIT, INREC/OUTREC, SUM statements.
- Record length and record type should be specified.
- Record validity check for numeric fields in PD and ZD formats.
- When SUM statement is used there is a check for summation overflow.
HSORT input and output
The input/output for HSORT could be one of the following:
- physical file on disk
- file in VSAM format (a DB table)
- file in Record-descriptor-word (RDW) format
- input from SYSIPT
- output to PUNCH
based on what is specified in the JCL and the DCB. Both resources are expected to match and JCL will override the information from DCB in case of a mismatch in the record's length/format.
We differentiate between the file formats based on the PROTOTYPE property that is set in the DCB:
"VDB:" -> file from database (DB table), "SYNC:" -> physical file in RDW format
At the moment, HSORT has support for the following JCL statements:
- SORT FIELDS=(<fields>)
- INREC FIELDS=(<fields>)
- OUTREC FIELDS=(<fields>)
OUTREC FIELDS=(3,3,1C'A',10,4, 4X, 6,4,6,4,X'40404040')
- SUM FIELDS=(<fields>)
- INCLUDE COND=(<conditions>)
- OMIT COND=(<conditions>)
- OUTFIL FILES=<number>, <statements>
OUTFIL FILES=1, INCLUDE=(106,15,CH,EQ,C'OSLO')
- OPTION SKIPREC=<number>, NRECS=<number
HSORT print statistics for the following:
- Number of records read and bytes read
- Number Records written and bytes written
- Input and output record length
Logging can be set on through the following constant in ebp.properties:
hsortlog=(true | false)
As of today, HSORT has support for 2 EBCDIC codepages - 037 and 277.
The collation sequences used, match exactly the collation sequences from MSSQL:
In order to specify the codepage, you should add ebp property:
<codepage-number> should be either 037 or 277.
If something else is provided HSORT will default to codepage 037 (SQL_EBCDIC037_CP1_CS_AS).
Note that there is a slight difference between Codepage037 and SQL_EBCDIC037_CP1_CS_AS.
This is true for codepage 277 as well.
There are five parameters defined in ebp.properties which can be used to fine tune HSORT performance.
- ebp.hsort.bufferSize parameter is used for setting the buffer size for the sort operation. If this parameter is not set, 8192 is used as the default value.
- ebp.hsort.maxItemsPerFile parameter is used for setting the maximum number of records in each temporary file to be sorted. If this parameter is not set, 100000 is used as the default value.
- ebp.hsort.initialSortInParallel parameter is used for enabling parallel sorting in files. Increasing the value of the parameter ebp.hsort.maxItemsPerFile might increase the effectiveness of this feature. If this parameter is not set to yes or true, the feature is disabled as default.
- ebp.hsort.maxFilesPerMerge parameter is used for setting the maximum number of temporary files to be created for splitting the records to be sorted. If this parameter is not set, 100 is used as the default value.
- ebp.hsort.tempdir parameter is used for setting the directory which the temporary files are created. If this parameter is not set, default temporary folder of the underlying operating system is used.
- ebp.hsort.defaultMaxMemory parameter is used for setting the default amount of memory used by HSORT for all jobs. This value is overridden if the STEPMEM step parameter is set in JCS type jobs. K, M or G can be used as unit symbols (1024K, 1024M, 1G, etc.).
Links to useful resources about commands usage, syntax, etc.: