This is part of the Heirloom Computing EBP Elastic Batch Platform forum on paas.heirloomcomputing.com.
This article discusses setting up EBP in a clustered environment on multiple virtual machine instances such that they work together to share common data and spool directories that appropriately manage shared and exclusive access to datasets, unique job numbering and shared job names. The effect is similar to an IBM JESPLEX on multiple LPARs across an IBM System z mainframe environment. When coupled with the Elastic Scheduling Platform (ESP), jobs can be schedule through a single node which can be made to start/stop EBP nodes when and if demand requires and makes jobs available to those nodes for execution.
There are three primary configuration options for setting up an EBP-Plex, as described in EBP Configuration:
- ebpplexnode - the Web service location of this node as seen by other nodes
- ebpplexpeer - the Web service location of one other node in the EBP-Plex
- ebpplexauth - the basic HTTP authentication string (before encryption) used between secure servers
- ebpplexstatus (read-only) - indicates the status of this node in the plex
If these configurations are not set up when EBP starts the node will operate independently from others, even if data or spool directories from multiple nodes indicate the same location. However, in a typical EBP-Plex configuration there are a number of other configuration entries that reference the same common spool or data directories such that a job submitted to any one node will perform exactly the same. This is normally accomplished by establishing shared directories among all EBP nodes such as through NFS, Hadoop File System, or other techniques that keep data in sync. The synchronization need not use lock semantics as the EBP nodes communicate among themselves to determine if resources are available. These configurations are,
- classlib, datalib, jcllib, systemlib - indicating the location of shared Java class libraries (CLASSPATH), shared datasets, JCL cataloged procedures or executable programs.
- jobspool, outputspool, tempspool - indicating the location of submitted jobs and output datasets as well as temporary execution directories. These need only be assigned to shared locations only if 3rd party products or scripts will be examining their input. ESP, in particular, will coalesce all job output after jobs terminate such that instances may be terminated making their private spools inaccessible.
- newdsndirectory, newdsnlowercase, etc. - other configurations that affect the way that datasets are created, parameters are processed, symbolics within intermediate data are handled should be common among all of the
In addition to configurations usually job classes, JEC execution groups, and perhaps the number of initiators is common among multiple EBP nodes but that needn't be the case. An EBP node that lacks a job class or a started initiator to run a job of a certain class will not request those classes of a coordinating ESP.
Virtual Machine Considerations
EBP can be started on small or large virtual machines. For those with a smaller footprint fewer job initiators should be started limiting the job concurrency. Generally a number of twice the number of started initiators to the number of cores is a good ratio. If large virtual machines are available an EBP-Plex may not be necessary - simply running more initiators on a single node may be sufficient to keep up with overnight batch workloads.
However, for a properly scalable and highly-available EBP environment, multiple EBP nodes combined into a EBP-Plex is recommended. In this way nodes can be configured to start and "join" a cluster (preconfigured with the apporpriate options) such that they begin processing work almost immediately. Quiesceing those nodes and waiting for running jobs to finish is all that is necessary to prepare an EBP virtual machine to be suspended.
When batch jobs are running job steps that reference either built-in or customer-driven programs that connect to a database those databases should also be scalable. Some databases scale-up, requiring that they be started on nodes that increase in size (CPU, memory) as load demands increase. Other databases are specifically designed to scale-out across a cloud (Oracle RAC, SpliceMachine, Gemstar, NuoDB) such that EBPs can be started and stopped in an EBP-Plex and the database can similarly scale if necessary.
In the figures below two nodes have been defined. One on localhost, port 8081 and the other localhost, port 8082. Both have been configured for logging level Debug allowing EBP-Plex communication messages to be echoed to the console. The first also had its ebpplexnode configured as http://localhost:8081/ebp and ebpplexpeer as http://localhost:8082/ebp. The second node learned of its existence in an EBP-Plex from the communication from the first, so it didn't need any additional manual configuration.
Fig. 1. EBP-Plex node 1: http://localhost:8081/ebp
Fig. 2. EBP-Plex node 2: http://locahost:8082/ebp
The same job was submitted to both after job class A was defined and an initiator started.
//TEST01 JOB (HCIACCT),'EBP Hello 8082',CLASS=A,MSGCLASS=A,
//STEPID01 EXEC PGM=IEFBR14,PARM='SLEEP=20 SYSOUT=HI+FROM+8082'
//SYSOUT DD SYSOUT=*
The job didn't share a resource (such as an exclusive dataset) but it did have a common job name for both, TEST01. IBM JESPLEX compatibility demands that only one job of a particular name is executing only once in the JESPLEX. In this EBP-Plex figures 1 and 2 we see that that the first node sent a LOCK directive to the second node causing it to queue TEST01 instead of executing it immediately after submission. Only when the second node received an UNLK directive did it begin execution on the job with the same name.