Introduction
The Heirloom Elastic Batch Platform (EBP) provides several methods to stop job execution, ranging from graceful cancellation to forceful termination. This guide explains when and how to use each cancellation method to effectively manage your batch jobs.
Quick Reference
| Method | When to Use | Impact | Recovery |
|---|---|---|---|
| Cancel | Normal job termination | Graceful shutdown with cleanup | Full |
| Kill | Unresponsive or hung jobs | Immediate termination | Partial |
| Purge | Remove completed/canceled jobs | Clean up job list | N/A |
Understanding Job Cancellation
Cancel - The Preferred Method
Cancel is the recommended way to stop a running job. It provides a graceful shutdown that:
- Allows the current processing step to complete
- Properly releases allocated resources
- Closes open files cleanly
- Preserves job output for review
- Updates job status appropriately
How to Cancel a Job
Using the Web Interface:
GET /CANCEL?job={jobId or jobName}
Using the REST API:
curl -X POST http://your-server/ebp/api/job/12345/cancel \
-H "Authorization: Bearer {your-token}"
Examples:
- Cancel by Job ID:
/CANCEL?job=12345 - Cancel by Job Name:
/CANCEL?job=PAYROLL(cancels all jobs named PAYROLL)
What Happens When You Cancel
The cancel process varies based on the job's current state:
Running Jobs:
- The system signals the job to stop after completing its current step
- Resources are released in an orderly manner
- Output files are finalized and preserved
- Job status changes to CANCELED
Queued Jobs:
- The job is immediately removed from the queue
- No processing occurs
- A cancellation message is recorded in the job output
- Status changes to CANCELED
Suspended Jobs:
- Checkpoint data is preserved initially
- Locked resources and files are released
- Temporary spool files are cleaned up
- Status changes to CANCELED
Kill - Forceful Termination
Kill forcefully terminates a job at the operating system level. Use this method only when:
- A job is completely unresponsive to cancel requests
- The job is stuck in an infinite loop
- Emergency termination is required
- A cancel attempt has failed or timed out
How to Kill a Job
Using the REST API:
curl -X POST http://your-server/ebp/api/job/12345/kill \
-H "Authorization: Bearer {your-token}"
Note: Kill functionality requires administrator privileges in most configurations.
Important Considerations for Kill
- Data Loss Risk: Forceful termination may result in incomplete output or corrupted files
- Resource Cleanup: Some resources may not be properly released
- No Graceful Shutdown: The job stops immediately without completing current operations
- Platform Differences:
- Linux/Unix: Uses SIGKILL signal (cannot be caught or ignored)
- Windows: Uses taskkill command
Purge - Removing Job Records
Purge removes completed or canceled jobs from the system, freeing up queue space and deleting associated files.
How to Purge Jobs
Single Job:
curl -X POST http://your-server/ebp/api/job/12345/purge \
-H "Authorization: Bearer {your-token}"
All Jobs with Same Name:
curl -X DELETE http://your-server/ebp/api/job/PAYROLL/purgeall \
-H "Authorization: Bearer {your-token}"
What Gets Removed During Purge
- Job input and output records
- Spool files (MSGOUT, SYSOUT, SYSERR)
- Temporary and checkpoint files
- Database records
- File allocations and assignments
Warning: Purged jobs cannot be recovered. Ensure you've saved any needed output before purging.
Security and Permissions
Who Can Cancel Jobs?
The ability to cancel jobs depends on your user permissions:
| User Type | Own Jobs | Other Users' Jobs | Kill Jobs |
|---|---|---|---|
| Regular User | Yes* | No | No |
| Administrator | Yes | Yes | Yes |
| System User | Yes | Yes | Yes |
*Subject to Resource Access Control (RAC) policies if enabled
Authorization Requirements
- Job Ownership: You can only cancel jobs you submitted unless you have administrator privileges
- RAC Policies: If Resource Access Control is enabled, you must have CANCEL rights for the job's class
- Security Violations: Attempts to cancel unauthorized jobs are logged and rejected
Best Practices
Choosing the Right Cancellation Method
Use Cancel When:
- Performing routine job termination
- Testing or debugging jobs
- The job is responding normally
- You want to preserve output for analysis
- Resources need proper cleanup
Use Kill When:
- Cancel has failed after waiting 30+ seconds
- The job is consuming excessive resources
- System stability is at risk
- The job is completely unresponsive
- Memory leaks are suspected
Use Purge When:
- Jobs have completed successfully
- Canceled jobs no longer need review
- Queue space needs to be freed
- Performing routine maintenance
Recommended Workflow
First Attempt: Always try Cancel first
POST /job/12345/cancel
- Wait Period: Allow 30-60 seconds for graceful shutdown
Check Status: Verify if the job has stopped
GET /job/12345/status
Escalate if Needed: If still running, use Kill
POST /job/12345/kill
Clean Up: Once terminated, purge if no longer needed
POST /job/12345/purge
Multi-Node Deployments (EBPPlex)
In distributed EBP environments, jobs may run on different nodes. The system automatically handles cross-node cancellation:
- Automatic Routing: Cancel requests are automatically routed to the correct node
- Database Coordination: Actions are queued in a shared database table
- Polling Mechanism: Each node polls for and executes pending actions
- Transparent Operation: Users don't need to know which node runs their job
Troubleshooting Common Issues
Problem: Cancel Doesn't Stop the Job
Possible Causes:
- Job is already terminating
- Job is unresponsive
- Insufficient permissions
Solutions:
- Check job status - it may already be stopping
- Wait 30-60 seconds for completion
- Verify your permissions for the job
- If still running, escalate to Kill
- Check system logs for errors
Problem: Kill Command Fails
Possible Causes:
- Incorrect process ID
- Insufficient OS-level permissions
- Process is already terminated
Solutions:
- Verify the job is actually running
- Ensure you have administrator privileges
- Check system logs for OS-level errors
- Contact system administrator if persistent
Problem: Job Restarts After Cancellation
Possible Causes:
- Automatic restart policies enabled
- Job control scripts triggering restart
- Scheduler configuration issues
Solutions:
- Check job class restart settings
- Review any associated JEC scripts
- Purge the job after cancellation
- Verify scheduler configuration
Problem: Cannot Cancel Another User's Job
Cause: Security restrictions
Solutions:
- Confirm you have administrator privileges
- Check RAC policies for the job class
- Contact the job owner or administrator
- Request appropriate permissions if needed
Job State Reference
Understanding job states helps determine which cancellation method to use:
| State | Description | Cancel | Kill | Purge |
|---|---|---|---|---|
| QUEUED | Waiting to run | ✓ | ✗ | ✗ |
| HELD | Manually held | ✓ | ✗ | ✗ |
| RUNNING | Currently executing | ✓ | ✓ | ✗ |
| SUSPENDED | Paused/checkpointed | ✓ | ✗ | ✗ |
| CANCELED | Already canceled | ✗ | ✗ | ✓ |
| TERMINATED | Completed normally | ✗ | ✗ | ✓ |
| ABORTED | Failed with error | ✗ | ✗ | ✓ |
API Endpoint Summary
Primary Endpoints
| Action | HTTP Method | Endpoint | Description |
|---|---|---|---|
| Cancel | GET | /CANCEL?job={id/name} | Legacy interface |
| Cancel | POST | /job/{jobId}/cancel | REST API cancel |
| Kill | POST | /job/{jobId}/kill | Force termination |
| Purge | POST | /job/{jobId}/purge | Remove single job |
| Purge All | DELETE | /job/{name}/purgeall | Remove all by name |
Generic Action Endpoint
POST /job/{jobId}/{action}
Where {action} can be: cancel, kill, purge, hold, start, or resubmit
0 Comments