Follow

EBP Job Cancellation Guide

Introduction

The Heirloom Elastic Batch Platform (EBP) provides several methods to stop job execution, ranging from graceful cancellation to forceful termination. This guide explains when and how to use each cancellation method to effectively manage your batch jobs.

Quick Reference

MethodWhen to UseImpactRecovery
CancelNormal job terminationGraceful shutdown with cleanupFull
KillUnresponsive or hung jobsImmediate terminationPartial
PurgeRemove completed/canceled jobsClean up job listN/A

Understanding Job Cancellation

Cancel - The Preferred Method

Cancel is the recommended way to stop a running job. It provides a graceful shutdown that:

  • Allows the current processing step to complete
  • Properly releases allocated resources
  • Closes open files cleanly
  • Preserves job output for review
  • Updates job status appropriately

How to Cancel a Job

Using the Web Interface:

GET /CANCEL?job={jobId or jobName}

Using the REST API:

curl -X POST http://your-server/ebp/api/job/12345/cancel \
  -H "Authorization: Bearer {your-token}"

Examples:

  • Cancel by Job ID: /CANCEL?job=12345
  • Cancel by Job Name: /CANCEL?job=PAYROLL (cancels all jobs named PAYROLL)

What Happens When You Cancel

The cancel process varies based on the job's current state:

Running Jobs:

  • The system signals the job to stop after completing its current step
  • Resources are released in an orderly manner
  • Output files are finalized and preserved
  • Job status changes to CANCELED

Queued Jobs:

  • The job is immediately removed from the queue
  • No processing occurs
  • A cancellation message is recorded in the job output
  • Status changes to CANCELED

Suspended Jobs:

  • Checkpoint data is preserved initially
  • Locked resources and files are released
  • Temporary spool files are cleaned up
  • Status changes to CANCELED

Kill - Forceful Termination

Kill forcefully terminates a job at the operating system level. Use this method only when:

  • A job is completely unresponsive to cancel requests
  • The job is stuck in an infinite loop
  • Emergency termination is required
  • A cancel attempt has failed or timed out

How to Kill a Job

Using the REST API:

curl -X POST http://your-server/ebp/api/job/12345/kill \
  -H "Authorization: Bearer {your-token}"

Note: Kill functionality requires administrator privileges in most configurations.

Important Considerations for Kill

  • Data Loss Risk: Forceful termination may result in incomplete output or corrupted files
  • Resource Cleanup: Some resources may not be properly released
  • No Graceful Shutdown: The job stops immediately without completing current operations
  • Platform Differences:
    • Linux/Unix: Uses SIGKILL signal (cannot be caught or ignored)
    • Windows: Uses taskkill command

Purge - Removing Job Records

Purge removes completed or canceled jobs from the system, freeing up queue space and deleting associated files.

How to Purge Jobs

Single Job:

curl -X POST http://your-server/ebp/api/job/12345/purge \
  -H "Authorization: Bearer {your-token}"

All Jobs with Same Name:

curl -X DELETE http://your-server/ebp/api/job/PAYROLL/purgeall \
  -H "Authorization: Bearer {your-token}"

What Gets Removed During Purge

  • Job input and output records
  • Spool files (MSGOUT, SYSOUT, SYSERR)
  • Temporary and checkpoint files
  • Database records
  • File allocations and assignments

Warning: Purged jobs cannot be recovered. Ensure you've saved any needed output before purging.

Security and Permissions

Who Can Cancel Jobs?

The ability to cancel jobs depends on your user permissions:

User TypeOwn JobsOther Users' JobsKill Jobs
Regular UserYes*NoNo
AdministratorYesYesYes
System UserYesYesYes

*Subject to Resource Access Control (RAC) policies if enabled

Authorization Requirements

  1. Job Ownership: You can only cancel jobs you submitted unless you have administrator privileges
  2. RAC Policies: If Resource Access Control is enabled, you must have CANCEL rights for the job's class
  3. Security Violations: Attempts to cancel unauthorized jobs are logged and rejected

Best Practices

Choosing the Right Cancellation Method

Use Cancel When:

  • Performing routine job termination
  • Testing or debugging jobs
  • The job is responding normally
  • You want to preserve output for analysis
  • Resources need proper cleanup

Use Kill When:

  • Cancel has failed after waiting 30+ seconds
  • The job is consuming excessive resources
  • System stability is at risk
  • The job is completely unresponsive
  • Memory leaks are suspected

Use Purge When:

  • Jobs have completed successfully
  • Canceled jobs no longer need review
  • Queue space needs to be freed
  • Performing routine maintenance

Recommended Workflow

  1. First Attempt: Always try Cancel first

    POST /job/12345/cancel
    
  2. Wait Period: Allow 30-60 seconds for graceful shutdown
  3. Check Status: Verify if the job has stopped

    GET /job/12345/status
    
  4. Escalate if Needed: If still running, use Kill

    POST /job/12345/kill
    
  5. Clean Up: Once terminated, purge if no longer needed

    POST /job/12345/purge
    

Multi-Node Deployments (EBPPlex)

In distributed EBP environments, jobs may run on different nodes. The system automatically handles cross-node cancellation:

  1. Automatic Routing: Cancel requests are automatically routed to the correct node
  2. Database Coordination: Actions are queued in a shared database table
  3. Polling Mechanism: Each node polls for and executes pending actions
  4. Transparent Operation: Users don't need to know which node runs their job

Troubleshooting Common Issues

Problem: Cancel Doesn't Stop the Job

Possible Causes:

  • Job is already terminating
  • Job is unresponsive
  • Insufficient permissions

Solutions:

  1. Check job status - it may already be stopping
  2. Wait 30-60 seconds for completion
  3. Verify your permissions for the job
  4. If still running, escalate to Kill
  5. Check system logs for errors

Problem: Kill Command Fails

Possible Causes:

  • Incorrect process ID
  • Insufficient OS-level permissions
  • Process is already terminated

Solutions:

  1. Verify the job is actually running
  2. Ensure you have administrator privileges
  3. Check system logs for OS-level errors
  4. Contact system administrator if persistent

Problem: Job Restarts After Cancellation

Possible Causes:

  • Automatic restart policies enabled
  • Job control scripts triggering restart
  • Scheduler configuration issues

Solutions:

  1. Check job class restart settings
  2. Review any associated JEC scripts
  3. Purge the job after cancellation
  4. Verify scheduler configuration

Problem: Cannot Cancel Another User's Job

Cause: Security restrictions

Solutions:

  1. Confirm you have administrator privileges
  2. Check RAC policies for the job class
  3. Contact the job owner or administrator
  4. Request appropriate permissions if needed

Job State Reference

Understanding job states helps determine which cancellation method to use:

StateDescriptionCancelKillPurge
QUEUEDWaiting to run
HELDManually held
RUNNINGCurrently executing
SUSPENDEDPaused/checkpointed
CANCELEDAlready canceled
TERMINATEDCompleted normally
ABORTEDFailed with error

API Endpoint Summary

Primary Endpoints

ActionHTTP MethodEndpointDescription
CancelGET/CANCEL?job={id/name}Legacy interface
CancelPOST/job/{jobId}/cancelREST API cancel
KillPOST/job/{jobId}/killForce termination
PurgePOST/job/{jobId}/purgeRemove single job
Purge AllDELETE/job/{name}/purgeallRemove all by name

Generic Action Endpoint

POST /job/{jobId}/{action}

Where {action} can be: cancel, kill, purge, hold, start, or resubmit

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.
Powered by Zendesk