Follow

EBP Node Decommissioning Guide

Overview

This document provides the procedure for safely decommissioning and removing an EBP application node from the EBP cluster. This process ensures clean removal of the node entry, its associated actions, job classes, job initiators, and proper handling of queued jobs from the database.

Prerequisites

  • Database administrative access
  • The node_id (UUID) of the node to be decommissioned
  • Confirmation that the node has been stopped and is no longer processing requests
  • Decision on how to handle queued jobs (reassignment vs cancellation)

Decommissioning Procedure

Step 1: Verify Node Status

Before deletion, confirm the node is no longer active:

SELECT node_id, status, last_heartbeat, ebp_version
FROM ebpnode
WHERE node_id = '<NODE_UUID>';

Ensure the last_heartbeat timestamp indicates the node is no longer sending heartbeats.

Step 2: Assess Impact on Queued Jobs

Check for any queued jobs associated with this node:

SELECT id, jobqid, jobqname, jobqclass, status, jobqsubmittimestamp
FROM jobinput
WHERE nodeid = '<NODE_UUID>' AND status = 'QUEUED'
ORDER BY jobqsubmittimestamp;

Review the results and decide on the appropriate handling strategy (see Step 3).

Step 3: Handle Queued Jobs

Choose ONE of the following options based on your operational requirements:

Option 1: Reassign Queued Jobs (Recommended when starting replacement node)

Use this option when you're replacing the node or want queued jobs to be picked up by other active nodes.

-- Nullify the nodeid for queued jobs to allow reassignment
UPDATE jobinput
SET nodeid = NULL
WHERE nodeid = '<NODE_UUID>' AND status = 'QUEUED';

Use Case: Node replacement, rolling updates, or when jobs should continue processing on other nodes.

Option 2: Cancel Queued Jobs

Use this option when you're permanently removing capacity or when the queued jobs are no longer needed.

-- Cancel all queued jobs for this node
UPDATE jobinput
SET status = 'CANCELED',
    jobqcompletedtimestamp = NOW(),
    jobqlastmessage = 'Job canceled due to node decommissioning'
WHERE nodeid = '<NODE_UUID>' AND status = 'QUEUED';

Use Case: Permanent cluster downsizing, emergency decommissioning, or when jobs are time-sensitive and no longer valid.

 

Step 4: Delete Job Initiators

Remove job initiators associated with the node:

DELETE FROM jobinitiator
WHERE nodeid = '<NODE_UUID>';
Note:Review initiator states before deletion. Active initiators should be stopped before node decommissioning.

Step 5: Delete Job Classes

Remove job classes associated with the node:

DELETE FROM jobclass
WHERE nodeid = '<NODE_UUID>';
Warning:Ensure no other nodes are referencing these job classes before deletion.

Step 6: Delete Associated Actions

Remove all actions associated with the node:

-- Delete actions where the node is the executing node
DELETE FROM ebpplexaction
WHERE ebpNodeId = '<NODE_UUID>';

Step 7: Delete Node Entry

Remove the node entry from the ebpnode table:

DELETE FROM ebpnode
WHERE node_id = '<NODE_UUID>';

Step 8: Verify Deletion

Confirm the node and its associated data have been properly handled:

-- Verify node deletion
SELECT COUNT(*) FROM ebpnode WHERE node_id = '<NODE_UUID>';

-- Verify job initiator deletion
SELECT COUNT(*) FROM jobinitiator WHERE nodeid = '<NODE_UUID>';

-- Verify job class deletion
SELECT COUNT(*) FROM jobclass WHERE nodeid = '<NODE_UUID>';

-- Verify action deletion
SELECT COUNT(*) FROM ebpplexaction WHERE ebpNodeId = '<NODE_UUID>';

-- Verify no queued jobs remain with this nodeid
SELECT COUNT(*) FROM jobinput 
WHERE nodeid = '<NODE_UUID>' AND status = 'QUEUED';

All queries should return 0.

 

Complete Decommissioning Scripts

Script A: With Job Reassignment (Option 1)

BEGIN;

-- Step 1: Reassign queued jobs
UPDATE jobinput
SET nodeid = NULL
WHERE nodeid = '<NODE_UUID>' AND status = 'QUEUED';

-- Step 2: Delete job initiators
DELETE FROM jobinitiator
WHERE nodeid = '<NODE_UUID>';

-- Step 3: Delete job classes
DELETE FROM jobclass
WHERE nodeid = '<NODE_UUID>';

-- Step 4: Delete associated actions
DELETE FROM ebpplexaction
WHERE ebpNodeId = '<NODE_UUID>';

-- Step 5: Delete node entry
DELETE FROM ebpnode
WHERE node_id = '<NODE_UUID>';

COMMIT;

Script B: With Job Cancellation (Option 2)

BEGIN;

-- Step 1: Cancel queued jobs
UPDATE jobinput
SET status = 'CANCELED',
    jobqcompletedtimestamp = NOW(),
    jobqlastmessage = 'Job canceled due to node decommissioning'
WHERE nodeid = '<NODE_UUID>' AND status = 'QUEUED';

-- Step 2: Delete job initiators
DELETE FROM jobinitiator
WHERE nodeid = '<NODE_UUID>';

-- Step 3: Delete job classes
DELETE FROM jobclass
WHERE nodeid = '<NODE_UUID>';

-- Step 4: Delete associated actions
DELETE FROM ebpplexaction
WHERE ebpNodeId = '<NODE_UUID>';

-- Step 5: Delete node entry
DELETE FROM ebpnode
WHERE node_id = '<NODE_UUID>';

COMMIT;
 

Decision Matrix: Queued Job Handling

ScenarioRecommended OptionRationale
Replacing node with new instanceOption 1 (Reassign)Jobs will be picked up by new node
Rolling update/restartOption 1 (Reassign)Maintain job continuity
Permanent capacity reductionOption 2 (Cancel)Jobs won't be processed anyway
Emergency decommissionOption 2 (Cancel)Quick cleanup, clear state
Time-sensitive jobs on failed nodeOption 2 (Cancel)Jobs may no longer be valid
Scale-out cluster with spare capacityOption 1 (Reassign)Other nodes can handle workload

Post-Decommissioning Tasks

After successful deletion:

  1. Verify cluster health:

    SELECT node_id, status, last_heartbeat 
    FROM ebpnode 
    WHERE status = 'ACTIVE'
    ORDER BY last_heartbeat DESC;
  2. If using Option 1 (Reassignment): Monitor that queued jobs are being picked up:

    SELECT COUNT(*), nodeid 
    FROM jobinput 
    WHERE status = 'QUEUED' AND nodeid IS NULL
    GROUP BY nodeid;
  3. Check for orphaned jobs (jobs from deleted node in non-terminal states):

    SELECT id, jobqid, status, jobqname
    FROM jobinput
    WHERE nodeid = '<NODE_UUID>' 
    AND status NOT IN ('COMPLETED', 'CANCELED', 'FAILED', 'QUEUED');
  4. Update external monitoring and inventory systems
  5. Document the decommissioning in your change management system
 

Rollback Procedure

If the deletion needs to be reversed, restore from database backup. There is no automated rollback for this operation.

For Option 1 (Reassignment):Jobs with nullified nodeid may have already been picked up by other nodes, making exact rollback impossible.

Safety Considerations

  • Always use transactions (BEGIN/COMMIT) to ensure atomicity
  • Backup the database before performing node deletions in production
  • Verify the node_id carefully to avoid deleting the wrong node
  • Stop the application node before database decommissioning
  • Choose the appropriate job handling strategy based on your operational context
  • Check for foreign key constraints that might prevent deletion or cascade unintentionally
  • Coordinate with the team to ensure no active jobs are in critical stages
  • Consider job priority when canceling jobs (review high-priority jobs individually)

Troubleshooting

IssueResolution
Foreign key constraint violationsCheck for additional tables referencing node_id or job classes
Jobs not being reassigned after nullificationVerify job dispatcher is running and checking for NULL nodeid jobs
Initiators still appearing in queriesCheck for additional columns or tables tracking initiator state
Orphaned jobs in processing stateManually update to FAILED or CANCELED with appropriate message
Job classes referenced by multiple nodesConsider if job classes should be node-specific or shared
 

Additional Queries for Operations

Find all resources associated with a node before deletion:

-- Summary of node resources
SELECT 
    'Node Entry' as resource_type, COUNT(*) as count
FROM ebpnode WHERE node_id = '<NODE_UUID>'
UNION ALL
SELECT 'Job Initiators', COUNT(*) 
FROM jobinitiator WHERE nodeid = '<NODE_UUID>'
UNION ALL
SELECT 'Job Classes', COUNT(*) 
FROM jobclass WHERE nodeid = '<NODE_UUID>'
UNION ALL
SELECT 'Actions (ebpNodeId)', COUNT(*) 
FROM ebpplexaction WHERE ebpNodeId = '<NODE_UUID>'
UNION ALL
SELECT 'Queued Jobs', COUNT(*) 
FROM jobinput WHERE nodeid = '<NODE_UUID>' AND status = 'QUEUED'
UNION ALL
SELECT 'Non-Queued Jobs', COUNT(*) 
FROM jobinput WHERE nodeid = '<NODE_UUID>' AND status != 'QUEUED';
Was this article helpful?
1 out of 1 found this helpful
Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.
Powered by Zendesk