Overview
This document provides the procedure for safely decommissioning and removing an EBP application node from the EBP cluster. This process ensures clean removal of the node entry, its associated actions, job classes, job initiators, and proper handling of queued jobs from the database.
Prerequisites
- Database administrative access
- The
node_id(UUID) of the node to be decommissioned - Confirmation that the node has been stopped and is no longer processing requests
- Decision on how to handle queued jobs (reassignment vs cancellation)
Decommissioning Procedure
Step 1: Verify Node Status
Before deletion, confirm the node is no longer active:
SELECT node_id, status, last_heartbeat, ebp_version FROM ebpnode WHERE node_id = '<NODE_UUID>';
Ensure the last_heartbeat timestamp indicates the node is no longer sending heartbeats.
Step 2: Assess Impact on Queued Jobs
Check for any queued jobs associated with this node:
SELECT id, jobqid, jobqname, jobqclass, status, jobqsubmittimestamp FROM jobinput WHERE nodeid = '<NODE_UUID>' AND status = 'QUEUED' ORDER BY jobqsubmittimestamp;
Review the results and decide on the appropriate handling strategy (see Step 3).
Step 3: Handle Queued Jobs
Choose ONE of the following options based on your operational requirements:
Option 1: Reassign Queued Jobs (Recommended when starting replacement node)
Use this option when you're replacing the node or want queued jobs to be picked up by other active nodes.
-- Nullify the nodeid for queued jobs to allow reassignment UPDATE jobinput SET nodeid = NULL WHERE nodeid = '<NODE_UUID>' AND status = 'QUEUED';
Use Case: Node replacement, rolling updates, or when jobs should continue processing on other nodes.
Option 2: Cancel Queued Jobs
Use this option when you're permanently removing capacity or when the queued jobs are no longer needed.
-- Cancel all queued jobs for this node
UPDATE jobinput
SET status = 'CANCELED',
jobqcompletedtimestamp = NOW(),
jobqlastmessage = 'Job canceled due to node decommissioning'
WHERE nodeid = '<NODE_UUID>' AND status = 'QUEUED';Use Case: Permanent cluster downsizing, emergency decommissioning, or when jobs are time-sensitive and no longer valid.
Step 4: Delete Job Initiators
Remove job initiators associated with the node:
DELETE FROM jobinitiator WHERE nodeid = '<NODE_UUID>';
Step 5: Delete Job Classes
Remove job classes associated with the node:
DELETE FROM jobclass WHERE nodeid = '<NODE_UUID>';
Step 6: Delete Associated Actions
Remove all actions associated with the node:
-- Delete actions where the node is the executing node DELETE FROM ebpplexaction WHERE ebpNodeId = '<NODE_UUID>';
Step 7: Delete Node Entry
Remove the node entry from the ebpnode table:
DELETE FROM ebpnode WHERE node_id = '<NODE_UUID>';
Step 8: Verify Deletion
Confirm the node and its associated data have been properly handled:
-- Verify node deletion SELECT COUNT(*) FROM ebpnode WHERE node_id = '<NODE_UUID>'; -- Verify job initiator deletion SELECT COUNT(*) FROM jobinitiator WHERE nodeid = '<NODE_UUID>'; -- Verify job class deletion SELECT COUNT(*) FROM jobclass WHERE nodeid = '<NODE_UUID>'; -- Verify action deletion SELECT COUNT(*) FROM ebpplexaction WHERE ebpNodeId = '<NODE_UUID>'; -- Verify no queued jobs remain with this nodeid SELECT COUNT(*) FROM jobinput WHERE nodeid = '<NODE_UUID>' AND status = 'QUEUED';
All queries should return 0.
Complete Decommissioning Scripts
Script A: With Job Reassignment (Option 1)
BEGIN; -- Step 1: Reassign queued jobs UPDATE jobinput SET nodeid = NULL WHERE nodeid = '<NODE_UUID>' AND status = 'QUEUED'; -- Step 2: Delete job initiators DELETE FROM jobinitiator WHERE nodeid = '<NODE_UUID>'; -- Step 3: Delete job classes DELETE FROM jobclass WHERE nodeid = '<NODE_UUID>'; -- Step 4: Delete associated actions DELETE FROM ebpplexaction WHERE ebpNodeId = '<NODE_UUID>'; -- Step 5: Delete node entry DELETE FROM ebpnode WHERE node_id = '<NODE_UUID>'; COMMIT;
Script B: With Job Cancellation (Option 2)
BEGIN;
-- Step 1: Cancel queued jobs
UPDATE jobinput
SET status = 'CANCELED',
jobqcompletedtimestamp = NOW(),
jobqlastmessage = 'Job canceled due to node decommissioning'
WHERE nodeid = '<NODE_UUID>' AND status = 'QUEUED';
-- Step 2: Delete job initiators
DELETE FROM jobinitiator
WHERE nodeid = '<NODE_UUID>';
-- Step 3: Delete job classes
DELETE FROM jobclass
WHERE nodeid = '<NODE_UUID>';
-- Step 4: Delete associated actions
DELETE FROM ebpplexaction
WHERE ebpNodeId = '<NODE_UUID>';
-- Step 5: Delete node entry
DELETE FROM ebpnode
WHERE node_id = '<NODE_UUID>';
COMMIT;Decision Matrix: Queued Job Handling
| Scenario | Recommended Option | Rationale |
|---|---|---|
| Replacing node with new instance | Option 1 (Reassign) | Jobs will be picked up by new node |
| Rolling update/restart | Option 1 (Reassign) | Maintain job continuity |
| Permanent capacity reduction | Option 2 (Cancel) | Jobs won't be processed anyway |
| Emergency decommission | Option 2 (Cancel) | Quick cleanup, clear state |
| Time-sensitive jobs on failed node | Option 2 (Cancel) | Jobs may no longer be valid |
| Scale-out cluster with spare capacity | Option 1 (Reassign) | Other nodes can handle workload |
Post-Decommissioning Tasks
After successful deletion:
Verify cluster health:
SELECT node_id, status, last_heartbeat FROM ebpnode WHERE status = 'ACTIVE' ORDER BY last_heartbeat DESC;
If using Option 1 (Reassignment): Monitor that queued jobs are being picked up:
SELECT COUNT(*), nodeid FROM jobinput WHERE status = 'QUEUED' AND nodeid IS NULL GROUP BY nodeid;
Check for orphaned jobs (jobs from deleted node in non-terminal states):
SELECT id, jobqid, status, jobqname FROM jobinput WHERE nodeid = '<NODE_UUID>' AND status NOT IN ('COMPLETED', 'CANCELED', 'FAILED', 'QUEUED');- Update external monitoring and inventory systems
- Document the decommissioning in your change management system
Rollback Procedure
If the deletion needs to be reversed, restore from database backup. There is no automated rollback for this operation.
nodeid may have already been picked up by other nodes, making exact rollback impossible.Safety Considerations
- Always use transactions (
BEGIN/COMMIT) to ensure atomicity - Backup the database before performing node deletions in production
- Verify the node_id carefully to avoid deleting the wrong node
- Stop the application node before database decommissioning
- Choose the appropriate job handling strategy based on your operational context
- Check for foreign key constraints that might prevent deletion or cascade unintentionally
- Coordinate with the team to ensure no active jobs are in critical stages
- Consider job priority when canceling jobs (review high-priority jobs individually)
Troubleshooting
| Issue | Resolution |
|---|---|
| Foreign key constraint violations | Check for additional tables referencing node_id or job classes |
| Jobs not being reassigned after nullification | Verify job dispatcher is running and checking for NULL nodeid jobs |
| Initiators still appearing in queries | Check for additional columns or tables tracking initiator state |
| Orphaned jobs in processing state | Manually update to FAILED or CANCELED with appropriate message |
| Job classes referenced by multiple nodes | Consider if job classes should be node-specific or shared |
Additional Queries for Operations
Find all resources associated with a node before deletion:
-- Summary of node resources
SELECT
'Node Entry' as resource_type, COUNT(*) as count
FROM ebpnode WHERE node_id = '<NODE_UUID>'
UNION ALL
SELECT 'Job Initiators', COUNT(*)
FROM jobinitiator WHERE nodeid = '<NODE_UUID>'
UNION ALL
SELECT 'Job Classes', COUNT(*)
FROM jobclass WHERE nodeid = '<NODE_UUID>'
UNION ALL
SELECT 'Actions (ebpNodeId)', COUNT(*)
FROM ebpplexaction WHERE ebpNodeId = '<NODE_UUID>'
UNION ALL
SELECT 'Queued Jobs', COUNT(*)
FROM jobinput WHERE nodeid = '<NODE_UUID>' AND status = 'QUEUED'
UNION ALL
SELECT 'Non-Queued Jobs', COUNT(*)
FROM jobinput WHERE nodeid = '<NODE_UUID>' AND status != 'QUEUED';
0 Comments