Vonds | Lab 039

troubleshooting core services

Scenario

A script named rogue.sh is consuming nearly all CPU and causing performance issues for other users. You need to locate the offending process, attempt a graceful shutdown, confirm it exited, then repeat the test and force-stop it with SIGKILL if it refuses to terminate.

Operator context

Default to graceful termination. SIGTERM allows cleanup and state preservation. Use SIGKILL only when the process is non-responsive and you have accepted the blast radius.

Objective

Identify the PID for the process running rogue.sh.
Send a graceful termination request using the default kill behavior (SIGTERM).
Verify the PID is gone using PID and name-based checks.
Restart the workload in the background and capture the new PID.
Force-stop the workload with SIGKILL when required and verify again.

Concepts

Signals are a control interface: “request” termination versus “enforce” termination.
PID targeting: kill the exact process you intend to stop, not a look-alike match.
Verification is mandatory: confirm the PID is gone and confirm the name is gone.
Escalation discipline: use SIGKILL only when a graceful stop fails.

Walkthrough

Step 1 : Locate the rogue process and capture its PID.

Commands

ps aux | grep rogue.sh
# OR
pgrep -a rogue.sh

Use ps when you need full context (CPU, memory, user, TTY). Use pgrep -a when you want the PID plus command line without scrolling.

user     4012  99.0  0.1  50000  3000 pts/0    R    13:24   0:45 bash rogue.sh

Step 2 : Send a graceful termination request (SIGTERM).

Command

kill 4012

By default, kill sends SIGTERM. This asks the process to exit cleanly and is the correct first action for most runaway workloads.

# Expected result:
# No output is common. The process should exit shortly after.

Step 3 : Verify the process is gone.

Commands

ps -p 4012
pgrep rogue.sh

Confirm PID and name-based absence. ps -p verifies the exact PID. pgrep verifies no matching processes remain.

# Expected result:
# ps: no row for the PID
# pgrep: no matches and a non-zero exit status

Step 4 : Restart the workload in the background and capture the new PID.

Commands

bash rogue.sh &
# OR
./rogue.sh &

Background execution returns control to the shell and prints the job number and PID. Capture the PID so you can target it precisely during escalation.

[1] 4021
user     4021  95.0  0.1  50000  3000 pts/0    R    13:29   0:02 bash rogue.sh

Step 5 : Escalate to a forced stop (SIGKILL) and verify again.

Safety note

SIGKILL cannot be handled or ignored. The process gets no cleanup time, which can risk partial writes or locked resources depending on workload. Use it only when graceful termination fails.

Command

kill -9 4021

-9 sends SIGKILL immediately. Follow up with verification to confirm the process is gone and does not respawn under a supervisor.

ps -p 4021
pgrep rogue.sh

Common breakpoints

grep matches grep

Your filter may include the grep command itself. Treat that match as noise and identify the long-lived process row that contains the real workload.

SIGTERM does not stop the process

Give it a moment and verify again. If the PID persists and CPU remains high, escalate to SIGKILL only after confirming you are targeting the correct PID.

Process disappears then returns

A supervisor may be restarting it (systemd, cron, a wrapper script). Identify the parent process and ownership, then stop the controlling service or job rather than repeatedly killing children.

kill -9 still does not work

The process may be stuck in uninterruptible sleep (often I/O). At that point, SIGKILL will not take effect until the kernel returns the task from the blocked state. Collect evidence and check system I/O pressure.

Cleanup checklist

This lab changes state only by starting and stopping a test workload. Cleanup is confirming the workload is not running and you have a record of the PID(s) you targeted and the signals you used.

Commands

pgrep -a rogue.sh
ps aux | grep rogue.sh

Success signal

No matching processes remain, and you can explain why you used SIGTERM first and why escalation to SIGKILL was or was not justified.

Reference

ps aux : BSD-style process listing with CPU and memory columns for quick triage.
- a : Shows processes for all users.
- u : User-oriented format (adds USER, %CPU, %MEM, and related columns).
- x : Includes processes without a controlling TTY.
grep <pattern> : Filters lines matching a pattern (commonly used to narrow process output).
ps -p <pid> : Checks whether a specific PID still exists.
- -p <pid> : Selects the target PID for display.
pgrep <pattern> : Prints PIDs of matching processes.
pgrep -a <pattern> : Prints matching PIDs and their full command line.
- -a : Shows the full command line for each match.
kill <pid> : Sends a signal to a PID (default is SIGTERM).
- Default signal: SIGTERM (15) requests a clean shutdown.
kill -9 <pid> : Sends SIGKILL (forced stop, no cleanup).
- -9 : Uses signal 9 (SIGKILL) to terminate immediately.
bash <script> & : Runs a script in the background and returns control to the shell.
- <script> : Script name (for example rogue.sh).
- & : Runs the command in the background.
./<script> & : Runs an executable script from the current directory in the background.
- ./ : Executes a file from the current directory.
- <script> : Executable script name (for example rogue.sh).
- & : Runs the command in the background.
| : Pipes output from the left command into the right command.

Lab 39: Managing Processes with kill

Back to Lab Index

Scenario

Objective

Concepts

Walkthrough

Common breakpoints

Cleanup checklist

Reference