Vonds | Lab 016

core troubleshooting services

Scenario

A Linux host is running slowly and user sessions feel laggy. You need to identify a process that is consuming excessive CPU, confirm system state using both snapshot and live views, terminate the runaway workload safely, and ensure you can manage shell jobs and scheduling priority without making the system worse.

Operator context

This is the workflow you use before escalating “the server is slow” tickets or guessing. You validate what is running, prove which process is responsible, and take the smallest safe corrective action.

Objective

Capture a quick process snapshot with CPU and memory visibility.
Inspect live process behavior to confirm the top offender.
Terminate a runaway process using a controlled signal.
List, stop, resume, and foreground shell jobs safely.
Start a workload at a lower scheduling priority using nice .

Concepts

ps is your snapshot. Good for quick evidence and grep-able output.
top (or htop ) is your live view. It confirms whether a PID is consistently misbehaving.
kill defaults to SIGTERM (graceful request). Escalate only if needed.
Job control ( Ctrl+Z , bg , fg , jobs ) is per-shell session, not system-wide.
nice sets priority at start. (Adjusting an existing process is renice .)

Walkthrough

Step 1 : Take a quick snapshot of top CPU consumers.

Command

ps aux --sort=-%cpu | head -n 10

Sorting by CPU gives you a fast shortlist. This is better than staring at an unsorted wall of processes. Treat this as evidence gathering before you take action.

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
lab       1357 98.3  0.5 248112 42280 ?        Rl   12:45   1:37 /usr/bin/python3 script.py
root       711  0.2  0.2  72960  2140 ?        Ss   08:01   0:00 /usr/sbin/cron -f
root       402  0.1  0.3  98720  3128 ?        Ss   08:01   0:00 /lib/systemd/systemd-journald

Step 2 : Confirm the offender in a live view.

Command

top

The point is not “open top.” The point is confirming the same PID is consistently responsible, not a transient spike. Exit with q .

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 1357 lab       20   0  248.1m  41.3m   5.1m R  98.3  0.5   1:39.21 python3

Step 3 : Send a controlled termination request (SIGTERM).

Command

kill 1357
# Verify:
ps -p 1357 || echo "PID 1357 is gone"

Default kill is SIGTERM, which is the safe first move. If the process is still present after a moment, you investigate or escalate.

PID 1357 is gone

Step 4 : Practice job control in your current shell session.

Commands

sleep 300
# Press Ctrl+Z to stop it
jobs
bg %1
jobs

Ctrl+Z stops (suspends) the foreground job. bg resumes it in the background. Use jobs to verify state before you bring anything to the foreground.

[1]+  Stopped                 sleep 300
[1]+  Running                 sleep 300 &

Step 5 : Bring a job back to the foreground.

Command

fg %1

Foregrounding restores interactive control. This is how you recover a “lost” long-running command. End it with Ctrl+C if needed.

sleep 300

Step 6 : Start a workload with lower scheduling priority.

Command

nice -n 10 ./backup.sh

Raising niceness reduces scheduling priority. This is an operator-friendly way to run heavy work without clobbering interactive users.

started './backup.sh' with nice value +10

Common breakpoints

I killed the wrong process

Use a repeatable identification step first: confirm the command line with ps -p <PID> -o pid,user,cmd . Do not rely on PID alone if processes respawn.

SIGTERM did nothing

Some processes trap SIGTERM or are stuck in uninterruptible I/O. Verify state with ps -o pid,stat,cmd -p <PID> . If you must escalate, use SIGKILL as a last resort and document it.

jobs shows nothing

Job control is per-shell session. If the process was started in a different terminal or via a service, jobs will not show it. Use system-wide tools ( ps , top ) for those.

Cleanup checklist

End any test jobs you started during the lab.

Commands

jobs
# If needed:
kill %1
jobs

Reference

ps aux --sort=-%cpu : Snapshot of all processes, sorted by CPU usage (highest first).
- a : Show processes for all users with a controlling TTY.
- u : User-oriented format (includes %CPU , %MEM , USER, etc.).
- x : Include processes without a controlling TTY (common for daemons/services).
- --sort=-%cpu : Sort descending by CPU percentage.
top : Interactive, live view of system load and per-process CPU/memory usage.
- q : Quit top .
- P : Sort by CPU usage.
- M : Sort by memory usage.
- k : Kill a process from within top (prompts for PID and signal).
kill <PID> : Sends a signal to a process (defaults to SIGTERM).
- <PID> : Process ID to signal.
- kill -TERM <PID> : Explicit SIGTERM (graceful request).
- kill -KILL <PID> : SIGKILL (force stop, last resort).
- kill -0 <PID> : Existence/permission check (does not terminate the process).
jobs : Lists jobs in the current shell session.
- Shows job IDs like [1] , state (Running/Stopped), and the command line.
- Only includes jobs started from the current shell (not system-wide processes).
bg %<job> : Resumes a stopped job in the background.
fg %<job> : Brings a job to the foreground.
nice -n <value> <command> : Starts a command with adjusted niceness (higher usually means lower priority).
- -n <value> : Niceness adjustment (common range: -20 highest priority to 19 lowest).

Lab 16: Process Management and Job Control

Back to Lab Index

Scenario

Objective

Concepts

Walkthrough

Common breakpoints

Cleanup checklist

Reference