Vonds | Lab 035

core troubleshooting services

Scenario

A developer reports the dev server “freezes” during a data run. You suspect a runaway process is consuming resources. You need to identify the offender, take controlled action (stop or terminate), then put basic guardrails in place using schedulers.

Operator context

This is evidence-first triage: observe, isolate, intervene safely, then automate follow-ups to reduce repeat incidents.

Objective

List processes and identify memory-heavy offenders.
Start a long-running job in the background.
Inspect jobs, foreground them, and suspend them safely.
Terminate a stuck job using SIGTERM.
Schedule a one-time cleanup job with at.
Create a daily cron job for monitoring.

Concepts

Evidence before action: identify the real offender with process listings before you kill anything.
Process vs job: a process is system-wide; a job is a process managed by your current shell session.
Job control: background, foreground, and stop states let you regain control without immediately terminating.
Signal discipline: start with SIGTERM to allow cleanup; escalate only when a process will not exit.
Scheduling guardrails: at for one-time actions, cron for recurring checks.

Walkthrough

Step 1 : Display processes sorted by memory usage.

Command

ps -eo pid,user,%mem,comm --sort=-%mem

This is a fast evidence view. Sort by memory to spot the likely offender before you touch anything. If you are debugging performance, capture this output for the ticket.

PID   USER     %MEM COMMAND
2387  alice    13.2 code
2194  alice     9.8 firefox
2381  dev       6.9 python3

Step 2 : Start the job in the background.

Command

python3 data_job.py &

Appending & runs the command in the background and returns control to your shell. This is useful when you need to keep working while a task runs.

[1] 2381

Step 3 : List jobs managed by the current shell.

Command

jobs

jobs lists background and stopped jobs in this shell session. This is not a system-wide process list. Use it to control tasks you started from this terminal.

[1]+  Running                 python3 data_job.py &

Step 4 : Foreground the job, then stop it.

Command

fg %1

Foregrounding gives you interactive control. In a real session you would press Ctrl+Z to stop the job without killing it.

python3 data_job.py
[1]+  Stopped                 python3 data_job.py

Step 5 : Terminate the job using SIGTERM.

Command

kill %1

Default kill sends SIGTERM, which is the correct first move in most admin workflows. It gives the process a chance to exit cleanly and flush state.

# No output on success

Step 6 : Schedule a one-time cleanup in two minutes.

Command

echo '/usr/local/bin/cleanup.sh' | at now + 2 minutes

at is for one-off scheduling. Use it for deferred cleanup, delayed restarts, or running a script once after a short buffer.

job 4 at Sat Jul 20 21:59:00 2025

Note

If at is not available, confirm the scheduler service is installed and enabled on your distro.

Step 7 : Create a daily cron job to run monitor.sh at midnight.

Command

(crontab -l; echo '0 0 * * * /usr/local/bin/monitor.sh') | crontab -

This preserves existing entries and appends the new schedule. Midnight daily is a common baseline for checks, reports, and housekeeping tasks.

# crontab installs silently on success

Common breakpoints

jobs shows nothing

jobs only reports tasks started from the current shell. If you started the process elsewhere or it is system-managed, use process tooling such as ps to find it by PID instead.

fg reports no such job

Job IDs are per shell session. Re-run jobs to confirm the job number and state, then retry fg %N.

kill does not stop the process

Default kill sends SIGTERM, which a misbehaving process may ignore. Confirm the target is correct, then decide whether escalation is justified in your environment.

at command not found or jobs do not run

One-time scheduling often requires a service (such as atd) to be installed and running. Validate availability before relying on scheduled actions during an incident.

crontab -l prints an error

If the user has no existing crontab, some systems print a message on stderr. Ensure your append workflow still results in a valid installed crontab.

Cleanup checklist

This lab can leave scheduled tasks behind if you created them. Clean up by removing the one-time job (if still queued) and removing the cron entry you added.

Commands

jobs
ps -eo pid,user,%mem,comm --sort=-%mem

atq
# Remove the queued at job by number (example: 4)
atrm 4

# Remove the monitor.sh cron line (edit interactively)
crontab -e

Success signal

No unintended scheduled work remains, and the offending job is no longer running.

Reference

ps -eo pid,user,%mem,comm --sort=-%mem : Lists processes with selected fields, sorted by memory.
- -e : Selects all processes.
- -o : Chooses output columns.
- --sort=-%mem : Sorts descending by memory usage.
python3 data_job.py & : Starts a command in the background.
- & : Runs the command asynchronously and returns the shell prompt.
jobs : Lists jobs managed by the current shell session.
fg %N : Brings job %N to the foreground.
Ctrl+Z : Stops the current foreground job (suspends it).
kill %N : Sends a signal to job %N (default SIGTERM).
echo '/path/script.sh' | at now + 2 minutes : Schedules a one-time job.
- | : Pipes the command string into at.
- now + 2 minutes : Relative schedule time.
atq : Lists pending at jobs.
atrm <jobid> : Removes a queued at job by ID.
(crontab -l; echo '0 0 * * * /usr/local/bin/monitor.sh') | crontab - : Appends a cron entry safely.
- crontab -l : Prints current crontab entries.
- crontab - : Installs a new crontab from stdin.
crontab -e : Edits the current user’s crontab.

Lab 35: Processes, Jobs, and Scheduling

Back to Lab Index

Scenario

Objective

Concepts

Walkthrough

Common breakpoints

Cleanup checklist

Reference