Loading...

Lab 9: Diagnose and Manage a Faulty Service

Diagnose why a systemd service fails using systemctl status and journal evidence. Resolve the root cause, restart cleanly, verify runtime health, and ensure the service is enabled for future boots.

services troubleshooting core

Scenario

The nginx.service is failing to start on boot. You must investigate the cause, fix it, and ensure it runs now and on future boots.

Operator context

This is a standard production workflow: confirm failure, pull the exact error from the journal, resolve the root cause, then prove it stays healthy after restart and across reboots.

Objective

  • Confirm the service failure using systemctl status.
  • Collect failure evidence using journalctl filtered to the unit.
  • Identify the cause (port conflict) from the log error text.
  • Confirm what is listening on the conflicting port.
  • Restart nginx and verify it transitions to active (running).
  • Ensure the service starts on boot using systemctl enable.

Concepts

  • A unit can be loaded but not active, or active but unhealthy (restarting/crashing).
  • Treat systemctl status as triage and the journal as the source for exact failure text.
  • “Address already in use” means another process is already bound to the same port/interface.
  • Restart affects runtime; enable controls whether the service starts automatically on boot.
  • Prove the service is running and listening where expected after the restart.

Walkthrough

Step 1: Check the service status.
Command
systemctl status nginx

This is your initial triage view: loaded state, enablement, active state, and the failing ExecStart context. If it is failing, pull the journal for the exact error text.

● nginx.service - A high performance web server
   Loaded: loaded (/lib/systemd/system/nginx.service; enabled)
   Active: failed (Result: exit-code) since ...
   Process: 1356 ExecStart=/usr/sbin/nginx -g 'daemon on;' (code=exited, status=1/FAILURE)
Step 2: Pull detailed logs for the unit.
Command
journalctl -xeu nginx

This filters the journal to the nginx unit while showing recent errors. Extract the root-cause line and treat it as the starting point for the fix.

Jul 18 12:43:01 nginx[1356]: nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
Jul 18 12:43:01 systemd[1]: nginx.service: Failed with result 'exit-code'.
Diagnosis

“Address already in use” means another process is already bound to port 80. Identify what owns the port, resolve the conflict, then restart nginx.

Step 3: Identify what is listening on port 80.
Command
sudo ss -ltnp | grep ':80 '

Confirm the process name and PID holding the port. This prevents “fixing” nginx when the real issue is an unexpected listener.

LISTEN 0      511          0.0.0.0:80        0.0.0.0:*    users:(("apache2",pid=912,fd=4))
Step 4: Stop the conflicting service.
Command
sudo systemctl stop apache2

Replace apache2 with the actual service name you discovered. The goal is to free port 80 for nginx.

Step 5: Restart nginx and verify it is running.
Command
sudo systemctl restart nginx
Verification
systemctl status nginx

A successful restart should transition nginx to active (running). If it still fails, return to the journal and read the new error.

● nginx.service - A high performance web server
   Loaded: loaded (/lib/systemd/system/nginx.service; enabled)
   Active: active (running) since ...
Step 6: Ensure nginx starts on boot.
Command
sudo systemctl enable nginx

Enabling a unit configures systemd to start it during boot by creating a symlink under the appropriate target wants directory.

Created symlink /etc/systemd/system/multi-user.target.wants/nginx.service → /lib/systemd/system/nginx.service

Common breakpoints

Port 80 still in use after “fix”

Re-run ss -ltnp and confirm the listener actually stopped. If the conflicting service is socket-activated, it may come back unless you stop/disable the socket unit.

Service shows “active” but the site is not reachable

Confirm it is listening where you expect (port/interface). If it is bound to 127.0.0.1 only, remote clients will fail even though the service is running.

Restart loops or repeated failures

The journal will contain the most recent failure cause. Read the latest unit logs again with journalctl and focus on the first error line nginx reports.

Enable succeeds but service still does not start after reboot

Confirm enablement state with systemctl status and check whether the unit is wanted by the expected target. Boot-time failures often show up in the journal for that boot.

Cleanup checklist

This lab only changes state if you stop a conflicting service or enable nginx. Ensure nginx is running, and decide whether the conflicting service should remain stopped in your environment.

Commands
systemctl status nginx
sudo ss -ltnp | grep ':80 '
Success signal

systemctl status nginx shows active (running) and ss confirms nginx is the process listening on port 80.

Reference

  • systemctl status <unit>: Shows current unit state and recent unit log lines.
  • journalctl -xeu <unit>: Shows recent journal entries for a specific unit with context.
    • -u <unit>: Filter output to a single systemd unit.
    • -e: Jump to the end of the journal.
    • -x: Show explanatory text when available.
  • ss -ltnp: Shows listening TCP sockets with process details.
    • -l: Show listening sockets.
    • -t: TCP sockets.
    • -n: Do not resolve names (show numeric ports/addresses).
    • -p: Show process using the socket (requires privileges for full detail).
  • systemctl stop <unit>: Stops a running service immediately.
  • systemctl restart <unit>: Stops and then starts the unit.
  • systemctl enable <unit>: Configures the unit to start automatically at boot.