Server Triage Checklist

First Commands

date -Is
hostnamectl
uptime
who

Capture time, host identity, load, and active users before changing anything.

Resource Snapshot

free -h
df -h
ps -eo pid,cmd,%cpu,%mem --sort=-%cpu | head

Service Snapshot

systemctl --failed
systemctl status nginx --no-pager
journalctl -p warning -n 100 --no-pager

Network Snapshot

ss -tulpn
ip addr
ip route

Safe Rule

Observe first, then change one thing at a time. Keep a terminal log when handling incidents.

What's Next

Bash Command Cheatsheet

Server Environment Context

This lesson matters in server operations because Server Triage Checklist supports structured diagnosis, debug traces, and server triage without destructive guesses. On a workstation, a mistake may affect one project. On a server, the same mistake can interrupt users, hide evidence, weaken access control, or make recovery harder.

Use the commands in this lesson with three questions in mind:

What system state am I about to inspect or change?
What evidence should I capture before changing it?
How will I prove the server is healthier after the command runs?

Operational Runbook Pattern

Use this repeatable pattern when applying the lesson on a real host:

Phase	Goal	Bash Habit
Identify	Confirm host, user, and scope	`hostname`, `id`, `pwd`
Inspect	Read state before modifying it	`systemctl status`, `ls -la`, `ss -tulpn`
Change	Make the smallest safe change	Quote paths and prefer explicit options
Verify	Confirm the intended result	Check exit status, logs, and service health
Record	Leave a useful audit trail	Save command output or ticket notes

Example session header:

printf 'time=%s host=%s user=%s cwd=%s
'   "$(date -Is)" "$(hostname)" "$(id -un)" "$(pwd)"

Pre-Flight Checklist

Before running commands from this lesson on a production server, check:

You are connected to the intended host.
You know whether the command is read-only or state-changing.
You have a rollback or recovery path for state-changing work.
You understand whether sudo is required and why.
You have captured current service, disk, or network state if the work is risky.

Useful pre-flight commands:

hostnamectl 2>/dev/null || hostname
id
uptime
systemctl --failed 2>/dev/null || true

Production Safety Notes

Risk	Safer Practice
Running on the wrong host	Print `hostname` and environment name first
Accidentally expanding paths	Quote variables: `"$path"`
Losing evidence	Copy logs or capture `journalctl` output before cleanup
Silent failure	Use `set -euo pipefail` in scripts and check exit codes interactively
Over-broad `sudo` usage	Run the smallest command possible with elevated permissions

When a command can delete, overwrite, restart, reload, or reconfigure something, do a dry run or read-only inspection first.

Validation Commands

After applying the technique from this lesson, validate with commands appropriate to the changed area:

printf 'exit_status=%s
' "$?"
systemctl --failed 2>/dev/null || true
journalctl -p warning -n 50 --no-pager 2>/dev/null || true
df -h
ss -tulpn 2>/dev/null || true

For application-facing changes, add an endpoint or process check:

curl -fsS http://127.0.0.1:8080/health >/dev/null || true
ps -eo pid,cmd,%cpu,%mem --sort=-%cpu | head

Automation Example

The following template shows how to turn this lesson into a repeatable server check. Adapt names and commands before using it.

#!/usr/bin/env bash
set -euo pipefail

log() {
  printf '%s INFO %s
' "$(date -Is)" "$*" >&2
}

die() {
  printf '%s ERROR %s
' "$(date -Is)" "$*" >&2
  exit 1
}

run_03_server_triage_checklist_check() {
  log 'running Server Triage Checklist validation'
  hostname >/dev/null
  uptime >/dev/null
}

run_03_server_triage_checklist_check "$@"

Troubleshooting Flow

If the expected result does not appear, diagnose in this order:

Confirm the command ran on the correct host and shell.
Check whether the command failed with a non-zero exit status.
Re-run the read-only inspection command with more explicit paths or options.
Check recent logs for permission, path, DNS, disk, or service errors.
Undo only the specific change you made, not unrelated user or system changes.

Useful debug commands:

set -x
# repeat the smallest failing command here
set +x
printf 'PATH=%s
' "$PATH"
type command 2>/dev/null || true

Practice Lab

Use a non-production VM, container, or temporary directory for practice:

Capture a baseline using date -Is, hostname, uptime, and df -h.
Apply the main command pattern from Server Triage Checklist to a safe test target.
Intentionally trigger one harmless failure, such as a missing file or inactive service.
Capture the error message and explain what Bash exit status it produced.
Convert the manual check into a small script with logging and validation.

Review Questions

Which commands in Server Triage Checklist are read-only, and which can change server state?
What is the safest way to test the command before using it on production data?
What log, service, or health check proves the operation succeeded?
What rollback step would you use if the result is wrong?
Which parts of the process should be automated, and which should remain manual?

Field Notes

Server work rewards boring, explicit commands. Prefer commands that can be pasted into a runbook, reviewed by another operator, and repeated during an incident without relying on memory.

Keep lesson examples as starting points, not blind copy-paste snippets. Adjust paths, service names, package names, ports, and users to match the actual server environment.

First Commands​

Resource Snapshot​

Service Snapshot​

Network Snapshot​

Safe Rule​

What's Next​

Server Environment Context​

Operational Runbook Pattern​

Pre-Flight Checklist​

Production Safety Notes​

Validation Commands​

Automation Example​

Troubleshooting Flow​

Practice Lab​

Review Questions​

Field Notes​