diff --git a/apps/agent/doc/README.md b/apps/agent/doc/README.md new file mode 100644 index 0000000..af59131 --- /dev/null +++ b/apps/agent/doc/README.md @@ -0,0 +1,19 @@ +# yanpm-agent Documentation + +This directory contains in-depth documentation for the yanpm agent daemon (the binary built from `apps/agent`). The agent exposes a unix-socket HTTP API for writing nginx configuration fragments, validating them, and reloading nginx safely. + +Docs included: + +- `architecture.md` — Detailed explanation of the program flow and components. +- `configuration.md` — CLI flags, environment variables, defaults, and permission handling. +- `usage.md` — How to run the agent, curl examples, and systemd/docker hints. +- `api.md` — HTTP API endpoints, request and response schemas, examples. +- `deployment.md` — Deployment considerations, permissions, and systemd socket/unit examples. +- `troubleshooting.md` — Common errors and solutions. + +For implementation details, see the source in `apps/agent/src` (notably `main.rs`, `routes.rs`, and the `commands/` submodule). + +Integration notes + +- The agent is intended to run as a companion agent for the API service in `apps/api`. The API service calls the agent over the unix-domain socket to write nginx fragments, validate them, and trigger reloads. +- A production Docker image is provided by `apps/agent/Dockerfile`. That Dockerfile packages nginx + the `yanpm-agent` binary and s6-overlay service scripts so a single container can run nginx and the agent alongside each other. diff --git a/apps/agent/doc/api.md b/apps/agent/doc/api.md new file mode 100644 index 0000000..b06270e --- /dev/null +++ b/apps/agent/doc/api.md @@ -0,0 +1,68 @@ +# HTTP API Reference + +Base: HTTP over a unix-domain socket. Example using curl: `curl --unix-socket /path/to/socket -X POST http://localhost/` + +1) GET /status + + - Response: 200 OK + - Body: JSON `{ "ok": true }` + +2) POST /validate + + - Request JSON: + + ```json + { + "config_name": "example", + "timestamp": 1234567890 + } + ``` + + - Behavior: validates the fragment file named by `config_name` and `timestamp` under the agent's internal subdirectory inside the configured nginx config directory. Delegates to `ValidateCommand::validate`. + - Success: 200 OK, body is `[rc, output]` tuple serialized as JSON (actual shape is `(i32, String)` returned from the command; examine responses for exact formatting). + - Error cases: + - 400 Bad Request: invalid or malformed JSON + - 500 Internal Server Error: validation error or missing fragment file + + - Request JSON: + + ```json + { + "config_name": "example", + "timestamp": 1234567890 + } + ``` + + - Behavior: validates the fragment file named by `config_name` and `timestamp` under the agent's internal subdirectory inside the configured nginx config directory. Delegates to `ValidateCommand::validate`. + - Success: 200 OK, body is a JSON array `[rc, output]` where `rc` is the integer return code and `output` is the combined stdout/stderr string from the validation command (the command returns an `(i32, String)` tuple). + - Error cases: + - 400 Bad Request: invalid or malformed JSON + - 500 Internal Server Error: validation error or missing fragment file + +3) POST /validate_and_reload + + - Request JSON same as `/validate`. + - Behavior: runs validation and, on success, attempts to reload nginx. Returns an object with `rc` and `ro` (return code and combined stdout/stderr output). + - Success: 200 OK with body: `{ "rc": , "ro": "" }` + - Errors: 400 for malformed JSON, 500 if the validate-and-reload command fails (body presents error text). + +4) POST /write_config + + - Request JSON: + + ```json + { + "config_name": "example", + "timestamp": 1234567890, + "content": "server { ... }" + } + ``` + + - Behavior: writes the provided `content` into an agent-managed fragment file named from `config_name` and `timestamp` in the internal subdirectory under `nginx_config_dir`. + - Success: 200 OK with empty body + - Error: 400 for malformed JSON, 500 if writing the file fails + +Notes + +- The agent expects callers to choose a `config_name` and `timestamp` that together form a unique filename. The concrete filename encoding is performed by `commands::run::to_file_name` in source. +- On validation failures the returned output often contains the full `nginx -t` output; inspect `ro` or the returned JSON error messages. diff --git a/apps/agent/doc/architecture.md b/apps/agent/doc/architecture.md new file mode 100644 index 0000000..5b5b2bd --- /dev/null +++ b/apps/agent/doc/architecture.md @@ -0,0 +1,34 @@ +# Architecture and Runtime Flow + +Overview + +- The agent is an async HTTP server (axum) listening on a Unix domain socket and exposes a small JSON API to manage nginx configuration fragments. +- Core lifecycle is implemented in `apps/agent/src/main.rs`: + - parse CLI args and environment variables + - ensure the socket path and directory exist and have permissive but secure defaults + - bind a `tokio::net::UnixListener` to the socket + - create an `NginxService` (shared state) and an in-process cron `JobScheduler` + - mount axum routes (`/status`, `/validate`, `/validate_and_reload`, `/write_config`) and serve HTTP over the Unix socket + +Key components + +- `main.rs` — Bootstrapping, argument handling, socket setup and permission handling, scheduler start, and axum server startup. +- `routes.rs` — axum handlers for the HTTP API. It deserializes JSON payloads and delegates to `NginxService` methods. Handlers return appropriate HTTP status codes and JSON on error or success. +- `commands/` — Implementation of lower-level actions (writing fragment files, running `nginx -t`, validating, reloads). The `validate.rs` command contains sophisticated behavior to handle permission-limited environments by: + - creating wrapper nginx configs that include a single fragment + - trying `nginx -t` directly, attempting a privileged wrapper via `sudo` if available, and finally passing a writable PID override via `-g pid ...;` to avoid permission failures + +Concurrency and state + +- A single shared `NginxService` instance is stored in axum `State` and cloned into handlers; it holds the scheduler and the configured nginx config directory path. +- The JobScheduler is created with `tokio_cron_scheduler::JobScheduler` and started before serving requests. + +Error handling and best-effort behavior + +- Socket permission changes, GID changes, and directory creations are best-effort and log warnings on failure rather than failing hard. +- Most command failures are converted into JSON errors with appropriate HTTP status codes so callers can inspect command output. + +Integration and packaging + +- The agent is intended to run as a companion to the API server in `apps/api`. The API calls the agent over the unix socket to write fragments, validate them, and trigger reloads. +- `apps/agent/Dockerfile` builds a runtime image that includes `nginx` and the `yanpm-agent` binary (the Dockerfile uses s6-overlay to run multiple services). This image is suitable for deployments that prefer nginx and the agent colocated in a single container. diff --git a/apps/agent/doc/configuration.md b/apps/agent/doc/configuration.md new file mode 100644 index 0000000..d19be57 --- /dev/null +++ b/apps/agent/doc/configuration.md @@ -0,0 +1,27 @@ +# Configuration and Environment + +CLI flags and environment variables + +- `--sock` / `YANPM_AGENT_SOCK` (default: `./yanpm-agent.sock`) + - Path to the Unix socket file the agent will bind to. + - If the socket directory does not exist the agent attempts to create it and set mode `0770`. + +- `--nginx-config-dir` / `YANPM_NGINX_CONFIG_DIR` (default: `/etc/nginx/conf.d`) + - Directory where nginx fragments are written. The agent writes fragments into a subdirectory named by the agent (internal use). + +- `--sock-perm` / `YANPM_AGENT_SOCK_PERM` (default: `660`) + - A 3-digit octal permission string applied to the socket file (best-effort). The program validates this is a 3-digit octal string. + - If the final digit is greater than `0` a warning is logged because that allows "others" access. + +- `--sock-gid` / `YANPM_AGENT_SOCK_GID` (default: current user's primary group) + - GID to set on the socket file (best-effort). + +Validation rules and behavior + +- `sock_perm` must be exactly 3 octal digits (characters 0-7). The agent rejects invalid values at startup. +- When an existing path exists at the socket location the agent verifies it is a unix socket; if so it removes it before binding. If the path exists and is not a socket, startup fails. +- Setting permissions (`set_permissions`) and changing GID (`chown`) are attempted but non-fatal: failures are logged as warnings and the agent continues. + +Notes about nginx config directory + +- The agent writes fragments into a subdirectory (internal) of the configured `nginx_config_dir`. Ensure nginx is configured to include that subdirectory so fragments are picked up, or use `write_config` then trigger a reload. diff --git a/apps/agent/doc/deployment.md b/apps/agent/doc/deployment.md new file mode 100644 index 0000000..9c707db --- /dev/null +++ b/apps/agent/doc/deployment.md @@ -0,0 +1,62 @@ +# Deployment and Permissions + +Socket location and permissions + +- The agent binds a unix socket at the path given by `--sock` or `YANPM_AGENT_SOCK`. The agent will: + - create the parent directory (best-effort) and attempt to set its permissions to `0770` + - remove an existing socket file if it is a socket, or fail if the path exists and is not a socket + - apply the `sock_perm` (3-digit octal) to the socket file and optionally change its GID to `sock_gid` + +Systemd socket/unit example + +Create a `yanpm-agent.socket` unit that creates and owns the unix socket, and a `yanpm-agent.service` that runs the agent. Ensure the socket path used by systemd matches `--sock`. + +Docker / container notes + +- If running the agent inside a container and writing to host nginx config, bind-mount the host nginx config directory into the container at the path provided to `--nginx-config-dir`. +- Consider running the agent as a user with permission to write the nginx config directory or use a shared group and `sock_gid` so clients can access the socket. +- The repository provides a runtime image built by `apps/agent/Dockerfile` which packages `nginx` together with the `yanpm-agent` binary and s6-overlay service scripts. This image runs nginx and the agent in one container which is useful when the agent is acting as the runtime companion for the API (`apps/api`). + +Privilege escalation for validation + +- In many systems `nginx -t` may fail due to inability to access `/run/nginx.pid` or other privileged files. The agent attempts a best-effort sequence: + + 1. Run `nginx -t` directly. + 2. If that fails with permission errors, try a privileged wrapper (e.g. `/usr/local/sbin/yanpm-nginx-validate` or `yanpm-nginx-validate-file`) via `sudo -n`. + 3. If wrapper is unavailable or fails, retry `nginx -t` with a writable PID override via `-g 'pid /tmp/yanpm-validate-.pid;'`. + +Security considerations + +- Avoid setting `sock_perm` to allow world access unless explicitly intended. +- Prefer controlling socket group membership via `sock_gid` rather than making the socket world-writable. + +s6 init scripts, wrappers and sudoers (runtime) + +- Purpose: The image built by `apps/agent/Dockerfile` uses `s6-overlay` as PID 1 (the Dockerfile sets `ENTRYPOINT ["/init"]`). The repository includes `docker/s6/cont-init.d` scripts that run at container startup (one-shot) and `docker/s6/services.d` entries to run long-lived services (nginx and the agent). The cont-init scripts prepare runtime users, permissions, and helper wrappers the agent uses for privileged operations. + +- Key cont-init scripts (in the repo): + - `docker/s6/cont-init.d/10-create-app-user` — ensures the `yanpm-agent` user and group exist (honoring `YANPM_AGENT_UID`, `YANPM_AGENT_GID`, and `YANPM_AGENT_SOCK_GID`), adds the user to the `nginx` group, and attempts to chown runtime directories like `/var/run/yanpm` and `/app/yanpm-agent` (logs warnings if chown fails for bind mounts or rootless containers). + - `docker/s6/cont-init.d/20-install-reload-wrapper` — installs three helper wrappers and a sudoers entry so the `yanpm-agent` user can perform narrowly-scoped privileged operations without a password. + +- Wrapper scripts installed by `20-install-reload-wrapper`: + - `/usr/local/sbin/yanpm-nginx-reload` — runs `nginx -c /etc/nginx/nginx.conf -s reload` (used for reloading the running nginx master process). + - `/usr/local/sbin/yanpm-nginx-validate` — runs `nginx -c /etc/nginx/nginx.conf -t` (validates the main nginx config). + - `/usr/local/sbin/yanpm-nginx-validate-file` — securely validates a single nginx config file: it resolves the absolute path, ensures the target is a regular file (not a symlink), checks the file is owned by the `yanpm-agent` user, enforces it's not world-writable, then runs `nginx -c -t`. This defends against symlink and race attacks when an unprivileged agent requests privileged validation. + +- Sudoers entry: + - The init script writes `/etc/sudoers.d/yanpm-agent` with a rule allowing the configured agent user (default `yanpm-agent`) to run only the three wrappers with `NOPASSWD`. This gives the agent a limited, auditable privilege escalation surface; the agent code attempts to use these wrappers via `sudo -n` before falling back to less privileged strategies. + +- Relevant environment variables (settable in the Dockerfile or at runtime): + - `YANPM_AGENT_SOCK` — unix socket path (default set in Dockerfile: `/var/run/yanpm/yanpm-agent.sock`). + - `YANPM_NGINX_CONFIG_DIR` — nginx config dir (default `/etc/nginx/conf.d`). + - `YANPM_AGENT_SOCK_PERM` — socket permissions (octal string, default `660`). + - `YANPM_AGENT_SOCK_GID` — desired GID for the socket (optional). + - `YANPM_AGENT_UID`, `YANPM_AGENT_GID` — runtime UID/GID used to create the `yanpm-agent` user in the container. + +- How the agent uses these runtime helpers: + - `ValidateCommand` and `ReloadCommand` in the agent code try `nginx` operations directly; when permission problems occur they attempt the privileged wrappers via `sudo -n /usr/local/sbin/yanpm-nginx-validate` or `...-validate-file` and `...-reload`. The cont-init script's wrappers plus the sudoers entry implement that intended secure upgrade path. + +- Notes and recommendations: + - The `validate-file` wrapper performs ownership and permission checks; ensure written fragments are created by the `yanpm-agent` user (the agent writes files as that user when running inside the container due to `10-create-app-user`). + - The cont-init scripts attempt to install `sudo` if missing; in minimal images you may prefer providing `sudo` at build time to avoid runtime installation attempts. + - If you bind-mount host directories (e.g., `/etc/nginx/conf.d`) into the container, ensure ownership and permissions are compatible with the agent user and `YANPM_AGENT_SOCK_GID` so the socket and files are accessible as intended. diff --git a/apps/agent/doc/troubleshooting.md b/apps/agent/doc/troubleshooting.md new file mode 100644 index 0000000..444b4d2 --- /dev/null +++ b/apps/agent/doc/troubleshooting.md @@ -0,0 +1,27 @@ +# Troubleshooting + +Common issues and how to resolve them + +- Socket path exists but is not a socket + - Symptom: startup fails with an error that the socket path exists and is not a socket. + - Fix: remove the file at the socket path or choose a different `--sock` path. + +- Permission denied on socket directory or socket + - Symptom: socket creation or permission setting logs warnings; clients cannot connect. + - Fix: ensure the socket directory exists and has correct ownership/group and that `sock_perm` and `sock_gid` are configured appropriately. Consider using `chown`/`chmod` from a privileged context. + +- `nginx -t` fails with `/run/nginx.pid: Permission denied` + - Symptom: validation fails; output contains permission denied for `/run/nginx.pid`. + - Fixes (tried by the agent): + 1. If available, provide a privileged validation wrapper (e.g. `/usr/local/sbin/yanpm-nginx-validate`) that runs `nginx -t` with appropriate privileges. + 2. Ensure the agent-runner has permission to read the main nginx configuration and `/run/nginx.pid` or allow the agent to use a writable PID override. + +- Fragment file not found during validation + - Symptom: validate returns 500 with message `Config file not found`. + - Fix: make sure the fragment has been written via `/write_config` to the agent's internal subdirectory under `NGINX_CONFIG_DIR`, using the same `config_name` and `timestamp` as the validate call. + +- Wrapper or sudo not available + - Symptom: attempts to run `sudo -n /usr/local/sbin/yanpm-nginx-validate` fail. + - Fix: install a wrapper script that allows unprivileged `sudo -n` validation or configure proper permissions on nginx state files. + +If none of the above solves the problem, collect the logs produced by the agent (it uses `tracing`/`tracing_subscriber`) and include the exact command outputs from the validation steps when asking for help. diff --git a/apps/agent/doc/usage.md b/apps/agent/doc/usage.md new file mode 100644 index 0000000..d2308d4 --- /dev/null +++ b/apps/agent/doc/usage.md @@ -0,0 +1,61 @@ +# Usage and Examples + +Running locally (development) + +1. Build the agent (from repository root): + + ```sh + cargo build -p agent + ``` + +2. Run the agent with defaults (socket in current directory): + + ```sh + ./target/debug/yanpm-agent + ``` + +3. Run with explicit socket and nginx config directory: + + ```sh + ./target/debug/yanpm-agent --sock /run/yanpm/yanpm-agent.sock --nginx-config-dir /etc/nginx/conf.d + ``` + +HTTP over unix-socket examples (using `socat` / `curl` helper) + +If you want to call the API from the shell, you can use `socat` to convert the unix socket to an HTTP stream, or use tools that support unix sockets directly (e.g. `curl --unix-socket`). Examples below use `curl --unix-socket`. + +Validate a fragment by name and timestamp: + +```sh +curl --unix-socket ./yanpm-agent.sock -X POST http://localhost/validate \ + -H 'Content-Type: application/json' \ + -d '{"config_name":"example","timestamp":1234567890}' +``` + +Validate and reload (returns `rc` and `ro`): + +```sh +curl --unix-socket ./yanpm-agent.sock -X POST http://localhost/validate_and_reload \ + -H 'Content-Type: application/json' \ + -d '{"config_name":"example","timestamp":1234567890}' +``` + +Write a fragment (create or update): + +```sh +curl --unix-socket ./yanpm-agent.sock -X POST http://localhost/write_config \ + -H 'Content-Type: application/json' \ + -d '{"config_name":"example","timestamp":1234567890,"content":"server { listen 80; server_name example.local; }"}' +``` + +Status endpoint (health) + +```sh +curl --unix-socket ./yanpm-agent.sock http://localhost/status +``` + +Notes + +- Use the `config_name` and `timestamp` fields consistently: `timestamp` is typically a monotonic update ID from the caller ensuring unique file names. +- When running in containers, mount the host nginx config dir if you want the agent to write directly to host nginx configuration. +- The repository includes a runtime Docker image built by `apps/agent/Dockerfile` which bundles `nginx` and the `yanpm-agent` binary (via s6-overlay). Use that image when you want nginx and the agent colocated (the agent is intended as a runtime companion to `apps/api`).