Add project structure and roadmap documentation
- Created `project-structure.md` to outline the directory layout, crate dependencies, design principles, module guidelines, and naming conventions for the NxMesh codebase. - Introduced `roadmap.md` detailing the development phases, milestones, tasks, deliverables, and resource requirements for the NxMesh project, spanning from foundational setup to enterprise features.
This commit is contained in:
527
docs/architecture.md
Normal file
527
docs/architecture.md
Normal file
@@ -0,0 +1,527 @@
|
||||
# NxMesh Architecture
|
||||
|
||||
## Table of Contents
|
||||
1. [Overview](#overview)
|
||||
2. [System Components](#system-components)
|
||||
3. [Data Flow](#data-flow)
|
||||
4. [Communication Protocols](#communication-protocols)
|
||||
5. [Security Model](#security-model)
|
||||
6. [Deployment Patterns](#deployment-patterns)
|
||||
7. [Failure Handling](#failure-handling)
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
NxMesh follows a **Control Plane / Data Plane** architecture pattern, similar to service meshes like Istio or Linkerd, but specifically optimized for nginx management.
|
||||
|
||||
### Design Principles
|
||||
|
||||
1. **Separation of Concerns**: Master handles policy and state; Agent handles execution
|
||||
2. **Eventual Consistency**: Configuration changes propagate asynchronously
|
||||
3. **Local Autonomy**: Agents can operate independently during master outages
|
||||
4. **Zero-Downtime Updates**: Nginx reloads without dropping connections
|
||||
5. **Observability First**: Every action is observable and traceable
|
||||
|
||||
---
|
||||
|
||||
## System Components
|
||||
|
||||
### 1. Master (Control Plane)
|
||||
|
||||
The Master is the brain of the system. It maintains the desired state and coordinates all agents.
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────────────┐
|
||||
│ MASTER │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌─────────────────────────┐ │
|
||||
│ │ API │ │ Config │ │ Event & Agent │ │
|
||||
│ │ Layer │ │ Engine │ │ Coordination │ │
|
||||
│ │ │ │ │ │ │ │
|
||||
│ │ ┌─────────┐ │ │ ┌─────────┐ │ │ ┌───────────────────┐ │ │
|
||||
│ │ │ REST │ │ │ │ Template│ │ │ │ Agent Registry │ │ │
|
||||
│ │ │ Handler │ │ │ │ Engine │ │ │ │ (Connections) │ │ │
|
||||
│ │ └─────────┘ │ │ └─────────┘ │ │ └───────────────────┘ │ │
|
||||
│ │ ┌─────────┐ │ │ ┌─────────┐ │ │ ┌───────────────────┐ │ │
|
||||
│ │ │ gRPC │ │ │ │ Version │ │ │ │ Event Bus │ │ │
|
||||
│ │ │ Server │ │ │ │ Control │ │ │ │ (Config Dist.) │ │ │
|
||||
│ │ └─────────┘ │ │ └─────────┘ │ │ └───────────────────┘ │ │
|
||||
│ │ ┌──────────┐ │ │ ┌──────────┐ │ │ ┌───────────────────┐ │ │
|
||||
│ │ │ WebSocket│ │ │ │ Validator│ │ │ │ Broadcast │ │ │
|
||||
│ │ │ Handler │ │ │ │ │ │ │ │ (Agent Updates) │ │ │
|
||||
│ │ └──────────┘ │ │ └──────────┘ │ │ └───────────────────┘ │ │
|
||||
│ └──────────────┘ └──────────────┘ └─────────────────────────┘ │
|
||||
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
|
||||
│ │ Auth │ │ Storage │ │ Observability │ │
|
||||
│ │ Service │ │ Layer │ │ │ │
|
||||
│ │ │ │ │ │ ┌───────────────────┐ │ │
|
||||
│ │ ┌─────────┐ │ │ ┌─────────┐ │ │ │ Metrics │ │ │
|
||||
│ │ │ JWT │ │ │ │ Postgres│ │ │ │ (Prometheus) │ │ │
|
||||
│ │ │ OAuth2 │ │ │ │ (SeaORM)│ │ │ └───────────────────┘ │ │
|
||||
│ │ └─────────┘ │ │ └─────────┘ │ │ ┌───────────────────┐ │ │
|
||||
│ │ ┌─────────┐ │ │ ┌─────────┐ │ │ │ Tracing │ │ │
|
||||
│ │ │ Password│ │ │ │ Cache │ │ │ │ (OpenTelemetry) │ │ │
|
||||
│ │ │ Login │ │ │ │ (Redis) │ │ │ └───────────────────┘ │ │
|
||||
│ │ └─────────┘ │ │ └─────────┘ │ │ │ │
|
||||
│ │ ┌─────────┐ │ │ │ │ │ │
|
||||
│ │ │ RBAC │ │ │ │ │ │ │
|
||||
│ │ │ Engine │ │ │ │ │ │ │
|
||||
│ │ └─────────┘ │ │ │ │ │ │
|
||||
│ └─────────────┘ └─────────────┘ └─────────────────────────┘ │
|
||||
└──────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
#### Master Responsibilities
|
||||
|
||||
| Module | Responsibility |
|
||||
|--------|----------------|
|
||||
| API Layer | HTTP REST API for external clients (CLI, Web UI, external systems) |
|
||||
| Config Engine | Template rendering, validation, versioning |
|
||||
| Event & Agent Coordination | Agent connection management, config event broadcasting |
|
||||
| Auth Service | Authentication (JWT/OAuth2, Password) and authorization (RBAC) |
|
||||
| Storage Layer | PostgreSQL for persistent state, Redis for caching |
|
||||
| Observability | Metrics collection, distributed tracing, structured logging |
|
||||
|
||||
#### Future: High Availability Mode
|
||||
|
||||
For large-scale deployments, the master can be extended with:
|
||||
- **Raft Consensus** for leader election and state replication
|
||||
- **Cluster Manager** for coordinating multiple master instances
|
||||
- This is **not required** for single-organization, self-hosted deployments |
|
||||
|
||||
### 2. Agent (Data Plane)
|
||||
|
||||
The Agent is a lightweight sidecar that runs alongside each nginx instance.
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ AGENT │
|
||||
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
|
||||
│ │ Master │ │ Nginx │ │ Health Monitor │ │
|
||||
│ │ Client │ │ Controller │ │ │ │
|
||||
│ │ │ │ │ │ ┌───────────────────┐ │ │
|
||||
│ │ ┌─────────┐ │ │ ┌─────────┐ │ │ │ Nginx Health │ │ │
|
||||
│ │ │ gRPC │ │ │ │ Config │ │ │ │ (HTTP checks) │ │ │
|
||||
│ │ │ Client │ │ │ │ Renderer│ │ │ └───────────────────┘ │ │
|
||||
│ │ └─────────┘ │ │ └─────────┘ │ │ ┌───────────────────┐ │ │
|
||||
│ │ ┌─────────┐ │ │ ┌─────────┐ │ │ │ System Metrics │ │ │
|
||||
│ │ │ WebSocket│ │ │ │ Reload │ │ │ │ (CPU/Mem/IO) │ │ │
|
||||
│ │ │ Client │ │ │ │ Manager │ │ │ └───────────────────┘ │ │
|
||||
│ │ └─────────┘ │ │ └─────────┘ │ │ │ │
|
||||
│ │ ┌─────────┐ │ │ ┌─────────┐ │ │ ┌───────────────────┐ │ │
|
||||
│ │ │ Reconnect│ │ │ │ Process │ │ │ │ Self-Health │ │ │
|
||||
│ │ │ Handler │ │ │ │ Signal │ │ │ │ (Heartbeat) │ │ │
|
||||
│ │ └─────────┘ │ │ └─────────┘ │ │ └───────────────────┘ │ │
|
||||
│ └─────────────┘ └─────────────┘ └─────────────────────────┘ │
|
||||
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
|
||||
│ │ Metrics │ │ Local │ │ Watchdog │ │
|
||||
│ │ Exporter │ │ Cache │ │ │ │
|
||||
│ │ │ │ │ │ ┌───────────────────┐ │ │
|
||||
│ │ ┌─────────┐ │ │ ┌─────────┐ │ │ │ Config Drift │ │ │
|
||||
│ │ │Prometheus│ │ │ │ Config │ │ │ │ Detection │ │ │
|
||||
│ │ │Endpoint │ │ │ │ State │ │ │ └───────────────────┘ │ │
|
||||
│ │ └─────────┘ │ │ └─────────┘ │ │ ┌───────────────────┐ │ │
|
||||
│ │ ┌─────────┐ │ │ ┌─────────┐ │ │ │ Auto-Recovery │ │ │
|
||||
│ │ │Statsd │ │ │ │ Backup │ │ │ │ (Nginx restart) │ │ │
|
||||
│ │ │Client │ │ │ │ Files │ │ │ └───────────────────┘ │ │
|
||||
│ │ └─────────┘ │ │ └─────────┘ │ │ │ │
|
||||
│ └─────────────┘ └─────────────┘ └─────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
#### Agent Responsibilities
|
||||
|
||||
| Module | Responsibility |
|
||||
|--------|----------------|
|
||||
| Master Client | Maintains persistent connection to master (gRPC + WebSocket fallback) |
|
||||
| Nginx Controller | Generates configs, manages reloads, handles lifecycle |
|
||||
| Health Monitor | Monitors nginx health, system resources, reports status |
|
||||
| Metrics Exporter | Prometheus endpoint, statsd client for metrics |
|
||||
| Local Cache | Caches configs for offline operation, backup/restore |
|
||||
| Watchdog | Detects config drift, auto-recovery from failures |
|
||||
|
||||
---
|
||||
|
||||
## Data Flow
|
||||
|
||||
### 1. Configuration Push Flow
|
||||
|
||||
```
|
||||
┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
|
||||
│ User │────▶│ API │────▶│ Config │────▶│ Event │────▶│ Agents │
|
||||
│ Action │ │ Server │ │ Engine │ │ Bus │ │ │
|
||||
└────────┘ └────────┘ └────────┘ └────────┘ └────────┘
|
||||
│
|
||||
▼
|
||||
┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
|
||||
│ Nginx │◀────│ Config │◀────│ Template│◀────│ gRPC │◀────│ Agent │
|
||||
│Reloaded│ │Applied │ │ Render │ │ Stream │ │Receive │
|
||||
└────────┘ └────────┘ └────────┘ └────────┘ └────────┘
|
||||
```
|
||||
|
||||
**Flow Description:**
|
||||
1. User creates/updates configuration via API or Web UI
|
||||
2. Master validates and stores configuration in database
|
||||
3. Config Engine determines affected agents
|
||||
4. Event Bus broadcasts configuration change event
|
||||
5. Agents receive event via gRPC streaming
|
||||
6. Agent renders local nginx configuration from templates
|
||||
7. Agent validates new configuration (`nginx -t`)
|
||||
8. Agent applies configuration via graceful reload
|
||||
9. Agent reports status back to master
|
||||
|
||||
### 2. Health Reporting Flow
|
||||
|
||||
```
|
||||
┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
|
||||
│ Nginx │────▶│ Agent │────▶│ Master │────▶│ DB │
|
||||
│ Health │ │ Health │ │ API │ │ Store │
|
||||
└────────┘ └────────┘ └────────┘ └────────┘
|
||||
│
|
||||
▼
|
||||
┌────────┐
|
||||
│Prometheus│
|
||||
│ Server │
|
||||
└────────┘
|
||||
```
|
||||
|
||||
**Flow Description:**
|
||||
1. Agent periodically checks nginx health (HTTP health endpoint)
|
||||
2. Agent collects system metrics (CPU, memory, connections)
|
||||
3. Agent sends health report to master via gRPC
|
||||
4. Master aggregates and stores in database
|
||||
5. Prometheus scrapes agent metrics endpoint
|
||||
|
||||
### 3. Certificate Management Flow
|
||||
|
||||
```
|
||||
┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
|
||||
│ Let's │◀────│ Master │────▶│ Agent │────▶│ Nginx │◀────│ Client │
|
||||
│Encrypt │ │ ACME │ │ Deploy │ │ Serve │ │Request │
|
||||
└────────┘ └────────┘ └────────┘ └────────┘ └────────┘
|
||||
```
|
||||
|
||||
**Flow Description:**
|
||||
1. Master requests certificate from Let's Encrypt (ACME protocol)
|
||||
2. Master distributes certificate to relevant agents
|
||||
3. Agent stores certificate locally (encrypted at rest)
|
||||
4. Agent updates nginx configuration with new certificate
|
||||
5. Nginx serves HTTPS traffic with new certificate
|
||||
|
||||
---
|
||||
|
||||
## Communication Protocols
|
||||
|
||||
### Master-Agent Protocol
|
||||
|
||||
NxMesh uses a **bidirectional gRPC stream** as the primary communication channel between master and agents.
|
||||
|
||||
```protobuf
|
||||
// agent.proto
|
||||
syntax = "proto3";
|
||||
package nxmesh.agent;
|
||||
|
||||
service AgentService {
|
||||
// Bidirectional streaming for real-time communication
|
||||
rpc Stream(stream AgentMessage) returns (stream MasterMessage);
|
||||
|
||||
// Unary calls for specific operations
|
||||
rpc ReportHealth(HealthReport) returns (Ack);
|
||||
rpc ReportMetrics(MetricsBatch) returns (Ack);
|
||||
}
|
||||
|
||||
message AgentMessage {
|
||||
string agent_id = 1;
|
||||
uint64 timestamp = 2;
|
||||
oneof payload {
|
||||
RegistrationRequest register = 3;
|
||||
HealthReport health = 4;
|
||||
ConfigStatus config_status = 5;
|
||||
MetricsBatch metrics = 6;
|
||||
LogBatch logs = 7;
|
||||
}
|
||||
}
|
||||
|
||||
message MasterMessage {
|
||||
uint64 timestamp = 1;
|
||||
oneof payload {
|
||||
RegistrationResponse register_response = 2;
|
||||
ConfigUpdate config_update = 3;
|
||||
Command command = 4;
|
||||
Ack ack = 5;
|
||||
}
|
||||
}
|
||||
|
||||
message ConfigUpdate {
|
||||
string config_id = 1;
|
||||
uint64 version = 2;
|
||||
repeated VirtualHost virtual_hosts = 3;
|
||||
repeated Upstream upstreams = 4;
|
||||
map<string, string> ssl_certificates = 5;
|
||||
}
|
||||
```
|
||||
|
||||
### Connection Management
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ CONNECTION LIFECYCLE │
|
||||
│ │
|
||||
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
|
||||
│ │ INIT │───▶│ CONNECT │───▶│ STREAM │───▶│ READY │ │
|
||||
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
|
||||
│ │ │ │ │
|
||||
│ ▼ ▼ ▼ │
|
||||
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
|
||||
│ │ RETRY │ │RECONNECT│ │ ERROR │ │
|
||||
│ └─────────┘ └─────────┘ └─────────┘ │
|
||||
│ │
|
||||
│ Connection Parameters: │
|
||||
│ - Heartbeat interval: 30s │
|
||||
│ - Reconnect backoff: 1s, 2s, 4s, 8s... (max 60s) │
|
||||
│ - gRPC keepalive: 10s ping, 20s timeout │
|
||||
│ - TLS: Server-side TLS (auto-generated or custom) │
|
||||
│ - Agent auth: Bootstrap token → Shared secret (HMAC) │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Security Model
|
||||
|
||||
### Authentication
|
||||
|
||||
| Component | Method | Details |
|
||||
|-----------|--------|---------|
|
||||
| Master API | JWT (RS256) | Short-lived access tokens, refresh tokens |
|
||||
| Master WebSocket | JWT | Same tokens as API |
|
||||
| Master-Agent gRPC | **TLS + Shared Secret** | Server TLS + bootstrap token → session HMAC |
|
||||
| Agent Registration | One-time Bootstrap Token | Generated in Master UI, single-use, short expiry |
|
||||
|
||||
### Agent Authentication Flow (TLS + Shared Secret)
|
||||
|
||||
```
|
||||
┌─────────────┐ ┌──────────────┐
|
||||
│ Agent │ │ Master │
|
||||
└──────┬──────┘ └──────┬───────┘
|
||||
│ │
|
||||
│ 1. TLS Handshake (verify server certificate) │
|
||||
│◄───────────────────────────────────────────────►│
|
||||
│ │
|
||||
│ 2. Register with bootstrap_token │
|
||||
│ ── gRPC: RegisterAgent { token } ─────────────▶│
|
||||
│ │
|
||||
│ 3. Receive agent_id + session_key (+ key_id) │
|
||||
│◄────────────────────────────────────────────────│
|
||||
│ [Encrypted over TLS] │
|
||||
│ │
|
||||
│ 4. Subsequent requests: HMAC-signed │
|
||||
│ ── gRPC + Headers: │
|
||||
│ X-Agent-ID: <agent_id> │
|
||||
│ X-Key-ID: <session_key_id> │
|
||||
│ X-Signature: HMAC(request_body, session_key)│
|
||||
│────────────────────────────────────────────────▶│
|
||||
│ │
|
||||
│ 5. Key Rotation (primary/secondary) │
|
||||
│◄═══════════════════════════════════════════════►│
|
||||
```
|
||||
|
||||
**Security Properties:**
|
||||
- **TLS**: Encrypts channel, verifies master identity (server cert)
|
||||
- **Bootstrap Token**: One-time use, time-limited, proves initial identity
|
||||
- **Session Key**: Per-agent secret, used for HMAC request signing
|
||||
- **Key Rotation**: Primary/secondary key design for seamless rotation
|
||||
|
||||
### Authorization (RBAC)
|
||||
|
||||
```yaml
|
||||
# Example RBAC Configuration
|
||||
roles:
|
||||
admin:
|
||||
permissions:
|
||||
- "*:*"
|
||||
|
||||
operator:
|
||||
permissions:
|
||||
- "config:read"
|
||||
- "config:write"
|
||||
- "agent:read"
|
||||
- "agent:reload"
|
||||
|
||||
viewer:
|
||||
permissions:
|
||||
- "config:read"
|
||||
- "agent:read"
|
||||
- "metrics:read"
|
||||
|
||||
# Resource hierarchy
|
||||
resources:
|
||||
- organization
|
||||
- workspace
|
||||
- agent
|
||||
- certificate
|
||||
- config (virtual_host, upstream)
|
||||
```
|
||||
|
||||
## Deployment Patterns
|
||||
|
||||
### Pattern 1: Docker Sidecar (Development/Single Host)
|
||||
|
||||
```yaml
|
||||
# docker-compose.yml
|
||||
version: '3.8'
|
||||
|
||||
services:
|
||||
nxmesh-master:
|
||||
image: nxmesh/master:latest
|
||||
ports:
|
||||
- "8080:8080" # API
|
||||
- "8443:8443" # gRPC
|
||||
environment:
|
||||
- DATABASE_URL=postgres://...
|
||||
|
||||
nginx-site-a:
|
||||
image: nginx:alpine
|
||||
volumes:
|
||||
- site-a-html:/usr/share/nginx/html
|
||||
|
||||
nxmesh-agent-a:
|
||||
image: nxmesh/agent:latest
|
||||
network_mode: service:nginx-site-a # Share network namespace with nginx
|
||||
pid: service:nginx-site-a # Share PID namespace (for nginx reload)
|
||||
environment:
|
||||
- NXMESH_MASTER_URL=wss://nxmesh-master:8443
|
||||
- NXMESH_AGENT_TOKEN=${AGENT_TOKEN_A}
|
||||
- NXMESH_DEPLOYMENT_MODE=docker_sidecar
|
||||
- NXMESH_NGINX_PID_FILE=/var/run/nginx.pid
|
||||
```
|
||||
|
||||
**Pros:** Simple, isolated, good for development
|
||||
**Cons:** Docker-only, single host limitation
|
||||
|
||||
### Pattern 2: Kubernetes Sidecar
|
||||
|
||||
```yaml
|
||||
# deployment.yaml
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: web-service
|
||||
spec:
|
||||
replicas: 3
|
||||
template:
|
||||
spec:
|
||||
containers:
|
||||
- name: nginx
|
||||
image: nginx:alpine
|
||||
volumeMounts:
|
||||
- name: nxmesh-config
|
||||
mountPath: /etc/nginx/conf.d
|
||||
|
||||
- name: nxmesh-agent
|
||||
image: nxmesh/agent:latest
|
||||
env:
|
||||
- name: NXMESH_MASTER_URL
|
||||
value: "wss://nxmesh-master.default.svc:8443"
|
||||
- name: NXMESH_AGENT_TOKEN
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: nxmesh-agent-token
|
||||
key: token
|
||||
volumeMounts:
|
||||
- name: nxmesh-config
|
||||
mountPath: /etc/nginx/conf.d
|
||||
volumes:
|
||||
- name: nxmesh-config
|
||||
emptyDir: {}
|
||||
```
|
||||
|
||||
**Pros:** Native K8s integration, auto-scaling, health checks
|
||||
**Cons:** K8s-only, more complex setup
|
||||
|
||||
### Pattern 3: Standalone (VM/Bare Metal)
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ VM / Bare Metal │
|
||||
│ ┌───────────────────────────────────────────────────────────┐ │
|
||||
│ │ Systemd │ │
|
||||
│ │ ┌─────────────────────────────────────────────────────┐ │ │
|
||||
│ │ │ nxmesh-agent.service │ │ │
|
||||
│ │ │ ┌──────────────┐ ┌──────────────┐ ┌───────────┐ │ │ │
|
||||
│ │ │ │ Agent │ │ Nginx │ │ Config │ │ │ │
|
||||
│ │ │ │ Process │──│ Process │──│ Files │ │ │ │
|
||||
│ │ │ └──────────────┘ └──────────────┘ └───────────┘ │ │ │
|
||||
│ │ └─────────────────────────────────────────────────────┘ │ │
|
||||
│ └───────────────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**Pros:** Works anywhere, minimal dependencies
|
||||
**Cons:** Manual setup, no container isolation
|
||||
|
||||
---
|
||||
|
||||
## Failure Handling
|
||||
|
||||
### Master Failure Scenarios
|
||||
|
||||
| Scenario | Impact | Mitigation |
|
||||
|----------|--------|------------|
|
||||
| Master unreachable | Agents continue with cached config | Agents retry with exponential backoff |
|
||||
| Master crashes | New connections fail, existing continue | External load balancer + health checks (HA: future) |
|
||||
| Database down | Read-only mode for existing configs | Database replication, failover |
|
||||
|
||||
### Agent Failure Scenarios
|
||||
|
||||
| Scenario | Impact | Mitigation |
|
||||
|----------|--------|------------|
|
||||
| Agent crashes | Nginx continues running | Systemd restart, watchdog |
|
||||
| Config validation fails | Previous config kept | Atomic config swap, rollback |
|
||||
| Nginx crashes | Agent restarts nginx | Health checks, auto-restart |
|
||||
| Network partition | Agent operates in "island mode" | Local cache, reconciliation on reconnect |
|
||||
|
||||
### Recovery Procedures
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ FAILURE RECOVERY FLOW │
|
||||
│ │
|
||||
│ Agent Disconnect │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
|
||||
│ │ Retry │───▶│ Cache │───▶│ Alert │───▶│ Watch │ │
|
||||
│ │ Connect │ │ Config │ │ Master │ │ Dog │ │
|
||||
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
|
||||
│ │ │ │
|
||||
│ ▼ ▼ │
|
||||
│ ┌─────────┐ ┌─────────┐ │
|
||||
│ │Reconnected│ │ Restart │ │
|
||||
│ │ Sync │ │ Nginx │ │
|
||||
│ └─────────┘ └─────────┘ │
|
||||
│ │
|
||||
│ Recovery Strategies: │
|
||||
│ 1. Exponential backoff for reconnection │
|
||||
│ 2. Circuit breaker for failed operations │
|
||||
│ 3. Config checksum verification after reconnect │
|
||||
│ 4. Automatic nginx restart on health check failure │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Technology Stack
|
||||
|
||||
| Layer | Technology | Rationale |
|
||||
|-------|------------|-----------|
|
||||
| **Master Backend** | Rust (Axum) | Performance, safety, async ecosystem |
|
||||
| **Agent** | Rust (Tokio) | Small binary, low memory, fast startup |
|
||||
| **Database** | PostgreSQL | ACID, JSON support, reliability |
|
||||
| **Cache** | Redis | Fast key-value, pub/sub for events |
|
||||
| **Frontend** | React + Vite (embedded) | Static build served by master, fast HMR in dev |
|
||||
| **gRPC** | Tonic | Native Rust implementation |
|
||||
| **ORM** | SeaORM | Async, type-safe, migration support |
|
||||
| **Config Template** | Handlebars | Logic-less, secure templating |
|
||||
| **Metrics** | Prometheus | Industry standard, rich ecosystem |
|
||||
| **Tracing** | OpenTelemetry | Vendor-neutral, future-proof |
|
||||
Reference in New Issue
Block a user