# NxMesh Feature Specification ## Table of Contents 1. [Core Features](#core-features) 2. [Master Features](#master-features) 3. [Agent Features](#agent-features) 4. [Configuration Management](#configuration-management) 5. [Observability](#observability) 6. [Security Features](#security-features) --- ## Core Features ### CF-001: Multi-tenancy with Organizations and Workspaces **Description**: Support for multiple organizations with isolated workspaces within each organization. **Requirements**: - Organizations are top-level resource containers - Each organization can have multiple workspaces - Resources (agents, configs, certificates) are scoped to a workspace - Cross-workspace visibility is configurable **Data Model**: ```rust struct Organization { id: Uuid, name: String, slug: String, // URL-friendly identifier created_at: DateTime, settings: OrganizationSettings, } struct Workspace { id: Uuid, organization_id: Uuid, name: String, slug: String, created_at: DateTime, } ``` **API Endpoints**: - `GET /api/v1/organizations` - List organizations - `POST /api/v1/organizations` - Create organization - `GET /api/v1/organizations/{id}/workspaces` - List workspaces - `POST /api/v1/organizations/{id}/workspaces` - Create workspace --- ### CF-002: Agent Registration and Lifecycle Management **Description**: Agents must register with the master before receiving configurations. **Registration Flow**: 1. Administrator generates bootstrap token in Master UI 2. Token is provided to agent via environment variable or config file 3. Agent establishes TLS connection to master (verifies server certificate) 4. Agent sends bootstrap token for registration 5. Master validates token and establishes shared secret: - Master generates session_key (per-agent) + key_id - Session key used for HMAC request signing - Primary/secondary key design for rotation **Agent States**: ```rust enum AgentState { Pending, // Registered but never connected Online, // Connected and healthy Offline, // Disconnected Degraded, // Connected but health checks failing Maintenance, // Manually placed in maintenance mode } ``` **Agent Metadata**: ```rust struct Agent { id: Uuid, workspace_id: Uuid, name: String, hostname: String, ip_address: String, version: String, state: AgentState, deployment_mode: DeploymentMode, // DockerSidecar, K8sSidecar, Standalone last_seen_at: DateTime, capabilities: Vec, // e.g., ["http3", "websocket", "rate_limiting"] labels: HashMap, // e.g., {"env": "prod", "region": "us-east"} } ``` **API Endpoints**: - `POST /api/v1/agents/register` - Register new agent - `GET /api/v1/agents` - List agents - `GET /api/v1/agents/{id}` - Get agent details - `POST /api/v1/agents/{id}/tokens` - Generate registration token - `DELETE /api/v1/agents/{id}` - Deregister agent --- ### CF-003: Real-time Configuration Distribution **Description**: Push configuration changes to agents in real-time with delivery guarantees. **Requirements**: - Config changes propagate to all affected agents within 5 seconds - Support for targeted updates (specific agents or groups) - Config versioning with rollback capability - Delivery confirmation from agents **Configuration Scope**: ```rust enum ConfigScope { Global, // All agents Workspace, // All agents in workspace AgentGroup(String), // Agents with specific label selector Agent(Uuid), // Single agent } ``` **Delivery Guarantees**: - At-least-once delivery - Automatic retry with exponential backoff - Config checksum verification - Offline agents receive updates on reconnection --- ## Master Features ### MF-001: RESTful API **Description**: Comprehensive REST API for all operations. **Base URL**: `/api/v1` **Resource Endpoints**: | Resource | Endpoints | |----------|-----------| | Organizations | GET, POST, PATCH, DELETE `/organizations` | | Workspaces | GET, POST, PATCH, DELETE `/workspaces` | | Agents | GET, POST, PATCH, DELETE `/agents` | | VirtualHosts | GET, POST, PATCH, DELETE `/virtual-hosts` | | Upstreams | GET, POST, PATCH, DELETE `/upstreams` | | Certificates | GET, POST, DELETE `/certificates` | | AccessLogs | GET `/access-logs` | | Metrics | GET `/metrics` | **Response Format**: ```json { "data": { ... }, "meta": { "page": 1, "per_page": 20, "total": 100 }, "links": { "self": "/api/v1/agents?page=1", "next": "/api/v1/agents?page=2", "prev": null } } ``` **Error Format**: ```json { "error": { "code": "VALIDATION_ERROR", "message": "Invalid configuration", "details": [ {"field": "server_name", "message": "Invalid domain format"} ] } } ``` --- ### MF-002: Web-based Admin Console (Embedded) **Description**: Modern web UI for managing the entire system. Built with React + Vite and served as static files embedded directly in the master binary. **Pages**: | Page | Features | |------|----------| | Dashboard | Agent status, recent events, traffic overview | | Agents | List, detail view, logs, metrics graphs | | Configurations | Virtual host editor, upstream management | | Certificates | SSL certificate list, expiration alerts | | Access Control | Users, roles, permissions management | | Settings | Organization settings, integrations | **Key UI Features**: - Real-time updates via WebSocket - Monaco editor for nginx configuration - Visual topology view (agent connections) - Dark/light mode support - Responsive design --- ### MF-003: Configuration Template Engine **Description**: Templating system for generating nginx configurations. **Template Variables**: ```handlebars # Example virtual host template server { listen {{port}} {{#if ssl}}ssl{{/if}} {{#if http2}}http2{{/if}}; server_name {{server_name}}; {{#if ssl}} ssl_certificate {{ssl_certificate_path}}; ssl_certificate_key {{ssl_certificate_key_path}}; {{/if}} location {{location_path}} { proxy_pass http://{{upstream_name}}; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; {{#each custom_headers}} add_header {{name}} "{{value}}"; {{/each}} {{#if rate_limiting}} limit_req zone={{rate_limit_zone}} burst={{rate_limit_burst}}; {{/if}} } } ``` **Built-in Templates**: - `default` - Standard reverse proxy - `spa` - Single Page Application (with fallback to index.html) - `api` - API gateway with rate limiting - `static` - Static file serving with caching - `websocket` - WebSocket proxy with connection upgrades --- ### MF-004: Certificate Management (ACME) **Description**: Automatic SSL/TLS certificate provisioning via Let's Encrypt. **Features**: - ACME v2 protocol support - HTTP-01 and DNS-01 challenges - Automatic renewal (30 days before expiry) - Wildcard certificate support (DNS-01) - Certificate monitoring and alerts **Certificate Entity**: ```rust struct Certificate { id: Uuid, workspace_id: Uuid, domain: String, is_wildcard: bool, provider: CertificateProvider, // LetsEncrypt, Custom status: CertificateStatus, // Pending, Active, Expired, Error issued_at: DateTime, expires_at: DateTime, auto_renew: bool, certificate_pem: Option, // Encrypted at rest private_key_pem: Option, // Encrypted at rest } ``` --- ## Agent Features ### AF-001: Nginx Lifecycle Management **Description**: Agent manages nginx process lifecycle based on deployment mode. **Docker Sidecar Mode**: - Shares PID namespace with nginx container (via `pid: service:nginx`) - Directly signals nginx process for reload/restart - Monitors nginx via health checks **Standalone Mode**: - Direct process management (signals to PID from file) - systemd integration (optional, for service management) - PID file monitoring **Lifecycle Actions**: - `start` - Start nginx - `stop` - Graceful shutdown - `reload` - Hot reload configuration - `restart` - Full restart - `test` - Validate configuration --- ### AF-002: Configuration Rendering and Application **Description**: Agent renders nginx configs from master templates and applies them using atomic symlink swaps for zero-downtime updates. **Config Directory Structure**: ``` /etc/nginx/ ├── nginx.conf # Contains: include /etc/nginx/conf.d/current/*.conf ├── conf.d/ │ ├── current -> ./20260302143000/ # Symlink to active deployment │ ├── 20260302143000/ # Active config (timestamped) │ │ ├── default.conf │ │ └── upstream.conf │ ├── 20260302141500/ # Previous deployment (for rollback) │ │ ├── default.conf │ │ └── upstream.conf │ └── 20260302140000/ # Older deployment (cleanup candidate) ``` **Config Rendering Flow**: 1. Receive ConfigUpdate from master 2. Create new deployment folder: `./conf.d//` 3. Render nginx config files into timestamped folder 4. **Validate** new config: `nginx -t -c /etc/nginx/conf.d//nginx.conf` 5. If validation passes, **atomically update symlink**: `current` → `/` 6. Execute graceful nginx reload 7. Verify reload success (health check) 8. Report status to master 9. Cleanup old deployments (keep N recent versions) **Atomic Config Swap**: ```rust async fn apply_config(&self, config: ConfigUpdate) -> Result<()> { let timestamp = generate_timestamp(); let deploy_dir = self.conf_d_path.join(×tamp); let symlink_path = self.conf_d_path.join("current"); // 1. Render config to new timestamped directory self.render_config(&config, &deploy_dir).await?; // 2. Validate BEFORE switching symlink (point to new folder directly) self.validate_config(&deploy_dir).await?; // 3. Atomic symlink swap (Unix: symlink + rename) let temp_link = self.conf_d_path.join("current.tmp"); tokio::fs::symlink(&deploy_dir, &temp_link).await?; tokio::fs::rename(&temp_link, &symlink_path).await?; // Atomic operation // 4. Reload nginx (picks up new symlink target) self.reload_nginx().await?; // 5. Verify and cleanup self.verify_health().await?; self.cleanup_old_deployments(5).await?; // Keep last 5 versions self.report_success(config.id, timestamp).await; } ``` **Rollback Strategy**: ```rust async fn rollback(&self, target_timestamp: &str) -> Result<()> { let target_dir = self.conf_d_path.join(target_timestamp); let symlink_path = self.conf_d_path.join("current"); // Verify target exists if !target_dir.exists() { return Err(Error::RollbackTargetNotFound); } // Atomic symlink swap back to previous deployment let temp_link = self.conf_d_path.join("current.tmp"); tokio::fs::symlink(&target_dir, &temp_link).await?; tokio::fs::rename(&temp_link, &symlink_path).await?; // Reload nginx self.reload_nginx().await?; } ``` --- ### AF-003: Health Monitoring and Reporting **Description**: Continuous health monitoring of nginx and the host system. **Health Checks**: - **Nginx Health**: HTTP request to nginx health endpoint - **Configuration Health**: Verify current config matches expected - **Resource Health**: CPU, memory, disk usage - **Connection Health**: Active connections, request rate **Health Report Structure**: ```rust struct HealthReport { agent_id: Uuid, timestamp: DateTime, nginx_status: NginxStatus, system_metrics: SystemMetrics, config_checksum: String, alerts: Vec, } struct NginxStatus { is_running: bool, pid: Option, uptime_seconds: u64, active_connections: u32, requests_per_second: f64, } struct SystemMetrics { cpu_percent: f64, memory_used_mb: u64, memory_total_mb: u64, disk_used_gb: u64, disk_total_gb: u64, } ``` **Reporting Interval**: Configurable (default: 30 seconds) --- ### AF-004: Metrics Collection and Export **Description**: Collect and expose metrics in Prometheus format. **Metrics Endpoint**: `GET /metrics` (on agent) **Built-in Metrics**: ``` # Nginx metrics (parsed from stub_status) nxmesh_nginx_connections_active{agent_id="..."} 42 nxmesh_nginx_connections_reading{agent_id="..."} 5 nxmesh_nginx_connections_writing{agent_id="..."} 30 nxmesh_nginx_connections_waiting{agent_id="..."} 7 nxmesh_nginx_requests_total{agent_id="..."} 1234567 # Agent metrics nxmesh_agent_uptime_seconds{agent_id="..."} 86400 nxmesh_agent_master_connection_status{agent_id="..."} 1 nxmesh_agent_config_version{agent_id="...",version="123"} 1 # System metrics nxmesh_system_cpu_percent{agent_id="..."} 25.5 nxmesh_system_memory_used_bytes{agent_id="..."} 1073741824 nxmesh_system_disk_used_bytes{agent_id="..."} 53687091200 ``` **Custom Metrics**: Agents can collect custom metrics from nginx access logs --- ### AF-005: Offline Operation and Recovery **Description**: Agent can operate independently when master is unreachable. **Offline Capabilities**: - Continue serving traffic with cached configuration - Local health monitoring continues - Metrics are buffered for later transmission - Automatic reconnection attempts **Recovery Flow**: 1. Detect disconnection from master 2. Enter "offline mode" 3. Continue operating with cached config 4. Buffer metrics and logs 5. Attempt reconnection with exponential backoff 6. On reconnection: - Sync configuration (compare checksums) - Transmit buffered metrics - Resume normal operation --- ## Configuration Management ### CM-001: Virtual Host Configuration **Description**: Define nginx server blocks (virtual hosts) via API/UI. **VirtualHost Entity**: ```rust struct VirtualHost { id: Uuid, workspace_id: Uuid, name: String, // Human-readable name server_name: String, // Domain name(s), comma-separated listen_port: u16, // Usually 80 or 443 ssl_enabled: bool, ssl_certificate_id: Option, // Routing configuration locations: Vec, // Advanced settings http2_enabled: bool, http3_enabled: bool, gzip_enabled: bool, rate_limiting: Option, // Target agents target_agents: AgentSelector, } struct Location { path: String, // e.g., "/api" or "~ \.php$" proxy_pass: Option, // e.g., "http://backend" upstream_id: Option, root: Option, // For static files index: Option, // e.g., "index.html" custom_headers: Vec
, rewrite_rules: Vec, } ``` **Validation Rules**: - `server_name` must be valid domain(s) - `listen_port` must be 1-65535 - SSL certificate must exist if `ssl_enabled` is true - At least one location must be defined --- ### CM-002: Upstream Configuration **Description**: Define backend server pools for load balancing. **Upstream Entity**: ```rust struct Upstream { id: Uuid, workspace_id: Uuid, name: String, // Used as upstream identifier // Load balancing algorithm algorithm: LoadBalanceAlgorithm, // RoundRobin, LeastConn, IPHash, etc. // Backend servers servers: Vec, // Health check configuration health_check: Option, // Connection settings keepalive_connections: Option, keepalive_timeout: Option, } struct UpstreamServer { address: String, // IP:port or hostname:port weight: u32, // Default: 1 backup: bool, // Backup server down: bool, // Temporarily down max_fails: u32, // Default: 1 fail_timeout: u32, // Seconds, default: 10 } enum LoadBalanceAlgorithm { RoundRobin, LeastConnections, IPHash, WeightedRoundRobin, } ``` --- ### CM-003: Configuration Versioning **Description**: Track all configuration changes with full history. **Versioning Features**: - Every change creates a new version - Versions are immutable - Rollback to any previous version - Diff between versions - Audit log of who changed what **Version Entity**: ```rust struct ConfigVersion { id: Uuid, resource_type: String, // "virtual_host", "upstream", etc. resource_id: Uuid, version_number: u64, // Auto-incrementing data: Json, // Full configuration snapshot checksum: String, // SHA-256 of data created_by: Uuid, // User ID created_at: DateTime, change_summary: String, // Human-readable description } ``` **API Endpoints**: - `GET /api/v1/virtual-hosts/{id}/versions` - List versions - `GET /api/v1/virtual-hosts/{id}/versions/{version}` - Get specific version - `POST /api/v1/virtual-hosts/{id}/rollback` - Rollback to version - `GET /api/v1/virtual-hosts/{id}/diff?from=v1&to=v2` - Compare versions --- ## Observability ### OB-001: Structured Logging **Description**: Comprehensive logging with structured format. **Log Levels**: ERROR, WARN, INFO, DEBUG, TRACE **Log Fields**: ```json { "timestamp": "2026-03-02T10:30:00Z", "level": "INFO", "component": "agent", "agent_id": "550e8400-e29b-41d4-a716-446655440000", "trace_id": "abc123", "span_id": "def456", "message": "Configuration applied successfully", "fields": { "config_id": "config-123", "version": 42, "duration_ms": 150 } } ``` **Log Targets**: - Master: systemd journal, file, or centralized (ELK/Loki) - Agent: stdout (Docker), file (standalone), or remote --- ### OB-002: Distributed Tracing **Description**: OpenTelemetry tracing for request flow visualization. **Traced Operations**: - Configuration push (master → agent → nginx) - Health check cycles - Certificate issuance - API requests **Span Attributes**: - `nxmesh.agent_id` - `nxmesh.config_id` - `nxmesh.workspace_id` - `nxmesh.organization_id` --- ### OB-003: Access Log Aggregation **Description**: Collect and query nginx access logs from all agents. **Features**: - Centralized access log storage - Real-time log streaming - SQL-like query interface - Log retention policies **Access Log Schema**: ```rust struct AccessLogEntry { id: Uuid, agent_id: Uuid, timestamp: DateTime, // Request details remote_addr: String, method: String, uri: String, protocol: String, host: String, // Response details status: u16, body_bytes_sent: u64, response_time_ms: f64, // Additional fields user_agent: Option, referer: Option, request_id: Option, } ``` **Query API**: ```graphql # Example query query { accessLogs( filter: { agentId: "...", timeRange: { from: "2026-03-01", to: "2026-03-02" }, statusCode: { gte: 500 } }, limit: 100 ) { timestamp method uri status responseTimeMs } } ``` --- ## Security Features ### SF-001: Authentication and Authorization **Description**: Multi-method authentication with fine-grained RBAC. **Authentication Methods**: - JWT (for API/Web UI) - Password-based login (local user accounts) - OAuth2/OIDC (Google, GitHub, enterprise SSO) - API Keys (for service accounts) - **TLS + Shared Secret** (for agent communication) - Server-side TLS (auto-generated self-signed or custom certificates) - Bootstrap token for initial registration - Session key with HMAC signing for ongoing requests - Primary/secondary key rotation **RBAC Model**: ```rust struct Role { id: Uuid, name: String, permissions: Vec, } enum Permission { // Organization scope OrganizationRead, OrganizationWrite, OrganizationDelete, // Workspace scope WorkspaceRead, WorkspaceWrite, WorkspaceDelete, // Agent scope AgentRead, AgentWrite, AgentReload, AgentDelete, // Config scope ConfigRead, ConfigWrite, ConfigDeploy, ConfigDelete, // Certificate scope CertificateRead, CertificateWrite, CertificateDelete, // User management UserRead, UserWrite, UserDelete, } ``` --- ### SF-002: Secret Management **Description**: Secure storage and distribution of sensitive data. **Secrets**: - SSL private keys - API tokens - Database passwords - External service credentials **Security Measures**: - Encryption at rest (AES-256-GCM) - Encryption in transit (TLS 1.3) - Automatic secret rotation - Audit logging for secret access --- ### SF-003: Network Security **Description**: Network-level security controls. **Features**: - IP allowlisting for agent connections - Rate limiting on API endpoints - DDoS protection recommendations - Security headers enforcement (HSTS, CSP, etc.) **Agent Connection Security**: - **TLS Encryption**: Server-side TLS (auto-generated or custom certificates) - Development: Self-signed certificates auto-generated on first start - Production: Valid certificates (Let's Encrypt or corporate CA) - **Bootstrap Authentication**: One-time token for initial registration - **Session Authentication**: HMAC-signed requests with shared session key - **Key Rotation**: Primary/secondary key design for seamless rotation - **Certificate Pinning**: Optional fingerprint verification for additional security