Files
NxMesh-old/docs/features.md
GW_MC 43b2e44d95 Add project structure and roadmap documentation
- Created `project-structure.md` to outline the directory layout, crate dependencies, design principles, module guidelines, and naming conventions for the NxMesh codebase.
- Introduced `roadmap.md` detailing the development phases, milestones, tasks, deliverables, and resource requirements for the NxMesh project, spanning from foundational setup to enterprise features.
2026-03-03 04:13:31 +00:00

21 KiB

NxMesh Feature Specification

Table of Contents

  1. Core Features
  2. Master Features
  3. Agent Features
  4. Configuration Management
  5. Observability
  6. Security Features

Core Features

CF-001: Multi-tenancy with Organizations and Workspaces

Description: Support for multiple organizations with isolated workspaces within each organization.

Requirements:

  • Organizations are top-level resource containers
  • Each organization can have multiple workspaces
  • Resources (agents, configs, certificates) are scoped to a workspace
  • Cross-workspace visibility is configurable

Data Model:

struct Organization {
    id: Uuid,
    name: String,
    slug: String,  // URL-friendly identifier
    created_at: DateTime,
    settings: OrganizationSettings,
}

struct Workspace {
    id: Uuid,
    organization_id: Uuid,
    name: String,
    slug: String,
    created_at: DateTime,
}

API Endpoints:

  • GET /api/v1/organizations - List organizations
  • POST /api/v1/organizations - Create organization
  • GET /api/v1/organizations/{id}/workspaces - List workspaces
  • POST /api/v1/organizations/{id}/workspaces - Create workspace

CF-002: Agent Registration and Lifecycle Management

Description: Agents must register with the master before receiving configurations.

Registration Flow:

  1. Administrator generates bootstrap token in Master UI
  2. Token is provided to agent via environment variable or config file
  3. Agent establishes TLS connection to master (verifies server certificate)
  4. Agent sends bootstrap token for registration
  5. Master validates token and establishes shared secret:
    • Master generates session_key (per-agent) + key_id
    • Session key used for HMAC request signing
    • Primary/secondary key design for rotation

Agent States:

enum AgentState {
    Pending,      // Registered but never connected
    Online,       // Connected and healthy
    Offline,      // Disconnected
    Degraded,     // Connected but health checks failing
    Maintenance,  // Manually placed in maintenance mode
}

Agent Metadata:

struct Agent {
    id: Uuid,
    workspace_id: Uuid,
    name: String,
    hostname: String,
    ip_address: String,
    version: String,
    state: AgentState,
    deployment_mode: DeploymentMode,  // DockerSidecar, K8sSidecar, Standalone
    last_seen_at: DateTime,
    capabilities: Vec<String>,  // e.g., ["http3", "websocket", "rate_limiting"]
    labels: HashMap<String, String>,  // e.g., {"env": "prod", "region": "us-east"}
}

API Endpoints:

  • POST /api/v1/agents/register - Register new agent
  • GET /api/v1/agents - List agents
  • GET /api/v1/agents/{id} - Get agent details
  • POST /api/v1/agents/{id}/tokens - Generate registration token
  • DELETE /api/v1/agents/{id} - Deregister agent

CF-003: Real-time Configuration Distribution

Description: Push configuration changes to agents in real-time with delivery guarantees.

Requirements:

  • Config changes propagate to all affected agents within 5 seconds
  • Support for targeted updates (specific agents or groups)
  • Config versioning with rollback capability
  • Delivery confirmation from agents

Configuration Scope:

enum ConfigScope {
    Global,           // All agents
    Workspace,        // All agents in workspace
    AgentGroup(String), // Agents with specific label selector
    Agent(Uuid),      // Single agent
}

Delivery Guarantees:

  • At-least-once delivery
  • Automatic retry with exponential backoff
  • Config checksum verification
  • Offline agents receive updates on reconnection

Master Features

MF-001: RESTful API

Description: Comprehensive REST API for all operations.

Base URL: /api/v1

Resource Endpoints:

Resource Endpoints
Organizations GET, POST, PATCH, DELETE /organizations
Workspaces GET, POST, PATCH, DELETE /workspaces
Agents GET, POST, PATCH, DELETE /agents
VirtualHosts GET, POST, PATCH, DELETE /virtual-hosts
Upstreams GET, POST, PATCH, DELETE /upstreams
Certificates GET, POST, DELETE /certificates
AccessLogs GET /access-logs
Metrics GET /metrics

Response Format:

{
  "data": { ... },
  "meta": {
    "page": 1,
    "per_page": 20,
    "total": 100
  },
  "links": {
    "self": "/api/v1/agents?page=1",
    "next": "/api/v1/agents?page=2",
    "prev": null
  }
}

Error Format:

{
  "error": {
    "code": "VALIDATION_ERROR",
    "message": "Invalid configuration",
    "details": [
      {"field": "server_name", "message": "Invalid domain format"}
    ]
  }
}

MF-002: Web-based Admin Console (Embedded)

Description: Modern web UI for managing the entire system. Built with React + Vite and served as static files embedded directly in the master binary.

Pages:

Page Features
Dashboard Agent status, recent events, traffic overview
Agents List, detail view, logs, metrics graphs
Configurations Virtual host editor, upstream management
Certificates SSL certificate list, expiration alerts
Access Control Users, roles, permissions management
Settings Organization settings, integrations

Key UI Features:

  • Real-time updates via WebSocket
  • Monaco editor for nginx configuration
  • Visual topology view (agent connections)
  • Dark/light mode support
  • Responsive design

MF-003: Configuration Template Engine

Description: Templating system for generating nginx configurations.

Template Variables:

# Example virtual host template
server {
    listen {{port}} {{#if ssl}}ssl{{/if}} {{#if http2}}http2{{/if}};
    server_name {{server_name}};
    
    {{#if ssl}}
    ssl_certificate {{ssl_certificate_path}};
    ssl_certificate_key {{ssl_certificate_key_path}};
    {{/if}}
    
    location {{location_path}} {
        proxy_pass http://{{upstream_name}};
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        
        {{#each custom_headers}}
        add_header {{name}} "{{value}}";
        {{/each}}
        
        {{#if rate_limiting}}
        limit_req zone={{rate_limit_zone}} burst={{rate_limit_burst}};
        {{/if}}
    }
}

Built-in Templates:

  • default - Standard reverse proxy
  • spa - Single Page Application (with fallback to index.html)
  • api - API gateway with rate limiting
  • static - Static file serving with caching
  • websocket - WebSocket proxy with connection upgrades

MF-004: Certificate Management (ACME)

Description: Automatic SSL/TLS certificate provisioning via Let's Encrypt.

Features:

  • ACME v2 protocol support
  • HTTP-01 and DNS-01 challenges
  • Automatic renewal (30 days before expiry)
  • Wildcard certificate support (DNS-01)
  • Certificate monitoring and alerts

Certificate Entity:

struct Certificate {
    id: Uuid,
    workspace_id: Uuid,
    domain: String,
    is_wildcard: bool,
    provider: CertificateProvider,  // LetsEncrypt, Custom
    status: CertificateStatus,      // Pending, Active, Expired, Error
    issued_at: DateTime,
    expires_at: DateTime,
    auto_renew: bool,
    certificate_pem: Option<String>,  // Encrypted at rest
    private_key_pem: Option<String>,  // Encrypted at rest
}

Agent Features

AF-001: Nginx Lifecycle Management

Description: Agent manages nginx process lifecycle based on deployment mode.

Docker Sidecar Mode:

  • Shares PID namespace with nginx container (via pid: service:nginx)
  • Directly signals nginx process for reload/restart
  • Monitors nginx via health checks

Standalone Mode:

  • Direct process management (signals to PID from file)
  • systemd integration (optional, for service management)
  • PID file monitoring

Lifecycle Actions:

  • start - Start nginx
  • stop - Graceful shutdown
  • reload - Hot reload configuration
  • restart - Full restart
  • test - Validate configuration

AF-002: Configuration Rendering and Application

Description: Agent renders nginx configs from master templates and applies them using atomic symlink swaps for zero-downtime updates.

Config Directory Structure:

/etc/nginx/
├── nginx.conf              # Contains: include /etc/nginx/conf.d/current/*.conf
├── conf.d/
│   ├── current -> ./20260302143000/    # Symlink to active deployment
│   ├── 20260302143000/                 # Active config (timestamped)
│   │   ├── default.conf
│   │   └── upstream.conf
│   ├── 20260302141500/                 # Previous deployment (for rollback)
│   │   ├── default.conf
│   │   └── upstream.conf
│   └── 20260302140000/                 # Older deployment (cleanup candidate)

Config Rendering Flow:

  1. Receive ConfigUpdate from master
  2. Create new deployment folder: ./conf.d/<timestamp>/
  3. Render nginx config files into timestamped folder
  4. Validate new config: nginx -t -c /etc/nginx/conf.d/<timestamp>/nginx.conf
  5. If validation passes, atomically update symlink: current<timestamp>/
  6. Execute graceful nginx reload
  7. Verify reload success (health check)
  8. Report status to master
  9. Cleanup old deployments (keep N recent versions)

Atomic Config Swap:

async fn apply_config(&self, config: ConfigUpdate) -> Result<()> {
    let timestamp = generate_timestamp();
    let deploy_dir = self.conf_d_path.join(&timestamp);
    let symlink_path = self.conf_d_path.join("current");
    
    // 1. Render config to new timestamped directory
    self.render_config(&config, &deploy_dir).await?;
    
    // 2. Validate BEFORE switching symlink (point to new folder directly)
    self.validate_config(&deploy_dir).await?;
    
    // 3. Atomic symlink swap (Unix: symlink + rename)
    let temp_link = self.conf_d_path.join("current.tmp");
    tokio::fs::symlink(&deploy_dir, &temp_link).await?;
    tokio::fs::rename(&temp_link, &symlink_path).await?;  // Atomic operation
    
    // 4. Reload nginx (picks up new symlink target)
    self.reload_nginx().await?;
    
    // 5. Verify and cleanup
    self.verify_health().await?;
    self.cleanup_old_deployments(5).await?;  // Keep last 5 versions
    
    self.report_success(config.id, timestamp).await;
}

Rollback Strategy:

async fn rollback(&self, target_timestamp: &str) -> Result<()> {
    let target_dir = self.conf_d_path.join(target_timestamp);
    let symlink_path = self.conf_d_path.join("current");
    
    // Verify target exists
    if !target_dir.exists() {
        return Err(Error::RollbackTargetNotFound);
    }
    
    // Atomic symlink swap back to previous deployment
    let temp_link = self.conf_d_path.join("current.tmp");
    tokio::fs::symlink(&target_dir, &temp_link).await?;
    tokio::fs::rename(&temp_link, &symlink_path).await?;
    
    // Reload nginx
    self.reload_nginx().await?;
}

AF-003: Health Monitoring and Reporting

Description: Continuous health monitoring of nginx and the host system.

Health Checks:

  • Nginx Health: HTTP request to nginx health endpoint
  • Configuration Health: Verify current config matches expected
  • Resource Health: CPU, memory, disk usage
  • Connection Health: Active connections, request rate

Health Report Structure:

struct HealthReport {
    agent_id: Uuid,
    timestamp: DateTime,
    nginx_status: NginxStatus,
    system_metrics: SystemMetrics,
    config_checksum: String,
    alerts: Vec<Alert>,
}

struct NginxStatus {
    is_running: bool,
    pid: Option<u32>,
    uptime_seconds: u64,
    active_connections: u32,
    requests_per_second: f64,
}

struct SystemMetrics {
    cpu_percent: f64,
    memory_used_mb: u64,
    memory_total_mb: u64,
    disk_used_gb: u64,
    disk_total_gb: u64,
}

Reporting Interval: Configurable (default: 30 seconds)


AF-004: Metrics Collection and Export

Description: Collect and expose metrics in Prometheus format.

Metrics Endpoint: GET /metrics (on agent)

Built-in Metrics:

# Nginx metrics (parsed from stub_status)
nxmesh_nginx_connections_active{agent_id="..."} 42
nxmesh_nginx_connections_reading{agent_id="..."} 5
nxmesh_nginx_connections_writing{agent_id="..."} 30
nxmesh_nginx_connections_waiting{agent_id="..."} 7
nxmesh_nginx_requests_total{agent_id="..."} 1234567

# Agent metrics
nxmesh_agent_uptime_seconds{agent_id="..."} 86400
nxmesh_agent_master_connection_status{agent_id="..."} 1
nxmesh_agent_config_version{agent_id="...",version="123"} 1

# System metrics
nxmesh_system_cpu_percent{agent_id="..."} 25.5
nxmesh_system_memory_used_bytes{agent_id="..."} 1073741824
nxmesh_system_disk_used_bytes{agent_id="..."} 53687091200

Custom Metrics: Agents can collect custom metrics from nginx access logs


AF-005: Offline Operation and Recovery

Description: Agent can operate independently when master is unreachable.

Offline Capabilities:

  • Continue serving traffic with cached configuration
  • Local health monitoring continues
  • Metrics are buffered for later transmission
  • Automatic reconnection attempts

Recovery Flow:

  1. Detect disconnection from master
  2. Enter "offline mode"
  3. Continue operating with cached config
  4. Buffer metrics and logs
  5. Attempt reconnection with exponential backoff
  6. On reconnection:
    • Sync configuration (compare checksums)
    • Transmit buffered metrics
    • Resume normal operation

Configuration Management

CM-001: Virtual Host Configuration

Description: Define nginx server blocks (virtual hosts) via API/UI.

VirtualHost Entity:

struct VirtualHost {
    id: Uuid,
    workspace_id: Uuid,
    name: String,              // Human-readable name
    server_name: String,       // Domain name(s), comma-separated
    listen_port: u16,          // Usually 80 or 443
    ssl_enabled: bool,
    ssl_certificate_id: Option<Uuid>,
    
    // Routing configuration
    locations: Vec<Location>,
    
    // Advanced settings
    http2_enabled: bool,
    http3_enabled: bool,
    gzip_enabled: bool,
    rate_limiting: Option<RateLimitConfig>,
    
    // Target agents
    target_agents: AgentSelector,
}

struct Location {
    path: String,              // e.g., "/api" or "~ \.php$"
    proxy_pass: Option<String>, // e.g., "http://backend"
    upstream_id: Option<Uuid>,
    root: Option<String>,      // For static files
    index: Option<String>,     // e.g., "index.html"
    custom_headers: Vec<Header>,
    rewrite_rules: Vec<RewriteRule>,
}

Validation Rules:

  • server_name must be valid domain(s)
  • listen_port must be 1-65535
  • SSL certificate must exist if ssl_enabled is true
  • At least one location must be defined

CM-002: Upstream Configuration

Description: Define backend server pools for load balancing.

Upstream Entity:

struct Upstream {
    id: Uuid,
    workspace_id: Uuid,
    name: String,              // Used as upstream identifier
    
    // Load balancing algorithm
    algorithm: LoadBalanceAlgorithm,  // RoundRobin, LeastConn, IPHash, etc.
    
    // Backend servers
    servers: Vec<UpstreamServer>,
    
    // Health check configuration
    health_check: Option<HealthCheckConfig>,
    
    // Connection settings
    keepalive_connections: Option<u32>,
    keepalive_timeout: Option<u32>,
}

struct UpstreamServer {
    address: String,           // IP:port or hostname:port
    weight: u32,               // Default: 1
    backup: bool,              // Backup server
    down: bool,                // Temporarily down
    max_fails: u32,            // Default: 1
    fail_timeout: u32,         // Seconds, default: 10
}

enum LoadBalanceAlgorithm {
    RoundRobin,
    LeastConnections,
    IPHash,
    WeightedRoundRobin,
}

CM-003: Configuration Versioning

Description: Track all configuration changes with full history.

Versioning Features:

  • Every change creates a new version
  • Versions are immutable
  • Rollback to any previous version
  • Diff between versions
  • Audit log of who changed what

Version Entity:

struct ConfigVersion {
    id: Uuid,
    resource_type: String,     // "virtual_host", "upstream", etc.
    resource_id: Uuid,
    version_number: u64,       // Auto-incrementing
    data: Json,                // Full configuration snapshot
    checksum: String,          // SHA-256 of data
    created_by: Uuid,          // User ID
    created_at: DateTime,
    change_summary: String,    // Human-readable description
}

API Endpoints:

  • GET /api/v1/virtual-hosts/{id}/versions - List versions
  • GET /api/v1/virtual-hosts/{id}/versions/{version} - Get specific version
  • POST /api/v1/virtual-hosts/{id}/rollback - Rollback to version
  • GET /api/v1/virtual-hosts/{id}/diff?from=v1&to=v2 - Compare versions

Observability

OB-001: Structured Logging

Description: Comprehensive logging with structured format.

Log Levels: ERROR, WARN, INFO, DEBUG, TRACE

Log Fields:

{
  "timestamp": "2026-03-02T10:30:00Z",
  "level": "INFO",
  "component": "agent",
  "agent_id": "550e8400-e29b-41d4-a716-446655440000",
  "trace_id": "abc123",
  "span_id": "def456",
  "message": "Configuration applied successfully",
  "fields": {
    "config_id": "config-123",
    "version": 42,
    "duration_ms": 150
  }
}

Log Targets:

  • Master: systemd journal, file, or centralized (ELK/Loki)
  • Agent: stdout (Docker), file (standalone), or remote

OB-002: Distributed Tracing

Description: OpenTelemetry tracing for request flow visualization.

Traced Operations:

  • Configuration push (master → agent → nginx)
  • Health check cycles
  • Certificate issuance
  • API requests

Span Attributes:

  • nxmesh.agent_id
  • nxmesh.config_id
  • nxmesh.workspace_id
  • nxmesh.organization_id

OB-003: Access Log Aggregation

Description: Collect and query nginx access logs from all agents.

Features:

  • Centralized access log storage
  • Real-time log streaming
  • SQL-like query interface
  • Log retention policies

Access Log Schema:

struct AccessLogEntry {
    id: Uuid,
    agent_id: Uuid,
    timestamp: DateTime,
    
    // Request details
    remote_addr: String,
    method: String,
    uri: String,
    protocol: String,
    host: String,
    
    // Response details
    status: u16,
    body_bytes_sent: u64,
    response_time_ms: f64,
    
    // Additional fields
    user_agent: Option<String>,
    referer: Option<String>,
    request_id: Option<String>,
}

Query API:

# Example query
query {
  accessLogs(
    filter: {
      agentId: "...",
      timeRange: { from: "2026-03-01", to: "2026-03-02" },
      statusCode: { gte: 500 }
    },
    limit: 100
  ) {
    timestamp
    method
    uri
    status
    responseTimeMs
  }
}

Security Features

SF-001: Authentication and Authorization

Description: Multi-method authentication with fine-grained RBAC.

Authentication Methods:

  • JWT (for API/Web UI)
  • Password-based login (local user accounts)
  • OAuth2/OIDC (Google, GitHub, enterprise SSO)
  • API Keys (for service accounts)
  • TLS + Shared Secret (for agent communication)
    • Server-side TLS (auto-generated self-signed or custom certificates)
    • Bootstrap token for initial registration
    • Session key with HMAC signing for ongoing requests
    • Primary/secondary key rotation

RBAC Model:

struct Role {
    id: Uuid,
    name: String,
    permissions: Vec<Permission>,
}

enum Permission {
    // Organization scope
    OrganizationRead,
    OrganizationWrite,
    OrganizationDelete,
    
    // Workspace scope
    WorkspaceRead,
    WorkspaceWrite,
    WorkspaceDelete,
    
    // Agent scope
    AgentRead,
    AgentWrite,
    AgentReload,
    AgentDelete,
    
    // Config scope
    ConfigRead,
    ConfigWrite,
    ConfigDeploy,
    ConfigDelete,
    
    // Certificate scope
    CertificateRead,
    CertificateWrite,
    CertificateDelete,
    
    // User management
    UserRead,
    UserWrite,
    UserDelete,
}

SF-002: Secret Management

Description: Secure storage and distribution of sensitive data.

Secrets:

  • SSL private keys
  • API tokens
  • Database passwords
  • External service credentials

Security Measures:

  • Encryption at rest (AES-256-GCM)
  • Encryption in transit (TLS 1.3)
  • Automatic secret rotation
  • Audit logging for secret access

SF-003: Network Security

Description: Network-level security controls.

Features:

  • IP allowlisting for agent connections
  • Rate limiting on API endpoints
  • DDoS protection recommendations
  • Security headers enforcement (HSTS, CSP, etc.)

Agent Connection Security:

  • TLS Encryption: Server-side TLS (auto-generated or custom certificates)
    • Development: Self-signed certificates auto-generated on first start
    • Production: Valid certificates (Let's Encrypt or corporate CA)
  • Bootstrap Authentication: One-time token for initial registration
  • Session Authentication: HMAC-signed requests with shared session key
  • Key Rotation: Primary/secondary key design for seamless rotation
  • Certificate Pinning: Optional fingerprint verification for additional security