Files

GW_MC 43b2e44d95 Add project structure and roadmap documentation

- Created `project-structure.md` to outline the directory layout, crate dependencies, design principles, module guidelines, and naming conventions for the NxMesh codebase.
- Introduced `roadmap.md` detailing the development phases, milestones, tasks, deliverables, and resource requirements for the NxMesh project, spanning from foundational setup to enterprise features.

2026-03-03 04:13:31 +00:00

21 KiB

Raw Blame History

NxMesh Feature Specification

Core Features
Master Features
Agent Features
Configuration Management
Observability
Security Features

Core Features

CF-001: Multi-tenancy with Organizations and Workspaces

Description: Support for multiple organizations with isolated workspaces within each organization.

Requirements:

Organizations are top-level resource containers
Each organization can have multiple workspaces
Resources (agents, configs, certificates) are scoped to a workspace
Cross-workspace visibility is configurable

Data Model:

struct Organization {
    id: Uuid,
    name: String,
    slug: String,  // URL-friendly identifier
    created_at: DateTime,
    settings: OrganizationSettings,
}

struct Workspace {
    id: Uuid,
    organization_id: Uuid,
    name: String,
    slug: String,
    created_at: DateTime,
}

API Endpoints:

GET /api/v1/organizations - List organizations
POST /api/v1/organizations - Create organization
GET /api/v1/organizations/{id}/workspaces - List workspaces
POST /api/v1/organizations/{id}/workspaces - Create workspace

CF-002: Agent Registration and Lifecycle Management

Description: Agents must register with the master before receiving configurations.

Registration Flow:

Administrator generates bootstrap token in Master UI
Token is provided to agent via environment variable or config file
Agent establishes TLS connection to master (verifies server certificate)
Agent sends bootstrap token for registration
Master validates token and establishes shared secret:
- Master generates session_key (per-agent) + key_id
- Session key used for HMAC request signing
- Primary/secondary key design for rotation

Agent States:

enum AgentState {
    Pending,      // Registered but never connected
    Online,       // Connected and healthy
    Offline,      // Disconnected
    Degraded,     // Connected but health checks failing
    Maintenance,  // Manually placed in maintenance mode
}

Agent Metadata:

struct Agent {
    id: Uuid,
    workspace_id: Uuid,
    name: String,
    hostname: String,
    ip_address: String,
    version: String,
    state: AgentState,
    deployment_mode: DeploymentMode,  // DockerSidecar, K8sSidecar, Standalone
    last_seen_at: DateTime,
    capabilities: Vec<String>,  // e.g., ["http3", "websocket", "rate_limiting"]
    labels: HashMap<String, String>,  // e.g., {"env": "prod", "region": "us-east"}
}

API Endpoints:

POST /api/v1/agents/register - Register new agent
GET /api/v1/agents - List agents
GET /api/v1/agents/{id} - Get agent details
POST /api/v1/agents/{id}/tokens - Generate registration token
DELETE /api/v1/agents/{id} - Deregister agent

CF-003: Real-time Configuration Distribution

Description: Push configuration changes to agents in real-time with delivery guarantees.

Requirements:

Config changes propagate to all affected agents within 5 seconds
Support for targeted updates (specific agents or groups)
Config versioning with rollback capability
Delivery confirmation from agents

Configuration Scope:

enum ConfigScope {
    Global,           // All agents
    Workspace,        // All agents in workspace
    AgentGroup(String), // Agents with specific label selector
    Agent(Uuid),      // Single agent
}

Delivery Guarantees:

At-least-once delivery
Automatic retry with exponential backoff
Config checksum verification
Offline agents receive updates on reconnection

Master Features

MF-001: RESTful API

Description: Comprehensive REST API for all operations.

Base URL: /api/v1

Resource Endpoints:

Resource	Endpoints
Organizations	GET, POST, PATCH, DELETE `/organizations`
Workspaces	GET, POST, PATCH, DELETE `/workspaces`
Agents	GET, POST, PATCH, DELETE `/agents`
VirtualHosts	GET, POST, PATCH, DELETE `/virtual-hosts`
Upstreams	GET, POST, PATCH, DELETE `/upstreams`
Certificates	GET, POST, DELETE `/certificates`
AccessLogs	GET `/access-logs`
Metrics	GET `/metrics`

Response Format:

{
  "data": { ... },
  "meta": {
    "page": 1,
    "per_page": 20,
    "total": 100
  },
  "links": {
    "self": "/api/v1/agents?page=1",
    "next": "/api/v1/agents?page=2",
    "prev": null
  }
}

Error Format:

{
  "error": {
    "code": "VALIDATION_ERROR",
    "message": "Invalid configuration",
    "details": [
      {"field": "server_name", "message": "Invalid domain format"}
    ]
  }
}

MF-002: Web-based Admin Console (Embedded)

Description: Modern web UI for managing the entire system. Built with React + Vite and served as static files embedded directly in the master binary.

Pages:

Page	Features
Dashboard	Agent status, recent events, traffic overview
Agents	List, detail view, logs, metrics graphs
Configurations	Virtual host editor, upstream management
Certificates	SSL certificate list, expiration alerts
Access Control	Users, roles, permissions management
Settings	Organization settings, integrations

Key UI Features:

Real-time updates via WebSocket
Monaco editor for nginx configuration
Visual topology view (agent connections)
Dark/light mode support
Responsive design

MF-003: Configuration Template Engine

Description: Templating system for generating nginx configurations.

Template Variables:

# Example virtual host template
server {
    listen {{port}} {{#if ssl}}ssl{{/if}} {{#if http2}}http2{{/if}};
    server_name {{server_name}};
    
    {{#if ssl}}
    ssl_certificate {{ssl_certificate_path}};
    ssl_certificate_key {{ssl_certificate_key_path}};
    {{/if}}
    
    location {{location_path}} {
        proxy_pass http://{{upstream_name}};
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        
        {{#each custom_headers}}
        add_header {{name}} "{{value}}";
        {{/each}}
        
        {{#if rate_limiting}}
        limit_req zone={{rate_limit_zone}} burst={{rate_limit_burst}};
        {{/if}}
    }
}

Built-in Templates:

default - Standard reverse proxy
spa - Single Page Application (with fallback to index.html)
api - API gateway with rate limiting
static - Static file serving with caching
websocket - WebSocket proxy with connection upgrades

MF-004: Certificate Management (ACME)

Description: Automatic SSL/TLS certificate provisioning via Let's Encrypt.

Features:

ACME v2 protocol support
HTTP-01 and DNS-01 challenges
Automatic renewal (30 days before expiry)
Wildcard certificate support (DNS-01)
Certificate monitoring and alerts

Certificate Entity:

struct Certificate {
    id: Uuid,
    workspace_id: Uuid,
    domain: String,
    is_wildcard: bool,
    provider: CertificateProvider,  // LetsEncrypt, Custom
    status: CertificateStatus,      // Pending, Active, Expired, Error
    issued_at: DateTime,
    expires_at: DateTime,
    auto_renew: bool,
    certificate_pem: Option<String>,  // Encrypted at rest
    private_key_pem: Option<String>,  // Encrypted at rest
}

Agent Features

AF-001: Nginx Lifecycle Management

Description: Agent manages nginx process lifecycle based on deployment mode.

Docker Sidecar Mode:

Shares PID namespace with nginx container (via pid: service:nginx)
Directly signals nginx process for reload/restart
Monitors nginx via health checks

Standalone Mode:

Direct process management (signals to PID from file)
systemd integration (optional, for service management)
PID file monitoring

Lifecycle Actions:

start - Start nginx
stop - Graceful shutdown
reload - Hot reload configuration
restart - Full restart
test - Validate configuration

AF-002: Configuration Rendering and Application

Description: Agent renders nginx configs from master templates and applies them using atomic symlink swaps for zero-downtime updates.

Config Directory Structure:

/etc/nginx/
├── nginx.conf              # Contains: include /etc/nginx/conf.d/current/*.conf
├── conf.d/
│   ├── current -> ./20260302143000/    # Symlink to active deployment
│   ├── 20260302143000/                 # Active config (timestamped)
│   │   ├── default.conf
│   │   └── upstream.conf
│   ├── 20260302141500/                 # Previous deployment (for rollback)
│   │   ├── default.conf
│   │   └── upstream.conf
│   └── 20260302140000/                 # Older deployment (cleanup candidate)

Config Rendering Flow:

Receive ConfigUpdate from master
Create new deployment folder: ./conf.d/<timestamp>/
Render nginx config files into timestamped folder
Validate new config: nginx -t -c /etc/nginx/conf.d/<timestamp>/nginx.conf
If validation passes, atomically update symlink: current → <timestamp>/
Execute graceful nginx reload
Verify reload success (health check)
Report status to master
Cleanup old deployments (keep N recent versions)

Atomic Config Swap:

async fn apply_config(&self, config: ConfigUpdate) -> Result<()> {
    let timestamp = generate_timestamp();
    let deploy_dir = self.conf_d_path.join(&timestamp);
    let symlink_path = self.conf_d_path.join("current");
    
    // 1. Render config to new timestamped directory
    self.render_config(&config, &deploy_dir).await?;
    
    // 2. Validate BEFORE switching symlink (point to new folder directly)
    self.validate_config(&deploy_dir).await?;
    
    // 3. Atomic symlink swap (Unix: symlink + rename)
    let temp_link = self.conf_d_path.join("current.tmp");
    tokio::fs::symlink(&deploy_dir, &temp_link).await?;
    tokio::fs::rename(&temp_link, &symlink_path).await?;  // Atomic operation
    
    // 4. Reload nginx (picks up new symlink target)
    self.reload_nginx().await?;
    
    // 5. Verify and cleanup
    self.verify_health().await?;
    self.cleanup_old_deployments(5).await?;  // Keep last 5 versions
    
    self.report_success(config.id, timestamp).await;
}

Rollback Strategy:

async fn rollback(&self, target_timestamp: &str) -> Result<()> {
    let target_dir = self.conf_d_path.join(target_timestamp);
    let symlink_path = self.conf_d_path.join("current");
    
    // Verify target exists
    if !target_dir.exists() {
        return Err(Error::RollbackTargetNotFound);
    }
    
    // Atomic symlink swap back to previous deployment
    let temp_link = self.conf_d_path.join("current.tmp");
    tokio::fs::symlink(&target_dir, &temp_link).await?;
    tokio::fs::rename(&temp_link, &symlink_path).await?;
    
    // Reload nginx
    self.reload_nginx().await?;
}

AF-003: Health Monitoring and Reporting

Description: Continuous health monitoring of nginx and the host system.

Health Checks:

Nginx Health: HTTP request to nginx health endpoint
Configuration Health: Verify current config matches expected
Resource Health: CPU, memory, disk usage
Connection Health: Active connections, request rate

Health Report Structure:

struct HealthReport {
    agent_id: Uuid,
    timestamp: DateTime,
    nginx_status: NginxStatus,
    system_metrics: SystemMetrics,
    config_checksum: String,
    alerts: Vec<Alert>,
}

struct NginxStatus {
    is_running: bool,
    pid: Option<u32>,
    uptime_seconds: u64,
    active_connections: u32,
    requests_per_second: f64,
}

struct SystemMetrics {
    cpu_percent: f64,
    memory_used_mb: u64,
    memory_total_mb: u64,
    disk_used_gb: u64,
    disk_total_gb: u64,
}

Reporting Interval: Configurable (default: 30 seconds)

AF-004: Metrics Collection and Export

Description: Collect and expose metrics in Prometheus format.

Metrics Endpoint: GET /metrics (on agent)

Built-in Metrics:

# Nginx metrics (parsed from stub_status)
nxmesh_nginx_connections_active{agent_id="..."} 42
nxmesh_nginx_connections_reading{agent_id="..."} 5
nxmesh_nginx_connections_writing{agent_id="..."} 30
nxmesh_nginx_connections_waiting{agent_id="..."} 7
nxmesh_nginx_requests_total{agent_id="..."} 1234567

# Agent metrics
nxmesh_agent_uptime_seconds{agent_id="..."} 86400
nxmesh_agent_master_connection_status{agent_id="..."} 1
nxmesh_agent_config_version{agent_id="...",version="123"} 1

# System metrics
nxmesh_system_cpu_percent{agent_id="..."} 25.5
nxmesh_system_memory_used_bytes{agent_id="..."} 1073741824
nxmesh_system_disk_used_bytes{agent_id="..."} 53687091200

Custom Metrics: Agents can collect custom metrics from nginx access logs

AF-005: Offline Operation and Recovery

Description: Agent can operate independently when master is unreachable.

Offline Capabilities:

Continue serving traffic with cached configuration
Local health monitoring continues
Metrics are buffered for later transmission
Automatic reconnection attempts

Recovery Flow:

Detect disconnection from master
Enter "offline mode"
Continue operating with cached config
Buffer metrics and logs
Attempt reconnection with exponential backoff
On reconnection:
- Sync configuration (compare checksums)
- Transmit buffered metrics
- Resume normal operation

Configuration Management

CM-001: Virtual Host Configuration

Description: Define nginx server blocks (virtual hosts) via API/UI.

VirtualHost Entity:

struct VirtualHost {
    id: Uuid,
    workspace_id: Uuid,
    name: String,              // Human-readable name
    server_name: String,       // Domain name(s), comma-separated
    listen_port: u16,          // Usually 80 or 443
    ssl_enabled: bool,
    ssl_certificate_id: Option<Uuid>,
    
    // Routing configuration
    locations: Vec<Location>,
    
    // Advanced settings
    http2_enabled: bool,
    http3_enabled: bool,
    gzip_enabled: bool,
    rate_limiting: Option<RateLimitConfig>,
    
    // Target agents
    target_agents: AgentSelector,
}

struct Location {
    path: String,              // e.g., "/api" or "~ \.php$"
    proxy_pass: Option<String>, // e.g., "http://backend"
    upstream_id: Option<Uuid>,
    root: Option<String>,      // For static files
    index: Option<String>,     // e.g., "index.html"
    custom_headers: Vec<Header>,
    rewrite_rules: Vec<RewriteRule>,
}

Validation Rules:

server_name must be valid domain(s)
listen_port must be 1-65535
SSL certificate must exist if ssl_enabled is true
At least one location must be defined

CM-002: Upstream Configuration

Description: Define backend server pools for load balancing.

Upstream Entity:

struct Upstream {
    id: Uuid,
    workspace_id: Uuid,
    name: String,              // Used as upstream identifier
    
    // Load balancing algorithm
    algorithm: LoadBalanceAlgorithm,  // RoundRobin, LeastConn, IPHash, etc.
    
    // Backend servers
    servers: Vec<UpstreamServer>,
    
    // Health check configuration
    health_check: Option<HealthCheckConfig>,
    
    // Connection settings
    keepalive_connections: Option<u32>,
    keepalive_timeout: Option<u32>,
}

struct UpstreamServer {
    address: String,           // IP:port or hostname:port
    weight: u32,               // Default: 1
    backup: bool,              // Backup server
    down: bool,                // Temporarily down
    max_fails: u32,            // Default: 1
    fail_timeout: u32,         // Seconds, default: 10
}

enum LoadBalanceAlgorithm {
    RoundRobin,
    LeastConnections,
    IPHash,
    WeightedRoundRobin,
}

CM-003: Configuration Versioning

Description: Track all configuration changes with full history.

Versioning Features:

Every change creates a new version
Versions are immutable
Rollback to any previous version
Diff between versions
Audit log of who changed what

Version Entity:

struct ConfigVersion {
    id: Uuid,
    resource_type: String,     // "virtual_host", "upstream", etc.
    resource_id: Uuid,
    version_number: u64,       // Auto-incrementing
    data: Json,                // Full configuration snapshot
    checksum: String,          // SHA-256 of data
    created_by: Uuid,          // User ID
    created_at: DateTime,
    change_summary: String,    // Human-readable description
}

API Endpoints:

GET /api/v1/virtual-hosts/{id}/versions - List versions
GET /api/v1/virtual-hosts/{id}/versions/{version} - Get specific version
POST /api/v1/virtual-hosts/{id}/rollback - Rollback to version
GET /api/v1/virtual-hosts/{id}/diff?from=v1&to=v2 - Compare versions

Observability

OB-001: Structured Logging

Description: Comprehensive logging with structured format.

Log Levels: ERROR, WARN, INFO, DEBUG, TRACE

Log Fields:

{
  "timestamp": "2026-03-02T10:30:00Z",
  "level": "INFO",
  "component": "agent",
  "agent_id": "550e8400-e29b-41d4-a716-446655440000",
  "trace_id": "abc123",
  "span_id": "def456",
  "message": "Configuration applied successfully",
  "fields": {
    "config_id": "config-123",
    "version": 42,
    "duration_ms": 150
  }
}

Log Targets:

Master: systemd journal, file, or centralized (ELK/Loki)
Agent: stdout (Docker), file (standalone), or remote

OB-002: Distributed Tracing

Description: OpenTelemetry tracing for request flow visualization.

Traced Operations:

Configuration push (master → agent → nginx)
Health check cycles
Certificate issuance
API requests

Span Attributes:

nxmesh.agent_id
nxmesh.config_id
nxmesh.workspace_id
nxmesh.organization_id

OB-003: Access Log Aggregation

Description: Collect and query nginx access logs from all agents.

Features:

Centralized access log storage
Real-time log streaming
SQL-like query interface
Log retention policies

Access Log Schema:

struct AccessLogEntry {
    id: Uuid,
    agent_id: Uuid,
    timestamp: DateTime,
    
    // Request details
    remote_addr: String,
    method: String,
    uri: String,
    protocol: String,
    host: String,
    
    // Response details
    status: u16,
    body_bytes_sent: u64,
    response_time_ms: f64,
    
    // Additional fields
    user_agent: Option<String>,
    referer: Option<String>,
    request_id: Option<String>,
}

Query API:

# Example query
query {
  accessLogs(
    filter: {
      agentId: "...",
      timeRange: { from: "2026-03-01", to: "2026-03-02" },
      statusCode: { gte: 500 }
    },
    limit: 100
  ) {
    timestamp
    method
    uri
    status
    responseTimeMs
  }
}

Security Features

SF-001: Authentication and Authorization

Description: Multi-method authentication with fine-grained RBAC.

Authentication Methods:

JWT (for API/Web UI)
Password-based login (local user accounts)
OAuth2/OIDC (Google, GitHub, enterprise SSO)
API Keys (for service accounts)
TLS + Shared Secret (for agent communication)
- Server-side TLS (auto-generated self-signed or custom certificates)
- Bootstrap token for initial registration
- Session key with HMAC signing for ongoing requests
- Primary/secondary key rotation

RBAC Model:

struct Role {
    id: Uuid,
    name: String,
    permissions: Vec<Permission>,
}

enum Permission {
    // Organization scope
    OrganizationRead,
    OrganizationWrite,
    OrganizationDelete,
    
    // Workspace scope
    WorkspaceRead,
    WorkspaceWrite,
    WorkspaceDelete,
    
    // Agent scope
    AgentRead,
    AgentWrite,
    AgentReload,
    AgentDelete,
    
    // Config scope
    ConfigRead,
    ConfigWrite,
    ConfigDeploy,
    ConfigDelete,
    
    // Certificate scope
    CertificateRead,
    CertificateWrite,
    CertificateDelete,
    
    // User management
    UserRead,
    UserWrite,
    UserDelete,
}

SF-002: Secret Management

Description: Secure storage and distribution of sensitive data.

Secrets:

SSL private keys
API tokens
Database passwords
External service credentials

Security Measures:

Encryption at rest (AES-256-GCM)
Encryption in transit (TLS 1.3)
Automatic secret rotation
Audit logging for secret access

SF-003: Network Security

Description: Network-level security controls.

Features:

IP allowlisting for agent connections
Rate limiting on API endpoints
DDoS protection recommendations
Security headers enforcement (HSTS, CSP, etc.)

Agent Connection Security:

TLS Encryption: Server-side TLS (auto-generated or custom certificates)
- Development: Self-signed certificates auto-generated on first start
- Production: Valid certificates (Let's Encrypt or corporate CA)
Bootstrap Authentication: One-time token for initial registration
Session Authentication: HMAC-signed requests with shared session key
Key Rotation: Primary/secondary key design for seamless rotation
Certificate Pinning: Optional fingerprint verification for additional security

21 KiB Raw Blame History

NxMesh Feature Specification

Table of Contents

Core Features

CF-001: Multi-tenancy with Organizations and Workspaces

CF-002: Agent Registration and Lifecycle Management

CF-003: Real-time Configuration Distribution

Master Features

MF-001: RESTful API

MF-002: Web-based Admin Console (Embedded)

MF-003: Configuration Template Engine

MF-004: Certificate Management (ACME)

Agent Features

AF-001: Nginx Lifecycle Management

AF-002: Configuration Rendering and Application

AF-003: Health Monitoring and Reporting

AF-004: Metrics Collection and Export

AF-005: Offline Operation and Recovery

Configuration Management

CM-001: Virtual Host Configuration

CM-002: Upstream Configuration

CM-003: Configuration Versioning

Observability

OB-001: Structured Logging

OB-002: Distributed Tracing

OB-003: Access Log Aggregation

Security Features

SF-001: Authentication and Authorization

SF-002: Secret Management

SF-003: Network Security

21 KiB

Raw Blame History