- Created `project-structure.md` to outline the directory layout, crate dependencies, design principles, module guidelines, and naming conventions for the NxMesh codebase. - Introduced `roadmap.md` detailing the development phases, milestones, tasks, deliverables, and resource requirements for the NxMesh project, spanning from foundational setup to enterprise features.
21 KiB
NxMesh Feature Specification
Table of Contents
- Core Features
- Master Features
- Agent Features
- Configuration Management
- Observability
- Security Features
Core Features
CF-001: Multi-tenancy with Organizations and Workspaces
Description: Support for multiple organizations with isolated workspaces within each organization.
Requirements:
- Organizations are top-level resource containers
- Each organization can have multiple workspaces
- Resources (agents, configs, certificates) are scoped to a workspace
- Cross-workspace visibility is configurable
Data Model:
struct Organization {
id: Uuid,
name: String,
slug: String, // URL-friendly identifier
created_at: DateTime,
settings: OrganizationSettings,
}
struct Workspace {
id: Uuid,
organization_id: Uuid,
name: String,
slug: String,
created_at: DateTime,
}
API Endpoints:
GET /api/v1/organizations- List organizationsPOST /api/v1/organizations- Create organizationGET /api/v1/organizations/{id}/workspaces- List workspacesPOST /api/v1/organizations/{id}/workspaces- Create workspace
CF-002: Agent Registration and Lifecycle Management
Description: Agents must register with the master before receiving configurations.
Registration Flow:
- Administrator generates bootstrap token in Master UI
- Token is provided to agent via environment variable or config file
- Agent establishes TLS connection to master (verifies server certificate)
- Agent sends bootstrap token for registration
- Master validates token and establishes shared secret:
- Master generates session_key (per-agent) + key_id
- Session key used for HMAC request signing
- Primary/secondary key design for rotation
Agent States:
enum AgentState {
Pending, // Registered but never connected
Online, // Connected and healthy
Offline, // Disconnected
Degraded, // Connected but health checks failing
Maintenance, // Manually placed in maintenance mode
}
Agent Metadata:
struct Agent {
id: Uuid,
workspace_id: Uuid,
name: String,
hostname: String,
ip_address: String,
version: String,
state: AgentState,
deployment_mode: DeploymentMode, // DockerSidecar, K8sSidecar, Standalone
last_seen_at: DateTime,
capabilities: Vec<String>, // e.g., ["http3", "websocket", "rate_limiting"]
labels: HashMap<String, String>, // e.g., {"env": "prod", "region": "us-east"}
}
API Endpoints:
POST /api/v1/agents/register- Register new agentGET /api/v1/agents- List agentsGET /api/v1/agents/{id}- Get agent detailsPOST /api/v1/agents/{id}/tokens- Generate registration tokenDELETE /api/v1/agents/{id}- Deregister agent
CF-003: Real-time Configuration Distribution
Description: Push configuration changes to agents in real-time with delivery guarantees.
Requirements:
- Config changes propagate to all affected agents within 5 seconds
- Support for targeted updates (specific agents or groups)
- Config versioning with rollback capability
- Delivery confirmation from agents
Configuration Scope:
enum ConfigScope {
Global, // All agents
Workspace, // All agents in workspace
AgentGroup(String), // Agents with specific label selector
Agent(Uuid), // Single agent
}
Delivery Guarantees:
- At-least-once delivery
- Automatic retry with exponential backoff
- Config checksum verification
- Offline agents receive updates on reconnection
Master Features
MF-001: RESTful API
Description: Comprehensive REST API for all operations.
Base URL: /api/v1
Resource Endpoints:
| Resource | Endpoints |
|---|---|
| Organizations | GET, POST, PATCH, DELETE /organizations |
| Workspaces | GET, POST, PATCH, DELETE /workspaces |
| Agents | GET, POST, PATCH, DELETE /agents |
| VirtualHosts | GET, POST, PATCH, DELETE /virtual-hosts |
| Upstreams | GET, POST, PATCH, DELETE /upstreams |
| Certificates | GET, POST, DELETE /certificates |
| AccessLogs | GET /access-logs |
| Metrics | GET /metrics |
Response Format:
{
"data": { ... },
"meta": {
"page": 1,
"per_page": 20,
"total": 100
},
"links": {
"self": "/api/v1/agents?page=1",
"next": "/api/v1/agents?page=2",
"prev": null
}
}
Error Format:
{
"error": {
"code": "VALIDATION_ERROR",
"message": "Invalid configuration",
"details": [
{"field": "server_name", "message": "Invalid domain format"}
]
}
}
MF-002: Web-based Admin Console (Embedded)
Description: Modern web UI for managing the entire system. Built with React + Vite and served as static files embedded directly in the master binary.
Pages:
| Page | Features |
|---|---|
| Dashboard | Agent status, recent events, traffic overview |
| Agents | List, detail view, logs, metrics graphs |
| Configurations | Virtual host editor, upstream management |
| Certificates | SSL certificate list, expiration alerts |
| Access Control | Users, roles, permissions management |
| Settings | Organization settings, integrations |
Key UI Features:
- Real-time updates via WebSocket
- Monaco editor for nginx configuration
- Visual topology view (agent connections)
- Dark/light mode support
- Responsive design
MF-003: Configuration Template Engine
Description: Templating system for generating nginx configurations.
Template Variables:
# Example virtual host template
server {
listen {{port}} {{#if ssl}}ssl{{/if}} {{#if http2}}http2{{/if}};
server_name {{server_name}};
{{#if ssl}}
ssl_certificate {{ssl_certificate_path}};
ssl_certificate_key {{ssl_certificate_key_path}};
{{/if}}
location {{location_path}} {
proxy_pass http://{{upstream_name}};
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
{{#each custom_headers}}
add_header {{name}} "{{value}}";
{{/each}}
{{#if rate_limiting}}
limit_req zone={{rate_limit_zone}} burst={{rate_limit_burst}};
{{/if}}
}
}
Built-in Templates:
default- Standard reverse proxyspa- Single Page Application (with fallback to index.html)api- API gateway with rate limitingstatic- Static file serving with cachingwebsocket- WebSocket proxy with connection upgrades
MF-004: Certificate Management (ACME)
Description: Automatic SSL/TLS certificate provisioning via Let's Encrypt.
Features:
- ACME v2 protocol support
- HTTP-01 and DNS-01 challenges
- Automatic renewal (30 days before expiry)
- Wildcard certificate support (DNS-01)
- Certificate monitoring and alerts
Certificate Entity:
struct Certificate {
id: Uuid,
workspace_id: Uuid,
domain: String,
is_wildcard: bool,
provider: CertificateProvider, // LetsEncrypt, Custom
status: CertificateStatus, // Pending, Active, Expired, Error
issued_at: DateTime,
expires_at: DateTime,
auto_renew: bool,
certificate_pem: Option<String>, // Encrypted at rest
private_key_pem: Option<String>, // Encrypted at rest
}
Agent Features
AF-001: Nginx Lifecycle Management
Description: Agent manages nginx process lifecycle based on deployment mode.
Docker Sidecar Mode:
- Shares PID namespace with nginx container (via
pid: service:nginx) - Directly signals nginx process for reload/restart
- Monitors nginx via health checks
Standalone Mode:
- Direct process management (signals to PID from file)
- systemd integration (optional, for service management)
- PID file monitoring
Lifecycle Actions:
start- Start nginxstop- Graceful shutdownreload- Hot reload configurationrestart- Full restarttest- Validate configuration
AF-002: Configuration Rendering and Application
Description: Agent renders nginx configs from master templates and applies them using atomic symlink swaps for zero-downtime updates.
Config Directory Structure:
/etc/nginx/
├── nginx.conf # Contains: include /etc/nginx/conf.d/current/*.conf
├── conf.d/
│ ├── current -> ./20260302143000/ # Symlink to active deployment
│ ├── 20260302143000/ # Active config (timestamped)
│ │ ├── default.conf
│ │ └── upstream.conf
│ ├── 20260302141500/ # Previous deployment (for rollback)
│ │ ├── default.conf
│ │ └── upstream.conf
│ └── 20260302140000/ # Older deployment (cleanup candidate)
Config Rendering Flow:
- Receive ConfigUpdate from master
- Create new deployment folder:
./conf.d/<timestamp>/ - Render nginx config files into timestamped folder
- Validate new config:
nginx -t -c /etc/nginx/conf.d/<timestamp>/nginx.conf - If validation passes, atomically update symlink:
current→<timestamp>/ - Execute graceful nginx reload
- Verify reload success (health check)
- Report status to master
- Cleanup old deployments (keep N recent versions)
Atomic Config Swap:
async fn apply_config(&self, config: ConfigUpdate) -> Result<()> {
let timestamp = generate_timestamp();
let deploy_dir = self.conf_d_path.join(×tamp);
let symlink_path = self.conf_d_path.join("current");
// 1. Render config to new timestamped directory
self.render_config(&config, &deploy_dir).await?;
// 2. Validate BEFORE switching symlink (point to new folder directly)
self.validate_config(&deploy_dir).await?;
// 3. Atomic symlink swap (Unix: symlink + rename)
let temp_link = self.conf_d_path.join("current.tmp");
tokio::fs::symlink(&deploy_dir, &temp_link).await?;
tokio::fs::rename(&temp_link, &symlink_path).await?; // Atomic operation
// 4. Reload nginx (picks up new symlink target)
self.reload_nginx().await?;
// 5. Verify and cleanup
self.verify_health().await?;
self.cleanup_old_deployments(5).await?; // Keep last 5 versions
self.report_success(config.id, timestamp).await;
}
Rollback Strategy:
async fn rollback(&self, target_timestamp: &str) -> Result<()> {
let target_dir = self.conf_d_path.join(target_timestamp);
let symlink_path = self.conf_d_path.join("current");
// Verify target exists
if !target_dir.exists() {
return Err(Error::RollbackTargetNotFound);
}
// Atomic symlink swap back to previous deployment
let temp_link = self.conf_d_path.join("current.tmp");
tokio::fs::symlink(&target_dir, &temp_link).await?;
tokio::fs::rename(&temp_link, &symlink_path).await?;
// Reload nginx
self.reload_nginx().await?;
}
AF-003: Health Monitoring and Reporting
Description: Continuous health monitoring of nginx and the host system.
Health Checks:
- Nginx Health: HTTP request to nginx health endpoint
- Configuration Health: Verify current config matches expected
- Resource Health: CPU, memory, disk usage
- Connection Health: Active connections, request rate
Health Report Structure:
struct HealthReport {
agent_id: Uuid,
timestamp: DateTime,
nginx_status: NginxStatus,
system_metrics: SystemMetrics,
config_checksum: String,
alerts: Vec<Alert>,
}
struct NginxStatus {
is_running: bool,
pid: Option<u32>,
uptime_seconds: u64,
active_connections: u32,
requests_per_second: f64,
}
struct SystemMetrics {
cpu_percent: f64,
memory_used_mb: u64,
memory_total_mb: u64,
disk_used_gb: u64,
disk_total_gb: u64,
}
Reporting Interval: Configurable (default: 30 seconds)
AF-004: Metrics Collection and Export
Description: Collect and expose metrics in Prometheus format.
Metrics Endpoint: GET /metrics (on agent)
Built-in Metrics:
# Nginx metrics (parsed from stub_status)
nxmesh_nginx_connections_active{agent_id="..."} 42
nxmesh_nginx_connections_reading{agent_id="..."} 5
nxmesh_nginx_connections_writing{agent_id="..."} 30
nxmesh_nginx_connections_waiting{agent_id="..."} 7
nxmesh_nginx_requests_total{agent_id="..."} 1234567
# Agent metrics
nxmesh_agent_uptime_seconds{agent_id="..."} 86400
nxmesh_agent_master_connection_status{agent_id="..."} 1
nxmesh_agent_config_version{agent_id="...",version="123"} 1
# System metrics
nxmesh_system_cpu_percent{agent_id="..."} 25.5
nxmesh_system_memory_used_bytes{agent_id="..."} 1073741824
nxmesh_system_disk_used_bytes{agent_id="..."} 53687091200
Custom Metrics: Agents can collect custom metrics from nginx access logs
AF-005: Offline Operation and Recovery
Description: Agent can operate independently when master is unreachable.
Offline Capabilities:
- Continue serving traffic with cached configuration
- Local health monitoring continues
- Metrics are buffered for later transmission
- Automatic reconnection attempts
Recovery Flow:
- Detect disconnection from master
- Enter "offline mode"
- Continue operating with cached config
- Buffer metrics and logs
- Attempt reconnection with exponential backoff
- On reconnection:
- Sync configuration (compare checksums)
- Transmit buffered metrics
- Resume normal operation
Configuration Management
CM-001: Virtual Host Configuration
Description: Define nginx server blocks (virtual hosts) via API/UI.
VirtualHost Entity:
struct VirtualHost {
id: Uuid,
workspace_id: Uuid,
name: String, // Human-readable name
server_name: String, // Domain name(s), comma-separated
listen_port: u16, // Usually 80 or 443
ssl_enabled: bool,
ssl_certificate_id: Option<Uuid>,
// Routing configuration
locations: Vec<Location>,
// Advanced settings
http2_enabled: bool,
http3_enabled: bool,
gzip_enabled: bool,
rate_limiting: Option<RateLimitConfig>,
// Target agents
target_agents: AgentSelector,
}
struct Location {
path: String, // e.g., "/api" or "~ \.php$"
proxy_pass: Option<String>, // e.g., "http://backend"
upstream_id: Option<Uuid>,
root: Option<String>, // For static files
index: Option<String>, // e.g., "index.html"
custom_headers: Vec<Header>,
rewrite_rules: Vec<RewriteRule>,
}
Validation Rules:
server_namemust be valid domain(s)listen_portmust be 1-65535- SSL certificate must exist if
ssl_enabledis true - At least one location must be defined
CM-002: Upstream Configuration
Description: Define backend server pools for load balancing.
Upstream Entity:
struct Upstream {
id: Uuid,
workspace_id: Uuid,
name: String, // Used as upstream identifier
// Load balancing algorithm
algorithm: LoadBalanceAlgorithm, // RoundRobin, LeastConn, IPHash, etc.
// Backend servers
servers: Vec<UpstreamServer>,
// Health check configuration
health_check: Option<HealthCheckConfig>,
// Connection settings
keepalive_connections: Option<u32>,
keepalive_timeout: Option<u32>,
}
struct UpstreamServer {
address: String, // IP:port or hostname:port
weight: u32, // Default: 1
backup: bool, // Backup server
down: bool, // Temporarily down
max_fails: u32, // Default: 1
fail_timeout: u32, // Seconds, default: 10
}
enum LoadBalanceAlgorithm {
RoundRobin,
LeastConnections,
IPHash,
WeightedRoundRobin,
}
CM-003: Configuration Versioning
Description: Track all configuration changes with full history.
Versioning Features:
- Every change creates a new version
- Versions are immutable
- Rollback to any previous version
- Diff between versions
- Audit log of who changed what
Version Entity:
struct ConfigVersion {
id: Uuid,
resource_type: String, // "virtual_host", "upstream", etc.
resource_id: Uuid,
version_number: u64, // Auto-incrementing
data: Json, // Full configuration snapshot
checksum: String, // SHA-256 of data
created_by: Uuid, // User ID
created_at: DateTime,
change_summary: String, // Human-readable description
}
API Endpoints:
GET /api/v1/virtual-hosts/{id}/versions- List versionsGET /api/v1/virtual-hosts/{id}/versions/{version}- Get specific versionPOST /api/v1/virtual-hosts/{id}/rollback- Rollback to versionGET /api/v1/virtual-hosts/{id}/diff?from=v1&to=v2- Compare versions
Observability
OB-001: Structured Logging
Description: Comprehensive logging with structured format.
Log Levels: ERROR, WARN, INFO, DEBUG, TRACE
Log Fields:
{
"timestamp": "2026-03-02T10:30:00Z",
"level": "INFO",
"component": "agent",
"agent_id": "550e8400-e29b-41d4-a716-446655440000",
"trace_id": "abc123",
"span_id": "def456",
"message": "Configuration applied successfully",
"fields": {
"config_id": "config-123",
"version": 42,
"duration_ms": 150
}
}
Log Targets:
- Master: systemd journal, file, or centralized (ELK/Loki)
- Agent: stdout (Docker), file (standalone), or remote
OB-002: Distributed Tracing
Description: OpenTelemetry tracing for request flow visualization.
Traced Operations:
- Configuration push (master → agent → nginx)
- Health check cycles
- Certificate issuance
- API requests
Span Attributes:
nxmesh.agent_idnxmesh.config_idnxmesh.workspace_idnxmesh.organization_id
OB-003: Access Log Aggregation
Description: Collect and query nginx access logs from all agents.
Features:
- Centralized access log storage
- Real-time log streaming
- SQL-like query interface
- Log retention policies
Access Log Schema:
struct AccessLogEntry {
id: Uuid,
agent_id: Uuid,
timestamp: DateTime,
// Request details
remote_addr: String,
method: String,
uri: String,
protocol: String,
host: String,
// Response details
status: u16,
body_bytes_sent: u64,
response_time_ms: f64,
// Additional fields
user_agent: Option<String>,
referer: Option<String>,
request_id: Option<String>,
}
Query API:
# Example query
query {
accessLogs(
filter: {
agentId: "...",
timeRange: { from: "2026-03-01", to: "2026-03-02" },
statusCode: { gte: 500 }
},
limit: 100
) {
timestamp
method
uri
status
responseTimeMs
}
}
Security Features
SF-001: Authentication and Authorization
Description: Multi-method authentication with fine-grained RBAC.
Authentication Methods:
- JWT (for API/Web UI)
- Password-based login (local user accounts)
- OAuth2/OIDC (Google, GitHub, enterprise SSO)
- API Keys (for service accounts)
- TLS + Shared Secret (for agent communication)
- Server-side TLS (auto-generated self-signed or custom certificates)
- Bootstrap token for initial registration
- Session key with HMAC signing for ongoing requests
- Primary/secondary key rotation
RBAC Model:
struct Role {
id: Uuid,
name: String,
permissions: Vec<Permission>,
}
enum Permission {
// Organization scope
OrganizationRead,
OrganizationWrite,
OrganizationDelete,
// Workspace scope
WorkspaceRead,
WorkspaceWrite,
WorkspaceDelete,
// Agent scope
AgentRead,
AgentWrite,
AgentReload,
AgentDelete,
// Config scope
ConfigRead,
ConfigWrite,
ConfigDeploy,
ConfigDelete,
// Certificate scope
CertificateRead,
CertificateWrite,
CertificateDelete,
// User management
UserRead,
UserWrite,
UserDelete,
}
SF-002: Secret Management
Description: Secure storage and distribution of sensitive data.
Secrets:
- SSL private keys
- API tokens
- Database passwords
- External service credentials
Security Measures:
- Encryption at rest (AES-256-GCM)
- Encryption in transit (TLS 1.3)
- Automatic secret rotation
- Audit logging for secret access
SF-003: Network Security
Description: Network-level security controls.
Features:
- IP allowlisting for agent connections
- Rate limiting on API endpoints
- DDoS protection recommendations
- Security headers enforcement (HSTS, CSP, etc.)
Agent Connection Security:
- TLS Encryption: Server-side TLS (auto-generated or custom certificates)
- Development: Self-signed certificates auto-generated on first start
- Production: Valid certificates (Let's Encrypt or corporate CA)
- Bootstrap Authentication: One-time token for initial registration
- Session Authentication: HMAC-signed requests with shared session key
- Key Rotation: Primary/secondary key design for seamless rotation
- Certificate Pinning: Optional fingerprint verification for additional security