# Hexagonal Architecture<no value>

ztick uses **hexagonal architecture** (also called "ports and adapters") to maintain a clean separation of concerns and testability.

## Layer Structure

The codebase is organized into 4 layers, with a strict dependency direction (inward only):

```
┌─────────────────────────────────────────────────────┐
│ Interfaces (CLI, Config)                            │
├─────────────────────────────────────────────────────┤
│ Infrastructure (Adapters: TCP, Shell, Persistence,  │
│                 Telemetry)                          │
├─────────────────────────────────────────────────────┤
│ Application (Scheduler, Storage, Query Handler)     │
├─────────────────────────────────────────────────────┤
│ Domain (Job, Rule, Runner, Execution)               │
└─────────────────────────────────────────────────────┘
     ↑ Dependencies flow inward only
```

### Layer 1: Domain (`src/domain/`)

**Purpose**: Pure data types and business logic with zero external dependencies.

**Exports**:
- `Job` — Scheduled action with status
- `JobStatus` — `planned`, `triggered`, `executed`, `failed`
- `Rule` — Pattern-based job-to-runner mapping
- `Runner` — Tagged union of `shell`, `direct`, `http`, `awf`, `amqp`, `redis`
- `Instruction` — `set`, `remove`, `query` operations
- `Request`/`Response` — Query protocol

**Key property**: No imports from outer layers. Tests can run standalone.

**Example**: Define a job type with lifecycle methods
```zig
pub const Job = struct {
    identifier: []const u8,
    execution: i64,  // nanosecond timestamp
    status: JobStatus,
};
```

### Layer 2: Application (`src/application/`)

**Purpose**: Core scheduler logic—storage, pattern matching, query handling.

**Exports**:
- `Scheduler` — Main orchestrator
- `JobStorage` — In-memory HashMap + priority queue for efficient job insertion (O(log n))
- `RuleStorage` — Rule persistence and pattern matching
- `QueryHandler` — Instruction → response conversion
- `ExecutionClient` — Tracks pending job executions

**Key property**: Depends only on Domain. No I/O or side effects.

**Performance note**: `JobStorage` uses `std.PriorityQueue` ordered by execution time for sub-linear insertion and sorted retrieval ([F021](../../.specify/implementation/F021/)). This ensures scheduling throughput scales to thousands of jobs without linear scan overhead.

**Example**: Scheduler tick loop
```zig
pub fn tick(self: *Scheduler, now: i64) !void {
    const to_execute = try self.job_storage.get_to_execute(now);
    for (to_execute) |job| {
        const rule = self.rule_storage.find_rule_for(job.identifier);
        try self.execution_client.trigger(job, rule);
    }
}
```

### Layer 3: Infrastructure (`src/infrastructure/`)

**Purpose**: Adapters that connect the application to external systems.

**Exports**:
- `TcpServer` — Listens for TCP protocol connections (thread per connection)
- `HttpServer` — RESTful JSON API server (thread per connection, mirrors TCP pattern)
- `ShellRunner` — Executes shell commands via `std.process`
- `Encoder`/`Logfile` — Binary persistence (read/write jobs and rules)
- `Parser` — Line protocol parsing
- `Telemetry` — OpenTelemetry SDK initialization and OTLP export ([ADR-0004](../ADR/0004-opentelemetry-sdk-dependency.md))
- `Channel` — Thread-safe bounded FIFO with `drain()` for single-lock batch consumption and optional wake notification ([F021](../../.specify/implementation/F021/))
- `Clock` — Event-driven tick scheduling with condition-variable wakeup and framerate enforcement ([F021](../../.specify/implementation/F021/))

**Key property**: Depends on Domain and Application. Handles all I/O.

**Performance note**: The database thread uses `Channel.drain()` to consume incoming requests in a single lock/unlock cycle, reducing contention at high concurrency. Event-driven `Clock` wakes immediately on incoming requests rather than sleeping for fixed intervals, reducing single-worker latency to sub-millisecond. Both TCP and HTTP servers spawn a detached thread per connection with atomic counter tracking, enabling concurrent request handling without blocking.

**Example**: TCP adapter accepts connections and routes commands
```zig
pub const TcpServer = struct {
    pub fn handle_connection(
        self: *TcpServer,
        scheduler: *application.Scheduler,
        socket: std.net.Stream,
    ) !void {
        var parser = Parser.init(socket);
        while (try parser.next_instruction()) |instr| {
            const response = try scheduler.handle_query(instr);
            try socket.write(response);
        }
    }
};
```

**Concurrency Pattern** ([F022](../../.specify/implementation/F022/)): Both TcpServer and HttpServer use an identical detached-thread pattern for handling concurrent connections:

1. **Accept loop** increments an atomic counter and spawns a detached thread per accepted connection
2. **Worker thread** processes the request and decrements the counter on exit (via `defer`)
3. **Graceful shutdown** via `join_all()` polls the counter until it reaches zero or 5-second timeout elapses

This lock-free pattern enables linear throughput scaling: each client gets its own thread without mutex contention, and the atomic counter enables safe shutdown coordination across threads.

Key benefits:
- **No head-of-line blocking** — slow clients don't block fast clients
- **Simple reasoning** — one thread per connection makes debugging straightforward
- **Proven pattern** — TCP server established this pattern; HTTP replicates it exactly
- **Graceful shutdown** — `join_all()` ensures all in-flight requests drain before exit

### Layer 4: Interfaces (`src/interfaces/`)

**Purpose**: Entry point—command-line parsing, configuration loading, component wiring.

**Exports**:
- `main()` — Parses args, loads config, spawns threads
- `Config` — TOML configuration
- `Cli` — Argument parsing

**Key property**: Depends on all layers. Orchestrates the entire system.

**Example**: Main function wires up all components
```zig
pub fn main() !void {
    var config = try load_config(args.config_path);
    var scheduler = try application.Scheduler.init(allocator);
    var tcp_server = try infrastructure.TcpServer.bind(config.controller.listen);

    try tcp_server.listen(scheduler);
}
```

## Dependency Inversion

The hexagonal pattern uses **dependency inversion** to keep dependencies flowing inward:

### Without Hexagonal (Tightly Coupled)
```
main.zig
  ├─ TcpServer
  │  ├─ Scheduler
  │  │  ├─ Job
  │  │  └─ Rule
  │  └─ Encoder (I/O)
  └─ Encoder (I/O)

Problem: Application depends on I/O; hard to test
```

### With Hexagonal (Inverted)
```
main.zig (orchestrates)
  ├─ TcpServer (adapter) → calls
  │  └─ Scheduler (application) → uses
  │     └─ Job, Rule (domain)

TcpServer is separate; Scheduler is testable without I/O
```

## Testing Strategy

Each layer is tested independently:

### Domain Tests
- Pure data structures
- Status transitions
- Pattern matching logic
- **No I/O or allocation tracking needed**

Example: `src/domain/job.zig` includes inline tests

```zig
test "job lifecycle" {
    var job = Job{ .identifier = "test", .status = .planned };
    job.status = .triggered;
    try std.testing.expectEqual(JobStatus.triggered, job.status);
}
```

### Application Tests
- Scheduler behavior
- Storage operations
- Rule resolution
- **No actual TCP or file I/O**

Example: `src/application/scheduler.zig` tests

```zig
test "scheduler triggers matching jobs" {
    var scheduler = try Scheduler.init(allocator);
    try scheduler.handle_query(Request{ .instruction = .{ .set = ... } });
    try scheduler.tick(1000);
    try std.testing.expect(job.status == .triggered);
}
```

### Infrastructure Tests
- Parsing (protocol, TOML)
- Encoding/decoding (persistence)
- Channel correctness
- **Mock I/O where possible; real I/O in integration tests**

Example: `src/infrastructure/protocol/parser.zig` tests

```zig
test "parse protocol line" {
    var parser = Parser.init("SET job.1 1234567890");
    const instr = try parser.next_instruction();
    try std.testing.expectEqual(InstructionType.set, instr.type);
}
```

### Functional Tests
- End-to-end behavior
- Component interaction
- Full tick cycle

Example: `src/functional_tests.zig`

```zig
test "scheduler processes job from query to executed" {
    var scheduler = try Scheduler.init(allocator);
    try scheduler.handle_query(Request{ .instruction = .{ .set = ... } });
    try scheduler.tick(1000);
    // Verify job is executed
}
```

## Adding a New Feature

1. **Define domain types** → `src/domain/new_concept.zig`
   - No external dependencies
   - Include unit tests

2. **Implement application logic** → `src/application/handler.zig`
   - Uses domain types
   - Tested without I/O

3. **Add infrastructure adapter** → `src/infrastructure/adapter.zig`
   - Implements the interface
   - Handles side effects

4. **Wire in interfaces** → `src/main.zig`
   - Compose the feature into the system
   - Update CLI/config as needed

5. **Add functional test** → `src/functional_tests.zig`
   - Verify end-to-end behavior

## Performance Optimizations

### Database Thread Throughput ([F021](../../.specify/implementation/F021/))

The database thread processes incoming job requests and triggers scheduled jobs. Three optimizations ensure it remains the throughput leader at scale:

**1. Priority Queue Storage (O(log n) insertion)**
- `JobStorage.to_execute` uses `std.PriorityQueue` ordered by execution time instead of sorted array
- Insertion: O(log n) vs O(n) linear scans + array shifts
- Supports thousands of scheduled jobs without throughput degradation

**2. Batch Request Drain (single lock/unlock)**
- `Channel.drain()` copies all buffered requests in one critical section
- Reduces lock contention from 500+ per second (individual try_receive) to ~1 per tick
- Multi-worker throughput scales closer to linearly

**3. Event-Driven Tick Scheduling (sub-millisecond latency)**
- `Clock` uses condition variable `timedWait()` instead of unconditional `Thread.sleep()`
- Wakes immediately when requests arrive; sleeps only when idle
- Single-worker p50 latency drops from 2.24ms to <1ms
- Framerate acts as a tick cap to prevent busy-spinning under sustained load

**Benchmark targets** (8 TCP workers):
- Throughput: >3000 msg/s (6x improvement)
- p50 latency: <5ms (68% reduction)
- p99 latency: <15ms (53% reduction)

See [Building the Project](building.md#performance-profiling) for benchmark instructions.

### HTTP Server Concurrency ([F022](../../.specify/implementation/F022/))

The HTTP server spawns a dedicated thread per accepted connection, mirroring the TCP server's detached-thread pattern:

- **Thread per connection** — Each HTTP request is handled in its own thread, eliminating head-of-line blocking in the accept loop
- **Atomic counter tracking** — An `active_connections` atomic counter tracks live worker threads for graceful shutdown
- **Graceful shutdown** — `join_all()` polls the counter with a 5-second timeout, ensuring in-flight requests complete before process exit
- **Shared state safety** — `ResponseRouter` and `Channel` are mutex-guarded; `next_client_id` uses atomic `fetchAdd`

**Benchmark targets** (8 HTTP workers):
- Throughput: >2000 msg/s (4x improvement over sequential baseline)
- p50 latency: <5ms (vs ~56ms under 16 concurrent workers previously)

## Key Principles

1. **Domain is pure** — no I/O, no dependencies
2. **Application is testable** — depends only on domain
3. **Infrastructure is flexible** — adapters are swappable
4. **Interfaces are thin** — just wiring and config
5. **Tests are co-located** — each file includes its own tests

## See Also

- **[Building](building.md)** — How to compile and test the project
- **[Contributing](contributing.md)** — Code style and submission guidelines
