zero-dependency C substrate for typed, effectful, deterministic compositional computation (graphs, plans, provenance)
Thymos is a typed, effectful, deterministic compositional computation substrate in zero-dependency C. ML is the first morphism library on top, not the project itself. The three levels are (i) category-theoretic graphical design, (ii) compiler with passes, (iii) kernel implementation. The use case splits as WHAT (declarative: opt.h, models/, tools/, quant/), HOW (compiler, fusion, th_Graph + th_Plan), WHERE (kernels & backends via th_run + registry for CPU/AVX2/BLAS/CUDA/Metal/etc.).
The moat is the small algebra: typed values, declared effects, content-addressed deterministic plans, and audit-grade provenance. Agent tools, classical models, quantization, actuarial work, and vulnerability research are all morphism packs over the same core. It targets embeddability on memory-constrained hardware (Jetson-class, edge) without Python/PyTorch drag.
Non-negotiable invariants: zero forced dependencies (pure C core), flat repo layout, determinism by default (same graph+inputs → byte-identical outputs unless non-det effect declared), effects first-class (every morphism declares its set; planner respects barriers), content-addressable plans, provenance emitted on every compile, embeddability (no unscoped globals; th_Context scopes everything), and no hidden allocations in the hot loop (steady-state runs on pre-planned memory layout).
Design motif: typed morphisms. Every operation has explicit inputs, outputs, effects, and context. No hidden state. Writing the core in C from scratch keeps that property all the way down.
Precondition for everything else. Detailed build plan defines DoD and invariants.
Prelude (th_Error, th_DType, th_Id, th_Shape, th_Type, th_Hash). Context + arena allocator + th_MemoryPlan + execution policy. Tensor (strided views, elementwise + linear kernels, no allocs in steady state). ACT IR (th_Object, th_Morphism, th_Signature, th_Category, th_HyperGraph, th_Compiler). Eight compiler passes required before Phase 1: type/effect/capability check, topo sort, liveness, memory_plan, content_address, emit_provenance. Minimal CLI demo that builds a graph, compiles, runs, emits provenance bundle, and produces byte-identical results on replay.
Current: implementation in progress per the public PLAN. No working end-to-end yet; the plan is the artifact and the contract.
First domain pack; the agentic moat.
Typed, verifiable plans over registered tools (http, shell, file, llm-call, ...). th_plan_verify (effects, capabilities, budget, sandbox) before any execution. Deterministic replay from provenance. Idempotency keys, control-flow combinators (Sequence/Parallel/Branch/Approve), cache hits on unchanged sub-plans. A demo that registers tools, verifies/plans/runs a multi-step task, replays from bundle, and rejects unauthorized plans.
Proves the performance thesis.
Static compilation of preprocess + predict pipelines (linear/logistic, trees, GBM, small MLP, calibration, drift) to flat memory plans. Zero allocations in the prediction hot loop. Measurable single-sample p99 latency win vs sklearn on realistic tabular pipelines. Train and predict express the same morphism category (training-serving skew structurally impossible). Wrap as Tool for Phase 1 planner.
Why the invariants exist.
Optional CUDA/Metal/ROCm backends. Jetson Orin Nano deployment under tight memory. quant/ (time-aware effects, Black-Scholes, backtest=production), actuary/ (IFRS 17 audit packs from provenance), vuln/ (taint lattice, symbolic + concrete as parallel functors). All as morphism packs over the same substrate.
The runtime is organized around a graph of typed ops over a single arena/context. Allocation is bump-style inside the arena; the entire context is freed in one shot. Core structs follow a strict naming convention (th_Pascal for types, th_snake_case for functions, TH_SCREAMING for constants). Public API returns th_Error; out-params are last. No globals except th_default_ctx; everything else lives in th_Context.
// arena + context (planned API — subject to change)
ThymosArena *arena = thymos_arena_create(512 * MB);
ThymosCtx *ctx = thymos_ctx_init(arena);
// typed tensors as graph nodes
ThymosTensor *a = thymos_tensor(ctx, THYMOS_F32, (int[]){4, 4}, 2);
ThymosTensor *b = thymos_tensor(ctx, THYMOS_F32, (int[]){4, 4}, 2);
ThymosTensor *c = thymos_matmul(ctx, a, b);
// validate, plan memory, execute
thymos_graph_validate(ctx);
thymos_graph_run(ctx);
thymos_arena_destroy(arena); // single free
ACT IR is the long-term moat: th_Object (typed IR value), th_Morphism (effect + capability flags + apply fn), th_Signature, th_Category, th_HyperGraph, th_Compiler. Eight passes (type_check, effect_check, capability_check, topo_sort, liveness, memory_plan, content_address, emit_provenance) run before any execution. Provenance bundle (graph hash, per-node hashes, effects, capabilities, memory plan, schedule) is a first-class compile output — identical builds on different machines produce identical bundles (timestamps isolated).
Morphism layer: explicit domain/codomain + declared effects on every op. Composition and reordering are checked statically and constrained by effects. The planner cannot move TH_EFF_NET_IO or TH_EFF_IRREVERSIBLE across barriers. This is what makes verifiable agent plans and training-serving skew impossible possible in the later phases.
phase / milestone status ───────────────────────────────────────────────── Phase 0: prelude + ctx + arena + tensor active Phase 0: ACT IR + 8 compiler passes active Phase 0: memory_plan + provenance active Phase 0: minimal CLI demo + DoD active Phase 1: tools/ (http, shell, llm, plan verify/run, sandbox, cache) next Phase 1: verifiable agent plans + deterministic replay next Phase 2: models/ (linear, trees, pipelines) + bench vs sklearn after P1 Phase 2: zero-alloc inference + train=predict morphisms after P1 long-term: quant/, actuary/, vuln/, CUDA/Jetson later
A milestone is not done until its Definition of Done (demo runs on clean checkout, invariants hold, provenance reproducible) is met. The public PLAN.md is the contract. No numbers or claims before the demos exist.