Real measurements. 500 iterations per benchmark. Local hardware. No cherry-picked runs, no cloud variance, no simulation.
The core reasoning engine processes binary decisions, multi-class classification, and risk assessments without leaving the microsecond range.
| Operation | 500 iter | 1,000 iter | 2,000 iter |
|---|---|---|---|
|
Binary Decision
Two-outcome classification with evidence weighting
|
< 10 µs | < 10 µs | < 10 µs |
|
Multi-Class Decision
Four competing outcomes with supporting evidence
|
< 10 µs | < 10 µs | < 10 µs |
|
Risk Assessment
Factor-weighted risk evaluation with uncertainty
|
< 10 µs | < 10 µs | < 10 µs |
Median values — consistent across all iteration counts
Every decision-layer operation completes in under 10 microseconds — roughly 200x faster than a single millisecond. Numbers are deterministically stable: the 500-iteration and 2,000-iteration results are within 0.1 µs of each other.
End-to-end latency from problem ingestion through routing, layer selection, and decision output.
| Operation | 500 iter | 1,000 iter | 2,000 iter |
|---|---|---|---|
|
Full Inference Cascade
Input → route → resolve → decision output
|
< 200 µs | < 200 µs | < 200 µs |
|
Quantum-Enhanced Inference
Quantum-probabilistic constraint processing
|
< 200 µs | < 200 µs | < 200 µs |
Median values — all runs sub-millisecond
Every interaction is stored, indexed, and instantly retrievable. Semantic search across thousands of entries in under 100 microseconds. Exact recall in under 10 microseconds.
| Operation | 500 iter | 1,000 iter | 2,000 iter |
|---|---|---|---|
|
Exact Recall
Key-based memory retrieval from persistent store
|
< 10 µs | < 10 µs | < 10 µs |
|
Store (semantic indexing)
Semantic encoding, indexing, and persistent write
|
< 1 ms | < 1 ms | < 1 ms |
|
Semantic Search
Similarity search across 2,000 indexed entries
|
< 100 µs | < 100 µs | < 100 µs |
Median values — every memory operation is sub-millisecond
The memory system enables infinite context. Every fact, every decision, every interaction — stored persistently and searchable in under 100 microseconds. Exact key recall in under 10 microseconds is deterministic. Store operations include encoding and indexing — at under 1 millisecond, the full end-to-end write path is sub-millisecond.
Quantum gate sequences executed on NVIDIA GPU with hardware-accelerated state evolution. Real computation, real silicon, real speedup.
| Operation | 500 iter | 1,000 iter | 2,000 iter |
|---|---|---|---|
|
Quantum Gate Sequence (accelerated)
Multi-gate quantum sequence, hardware-accelerated path
|
0.25 ms | 0.25 ms | 0.26 ms |
|
Quantum Gate Sequence (baseline)
Multi-gate quantum sequence, standard GPU path
|
5.58 ms | 5.58 ms | 5.58 ms |
|
Measured Speedup
Accelerated vs. standard path
|
22.0x | 22.0x | 22.0x |
|
Quantum Gate Sequence (extended)
Extended quantum sequence, standard GPU path
|
14.1 ms | 14.1 ms | 14.1 ms |
Median values — speedup consistently ~20x across all runs
The hardware-accelerated quantum path delivers a 22x speedup over the standard GPU path. The accelerated path completes multi-gate quantum sequences in 250 microseconds.
Semantic search latency across increasing memory sizes. The system stays sub-millisecond even as the knowledge base grows.
| Memory Size | 500 iter | 1,000 iter | 2,000 iter |
|---|---|---|---|
100 Entries |
< 100 µs | < 100 µs | < 100 µs |
1,000 Entries |
< 100 µs | < 100 µs | < 100 µs |
5,000 Entries |
< 200 µs | < 200 µs | < 200 µs |
10,000 Entries |
< 200 µs | < 200 µs | < 200 µs |
Median values — scaling behavior is deterministic
At 10,000 stored memories, search completes in under 200 microseconds. Going from 100 to 10,000 entries (100x more data) adds minimal latency. This is how infinite context works: nothing is forgotten, and recall barely slows down.
Raw sustained throughput of the decision engine under continuous load.
| Operation | 500 iter | 1,000 iter | 2,000 iter |
|---|---|---|---|
|
Binary Decision Throughput
Sustained bursts of 1,000 consecutive decisions
|
200K+ dec/s | 200K+ dec/s | 200K+ dec/s |
|
Multi-Class Decision Throughput
Four competing outcomes per decision, sustained
|
150K+ dec/s | 150K+ dec/s | 150K+ dec/s |
Throughput is stable — no variance across iteration counts
The complete recall path: similarity matching plus linked memory retrieval. This is how the system actually retrieves context.
| Operation | Median | P5 | P95 |
|---|---|---|---|
|
Full Recall Pipeline
Full recall across 200 indexed entries
|
< 300 µs | < 300 µs | < 400 µs |
|
Linked Recall
Related memory retrieval from any entry
|
< 100 µs | < 100 µs | < 200 µs |
Full brain recall completes in under 300 microseconds. Linked memory retrieval alone is under 100 µs. Both paths execute in parallel and merge results. This is how the system retrieves relevant context from across its entire memory in under a third of a millisecond.
Store N entries, retrieve all N, verify every single one. This is the benchmark that proves "0 information loss" — not a claim, a measurement.
| Scale | Retention | Retrieval Speed | |
|---|---|---|---|
1,000 Entries |
100.0% | < 10 µs/op | |
5,000 Entries |
100.0% | < 10 µs/op | |
10,000 Entries |
100.0% | < 10 µs/op |
100% retention at every scale — retrieval speed is constant regardless of store size
Every entry stored is retrievable. 100% retention at 1,000, 5,000, and 10,000 entries — with retrieval speed holding flat under 10 µs/op regardless of store size. This isn't caching — it's persistent, indexed, deterministic memory.
Per-decision latency under concurrent load. The system doesn't just perform well in isolation — it maintains microsecond latency when processing many decisions simultaneously.
| Concurrency | Median / Decision | P5 | P95 |
|---|---|---|---|
|
10 Parallel Decisions
|
< 10 µs | < 10 µs | < 30 µs |
|
50 Parallel Decisions
|
< 10 µs | < 10 µs | < 10 µs |
|
100 Parallel Decisions
|
< 10 µs | < 10 µs | < 10 µs |
Per-decision latency actually decreases as concurrency increases — staying under 10 µs even at 100 parallel decisions. The decision engine has no contention overhead; concurrent requests benefit from CPU pipeline utilization. Even the P95 at 100 parallel decisions stays under 10 µs — well over 100x below the 1ms threshold.
Underlying hardware throughput powering the cognitive architecture.
| Metric | Value | ||
|---|---|---|---|
|
CPU Compute Throughput
Dense matrix multiplication (FP32)
|
1,329 GFLOPS | ||
|
Memory Bandwidth
Sequential read throughput
|
38.1 GB/s | ||
|
Vector State Clone
High-dimensional state copy
|
120 ns | ||
|
GPU: Consumer NVIDIA (Dual)
Multi-GPU compute, consumer hardware
|
Every benchmark validated across multiple iteration counts. All results shown side-by-side to demonstrate stability.
Production codebase with real data structures. No synthetic shortcuts or toy problems.
Consumer GPU. No cloud instances, no container overhead, no network latency.