State Flow Machine: 62% Accuracy on Long Sequences

A developer has built State Flow Machine (SFM), a non-transformer architecture designed for tasks requiring state tracking across long sequences. The model runs on a single Huawei Ascend 910 ProA NPU and addresses transformers' limitations in simulating processes step-by-step when sequences exceed training lengths.

Architecture Details

Instead of attention heads, SFM uses a bank of explicit memory slots (small fixed-size vectors). At each token, a gating mechanism decides which slots to update and how. The model reads from slots, computes an update, and writes back, functioning like a tiny differentiable register file. This approach is related to DeltaNet, Linear Attention, and state-space models (Mamba, RWKV) but more explicit—slots are directly addressable and updated via learned gates rather than being an implicit recurrent state.

Benchmark Setup

The synthetic program state tracking benchmark involves sequences like x = 42; x += 17; x -= 8; x *= 2; ... where the model must predict the final value of x (integer 0–100, framed as 101-class classification).

Training data: 10,000 programs with 10–27 operations, hard difficulty (all ops: add, subtract, multiply, integer divide, modulo, set), seed 42
Validation: 1,000 programs, same distribution
Evaluation: test at 1× (in-distribution), 2×, 4×, 8×, 16×, and 32× the training program length

Results

Exact Match Accuracy:

1× (10 ops): State Slots 99.9%, Transformer-Fair 100.0%, Transformer-Large 100.0%
2× (20 ops): State Slots 92.9%, Transformer-Fair 99.0%, Transformer-Large 99.5%
4× (40 ops): State Slots 62.0%, Transformer-Fair 1.9%, Transformer-Large 3.1%
8× (80 ops): State Slots 35.3%, Transformer-Fair 1.3%, Transformer-Large 1.0%
16× (160 ops): State Slots 5.1%, Transformer-Fair 0.9%, Transformer-Large 0.7%
32× (320 ops): State Slots 5.0%, Transformer-Fair 1.0%, Transformer-Large 0.8%

Generalization ratio (accuracy retention):

State Slots: 4×/1× = 0.62×, 8×/1× = 0.35×
Transformer-Fair: 4×/1× = 0.02×, 8×/1× = 0.01×
Transformer-Large: 4×/1× = 0.03×, 8×/1× = 0.01×

Mean Absolute Error at extrapolation lengths (scale 0–100):

4×: State Slots 14.03, Transformer-Fair 40.33, Transformer-Large 36.76
8×: State Slots 26.73, Transformer-Fair 41.71, Transformer-Large 41.19

The transformers are essentially guessing randomly at 4× and beyond (MAE ~40 on a 0–100 scale is close to the expected error of a uniform random guess), while State Slots continues making meaningful predictions.