Empirical CS Research · Bucknell University

DO TRANSFORMERS
LEARN RULES
OR JUST PATTERNS?

An empirical study investigating whether transformer models truly learn structural rules in formal languages or simply memorize statistical surface patterns within the training distribution.

Hannah Tran & Chris Mitsch
Formal Languages
DFA-Based Evaluation
Out-of-Distribution Generalization

Central Question

Can transformers learn the actual generative structure of a language, or do they only approximate patterns seen during training?

2
Transformer architectures tested
OOD
Long-sequence generalization evaluation
DFA
Fully controlled formal language generation
EOS
Primary structural failure point identified
Overview

HIGH ACCURACY
DOES NOT NECESSARILY
MEAN UNDERSTANDING

Transformer models routinely achieve impressive performance on sequence modeling tasks, yet performance metrics alone cannot determine whether these systems genuinely understand the rules governing a structure or merely approximate patterns within a dataset.

This project isolates that distinction through deterministic finite automata, where the true generative rules are fully known. By creating a controlled experimental environment, we can directly observe where transformers succeed, where they fail, and whether those failures reveal deeper architectural limitations.

Research Gap

Prior work often focuses on benchmark accuracy or theoretical learnability in isolation. Very little research combines both behavioral evaluation and structural probing to identify whether models genuinely learn rules.

This Project Explores

  • In-distribution vs out-of-distribution performance
  • EOS prediction failure patterns
  • Structural generalization breakdown
  • Whether scaling improves true rule learning
Methodology

A CONTROLLED SYSTEM
WHERE THE TRUTH
IS ALWAYS KNOWN

All datasets were generated using deterministic finite automata, ensuring every sequence has a precisely computable structure. This allows evaluation beyond simple accuracy metrics and enables direct probing of structural understanding.

Language

Regular languages generated through DFA transitions over constrained alphabets.

  • Positive sequences only
  • Next-token prediction objective
  • Controlled sequence lengths

Models

GPT-style transformer architectures evaluated under identical training conditions.

  • 128 hidden / 2 layers
  • 256 hidden / 4 layers
  • OOD evaluation up to length 14

Evaluation

Structural failure was measured through targeted probing rather than benchmark accuracy alone.

  • Perplexity analysis
  • Token-level evaluation
  • EOS prediction probing

Failure Beyond
Distribution

While transformers achieved strong in-distribution validation performance, their behavior collapsed under longer out-of-distribution sequences, revealing reliance on statistical locality rather than structural rule learning.

Validation Dynamics

Models appear stable during training, but stability alone does not imply abstraction.

Both classification and next-token models demonstrate smooth validation convergence during training. However, later experiments reveal that this apparent success does not transfer outside the original training distribution.

Validation dynamics comparison graph
Classification validation accuracy

High in-distribution performance

Validation accuracy exceeds 90%, suggesting successful memorization of training dynamics.

OOD perplexity spike

Sharp OOD failure

A dramatic perplexity spike appears immediately beyond the training boundary, indicating failure to generalize algorithmically.

Key Observation

PERFORMANCE
IS NOT
UNDERSTANDING

OOD perplexity vs training length

Scaling Analysis

Larger models do not consistently solve the problem.

Increasing parameter count alone fails to guarantee stronger structural generalization. Longer training contexts improve OOD performance more reliably than naive scaling.

EOS accuracy

EOS prediction collapse

End-of-sequence prediction deteriorates sharply outside the training range.

EOS perplexity

Perplexity instability

The model exhibits severe uncertainty once sequence lengths exceed familiar contexts.

Symbol accuracy

Local token prediction survives

Symbol-level prediction remains relatively stable, suggesting dependence on local statistical cues.

Symbol perplexity

But abstraction still breaks

OOD perplexity spikes indicate that the model does not consistently recover underlying DFA structure.

Findings

FAILURE IS
STRUCTURED,
NOT RANDOM

Models performed well on sequences similar to training data, but degraded sharply on longer unseen sequences. The failures consistently concentrated around sequence termination, suggesting the absence of true structural understanding.

01

Strong ID Performance

Transformers achieved stable perplexity and strong next-token prediction within the training distribution.

02

OOD Collapse

Performance degraded sharply on longer unseen sequences, indicating failure to extrapolate structural rules.

03

EOS Failure

End-of-sequence prediction emerged as the primary breakdown point, revealing weak state-sensitive reasoning.

04

Scaling Wasn't Enough

Larger models improved in-distribution metrics but did not resolve structural generalization failures.

A MODEL HAS LEARNED THE RULE ONLY IF ITS PREDICTIONS FACTOR THROUGH THE STATE MAPPING
Structural Learning Condition
D(x) ≈ g ∘ φ(x)

Our findings suggest transformers trained through next-token prediction do not reliably learn functions of this form. Instead, they often rely on local statistical correlations that remain effective only within the original training distribution.

Future Work

WHERE THIS
RESEARCH GOES NEXT

The DFA framework creates a controlled environment for studying structural learning and opens several directions for future work surrounding interpretability, inductive bias, and symbolic reasoning.

Context-Free Languages
Mechanistic Interpretability
Negative Samples
Alternative Architectures
State-Sensitive Objectives
Generalization Theory