Empirical CS Research · Bucknell University

DO TRANSFORMERS
LEARN RULES
OR JUST PATTERNS?

An empirical study investigating whether transformer models truly learn structural rules in formal languages or simply memorize statistical surface patterns within the training distribution.

Hannah Tran & Chris Mitsch

Formal Languages

DFA-Based Evaluation

Out-of-Distribution Generalization

Central Question

Can transformers learn the actual generative structure of a language, or do they only approximate patterns seen during training?

Transformer architectures tested

OOD

Long-sequence generalization evaluation

DFA

Fully controlled formal language generation

EOS

Primary structural failure point identified

Overview

HIGH ACCURACY
DOES NOT NECESSARILY
MEAN UNDERSTANDING

Transformer models routinely achieve impressive performance on sequence modeling tasks, yet performance metrics alone cannot determine whether these systems genuinely understand the rules governing a structure or merely approximate patterns within a dataset.

This project isolates that distinction through deterministic finite automata, where the true generative rules are fully known. By creating a controlled experimental environment, we can directly observe where transformers succeed, where they fail, and whether those failures reveal deeper architectural limitations.

Research Gap

Prior work often focuses on benchmark accuracy or theoretical learnability in isolation. Very little research combines both behavioral evaluation and structural probing to identify whether models genuinely learn rules.

This Project Explores

In-distribution vs out-of-distribution performance
EOS prediction failure patterns
Structural generalization breakdown
Whether scaling improves true rule learning

Methodology

A CONTROLLED SYSTEM
WHERE THE TRUTH
IS ALWAYS KNOWN

All datasets were generated using deterministic finite automata, ensuring every sequence has a precisely computable structure. This allows evaluation beyond simple accuracy metrics and enables direct probing of structural understanding.

Language

Regular languages generated through DFA transitions over constrained alphabets.

Positive sequences only
Next-token prediction objective
Controlled sequence lengths

Models

GPT-style transformer architectures evaluated under identical training conditions.

128 hidden / 2 layers
256 hidden / 4 layers
OOD evaluation up to length 14

Evaluation

Structural failure was measured through targeted probing rather than benchmark accuracy alone.

Perplexity analysis
Token-level evaluation
EOS prediction probing

Experimental Results

Failure Beyond
Distribution

While transformers achieved strong in-distribution validation performance, their behavior collapsed under longer out-of-distribution sequences, revealing reliance on statistical locality rather than structural rule learning.

Validation Dynamics

Models appear stable during training, but stability alone does not imply abstraction.

Both classification and next-token models demonstrate smooth validation convergence during training. However, later experiments reveal that this apparent success does not transfer outside the original training distribution.

High in-distribution performance

Validation accuracy exceeds 90%, suggesting successful memorization of training dynamics.

Sharp OOD failure

A dramatic perplexity spike appears immediately beyond the training boundary, indicating failure to generalize algorithmically.

Key Observation

PERFORMANCE
IS NOT
UNDERSTANDING

Scaling Analysis

Larger models do not consistently solve the problem.

Increasing parameter count alone fails to guarantee stronger structural generalization. Longer training contexts improve OOD performance more reliably than naive scaling.

EOS prediction collapse

End-of-sequence prediction deteriorates sharply outside the training range.

Perplexity instability

The model exhibits severe uncertainty once sequence lengths exceed familiar contexts.

Local token prediction survives

Symbol-level prediction remains relatively stable, suggesting dependence on local statistical cues.

But abstraction still breaks

OOD perplexity spikes indicate that the model does not consistently recover underlying DFA structure.

Findings

FAILURE IS
STRUCTURED,
NOT RANDOM

Models performed well on sequences similar to training data, but degraded sharply on longer unseen sequences. The failures consistently concentrated around sequence termination, suggesting the absence of true structural understanding.

Strong ID Performance

Transformers achieved stable perplexity and strong next-token prediction within the training distribution.

OOD Collapse

Performance degraded sharply on longer unseen sequences, indicating failure to extrapolate structural rules.

EOS Failure

End-of-sequence prediction emerged as the primary breakdown point, revealing weak state-sensitive reasoning.

Scaling Wasn't Enough

Larger models improved in-distribution metrics but did not resolve structural generalization failures.

DO TRANSFORMERS LEARN RULES OR JUST PATTERNS?