I'm happy to announce that our paper on Leela's BT4 architecture, the "Chessformer", has been accepted for publication at the International Conference in Learning Representation! (ICLR 2026, see
https://openreview.net/forum?id=2ltBRzEHyd).
To summarize our findings, we use a 64-token encoder-only transformer with a position encoding called the Geometric Attention Bias (previously referred to as Smolgen) to achieve state-of-the-art searchless chess strength and human move-matching with 20x fewer parameters and FLOPS. We also trained transcoders on the MLP activations of these models, finding features corresponding to several nontrivial human chess concepts like "square on b3, b6, f3, or f6 in the opening that have been weakened by the lack of a bishop pawn". The improvement in modeling strong human play is particularly notable. Prior approaches required search to accurately model this caliber of play well, but a raw Chessformer is enough to outperform these by up to 5%.
We are in the process of open-sourcing code the repository, but for now the big takeaways are:
(1) The Chessformer is capable of achieving state-of-the-art results on a variety of chess modeling tasks, giving it the potential to either greatly increase output quality or greatly decrease resource cost compared to prior architectures.
(2) Strong, searchless, human-like play will now cheaply be available to projects like the Leela Odds bots which aim to produce results that are strong *and* human-geared. The fact 2% increase in move-matching accuracy at ratings below 2000 from taking move history into account may also prove useful.
(3) Having interpretable, square-attributable features across a range of skill levels could enable a better understanding of how humans process chess and allow for the design of better features for NNUE architectures.
(4) More broadly, a strong tokenization and position encoding choice are critical to the success of a transformer.