Aussicht auf Lc0 Bt5

Not logged inCSS-Forum

Topic Hauptforen / CSS-Forum / Aussicht auf Lc0 Bt5

By Lothar Jung Date 2024-09-30 09:39 Upvotes 2

Big transformer 5. Improvements over BT4 include removing biases from qkv and centering and biases from layer normalizations, in addition to a new relative position encoding replacing the previous "smolgen". Training started late June 2024 and is expected to take 6 months. It has 15 layers with 1024 embedding size, 32 heads per layer, and 4096 dff size, roughly tripling the dff size over BT4 and doubling the overall model size.

Training can be monitored here: https://leela.plutie.ca/?darkMode=true#scalars&runSelectionState=eyJCVDUtMTAyNHgxNXgzMmgtcnBlLXRyYWluIjpmYWxzZSwiQlQ1LTEwMjR4MTV4MzJoLXJwZS1zd2EtdGVzdCI6ZmFsc2UsIkJUNS0xMDI0eDE1eDMyaC1ycGUtc3dhLXZhbGlkYXRpb24iOnRydWUsIkJUNC0xMDI0eDE1eDMyaC1zd2EtdGVzdCI6ZmFsc2UsIkJUNC0xMDI0eDE1eDMyaC1zd2EtdmFsaWRhdGlvbiI6dHJ1ZSwiQlQ0LTEwMjR4MTV4MzJoLXRyYWluIjpmYWxzZX0%3D&_smoothingWeight=0

By Max Siegfried Date 2024-09-30 12:32

3x bzw. 2x so groß wie BT4.
Das wurde auch mal Zeit.
Immerhin zeitgleich mit der doppelt so schnellen RTX 5090 verfügbar.

By Lothar Jung Date 2024-09-30 16:02

Steht doch im Text, BT5 ist 2-fach größer als BT4.

Topic Hauptforen / CSS-Forum / Aussicht auf Lc0 Bt5

Powered by mwForum 2.29.3 © 1999-2014 Markus Wichitill