Scaling von BT3 und BT4 gegenüber SF

By Geri Schmidt Date 2024-05-18 02:32

Lothar Jung schrieb:

No news except that may be BT3 and BT4 need at least 150Knpm to scale better than SF.

<a class='ura' href='https://discord.com/channels/425419482568196106/539960268982059008/1234354466199830610'>https://discord.com/channels/425419482568196106/539960268982059008/1234354466199830610</a>

<a class='ura' href='https://cdn.discordapp.com/attachments/539960268982059008/1234354699608522853/image.png?ex=66306de5&is=662f1c65&hm=e5e173ba4184358489fa8693ed7a7affb67a7fe64d0479f8da887444e3246099&'>https://cdn.discordapp.com/attachments/539960268982059008/1234354699608522853/image.png?ex=66306de5&is=662f1c65&hm=e5e173ba4184358489fa8693ed7a7affb67a7fe64d0479f8da887444e3246099&</a>

Wie wird eigentlich das BT4-Netz trainiert? Der Trainings-Client lädt automatisch das neueste 768x15 Netz runter.

By Lothar Jung Date 2024-05-18 07:57 Edited 2024-05-18 08:00

Hier der Link zur Training Seite:

https://training.lczero.org/

So werden auf Discord die Engines und Netze veröffentlicht:

Big transformer 3. New network arch using smolgen-augmented self-attention from BT2. It has embedding size 768, ffn projection size 1024, 24 heads per layer, and 15 smolgen encoder layers with mish activation. There are also cuda optimizations available, which should reduce latency by 10 to 15%. It has 3 policy heads: vanilla, optimistic and soft. Vanilla and optimistic can be used for play, while soft helps speedup training. The optimistic policy head improves policy predictions drastically in tactical positions. It has 3 value heads: winner, q and st. Winner head is trained on game outcome, while q head is trained on position Q-value. The ST value head is a weighted average of short-term future value from current position. The ideas are from Katago methods: https://github.com/lightvector/KataGo/blob/master/docs/KataGoMethods.md#optimistic-policy.

Quickstart:
The new engine version can be found at https://github.com/Ergodice/lc0/tree/master
Executable files can be found at https://ci.appveyor.com/project/Ergodice/lc0/builds/48346860
BT3 can be found at https://storage.lczero.org/files/networks-contrib/BT3-768x15x24h-swa-2790000.pb.gz

To enable all the new features, put the following in the config file:

```
--uncertainty-weighting-cap=1.03
--uncertainty-weighting-coefficient=0.13
--uncertainty-weighting-exponent=-1.76
--use-uncertainty-weighting=true
```

If you are using a single GPU, add
--backend-opts=policy_head=vanilla,value_head=winner
Otherwise, check the Github for instructions.

Yaml specification at https://discord.com/channels/425419482568196106/1192835078099828895

Last updated: 3/7/2024

By Max Siegfried Date 2024-05-18 09:10

Wie wird eigentlich das BT5-Netz trainiert?

By Lothar Jung Date 2024-05-18 09:30 Edited 2024-05-18 09:59

Wie das BT3 Netz.

Big transformer 4. New network architecture which builds off of BT3 by adding two types of auxiliary heads, future heads and categorical value heads. The categorical value heads predict a distribution over values of q rather than a WDL outcome distribution, and the future heads predict the moves that will be played over the next two plies. The hope is that these heads will give additional information to the net to improve training speed. We've also fixed half-precision training, so this model will be larger. BT4 training started in mid-October and is expected to take a few months. It has 15 layers with 1024 embedding size, 32 heads per layer, and 1536 dff size, for roughly a doubling in size over BT3. Training logs can be accessed at https://leela.plutie.ca/

Yaml specification in https://discord.com/channels/425419482568196106/1192835078099828895

Quickstart:
Network: https://plutie.ca/lc0/BT4-1024x15x32h-swa-6147500.pb.gz
Binary: https://ci.appveyor.com/project/Ergodice/lc0/builds/48488418