Not logged inCSS-Forum
Forum CSS-Online Help Search Login
CSS-Shop Impressum Datenschutz
Up Topic Hauptforen / CSS-Forum / LC0 BT5
- By Max Siegfried Date 2024-03-21 12:08
Hier einige Informationen, welche mir ein Freund zugesendet hat, deshalb auf English:

Big transformer 5. Tentative network architecture. The only real improvements we have are removing the biases in the qkv projections and replacing the layer normalizations with RMS norms, which improves training speed by 10%. Likely to be substantially larger than BT4.

Happy to present some new stuff ready for BT5. The first is relative positional encodings, which when replacing smolgen add roughly 0.5% policy accuracy without much throughput loss https://arxiv.org/pdf/1803.02155.pdf (green line is a compact version of rpe with 15x15 params per channel, blue is a larger version with 64x64 params per channel, orange is just smolgen). We're also finding that large MLP hidden depths greatly improve performance consistent with other transformers when integrated with transformers with rpe encoding. Why large hidden depths didn't work before we have yet to explain. We've also been experimenting with mixture of experts (specifically expert choice https://arxiv.org/pdf/2202.09368v1.pdf) and found around a 1% gain without flops increase.





Wird man die Policy Accuracy auf 100% steigern können?
Up Topic Hauptforen / CSS-Forum / LC0 BT5

Powered by mwForum 2.29.3 © 1999-2014 Markus Wichitill