- - By Lothar Jung Date 2023-06-30 21:38 Upvotes 1
Hier der aktuelle Blog:

Big transformer 3. Tentative new network arch using smolgen-augmented self-attention (see from BT2. Will probably switch main activation from square relu to mish. The plan is to have embedding size 1024, ffn projection size 1024, 32 heads per layer, and 10 total layers.  We have also found a way of preprocessing the inputs that prevents early layers from doing nothing, which should slightly improve performance. Removing layernorms after attention also improves performance and slightly decreases latency. Experiments are currently in progress to quantize the dense layers to int8 precision to improve speed. There are also cuda optimizations available, which should reduce latency by 10 to 15%.

Es werden hiermit verschiedene Performance Verbesserungen angekündigt.
Parent - - By Daniel Reist Date 2023-07-02 12:39 Upvotes 1
Hallo Lothar

Danke für die Information.
Gibt es denn schon ein BT3 Netz?
Parent - By Lothar Jung Date 2023-07-02 12:46 Upvotes 1
Hallo Daniel,

Nein noch nicht.
Wird aber im Sommer kommen.

Steigerung der Qualität der Netzgenerierung verläuft schleppend.
Stockfish 16 hat doch einen hohen Benchmark gesetzt.
Jedoch werden die GPUs alle 2 Jahre deutlich stärker.


