Alpha Go Zero - Algorithmus, Programmier-Tricks

By Guenter Stertenbrink Date 2018-04-22 04:41 Upvotes 1

hier eine gute Beschreibung von alpha Go zero in English :
alpha-(chess) zero wird vermutlich dieselben "Tricks" benutzen.
Nichts hierueber im alpha-zero-paper ?!?
keine Suchtreffer dort fuer "residual","headed","multi","lookahead","2048",

---------------------------------
[in alpha Go zero ]
they trained networks that could look at
The current board position Which player was playing,
The sequence of recent moves (necessary to rule out certain moves as illegal)
---------------------------
The neural network was trained to play moves that reflected the improved evaluations
from performing the “lookahead” search.
[="analysing" the self-play-games ??]
------------------------------------------------------
“two-headed” neural network This is quite unusual.
This trick accounted for half of AlphaGo Zero’s increase in playing strength over AlphaGo.
(this trick is known more technically as Multi-Task Learning with Hard Parameter Sharing.
Sebastian Ruder has a great overview here)
----------------------------------
Trick #3: “Residual” Nets [pioneered 2015]
----------------------------------------
each of these two neural network-related tricks — switching from convolutional to
residual architecture and using the “Two Headed Monster” neural network
architecture instead of separate neural networks — would have resulted in
about half of the increase in playing strength as was achieved when both
were combined.
------------------------------------------------------
3. As these self-play games are happening, sample 2,048 positions from the
most recent 500,000 games, along with whether the game was won or lost.
For each move, record both A) the results of the MCTS evaluations of those
positions — how “good” the various moves in these positions were based on
lookahead — and B) whether the current player won or lost the game.
-------------------------------------------------------

By Guenter Stertenbrink Date 2018-05-04 01:44

Es gibt zwei Dinge, die Leela versucht vorherzusagen, wenn sie mit einer Schachstellung präsentiert wird:
das Ergebnis des Spiels (1,1/2,0), "Wertvorhersage"
und die Wahrscheinlichkeitsverteilung der moeglichen Zuege aus dieser Position, "Politikvorhersage"

Die Trainingsdaten bestehen aus einer Menge von Positionen, die aus Selbstpartien rausgefiltert werden,
sowie diesen zwei zusätzlichen Informationen zu jeder Position.

So werden Wertvorhersage und Politikvorhersage verbessert.

wenn Leela den Suchbaum durchquert und auf einen Blattknoten trifft, führt sie eine Wertvorhersage
der Blattknotenposition durch, die dann zur Wurzelposition zurueckverfolgt wird, um die
Wahrscheinlichkeitsverteilung zu aktualisieren. Außerdem durchquert MCTS zufällig den Spielbaum,
aber Leela verwendet ihre Politikvorhersage, um intelligenter auszuwählen, welche Zuege zu erforschen sind.