neural net , MCTS fuer Anfaenger

By Guenter Stertenbrink Date 2018-04-06 14:37

after the Move traing phase there can be a Game training phase using self play.
It would start from the then existing set of weights.

neural net , MCTS

scheint, als ob nur die "network weights" Parameter upgedated werden, [wieviele ?]
die Partien werden nicht gespeichert.

instead of a handcrafted evaluation function and move ordering heuristics, Alpha Zero
utilises a deep neural network (p,v) = f_θ(s) with parameters θ.
Thisneuralnetworktakestheboardpo-
sition s as an input and outputs a vector of move probabilities p with components
p_a= Pr(a|s) for each action a, and a scalar value v estimating the expected
outcome z from position s, v ≈ E[z|s].
AlphaZero learns these move probabilities and value estimates entirely from selfplay;
these are then used to guide its search.
The parameters θ of the deep neural network in AlphaZero are trained by self-play
reinforcement learning, starting from randomly initialised parameters θ.
The neural network parameters θ are updated so as to minimise the
error between the predicted outcome vt and the game outcome z,
and to maximise the similarity of the policy vector pt to the search probabilities ππt.
The updated parameters are used in subsequent games of self-play.
Self-play games are generated by using the latest parameters for this neural network,
omitting the evaluation step and the selection of best player.

-------------------------------
https://github.com/suragnair/alpha-zero-general/tree/master/pytorch_classification
they have ~20M parameters (1M-70M)
https://github.com/crypt3lx2k/Zerofish
Currently uses a completely different model than the one from the paper!
This model has a very different layout than the one from the paper.
Significantly reduced number of parameters in value and policy output heads.
https://www.reddit.com/r/chess/comments/7igro1/alphazero_reactions_from_top_gms_stockfish_author/
[–]Harawaldr wrote:
It is also worth noting that their approach was inefficient to say the least,
because they intended to prove the general purpose quality of their method.
Whoever adapts the A0 method for chess, with the intention of creating the
best possible engine, will optimise hyper parameters for chess,
they will use tablebases, they will experiment with different state representations.
I hypothesise that the chess world will see software based on the A0 architecture,
but better, and runnable on high end consumer hardware within three years.
https://www.reddit.com/r/MachineLearning/comments/7qdwb5/is_it_possible_to_train_the_value_output_in_a/
Zeta_36
picardythird wrote :
my own chess implementation of AGZ
https://github.com/trebledawson/chess
my neural network was too small, increase the number of parameters
until it was as large as my computer could store,
zeta_36 wrote: I really think DeepMind did something they didn't explain
thousands of parameters
https://sjeng.org/leela.html
Leela Go FAQ
no new version since Oct.

By Guenter Stertenbrink Date 2018-04-06 15:38

also, was passiert evalmaessig ? A0 berechnet zu der zu analysierenden Stellung 20Millionen
Parameter und verknuepft diese irgendwie und findet so die Resultate von "aehnlichen"
Parameterverknuepfungen, die vorher aus gespielten Partien erzeugt wurden.
Nun wuerde mich interessieren, wie findet a0 die zu einer Stellung aehnlichen Stellungen
aus vorherigen Partien gegen sich selbst ?
Kann ich da eine Stellung angeben und LC0 gibt mir die 10 "aehnlichsten" Stellungen ?
Am besten per webpage , waere m.E. viel interessanter, als Partien zu spielen.