after the Move traing phase there can be a Game training phase using self play.
It would start from the then existing set of weights.
neural net , MCTS
scheint, als ob nur die "network weights" Parameter upgedated werden, [wieviele ?]
die Partien werden nicht gespeichert.
instead of a handcrafted evaluation function and move ordering heuristics, Alpha Zero
utilises a deep neural network (p,v) = f_θ(s) with parameters θ.
Thisneuralnetworktakestheboardpo-
sition s as an input and outputs a vector of move probabilities p with components
p_a= Pr(a|s) for each action a, and a scalar value v estimating the expected
outcome z from position s, v ≈ E[z|s].
AlphaZero learns these move probabilities and value estimates entirely from selfplay;
these are then used to guide its search.
The parameters θ of the deep neural network in AlphaZero are trained by self-play
reinforcement learning, starting from randomly initialised parameters θ.
The neural network parameters θ are updated so as to minimise the
error between the predicted outcome vt and the game outcome z,
and to maximise the similarity of the policy vector pt to the search probabilities ππt.
The updated parameters are used in subsequent games of self-play.
Self-play games are generated by using the latest parameters for this neural network,
omitting the evaluation step and the selection of best player.
-------------------------------
https://github.com/suragnair/alpha-zero-general/tree/master/pytorch_classificationthey have ~20M parameters (1M-70M)
https://github.com/crypt3lx2k/ZerofishCurrently uses a completely different model than the one from the paper!
This model has a very different layout than the one from the paper.
Significantly reduced number of parameters in value and policy output heads.
https://www.reddit.com/r/chess/comments/7igro1/alphazero_reactions_from_top_gms_stockfish_author/[–]Harawaldr wrote:
It is also worth noting that their approach was inefficient to say the least,
because they intended to prove the general purpose quality of their method.
Whoever adapts the A0 method for chess, with the intention of creating the
best possible engine, will optimise hyper parameters for chess,
they will use tablebases, they will experiment with different state representations.
I hypothesise that the chess world will see software based on the A0 architecture,
but better, and runnable on high end consumer hardware within three years.
https://www.reddit.com/r/MachineLearning/comments/7qdwb5/is_it_possible_to_train_the_value_output_in_a/Zeta_36
picardythird wrote :
my own chess implementation of AGZ
https://github.com/trebledawson/chess my neural network was too small, increase the number of parameters
until it was as large as my computer could store,
zeta_36 wrote: I really think DeepMind did something they didn't explain
thousands of parameters
https://sjeng.org/leela.htmlLeela Go FAQ
no new version since Oct.