Lies dies:
Comments
to the Swedish Rating List 1/2022, January 6
This last list of 2021 and first list of 2022, will be dedicated in honor of the late Guy Haworth.
On this new years list, we can present seven new programs on our two hardware levels.
First one out is Mark Lefler and Larry Kaufman's latest version of Dragon Komodo, named 2.51. We have initially tested the non-MCTS version and after the first 240 games it has reached a rating of 3569, that is 22 points stronger than the last non-MCTS Dragon Komodo we tested. It is now just four points behind the leader from the last rating list - Lc0. It is also nine points below Stockfish 13 - which now has taken over the leader spot in the rating list. Dragon Komodo 2.51 is now more than 100 points stronger than the best non-NNUE version of Komodo we have tested! As before we have used the opening book "out10-35.bin" of Erdogan Gunes for the testing of Dragon Komodo.
Next one out is the latest creation from team Stockfish, namely Stockfish 14. After the first 202 games it has reached a rating of 3554, which is 24 points behind Stockfish 13 in first place, and 4 points behind Stockfish 12. For the testing of Stockfish 14 we have used the opening book by Fauzi Dabat named "Aggressive 5.0 by Fauzi.abk". More games will probably be needed to stabilise the rating against the other engines in the top, lower the error bars and see if it will be able to supersede the older version as more games are played.
We have also tested two new engines from Jon Dart, named Arasan 22.3 and the neural network architecture (NNUE) version named 23.01. For the testing of these two engines, we have used Arasan's own opening book. Arasan 22.3 has reached the rating of 3363 after the 240 games. This is 78 points stronger than the last Arasan 21.2 version we have tested. The NNUE version: Arasan 23.01, has then gained a further 74 points on that, resulting in the rating of 3437 after the first 240 games.
We also have the latest (and last) free Pedone-version in this rating list. It is Pedone 3.1 by Fabio Gobbato. Like the 3.0-version of the same program, it uses NNUE in the search. We have tested it on both our hardware levels. On the 1800X - Pedone 3.1 has reached a rating of 3423 after 220 games played. On the Q6600 hardware, Pedone 3.1 has achieved a rating of 3376 after 260 games. The difference between the two hardware is 47 points, and Pedone 3.1 on the 1800X is 37 points stronger than Pedone 3.0 1800X. On the Q6600, the difference between Pedone 3.1 and 3.0 is 53 points. We have used Pedone's own opening book for the testing of Pedone 3.1.
We can also present a rating of Alex Morozov's latest Booot program, named Booot 6.5. We have tested this program on two of our hardware levels. On our 1800X hardware, Booot 6.5 has received a rating of 3432 after 200 games. On our Q6600 hardware, Booot 6.5 has reached a rating of 3359 after 240 games. The difference between the two different hardware is 73 points. The 6.5-version is 64 points ahead of the formerly tested Booot 6.4 on the 1800X and on the Q6600, the difference between Booot 6.5 and Booot 6.3.1 is 68 points. We have used Sedat Canbaz's "Perfect2021.abk" for the testing of Booot 6.5.
We are also able to welcome a Swedish newcomer in this rating list! Probably the first Swedish program in the list after Per Ola Valfridsson's - Ruffian, I guess? It is Martin Danielsson's engine: Marvin 5.1.0, which we have tested on both our hardware levels. We have used Marvin's own opening book for the testing. Marvin is from the 5.0.0-version a neural network architecture (NNUE)-program. On our 1800X hardware, Marvin 5.1.0 has reached a rating of 3324 after 240 games. On the Q6600 hardware, Marvin 5.1.0 has gotten a rating of 3225 after the first 147 games. The difference between the two hardware is just shy of 100 points.
Last, but not least, is the two Wasp-programs from John Stanback. The first one is Wasp 4.5, which uses ordinary search, and has received a rating of 3264 after the first 140 games. This is nine points ahead of the formerly tested Wasp 4. Wasp 5.0 has introduced NNUE in its search. This has proven to be valuable for Wasp (as for other NNUE engines), and Wasp 5.0 1800X has received a rating of 3378 after the first 120 games. Albeit it's early in the testing, the difference between the two versions is now 114 points at least!
By Andreas Mader
Date 2022-01-08 19:42
Edited 2022-01-08 19:59
Upvotes 5
Benno Hartwig schrieb:
Was ist eigentlich von dieser seltsamen Reihenfolge der Stockfish-Versionen zu halten:
<code> 1 Stockfish 13 x64 1800X 3.6 GHz 3578 42 -38 360 71% 3424
4 Stockfish 12 NNUE x64 1800X 3.6 GHz 3558 30 -29 560 62% 3471
5 Stockfish 14 x64 1800X 3.6 GHz 3554 51 -48 202 59% 3502</code>
Mir scheint, so kleine Partienumfänge (202 bei Stockfish 14!) sollten nicht für eine veröffentliche Liste genutzt werden.
Die Ergebnisse wirken doch schon auch ein Stück weit irre.
Hinter den ELO-Zahlen gibt es zwei Spalten, die beinhalten den "Margin of Error". Die wahre ELO-Zahl liegt demnach bei Stockfish 14 aufgrund der bisher gespielten Partien zwischen 3554-48 und 3554+51, und das auch "nur" mit einer 95%igen Wahrscheinlichkeit.
In einer Zeit, in der ELO-Angaben mit zwei Stellen hinter dem Komma erfolgen und jedes einzelne Turnier, in dem im Ergebnis die exakte Reihenfolge nach ELO-Rating nicht penibel eingehalten wird, als "wertlos" bezeichnet wird, mutet es wirklich vollkommen anachronistisch an, wenn sich die Schweden nach wie vor mathematisch und statistisch korrekt verhalten. Sie können keine endgültigen Antworten anbieten, weil es die nicht gibt, aber das wird überhaupt nicht gerne gesehen. Wenn man die Liste richtig liest und nicht stur ausschließlich auf das Rating und die Reihenfolge starrt, ist sie wertvoller als die meisten anderen Rankings.
Schöne Grüße
Andreas