Not logged inCSS-Forum
Forum CSS-Online Help Search Login
CSS-Shop Impressum Datenschutz
Up Topic Hauptforen / CSS-Forum / CCRL update (31st October 2008)
- - By Graham Banks Date 2008-11-01 21:21
The latest updates of the CCRL Rating Lists and Statistics are available for viewing at:
http://www.computerchess.org.uk/ccrl/4040/ (40/40)
http://computerchess.org.uk/ccrl/404/ (40/4)
The live link to the 40/4 list given below is currently the most up to date for that list.

The lists sometimes get updated during the week and these updates can be viewed here:
http://www.computerchess.org.uk/ccrl/4040.live/ (40/40)
http://computerchess.org.uk/ccrl/404.live/ (40/4)
However, no game downloads are available from these live links.

The links to the various rating lists can be found just beneath the default Best Versions list.
For example there is a 32-bit Single CPU list.

Our 40 moves in 40 minutes repeating and 40 moves in 4 minutes repeating are both adjusted to the AMD64 X2 4600+ (2.4GHz).

Currently active testers are:
Graham Banks, Ray Banks, Shaun Brewer, Kirill Kryukov, Dom Leste, Tom Logan, Wassim Saeed, Charles Smith, George Speight and Gabor Szots.
Currently inactive testers are:
Sarah Bird, Andreas Schwartmann, Chris Taylor, Martin Thoresen and Chuck Wilson.

Be aware that in the early stages of testing, an engine's rating can often fluctuate a lot.
It is strongly advised to also look at the many other rating lists available in order to get a more accurate overall picture of an engine's rating relative to others.

40/40 Notes

There are currently just under 150,000 games in the 40/40 database.

4CPU 64-bit Engines

Rybka 3 is almost 150 elo clear at the top, ahead of both Naum 3.1 and Zappa Mexico II which are very even in strength.
50+ elo further back is a closely bunched group that includes Deep Shredder 11, Deep Sjeng 3.0, Toga II 1.4.1SE, Hiarcs 12, Bright 0.4a (private) and Deep Fritz 10.1. We have not tested Deep Sjeng WC2008 in this category yet.
Glaurung 2.1 and Loop M1-T are next in the pecking order.

The relative ratings of the 2CPU engines that have been well tested are pretty much the same as their 4CPU counterparts.

Single CPU Engines

Rybka 3 is roughly 190 elo ahead of other engines in this category. Although more games are still needed, it seems apparent that the default settings are better than both the dynamic and human settings.
Naum 3.1, Zappa Mexico II and Fritz 11 are all pretty close in strength. It is expected that Deep Sjeng WC2008 will join them once we've tested it more extensively.
There is a small margin back to Shredder 11 and Toga II 1.4.1SE.
Hiarcs 12 is further back still, ahead of the group that includes Bright 0.4a (private), Fruit 2.3.1, Loop 13.6, Glaurung 2.1, Cyclone 1.0 and Thinker 5.1e Passive.

Free Single CPU Engines

Rybka 2.2 heads the field with a 50+ elo gap back to Toga II 1.4.1SE.
There is a similar gap back to Fruit 2.3.1, Glaurung 2.1, Cyclone 1.0 and Thinker 5.2e Passive. Spike 1.2 Turin and Bright 0.3a are a further 20+ elo behind, but clearly stronger than the next group that includes Frenzee Feb08, Twisted Logic 20080620 and Delfi 5.4.

CCRL tests a wide range of free engines, ranging right down to the 1900 elo level. The intention is to get well over 200 games for each of these engines. This rating list is certainly our most extensive one.

Recently released engines that seem to have made big strides are Twisted Logic, Cyrano, DanaSah, Rotor, Pupsi and NanoSzachy.

Blitz Notes

An enormous amount of work goes into the blitz list, and with over 350,000 games in the database, it is well worth a visit.

Of special interest to some will be the best free 1CPU engines list which is being constructed through a systematic testing approach as mentioned here:
http://kirill-kryukov.com/chess/discussion-board/viewtopic.php?f=7&t=3271

FRC Notes

Ray tests only those engines that can play FRC through the Shredder Classic GUI.
If engine authors have a new and stable version of their engine that will run under this GUI, they should contact Ray if they wish to see it tested.

Rybka 3 has a massive 200 elo lead over the closely grouped Shredder 11, Naum 3.1 and Deep Sjeng 3.0.
Hiarcs Paderborn 2007 in fifth spot is well ahead of Fruit 051103 and Loop 10.32f (the most recent Loop version that could play FRC).

For FRC the best list to look at is the pure list.
http://www.computerchess.org.uk/ccrl/404FRC/

Stats/Presentation Notes

The LOS (likelihood of superiority) stats to the right hand side of each rating list tell you the likelihood in percentage terms of each engine being superior to the engine directly below them.

All games are available for download by engine, by month or by ECO code.
ELO ratings are now saved in all game databases for those engines that have 200 games or more.

Clicking on an engine name will give details as to opponents played plus homepage links where applicable.

Custom lists of engines can be selected for comparison.

An openings report page lists the number of games played by ECO codes with draw percentage and White win percentage. Clicking on a column heading will sort the list by that column.
Parent - - By Ingo Bauer Date 2008-11-01 22:25
Hello Graham

Jsut a question for understanding:

Have a look here:

http://www.computerchess.org.uk/ccrl/4040/cgi/compare_engines.cgi?class=Single-CPU+engines&print=Rating+list&print=Results+table&print=LOS+table&table_size=12&cross_tables_for_best_versions_only=1

and here:

http://www.computerchess.org.uk/ccrl/4040/rating_list_pure_single_cpu.html

One list is: "Single CPU engines", the other is "Pure single CPU engines" but they have different rankings (5,6,7 + maybe others).

Unfortunately I have to say that every time I look at you list I found same "flaws". As soon as you look into details it becomes obvious that somehow you concept of different testers + different hardware + different times + different opponents + different number of games for each entry seems to be "suspicious".

Sorry but please have a look at all of your basic test conditions. This is really puzzling me as i try to find a good and reliable testing method for engines

Nevertheless I have a very high respect for the amount of work you are investing, keep uo the good work (but check you methods)
Ingo Bauer
Parent - - By Graham Banks Date 2008-11-01 22:38
The "pure" list only includes games played against other engines in that list. Hope this helps.

Cheers, Graham.
Parent - - By Ingo Bauer Date 2008-11-01 22:45
Hi

[quote="Graham Banks"]
The "pure" list only includes games played against other engines in that list. Hope this helps.

Cheers, Graham.
[/quote]

Not realy! I see differnet number of games in the lists, but does this mean that the single list have games vs Engines which are NOT in the list? I did not check this, but whom do you expect to understand that?

Now completly puzzled
Ingo
Parent - - By Graham Banks Date 2008-11-01 22:56
From the notes provided at the top of each pure list:

"Pure" list removes rating distortion

"Pure" list is computed to remove the distortion that may affect the main rating list. Distortion appears when several versions or settings of the same engine are included together in the testing study. Suppose you have engine A and several versions of engine B: B1, B2, B3. Suppose also that A is particularly strong versus any version of B, which often happens in real testing because of some characteristics of those engines. In such case A will have higher rating, comparing to the study where only one version of B is present. Same thing may happen when A is weak versus B, getting lower rating.

To remove that distortion, a separate game database is constructed from games played only by best version in each engine "family". To save some space and time, pure database has all moves stripped out, it contains PGN header and results only. Then the "Pure list" is computed based for that "pure" database using Bayeselo.
Parent - - By Ingo Bauer Date 2008-11-01 23:14
Hello Graham

[quote="Graham Banks"]
From the notes provided at the top of each pure list:

[/quote]

This is the top of the pure list as I see it:

-------------------------------------------------------------------------------
Contents:

    * Home
    * 40/4
    * 40/4 FRC
    * 40/40
    * Forum

 
CCRL 40/40
Downloads and Statistics
October 31, 2008
 
Testing summary:
Total: 147'043 games
played by 531 programs
21374 CPU days (X2 4600+)

White wins: 53'156 (36.1%)
Black wins: 39'891 (27.1%)
Draws: 53'996 (36.7%)
White score: 54.5%

    * Index
    * About
    * Complete list
    * Pure list
    * Games
    * Correlation
    * History
    * Thanks
    * Links

Pure list for single-CPU engines
Pure database download

To save space, pure database has all moves stripped out, it contains PGN header and results only. This pure database is useful only for rating calculation or similar analysis, it does not have actual games, only the results.
Download pure database, 7'620 games:  0.02 MB
CCRL 40/40 Rating List -- Pure single-CPU engines
------------------------------------------------------------------------------------------------------------------

There is nothing mentioned as described by you on the first check. But OK, I got the principle now!

One question:

Given engine A is very good vs any subset of engine B, while engine C is very bad against any subset of engine B and engine A is very bad vs the whole subset of engine C. Now your pure list has a problem as you only count "best " versions while it might be that a engine A is very good vs anything else than lower subset of C.

Yes this is a construct, but valid anyhow!

I have to think about this by myself!

Again: I know about the amount of work you are doing and I apreciate that!

Bye
Ingo
Parent - By Graham Banks Date 2008-11-01 23:24
Hi Ingo,

Kirill is more the statistician than I am, so I'll get him to answer your questions.
We don't mind people asking questions or posting constructive criticism about what we do.
We fully realise that our rating lists will have some anomalies. All rating lists do, which is why we recommend that all rating lists should be looked at to provide an overall picture of engines' strength relative to each other.

Regards, Graham.
Up Topic Hauptforen / CSS-Forum / CCRL update (31st October 2008)

Powered by mwForum 2.29.3 © 1999-2014 Markus Wichitill