[STS v13.0] - "Pawn Play in the Center"

By Swaminathan N Date 2010-10-08 18:45

Version 13 of the Strategic Test Suite series is now available for download.

* Consists of carefully selected 100 test positions on "Pawn Play in the center"

Information:

Pushing a pawn to the center of the board in early middle game to late middle game.

4 central squares (e4,d4,d5,e5) are the most important squares on the board. It's obvious one has to take control of them before their opponent does.

Tests engine's strength on Pawn structures, Defense, Piece Activity evaluation, King safety, Center Play and mobility.

Key Characteristics:

* Pawn Preponderance in the center.
* Idea is to have more influence in the center, which in turn yields better mobility.
* Pawn moves in center exerts pressure on other side by threatening to open up the file.
* Trying to open up the center reduces the amount of pressure on the flank, and divert's opponent's attention away from the king side especially when he's in the process of setting up an attack.

Download it! here:

https://sites.google.com/site/strategictestsuite/home/sts13

Test Suite release time: 8th of October, 2010
Swaminathan and Dann Corbit

By Swaminathan N Date 2010-10-08 18:46

Anyone is welcome to post the results from any engines.

It would be interesting to know how it rates in terms of difficulty relative to other EPD's.

By Peter Martan Date 2010-10-09 00:49 Edited 2010-10-09 00:56

Hi Swami!

Many thanks for the next test set.
5 engines at first glance: Ivanhoe T0.4 93, Houdini 1.03 91, R4 89, Stockfish 1.9 87, Shredder12 82 out of 100, each on a single core 2.5GHz with 128 Mb hash and 10 seconds, 1 extra ply search depth.
(noncompetitive: ChessGenius 7.2 made 70/100 having a 3.GHz core and 60 seconds per position by the way

)
Some positions would need some more time on single core maybe to change the results appreciably, for example the very first one:

1k1r4/1p3p2/p1bq1p1p/4p3/3r1P1Q/P5P1/1PP1B2P/K2RR3 b - -

1... e4 (1... Re8 2. Bh5 (2. Bg4 exf4) 2... Rxd1+ (2... Rd2
3. Qg4) 3. Rxd1 Qe7 4. f5) 2. Qxh6 Rd2 3. Qh5 e3 4. Bd3 Qd5
5. Qxd5 Bxd5 6. Rc1 Re8

Looking only a few moves into two of the variants without finding punch lines but the necessity to look much deeper into them, main variant getting 1... e4 10 and non main 1... Re8 7 points, within 10 seconds only, differences of eval between these two must be a matter of static evalution mainly, I guess.
(And as for a positional test this could be one of the goals?) On the other hand it shows, how important the amount of points broken down into every solved- unsolved positions is. The difference of one "unsolved" to 3 points only itemized is obvious as for counting within the result as a whole one.

Considering this I think one of the arguments against test positions, to be too easy or too difficult for a certain time- hardware- combination can be refuted: it's not a matter of a single result to compare with other strengths or weaknesses of engines, measured by rating lists for example. It's a matter of proportions between certain special abilities in positional or tactial quest. The numeral results matter much less and depend on hardware, time and of course on the single position.
The proportions between engines and versions of engines matter much more as for my personal point of view and can be measured this way additionally to ratings from games very fine, distinguishing of engines and versions scoring just marginally different might be much easier sometimes.

Even if I for myself still prefer looking very closely at every single test position and the engines' output in analysis (single + multi variant mode) at longer time and longer variants letting the engines go forward and backward analyzing, to get an estimate of my very own for my very own kind of positions (especially as for variants of openings, I want the engines to play best with), I appreciate test sets like yours very much too, thanks again.

By Swaminathan N Date 2010-10-09 04:26 Edited 2010-10-09 04:32

Thanks for testing it, Peter. Scores of those 6 engines correspond well with their actual ratings.

Some people hardly go through engine games to see in which areas the engines are weaker, they just care only about the ratings/major bugs like underpromotion.
With these test suites, I like the idea that it can assist the engine with opportunity "areas" to reduce the weaknesses and optimize the overall performance in future versions.

Single test position comes with 3 or 4 best moves each with score range of 10, 7, 4, 2 etc. As you can see, the "total" scores out of 1000 actually gives more realistic representation of engine's knowledge. Therefore it's certainly one of the goals.

Knowledge is one of the most important things that can definitely boost the engine's play and understanding. For an engine to gain better knowledge, Programmers have to implement stuff in code with the goal to eliminate the weak points as presented in the test results.

I do also like the statistical representation as outputted from STS Stat. It gives the broad overview and make comparison of versions easier. Currently, the predictions of ratings based on just 13 test sets would give ratings in the range of +/- 100. With more test sets, I hope the ratings will be more accurate. STS appears to give better predictions on rating of the engine than other test suites out there, because we have 1300 positions so far -- therefore statistically superior!

Best regards,
Swami

By Swaminathan N Date 2010-10-10 06:04 Edited 2010-10-10 09:12

Updated:

Download ALL Epd's in One File:
https://sites.google.com/site/strategictestsuite/piece-specific-epd-s/PieceSpecificEPD%27s.rar

Download Piece-Specific EPD's (for tuning)
https://sites.google.com/site/strategictestsuite/piece-specific-epd-s/PieceSpecificEPD%27s.rar

Best wishes,
Swami

PS: If anyone has CBH/PGN version of STS EPD's, please send it to me at nswami15 at yahoo dot com. I'll put it up at site since few people have asked for it.Thanks!