Hi Swami!
Many thanks for the next test set.
5 engines at first glance: Ivanhoe T0.4 93, Houdini 1.03 91, R4 89, Stockfish 1.9 87, Shredder12 82 out of 100, each on a single core 2.5GHz with 128 Mb hash and 10 seconds, 1 extra ply search depth.
(noncompetitive: ChessGenius 7.2 made 70/100 having a 3.GHz core and 60 seconds per position by the way

)
Some positions would need some more time on single core maybe to change the results appreciably, for example the very first one:
1k1r4/1p3p2/p1bq1p1p/4p3/3r1P1Q/P5P1/1PP1B2P/K2RR3 b - -
1... e4 (1... Re8 2. Bh5 (2. Bg4 exf4) 2... Rxd1+ (2... Rd2
3. Qg4) 3. Rxd1 Qe7 4. f5) 2. Qxh6 Rd2 3. Qh5 e3 4. Bd3 Qd5
5. Qxd5 Bxd5 6. Rc1 Re8
Looking only a few moves into two of the variants without finding punch lines but the necessity to look much deeper into them, main variant getting 1... e4 10 and non main 1... Re8 7 points, within 10 seconds only, differences of eval between these two must be a matter of static evalution mainly, I guess.
(And as for a positional test this could be one of the goals?) On the other hand it shows, how important the amount of points broken down into every solved- unsolved positions is. The difference of one "unsolved" to 3 points only itemized is obvious as for counting within the result as a whole one.
Considering this I think one of the arguments against test positions, to be too easy or too difficult for a certain time- hardware- combination can be refuted: it's not a matter of a single result to compare with other strengths or weaknesses of engines, measured by rating lists for example. It's a matter of proportions between certain special abilities in positional or tactial quest. The numeral results matter much less and depend on hardware, time and of course on the single position.
The proportions between engines and versions of engines matter much more as for my personal point of view and can be measured this way additionally to ratings from games very fine, distinguishing of engines and versions scoring just marginally different might be much easier sometimes.
Even if I for myself still prefer looking very closely at every single test position and the engines' output in analysis (single + multi variant mode) at longer time and longer variants letting the engines go forward and backward analyzing, to get an estimate of my very own for my very own kind of positions (especially as for variants of openings, I want the engines to play best with), I appreciate test sets like yours very much too, thanks again.