
![]() |
Last update: 01/10/97 |
|
My test suite is still growing, and I carefully select each position so that they
demonstrate something about the programs solving or trying to solve them. They are mostly
very difficult, and most of them cannot be easily solved within 3 minutes, which is what
I would consider as a reasonable time for actual game-playing.
But notice that in some of my tests (i.e. Test 81) all of the
programs tested find the correct move, however I do NOT
consider any of them to have solved
the problem, because their evaluation of the position shows that they
don't see the draw,
and do not even know how to maintain the draw afterwards.
In my opinion, a meaningful test should include not only
time and solution, but also the minimum value the evaluation
should have to consider that the program has really found the solution.
For instance, test XX should be solved within 10 minutes, it should find Nb3,
and it should have Eval >= +5.00 points because that move
wins the queen. If the program does find Nb3 within 10 min., but evaluates the move
as +0.32, then the program has not seen that the queen is won,
and certainly has NOT solved the problem.
Perhaps if let alone for another 20 minutes it would then evaluate it as +5.12 in
which case it would have solved it, but not with Eval = +0.32.
Same goes, as another example, for certain drawn positions (see Test 81),
where white can draw enclosing the enemy king in a fortress. The promoted queen cannot
mate the white king alone, and so the position is drawn.
If the program evaluates this as Eval =-9.00 because of the queen, the program
is not seeing the draw and is not understanding
the position at all, so much in fact that many times it will ruin the position
even after making the correct first move. Thus, it cannot be considered to have solved
the problem, either.
|

|
Taking all of this into account, I hereby advocate that test suites should include for each and every test position, not just the desirable move or moves, the moves to avoid, and some time limit, but also the minimum value the desirable move must have to consider the program has really seen the consequences of the move, and is not chosing the right move for the wrong reasons.
|
