On 12/22/2023 12:18 PM, MK wrote:
On December 13, 2023 at 7:27:56 AM UTC-7, Timothy Chow wrote:
If you have a lot of trials then you can be very
confident that you are learning "what the bot
really thinks" and that it is very unlikely to
change its mind even if you increase the number
of trials to infinity.
This isn't necessarily true and indeed incomplete.
While random errors decrease, systematic errors
may increase (accumulate and compound), thus
cause the bot to change its mind.
No, this is not correct, at least when you are simply extending
a specific rollout. Systematic errors can indeed accumulate and
compound over the course of a game, but a rollout trial repeatedly
samples an entire game, so *each individual* trial is subject to
the accumulated systematic error. There will be some randomness
involved from trial to trial, of course; some trials may be "lucky"
enough to avoid the variations that suffer from a lot of accumulated
systematic error, while other trials may be "unlucky" enough to hit
those variations, but in the long run these fluctuations will even
out, and the rollout will converge. The final result will be an
average over all accumulated systematic errors.
I assume you mean look-ahead plies? Can you (or
someone else) expand on this and explain/clarify
how plies work during play and during rollouts?
The GNU team can answer this better than I can. One thing to note
is that during rollouts, the bots will apply some kind of move
filter to screen out unpromising plays. That is, if you perform
a 3-ply rollout, the bot doesn't necessarily evaluate every legal
move at 3-ply and pick the highest-scoring one. It will evaluate
all the options at the lowest ply but then discard a lot of them
as not likely to emerge as the top play.
I won't argue against self-consistency if you can
prove that your equilibrium play is actually that.
The *theoretical* equilibrium play is *defined* in terms of a
system of equations that expresses self-consistency. If you insist
on an empirical definition, though, then self-consistency can't be
proved.
so you can "cross-examine" the bot and see its
answers are self-consistent.
This would be most interesting for me to see. Has
any bot been cross-examined for this and how?
I don't know if anyone has done this in a systematic fashion, but
certainly, if you take some crazy superbackgame or containment
position, you can observe inconsistency yourself. Note down the
3-ply equity (for example). Then run through all the possible rolls,
and note down their 3-ply equities. Average them, and you'll find
that they don't average out to the original 3-ply equity. This means
that the 3-ply equity isn't (entirely) self-consistent. In many
positions, the top play will still be the top play, but in the crazy superbackgame positions, this experiment can result in wild swings
that drastically change the top play.
But again, the arguments are only heuristic, and
we certainly can't be completely sure in any
particular instance that stronger settings are
giving us more "accurate" answers.
I argue that we can if we have unbiased bots that
are trained not only through cubeless, single-game
play but also through cubeful and "matchful" play,
eliminating extrapolated cubeful/matchful equities.
There are certainly ways to improve the way bots are trained, but it
will still be true that we won't be *completely* sure that we're getting
more accurate answers in every position. That would require more
computing power than is available in the observable universe.
---
Tim Chow
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)