• Re: ChatGPT 5.0 wrote this Rubric so that every LLM system evaluates DD

    From Richard Damon@21:1/5 to olcott on Wed Aug 20 22:04:40 2025
    On 8/20/25 11:52 AM, olcott wrote:
    ChatGPT 5.0 and Gemini had to be specifically told
    to think it all the way through without guessing.

    ChatGPT 4.0, Claude AI, and Grok did not have to be
    told not to guess. They figured out the
    *recursive simulation non-halting behavior pattern*
    and that HHH(DD)==0 is correct all on their own without
    prompting.

    https://claude.ai/share/da9e56ba-f4e9-45ee-9f2c-dc5ffe10f00c

    https://chatgpt.com/share/68939ee5-e2f8-8011-837d-438fe8e98b9c

    https://grok.com/share/c2hhcmQtMg%3D%3D_810120bb-5ab5-4bf8-af21-
    eedd0f09e141

    <Rubric>
    Rules of Simulation:
    - (a) If a non-terminating behavior pattern is detected during
    simulation, abort immediately and return 0.
    - (b) If the simulated program reaches a return statement without
    earlier detection, return 1.
    Ignoring the 3rd possibility, that the decider just runs forever if it
    never finds a non-halting pattern or reaches a final state.

    - Priority: (a) always takes precedence over (b).

    Shouldn't be needed, as if you CAN get to (b), you can't have (a)>
    Procedure:
    1. Restate rules before simulation.
    2. Simulate step by step using a trace table with columns:
       | Step | Code Location | State (variables, call stack) | Event /
    Action | Rule Applied |
    3. Stop immediately if recursion, infinite loop, or repeating state
    occurs → apply (a).
    4. If a return is reached first → apply (b).
    5. Conclude with:

    What if you never hit one of them?

       - Which rule fired first
       - The correct return value
    6. Optionally, provide:
       - (i) Simulation result under rules
       - (ii) Actual runtime behavior in C

    </Rubric>

    <Test Program>
    typedef int (*ptr)();
    int HHH(ptr P);

    int DD()
    {
      int Halt_Status = HHH(DD);
      if (Halt_Status)
        HERE: goto HERE;
      return Halt_Status;
    }
    </Test Program>

    <Required Output Format>
    1. Restatement of rules
    2. Step-by-step execution trace in a table
    3. Identification of first rule triggered
    4. Final result: return value of HHH(program)
    5. Optional: actual C runtime behavior if relevant
    </Required Output Format>

    </Benchmark Task>

    I
    Sorry, your Rubric is in error, as it ignore a possible case and ASSUMES
    that the decider can find an answer.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)