• Can Prologers produce 100% Prolog Code? (Was: Do Prologers know the Uni

    From Mild Shock@21:1/5 to Somebody on Fri Jun 27 13:22:33 2025
    Somebody wrote:

    It seems that it reads in as ðŸ‘\u008D but writes out as ðŸ‘\\x8D\\.

    Can one then do ‘\uXXXX’ in 100% Prolog as
    well? Even including surrogates? Of course,
    here some DCG generator snippet from Dogelog

    Player which is 100% Prolog. This is from the
    Java backend, because I didn’t introduce ‘\uXXXX’
    in my Prolog system, because it is not part of

    ISO core standard. The ISO core standard would want '\xXX':

    crossj_escape_code2(X) --> {X =< 0xFFFF}, !,
    {atom_integer(J, 16, X), atom_codes(J, H),
    length(H, N), M is 4-N}, [0'\\, 0'u],
    cross_escape_zeros(M),
    cross_escape_codes2(H).
    crossj_escape_code2(X) --> {crossj_high_surrogate(X, Y),
    crossj_low_surrogate(X, Z)},
    crossj_escape_code2(Y),
    crossj_escape_code2(Z).

    crossj_high_surrogate(X, Y) :- Y is (X >> 10) + 0xD7C0.

    crossj_low_surrogate(X, Y) :- Y is (X /\ 0x3FF) + 0xDC00.

    Mild Shock schrieb:
    The official replacement character is 0xFFFD:

    Replacement Character
    https://www.compart.com/de/unicode/U+FFFD

    Well that is what people did in the past, replace
    non-printables by the ever same code, instead of
    using ‘\uXXXX’ notation. I have studied the

    library(portray_text) extensively. And my conclusion
    is still that it extremly ancient.

    For example I find:

    mostly_codes([H|T], Yes, No, MinFactor) :-
        integer(H),
        H >= 0,
        H =< 0x1ffff,
        [...]
       ;   catch(code_type(H, print),error(_,_),fail),
        [...]

    https://github.com/SWI-Prolog/swipl-devel/blob/eddbde61be09b95eb3ca2e160e73c2340744a3d2/library/portray_text.pl#L235


    Why even 0x1ffff and not 0x10ffff, this is a bug,
    do you want to starve is_text_code/1 ? The official
    Unicode range is 0x0 to 0x10ffff. Ulrich Neumerkel

    often confused the range in some of his code snippets,
    maybe based on a limited interpretation of Unicode.
    But if one would switch to chars one could easily

    support any Unicode code point even without
    knowing the range. Just do this:

    mostly_chars([H|T], Yes, No, MinFactor) :-
        atom(H),
        atom_length(H, 1),
        [...]
       ;  /* printable check not needed */
        [...]

    Mild Shock schrieb:
    Hi,

    The most radical approach is Novacore from
    Dogelog Player. It consists of the following
    major incisions in the ISO core standard:

    - We do not forbid chars, like for example
       using lists of the form [a,b,c], we also
       provide char_code/2 predicate bidirectionally.

    - We do not provide and _chars built-in
       predicates also there is nothing _strings. The
       Prolog system is clever enough to not put
       every atom it sees in an atom table. There
       is only a predicate table.

    - Some host languages have garbage collection that
       deduplicates Strings. For example some Java
       versions have an options to do that. But we
       do not have any efforts to deduplicate atoms,
       which are simply plain strings.

    - Some languages have constant pools. For example
       the Java byte code format includes a constant
       pool in every class header. We do not do that
       during transpilation , but we could of course.
       But it begs the question, why only deduplicate
       strings and not other constant expressions as well?

    - We are totally happy that we have only codes,
       there are chances that the host languages use
       tagged pointers to represent them. So they
       are represented similar to the tagged pointers
       in SWI-Prolog which works for small integers.

    - But the tagged pointer argument is moot,
       since atom length=1 entities can be also
       represented as tagged pointers, and some
       programming languages do that. Dogelog Player
       would use such tagged pointers without
       poluting the atom table.

    - What else?

    Bye

    Mild Shock schrieb:

    Technically SWI-Prolog doesn't prefer codes.
    Library `library(pure_input)` might prefer codes.
    But this is again an issue of improving the
    library by some non existent SWI-Prolog community.

    The ISO core standard is silent about a flag
    back_quotes, but has a lot of API requirements
    that support both codes and chars, for example it
    requires atom_codes/2 and atom_chars/2.

    Implementation wise there can be an issue,
    like one might decide to implement the atoms
    of length=1 more efficiently, since with Unicode
    there is now an explosion.

    Not sure whether Trealla Prolog and Scryer
    Prolog thought about this problem, that the
    atom table gets quite large. Whereas codes don't
    eat the atom table. Maybe they forbit predicates

    that have an atom of length=1 head:

    h(X) :-
         write('Hello '), write(X), write('!'), nl.

    Does this still work?

    Mild Shock schrieb:
    Concerning library(portray_text) which is in limbo:

    Libraries are (often) written for either
    and thus the libraries make the choice.

    But who writes these libraries? The SWI Prolog
    community. And who doesn’t improve these libraries,
    instead floods the web with workaround tips?
    The SWI Prolog community.

    Conclusion the SWI-Prolog community has itself
    trapped in an ancient status quo, creating an island.
    Cannot improve its own tooling, is not willing
    to support code from else where that uses chars.

    Same with the missed AI Boom.

    (*) Code from elsewhere is dangerous, People
    might use other Prolog systems than only SWI-Prolog,
    like for exampe Trealla Prolog and Scryer Prolog.

    (**) Keeping the status quo is comfy. No need to
    think in terms of programm code. Its like biology
    teachers versus pathology staff, biology teachers
    do not everyday see opened corpses.


    Mild Shock schrieb:

    Inductive logic programming at 30
    https://arxiv.org/abs/2102.10556

    The paper contains not a single reference to autoencoders!
    Still they show this example:

    Fig. 1 ILP systems struggle with structured examples that
    exhibit observational noise. All three examples clearly
    spell the word "ILP", with some alterations: 3 noisy pixels,
    shifted and elongated letters. If we would be to learn a
    program that simply draws "ILP" in the middle of the picture,
    without noisy pixels and elongated letters, that would
    be a correct program.

    I guess ILP is 30 years behind the AI boom. An early autoencoder
    turned into transformer was already reported here (*):

    SERIAL ORDER, Michael I. Jordan - May 1986
    https://cseweb.ucsd.edu/~gary/PAPER-SUGGESTIONS/Jordan-TR-8604-OCRed.pdf >>>>>

    Well ILP might have its merits, maybe we should not ask
    for a marriage of LLM and Prolog, but Autoencoders and ILP.
    But its tricky, I am still trying to decode the da Vinci code of

    things like stacked tensors, are they related to k-literal clauses?
    The paper I referenced is found in this excellent video:

    The Making of ChatGPT (35 Year History)
    https://www.youtube.com/watch?v=OFS90-FX6pg






    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Mild Shock@21:1/5 to Mild Shock on Fri Jun 27 13:36:14 2025
    Attention: Java is an example that doesn’t
    understand \UXXXXXXXX, so one has to be careful
    in inroducing \uXXXX and \UXXXXXXXX at the same time.

    Although we have in Python that this works:

    emoji = "\U0001F600" # 😀 GRINNING FACE

    Python universal strings can even distingush between
    original code point, and surrogate translation, since
    the strings can be up to 32-bit words. Java does

    only accept for grinning face the surrogate
    translation, since their strings are 16-bit words:

    String emoji = "\uD83D\uDE00"; // 😀 GRINNING FACE

    Mild Shock schrieb:
    Somebody wrote:

    It seems that it reads in as ðŸ‘\u008D but writes out as ðŸ‘\\x8D\\.

    Can one then do ‘\uXXXX’ in 100% Prolog as
    well? Even including surrogates? Of course,
    here some DCG generator snippet from Dogelog

    Player which is 100% Prolog. This is from the
    Java backend, because I didn’t introduce ‘\uXXXX’
    in my Prolog system, because it is not part of

    ISO core standard. The ISO core standard would want '\xXX':

    crossj_escape_code2(X) --> {X =< 0xFFFF}, !,
       {atom_integer(J, 16, X), atom_codes(J, H),
       length(H, N), M is 4-N}, [0'\\, 0'u],
       cross_escape_zeros(M),
       cross_escape_codes2(H).
    crossj_escape_code2(X) --> {crossj_high_surrogate(X, Y),
       crossj_low_surrogate(X, Z)},
       crossj_escape_code2(Y),
       crossj_escape_code2(Z).

    crossj_high_surrogate(X, Y) :- Y is (X >> 10) + 0xD7C0.

    crossj_low_surrogate(X, Y) :- Y is (X /\ 0x3FF) + 0xDC00.

    Mild Shock schrieb:
    The official replacement character is 0xFFFD:

    Replacement Character
    https://www.compart.com/de/unicode/U+FFFD

    Well that is what people did in the past, replace
    non-printables by the ever same code, instead of
    using ‘\uXXXX’ notation. I have studied the

    library(portray_text) extensively. And my conclusion
    is still that it extremly ancient.

    For example I find:

    mostly_codes([H|T], Yes, No, MinFactor) :-
         integer(H),
         H >= 0,
         H =< 0x1ffff,
         [...]
        ;   catch(code_type(H, print),error(_,_),fail),
         [...]

    https://github.com/SWI-Prolog/swipl-devel/blob/eddbde61be09b95eb3ca2e160e73c2340744a3d2/library/portray_text.pl#L235


    Why even 0x1ffff and not 0x10ffff, this is a bug,
    do you want to starve is_text_code/1 ? The official
    Unicode range is 0x0 to 0x10ffff. Ulrich Neumerkel

    often confused the range in some of his code snippets,
    maybe based on a limited interpretation of Unicode.
    But if one would switch to chars one could easily

    support any Unicode code point even without
    knowing the range. Just do this:

    mostly_chars([H|T], Yes, No, MinFactor) :-
         atom(H),
         atom_length(H, 1),
         [...]
        ;  /* printable check not needed */
         [...]

    Mild Shock schrieb:
    Hi,

    The most radical approach is Novacore from
    Dogelog Player. It consists of the following
    major incisions in the ISO core standard:

    - We do not forbid chars, like for example
       using lists of the form [a,b,c], we also
       provide char_code/2 predicate bidirectionally.

    - We do not provide and _chars built-in
       predicates also there is nothing _strings. The
       Prolog system is clever enough to not put
       every atom it sees in an atom table. There
       is only a predicate table.

    - Some host languages have garbage collection that
       deduplicates Strings. For example some Java
       versions have an options to do that. But we
       do not have any efforts to deduplicate atoms,
       which are simply plain strings.

    - Some languages have constant pools. For example
       the Java byte code format includes a constant
       pool in every class header. We do not do that
       during transpilation , but we could of course.
       But it begs the question, why only deduplicate
       strings and not other constant expressions as well?

    - We are totally happy that we have only codes,
       there are chances that the host languages use
       tagged pointers to represent them. So they
       are represented similar to the tagged pointers
       in SWI-Prolog which works for small integers.

    - But the tagged pointer argument is moot,
       since atom length=1 entities can be also
       represented as tagged pointers, and some
       programming languages do that. Dogelog Player
       would use such tagged pointers without
       poluting the atom table.

    - What else?

    Bye

    Mild Shock schrieb:

    Technically SWI-Prolog doesn't prefer codes.
    Library `library(pure_input)` might prefer codes.
    But this is again an issue of improving the
    library by some non existent SWI-Prolog community.

    The ISO core standard is silent about a flag
    back_quotes, but has a lot of API requirements
    that support both codes and chars, for example it
    requires atom_codes/2 and atom_chars/2.

    Implementation wise there can be an issue,
    like one might decide to implement the atoms
    of length=1 more efficiently, since with Unicode
    there is now an explosion.

    Not sure whether Trealla Prolog and Scryer
    Prolog thought about this problem, that the
    atom table gets quite large. Whereas codes don't
    eat the atom table. Maybe they forbit predicates

    that have an atom of length=1 head:

    h(X) :-
         write('Hello '), write(X), write('!'), nl.

    Does this still work?

    Mild Shock schrieb:
    Concerning library(portray_text) which is in limbo:

    Libraries are (often) written for either
    and thus the libraries make the choice.

    But who writes these libraries? The SWI Prolog
    community. And who doesn’t improve these libraries,
    instead floods the web with workaround tips?
    The SWI Prolog community.

    Conclusion the SWI-Prolog community has itself
    trapped in an ancient status quo, creating an island.
    Cannot improve its own tooling, is not willing
    to support code from else where that uses chars.

    Same with the missed AI Boom.

    (*) Code from elsewhere is dangerous, People
    might use other Prolog systems than only SWI-Prolog,
    like for exampe Trealla Prolog and Scryer Prolog.

    (**) Keeping the status quo is comfy. No need to
    think in terms of programm code. Its like biology
    teachers versus pathology staff, biology teachers
    do not everyday see opened corpses.


    Mild Shock schrieb:

    Inductive logic programming at 30
    https://arxiv.org/abs/2102.10556

    The paper contains not a single reference to autoencoders!
    Still they show this example:

    Fig. 1 ILP systems struggle with structured examples that
    exhibit observational noise. All three examples clearly
    spell the word "ILP", with some alterations: 3 noisy pixels,
    shifted and elongated letters. If we would be to learn a
    program that simply draws "ILP" in the middle of the picture,
    without noisy pixels and elongated letters, that would
    be a correct program.

    I guess ILP is 30 years behind the AI boom. An early autoencoder
    turned into transformer was already reported here (*):

    SERIAL ORDER, Michael I. Jordan - May 1986
    https://cseweb.ucsd.edu/~gary/PAPER-SUGGESTIONS/Jordan-TR-8604-OCRed.pdf >>>>>>

    Well ILP might have its merits, maybe we should not ask
    for a marriage of LLM and Prolog, but Autoencoders and ILP.
    But its tricky, I am still trying to decode the da Vinci code of

    things like stacked tensors, are they related to k-literal clauses? >>>>>> The paper I referenced is found in this excellent video:

    The Making of ChatGPT (35 Year History)
    https://www.youtube.com/watch?v=OFS90-FX6pg







    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)