• TeX's line breaking in the grub sesh

    From Stefan Ram@21:1/5 to All on Mon Aug 5 13:09:16 2024
    During my grub sesh today, I crushed out TeX's line breaking
    in Python in like a hot minute - 36 to be exact. Tryna use
    it for plain text, like, monospaced fonts and whatnot. Natch,
    I stripped it down to the bare bones, but it's still hella
    tight. Already got that "parshape" action goin' on (which
    Knuth-Plass can't hang with, if I'm not trippin'). Next up,
    I'm finna tackle those "discretionary items" - that's gonna be
    gnarly! For sure there's some janky bugs in there, but peep this:

    wrap.py

    source = 'Ich habe das gebackene Profi-Bettuch bereits gesehen. '

    active0 =[ 0, 0, 0, 0 ] # previous, position, quality, line_number
    active =[ active0 ]
    parshape =[ 10, 20, 20, 20, 20, 20, 20, 20 ]

    p = 1
    while p < len( source ):
    ch = source[ p ]
    if ch == ' ':
    new_active = []
    best_quality = -10000
    best_act = active[ 0 ]
    for act in active:
    a = act[ 1 ]
    line_number = act[ 3 ]
    target_length = parshape[ line_number ]
    this_length = p - a
    if this_length > target_length:
    pass
    else:
    quality = this_length - target_length
    new_active.append( act )
    if quality > best_quality:
    best_quality = quality
    best_act = act
    new_active.append( [ best_act, p, best_quality, best_act[ 3 ]+1 ] )
    active = new_active
    p += 1
    act = active[ -1 ]
    buff = []
    while act[ 0 ]:
    prev = act[ 0 ]
    buff.append( source[ prev[1]: act[1] ])
    act = prev
    for line in reversed( buff ):
    print(line)

    output

    Ich habe das
    gebackene Profi-Bettuch
    bereits gesehen.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stefan Ram@21:1/5 to Stefan Ram on Wed Aug 28 18:34:59 2024
    [email protected] (Stefan Ram) wrote or quoted:
    For sure there's some janky bugs in there, but peep this:

    Turns out there was still a glitch in the code! The latest
    build now shows a paragraph break with (fingers crossed)
    global optimization, taking parshape into account. Now that
    I've finally squashed the bug, discretionary items haven't
    been baked in yet. That's next on my to-do list though.

    Ironically, lines of the following Python 3.12 source code have NOT
    been wrapped to the 72 characters recommended for Usenet posts!

    main.py

    from dataclasses import dataclass
    from typing import Optional, List, Iterator
    import bisect

    source_text = list( ' Lorem ipsum dolor sit amet, consectetur adipiscing elit, '
    'sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad '
    'minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea '
    'commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit ' 'esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat '
    'non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. ' )

    parshape =[ 80, 80, 80, 40 ]

    @dataclass
    class ActiveEntry:
    previous: Optional[ 'ActiveEntry' ]= None
    position: int = 0 # position in the text
    branch: int = 0 # for future version
    sum_quality: int = 0 # the sum of all "merits" up to this point
    line_number: int = 0 # line started with this (first line = 0)

    parshape_length = len( parshape )

    def print_up_to( this ):
    '''prints the wrapped paragraph up to the point "this".
    It starts at the back point "this" and then goes forward
    in the text via the linked chain of points. Finally, for
    printing the text in the normal order, it then goes forward
    again.'''
    buff = []
    qual = []
    sum_quality = this.sum_quality
    while this.previous:
    previous = this.previous
    line = source_text[ previous.position: this.position ]
    # print( f'{line = }' )
    buff.append( line )
    qual.append( this.sum_quality )
    this = previous
    start_position = 0
    # we went backwards, but actually want to print in the normal direction
    first = 1
    for i,( line, qual ) in enumerate( zip( reversed( buff ), reversed( qual ))):
    text = ''.join( line[ start_position: ])
    target_length = parshape[ i ]if i < parshape_length else parshape[ -1 ]# dupe!
    output = text
    print( output[ first: ])
    first = 0
    start_position = 1 # skip an initial space or something
    print()
    print( 'Total merits:', sum_quality )
    print()

    active0 = ActiveEntry()

    active_list =[ active0 ]

    current_position = 1
    source_length = len( source_text )
    while current_position < len( source_text ):
    ch = source_text[ current_position ]
    if ch == ' ': # possible breakpoint
    new_active_list = [] # next active list
    best_sum_quality = None # best quality summed across this and previous lines, not yet determined
    best_act = active_list[ 0 ] # preliminary choice
    for active in active_list:
    active_position = active.position
    line_number = active.line_number
    target_length = parshape[ line_number ]if line_number < parshape_length else parshape[ -1 ]# dupe!
    distance = current_position - active_position
    adjustment = target_length - distance
    if adjustment < 0:
    # "When an active breakpoint a is encountered for which
    # the line from a to b has an adjustment ratio less
    # than -1 (that is, when the line can't be shrunk to
    # fit the desired length), breakpoint a is removed
    # from the active list."
    pass # do not transfer into the new active list
    else:
    new_active_list.append( active )
    this_line_quality = -adjustment**2
    have_reached_final_space = current_position == source_length - 1 # final ' ' on end of last line
    if have_reached_final_space: this_line_quality = 0 # arbitrary whitespace at end is accepted
    this_sum_quality = active.sum_quality + this_line_quality
    if \
    best_sum_quality is None or \
    this_sum_quality > best_sum_quality:
    best_sum_quality = this_sum_quality
    best_predecessor = active
    if best_sum_quality is not None:
    # make a new active point from current position, linking it to the best active point found
    new_active_list.append( ActiveEntry( previous=best_predecessor, position=current_position, branch=0, sum_quality=best_sum_quality, line_number=best_predecessor.line_number+1 ))
    active_list = new_active_list
    current_position += 1
    active = active_list[ -1 ] # the final space

    print_up_to( active )

    output

    Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit
    in voluptate velit esse cillum dolore
    eu fugiat nulla pariatur. Excepteur
    sint occaecat cupidatat non proident,
    sunt in culpa qui officia deserunt
    mollit anim id est laborum.

    Total merits: -80

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stefan Ram@21:1/5 to Stefan Ram on Fri Jan 31 22:08:14 2025
    [email protected] (Stefan Ram) wrote or quoted:
    new_active_list.append( active )
    this_line_quality = -adjustment**2

    In August 2024, I posted an implementation of TeX's paragraph
    algorithm for plain text in Python intended to be used to
    generate ragged-right output. I'm still working on this code
    and intend to publish the new version of the code once it's
    finished. Here is a preliminary report:

    This is an output of the algorithm:

    Ich will das natürlich nicht überbewerten, aber eine gewisse Be-
    schäftigung mit dem neuen/unbekannten Code vor Ausführung desselben
    fände ich nicht schlecht, und sei es nur, die Zitatzeichen zu ent-
    fernen und dabei mal einen Blick auf den Inhalt zu werfen ...

    Now, I set "wrapper.adjdemerits = 1000":

    Ich will das natürlich nicht überbewerten, aber eine gewisse Be-
    schäftigung mit dem neuen/unbekannten Code vor Ausführung dessel-
    ben fände ich nicht schlecht, und sei es nur, die Zitatzeichen zu
    entfernen und dabei mal einen Blick auf den Inhalt zu werfen ...

    Above, we see that the program now tries harder to make
    the ragged right border less ragged. But we now have two
    consecutive discretionary hyphens!

    wrapper.adjdemerits = 1000
    wrapper.doublehyphendemerits = 10000

    Ich will das natürlich nicht überbewerten, aber eine gewisse Be-
    schäftigung mit dem neuen/unbekannten Code vor Ausführung desselben
    fände ich nicht schlecht, und sei es nur, die Zitatzeichen zu ent-
    fernen und dabei mal einen Blick auf den Inhalt zu werfen ...

    Now, we have a discretionary break in then penultimate line!

    wrapper.adjdemerits = 1000
    wrapper.doublehyphendemerits = 10000
    wrapper.finalhyphendemerits = 100000

    Ich will das natürlich nicht überbewerten, aber eine gewisse Be-
    schäftigung mit dem neuen/unbekannten Code vor Ausführung dessel-
    ben fände ich nicht schlecht, und sei es nur, die Zitatzeichen
    zu entfernen und dabei mal einen Blick auf den Inhalt zu werfen ...

    The price to be paid for the removal of the final hyphen is the
    reintroduction of the double hyphen. I guess you just can't have
    everything you wish for!

    All of the above is actual program output, so these TeX parameters
    (and some other) are already implemented, albeit with the numeric
    values possibly not corresponding directly to TeX's values.
    (So, "1000" in my code might not have the same meaning as in TeX.)

    Now I plan to add some of my own ideas for parameters and then
    to clean up the code and add some documentation . . .

    One final example, wrapper.hyphenpenalty = 1000:

    Ich will das natürlich nicht überbewerten, aber eine gewisse
    Beschäftigung mit dem neuen/unbekannten Code vor Ausführung
    desselben fände ich nicht schlecht, und sei es nur, die Zitatzeichen
    zu entfernen und dabei mal einen Blick auf den Inhalt zu werfen ...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)