Forum: >>> Magnum BBS <<<

Re: Parsing timestamps?

From Anthony Howe@21:1/5 to dxf on Sun Oct 6 11:35:24 2024

On 2024-10-06 03:51, dxf wrote:

Is there an easier way of doing this? End goal is a double number representing centi-secs.

Isn't there an ISO 8601 parsing package?

--
Anthony C Howe
[email protected] BarricadeMX & Milters http://nanozen.snert.com/ http://software.snert.com/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From [email protected]@21:1/5 to [email protected] on Mon Oct 7 13:00:10 2024

In article <[email protected]>,
dxf <[email protected]> wrote:

Is there an easier way of doing this? End goal is a double number >representing centi-secs.

empty decimal

: SPLIT ( a u c -- a2 u2 a3 u3 ) >r 2dup r> scan 2swap 2 pick - ;
: >INT ( adr len -- u ) 0 0 2swap >number 2drop drop ;

: /T ( a u -- $hour $min $sec )
2 0 do [char] : split 2swap dup if 1 /string then loop
2 0 do dup 0= if 2rot 2rot then loop ;

: .T 2swap 2rot cr >int . ." hr " >int . ." min " >int . ." sec " ;

s" 1:2:3" /t .t
s" 02:03" /t .t
s" 03" /t .t
s" 23:59:59" /t .t
s" 0:00:03" /t .t

After ca. 50 years I have completed the $@ $! $+! $C+ $/ with
$\ . Now I can do this

"12:03:43" &: $\ TYPE &: $\ TYPE &: $\ TYPE
43 03 12 OK

"12:03:43" &: $/ TYPE &: $/ TYPE &: $/ TYPE
12 03 43 OK

Insert
"hr" TYPE
as required.

I can't believe the long posts this sparks.

Groetjes Albert
--
Temu exploits Christians: (Disclaimer, only 10 apostles)
Last Supper Acrylic Suncatcher - 15Cm Round Stained Glass- Style Wall
Art For Home, Office And Garden Decor - Perfect For Windows, Bars,
And Gifts For Friends Family And Colleagues.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From sjack@21:1/5 to dxf on Mon Oct 7 16:20:29 2024

dxf <[email protected]> wrote:

The HH:MM:SS format is easy but how to deal with the variants shown above? They occur in the real world.

Toad code:
fload job
: xx. 0 <# bl hold # # #> type ;
: tab3. tab rot xx. swap xx. xx. ;

-- &num ( g -- s )
-- Convert g-string to numeric string address
: &num drop 1- ;
-- Note g-string is ANS string ( addr u )

-- ts_elms ( "[hh:][mm:]ss<bl>" -- 0 0 ss | 0 mm ss | hh mm ss )
-- Parse timestamp elements: hh=hours mm=minutes ss=seconds
-- Input hh: element and hh:mm: combination elements may be left out
-- if zero(s).
: ts_elms
bl word here count
o+s do i c@ asc : = if bl i c! then loop
0 0 0 here count
begin
bl split dup 0> while &num number drop
5 roll drop -rot
repeat 4drop
;

ts_elms 25
i. tab3. --> 00 00 25
i. ts_elms 25 tab3. --> 00 00 25
i. ts_elms 10:25 tab3. --> 00 10 25
i. ts_elms 2:10:25 tab3. --> 02 10 25
OK

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From [email protected]@21:1/5 to [email protected] on Tue Oct 8 09:41:07 2024

In article <[email protected]>,
dxf <[email protected]> wrote:

Is there an easier way of doing this? End goal is a double number >representing centi-secs.

empty decimal

: SPLIT ( a u c -- a2 u2 a3 u3 ) >r 2dup r> scan 2swap 2 pick - ;
: >INT ( adr len -- u ) 0 0 2swap >number 2drop drop ;

: /T ( a u -- $hour $min $sec )
2 0 do [char] : split 2swap dup if 1 /string then loop
2 0 do dup 0= if 2rot 2rot then loop ;

: .T 2swap 2rot cr >int . ." hr " >int . ." min " >int . ." sec " ;

s" 1:2:3" /t .t
s" 02:03" /t .t
s" 03" /t .t
s" 23:59:59" /t .t
s" 0:00:03" /t .t

This problem is ill posed. You don't specify what happens
with less that 3 fields, or the meaning of the fields.
Normally I make the tests to run before attempting to code.

Groetjes Albert
--
Temu exploits Christians: (Disclaimer, only 10 apostles)
Last Supper Acrylic Suncatcher - 15Cm Round Stained Glass- Style Wall
Art For Home, Office And Garden Decor - Perfect For Windows, Bars,
And Gifts For Friends Family And Colleagues.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From sjack@21:1/5 to Ahmed on Tue Oct 8 15:19:25 2024

Ahmed <[email protected]> wrote:

I know you don't care about this case, but:

Yes, originally I had syntax checks, left them out to focus more on getting
the zeros in.

-- ts_elms ( "[hh:][mm:]ss<bl>" -- 0 0 ss | 0 mm ss | hh mm ss )
-- Parse timestamp elements: hh=hours mm=minutes ss=seconds
-- Input hh: element and hh:mm: combination elements may be left out
-- if zero(s).
: ts_elms
bl word here count
over c@ asc : = >r ( leading char check )
2dup + 1- c@ asc : = ( lagging char check )
r> or if ." --Invalid " 2drop rdrop exit then
o+s do i c@ asc : = if bl i c! then loop
0 0 0 here count
begin
bl split dup 0> while &num number drop
5 roll drop -rot
repeat 4drop
;

[s] Invalid syntax
i. ts_elms 25: tab3. --> --Invalid
i. ts_elms :25 tab3. --> --Invalid
i. ts_elms :25: tab3. --> --Invalid

[s] Valid syntax
ts_elms 25
i. tab3. --> 00 00 25
i. ts_elms 25 tab3. --> 00 00 25
i. ts_elms 10:25 tab3. --> 00 10 25
i. ts_elms 2:10:25 tab3. --> 02 10 25

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Gerry Jackson@21:1/5 to dxf on Fri Oct 18 15:46:27 2024

On 06/10/2024 08:51, dxf wrote:

Is there an easier way of doing this? End goal is a double number representing centi-secs.

empty decimal

: SPLIT ( a u c -- a2 u2 a3 u3 ) >r 2dup r> scan 2swap 2 pick - ;
: >INT ( adr len -- u ) 0 0 2swap >number 2drop drop ;

: /T ( a u -- $hour $min $sec )
2 0 do [char] : split 2swap dup if 1 /string then loop
2 0 do dup 0= if 2rot 2rot then loop ;

: .T 2swap 2rot cr >int . ." hr " >int . ." min " >int . ." sec " ;

s" 1:2:3" /t .t
s" 02:03" /t .t
s" 03" /t .t
s" 23:59:59" /t .t
s" 0:00:03" /t .t

Another solution

: /t ( ca u -- sec min hour )
3 \ a count, decremented every recurse
[: -rot dup 0>
if 0. 2swap >number 1 /string 2swap drop ( -- ct ca' u' n1 )
>r rot 1-
recurse r> swap exit
then 2drop
;] execute
0 ?do 0 loop \ 0 hours and minutes if missing in source string
;
: .t cr . ." hr " . ." min " . ." sec " ;

cr
s" 1:2:3" /t .t
s" 02:03" /t .t
s" 03" /t .t
s" 23:59:59" /t .t
s" 0:00:03" /t .t
s" " /t .t
s" :" /t .t
s" :53" /t .t
s" 11/12/13" /t .t \ Different separator
s" 11::13" /t .t
s" :::" /t .t
s" 3:" /t .t
s" 1:2:" /t .t

\ Results
1 hr 2 min 3 sec
0 hr 2 min 3 sec
0 hr 0 min 3 sec
23 hr 59 min 59 sec
0 hr 0 min 3 sec
0 hr 0 min 0 sec
0 hr 0 min 0 sec
0 hr 0 min 53 sec
11 hr 12 min 13 sec
11 hr 0 min 13 sec
0 hr 0 min 0 sec
0 hr 0 min 3 sec
0 hr 1 min 2 sec

The last two could be regarded as wrong but you indicated elsewhere that
they wouldn't occur.

Any non-digit is a separator

--
Gerry

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From B. Pym@21:1/5 to mhx on Sat Jun 7 12:38:40 2025

mhx wrote:

On Sun, 6 Oct 2024 7:51:31 +0000, dxf wrote:

Is there an easier way of doing this? End goal is a double number representing centi-secs.

empty decimal

: SPLIT ( a u c -- a2 u2 a3 u3 ) >r 2dup r> scan 2swap 2 pick - ;
: >INT ( adr len -- u ) 0 0 2swap >number 2drop drop ;

: /T ( a u -- $hour $min $sec )
2 0 do [char] : split 2swap dup if 1 /string then loop
2 0 do dup 0= if 2rot 2rot then loop ;

: .T 2swap 2rot cr >int . ." hr " >int . ." min " >int . ." sec " ;

s" 1:2:3" /t .t
s" 02:03" /t .t
s" 03" /t .t
s" 23:59:59" /t .t
s" 0:00:03" /t .t

Why don't you use the fact that >NUMBER returns the given
string starting with the first unconverted character?
SPLIT should be redundant.

-marcel

: CHAR-NUMERIC? 48 58 WITHIN ;
: SKIP-NON-NUMERIC ( adr u -- adr2 u2)
BEGIN
DUP IF OVER C@ CHAR-NUMERIC? NOT ELSE 0 THEN
WHILE
1 /STRING
REPEAT ;

: SCAN-NEXT-NUMBER ( n adr len -- n2 adr2 len2)
2>R 60 * 0. 2R> >NUMBER
2>R D>S + 2R> ;

: PARSE-TIME ( adr len -- seconds)
0 -ROT
BEGIN
SKIP-NON-NUMERIC
DUP
WHILE
SCAN-NEXT-NUMBER
REPEAT
2DROP ;

S" hello 1::36 world" PARSE-TIME CR .
96 ok

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From B. Pym@21:1/5 to B. Pym on Mon Jun 9 12:34:18 2025

B. Pym wrote:

mhx wrote:

On Sun, 6 Oct 2024 7:51:31 +0000, dxf wrote:

Is there an easier way of doing this? End goal is a double number representing centi-secs.

empty decimal

: SPLIT ( a u c -- a2 u2 a3 u3 ) >r 2dup r> scan 2swap 2 pick - ;
: >INT ( adr len -- u ) 0 0 2swap >number 2drop drop ;

: /T ( a u -- $hour $min $sec )
2 0 do [char] : split 2swap dup if 1 /string then loop
2 0 do dup 0= if 2rot 2rot then loop ;

: .T 2swap 2rot cr >int . ." hr " >int . ." min " >int . ." sec " ;

s" 1:2:3" /t .t
s" 02:03" /t .t
s" 03" /t .t
s" 23:59:59" /t .t
s" 0:00:03" /t .t

Why don't you use the fact that >NUMBER returns the given
string starting with the first unconverted character?
SPLIT should be redundant.

-marcel

: CHAR-NUMERIC? 48 58 WITHIN ;
: SKIP-NON-NUMERIC ( adr u -- adr2 u2)
BEGIN
DUP IF OVER C@ CHAR-NUMERIC? NOT ELSE 0 THEN
WHILE
1 /STRING
REPEAT ;

: SCAN-NEXT-NUMBER ( n adr len -- n2 adr2 len2)
2>R 60 * 0. 2R> >NUMBER
2>R D>S + 2R> ;

: PARSE-TIME ( adr len -- seconds)
0 -ROT
BEGIN
SKIP-NON-NUMERIC
DUP
WHILE
SCAN-NEXT-NUMBER
REPEAT
2DROP ;

S" hello 1::36 world" PARSE-TIME CR .
96 ok

Using regular expressions in SP-Forth.

( pcre.dll must be in your path.)
REQUIRE PcreMatch ~ac/lib/string/regexp.f \ PCRE wrapper
REQUIRE S>NUM ~nn\lib\s2num.f \ String to number

VARIABLE HOW-MANY-SECONDS
: INCREMENT-SECONDS ( n adr --)
SWAP OVER @ 60 * + SWAP ! ;

: PARSE-TIME ( adr len -- seconds)
0 HOW-MANY-SECONDS !
BEGIN
S" \d+(.*)" PcreGetMatch
WHILE
S>NUM HOW-MANY-SECONDS INCREMENT-SECONDS
REPEAT
HOW-MANY-SECONDS @ ;

S" hello 20::_:55 world" PARSE-TIME CR .

1255

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From B. Pym@21:1/5 to B. Pym on Tue Jun 10 09:18:33 2025

B. Pym wrote:

mhx wrote:

On Sun, 6 Oct 2024 7:51:31 +0000, dxf wrote:

Is there an easier way of doing this? End goal is a double number representing centi-secs.

empty decimal

: SPLIT ( a u c -- a2 u2 a3 u3 ) >r 2dup r> scan 2swap 2 pick - ;
: >INT ( adr len -- u ) 0 0 2swap >number 2drop drop ;

: /T ( a u -- $hour $min $sec )
2 0 do [char] : split 2swap dup if 1 /string then loop
2 0 do dup 0= if 2rot 2rot then loop ;

: .T 2swap 2rot cr >int . ." hr " >int . ." min " >int . ." sec " ;

s" 1:2:3" /t .t
s" 02:03" /t .t
s" 03" /t .t
s" 23:59:59" /t .t
s" 0:00:03" /t .t

Why don't you use the fact that >NUMBER returns the given
string starting with the first unconverted character?
SPLIT should be redundant.

-marcel

: CHAR-NUMERIC? 48 58 WITHIN ;
: SKIP-NON-NUMERIC ( adr u -- adr2 u2)
BEGIN
DUP IF OVER C@ CHAR-NUMERIC? NOT ELSE 0 THEN
WHILE
1 /STRING
REPEAT ;

: SCAN-NEXT-NUMBER ( n adr len -- n2 adr2 len2)
2>R 60 * 0. 2R> >NUMBER
2>R D>S + 2R> ;

: PARSE-TIME ( adr len -- seconds)
0 -ROT
BEGIN
SKIP-NON-NUMERIC
DUP
WHILE
SCAN-NEXT-NUMBER
REPEAT
2DROP ;

S" hello 1::36 world" PARSE-TIME CR .
96 ok

: SCAN-NUMBER-OR-SKIP ( n adr len -- n' adr' len')
DUP >R
0 0 2SWAP >NUMBER
DUP R> =
IF 2SWAP 2DROP 1 /STRING
ELSE
2>R D>S SWAP 60 * + 2R>
THEN ;

: PARSE-TIME ( adr len -- seconds)
0 -ROT
BEGIN
DUP
WHILE
SCAN-NUMBER-OR-SKIP
REPEAT
2DROP ;

S" hi 5 or 1 is 44 ho " PARSE-TIME CR .
18104

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Pelc@21:1/5 to All on Tue Jun 10 12:07:39 2025

On 9 Jun 2025 at 23:40:28 CEST, "Hans Bezemer" <[email protected]> wrote:

Third, any statement must come with proof. And in this case that means extended benchmarking. I can tell you beforehand that I've never seen significant differences between locals and stack. I'm sorry to say that
- but it's true.

I suspect tthat the lack difference comes from the underlying Forth system.
For threaded code systems, the threading costs a lot of performance. In
our tests, subroutine threaded code on 32 bit systems averages 2.2
times the performance of direct threaded code for 68k class CPUs.

Once full native code compilation and optimisation is turned on, you
can get surprising results. At one stage we (MPE) de-localled a
substantial portion of the PowerNet TCP/IP stack - all in high-level
Forth. For the modified code, size decreased by 25% and performance
increased by 50%.

Stephen
--
Stephen Pelc, [email protected]
Wodni & Pelc GmbH
Vienna, Austria
Tel: +44 (0)7803 903612, +34 649 662 974 http://www.vfxforth.com/downloads/VfxCommunity/
free VFX Forth downloads

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Stephen Pelc on Tue Jun 10 14:06:37 2025

Stephen Pelc <[email protected]> writes:

Once full native code compilation and optimisation is turned on, you
can get surprising results. At one stage we (MPE) de-localled a
substantial portion of the PowerNet TCP/IP stack - all in high-level
Forth. For the modified code, size decreased by 25% and performance
increased by 50%.

This demonstrates that you implemented locals less efficiently than
stack manipulation, not that locals are inevitably slow. For more
information, see

@InProceedings{ertl22-locals,
author = {M. Anton Ertl},
title = {Are Locals Inevitably Slow?},
crossref = {euroforth22},
pages = {48--49},
url = {http://www.euroforth.org/ef22/papers/ertl-locals.pdf},
url-slides = {http://www.euroforth.org/ef22/papers/ertl-locals-slides.pdf},
video = {https://www.youtube.com/watch?v=tPjSKetEJn0},
OPTnote = {presentation slides},
abstract = {Code quality of locals on two code examples on
various systems}
}

@Proceedings{euroforth22,
title = {38th EuroForth Conference},
booktitle = {38th EuroForth Conference},
year = {2022},
key = {EuroForth'22},
url = {http://www.euroforth.org/ef22/papers/proceedings.pdf}
}

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/
EuroForth 2024 proceedings: http://www.euroforth.org/ef24/papers/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From B. Pym@21:1/5 to B. Pym on Wed Jun 11 09:25:50 2025

B. Pym wrote:

B. Pym wrote:

mhx wrote:

On Sun, 6 Oct 2024 7:51:31 +0000, dxf wrote:

Is there an easier way of doing this? End goal is a double number representing centi-secs.

empty decimal

: SPLIT ( a u c -- a2 u2 a3 u3 ) >r 2dup r> scan 2swap 2 pick - ;
: >INT ( adr len -- u ) 0 0 2swap >number 2drop drop ;

: /T ( a u -- $hour $min $sec )
2 0 do [char] : split 2swap dup if 1 /string then loop
2 0 do dup 0= if 2rot 2rot then loop ;

: .T 2swap 2rot cr >int . ." hr " >int . ." min " >int . ." sec " ;

s" 1:2:3" /t .t
s" 02:03" /t .t
s" 03" /t .t
s" 23:59:59" /t .t
s" 0:00:03" /t .t

Why don't you use the fact that >NUMBER returns the given
string starting with the first unconverted character?
SPLIT should be redundant.

-marcel

: CHAR-NUMERIC? 48 58 WITHIN ;
: SKIP-NON-NUMERIC ( adr u -- adr2 u2)
BEGIN
DUP IF OVER C@ CHAR-NUMERIC? NOT ELSE 0 THEN
WHILE
1 /STRING
REPEAT ;

: SCAN-NEXT-NUMBER ( n adr len -- n2 adr2 len2)
2>R 60 * 0. 2R> >NUMBER
2>R D>S + 2R> ;

: PARSE-TIME ( adr len -- seconds)
0 -ROT
BEGIN
SKIP-NON-NUMERIC
DUP
WHILE
SCAN-NEXT-NUMBER
REPEAT
2DROP ;

S" hello 1::36 world" PARSE-TIME CR .
96 ok

: SCAN-NUMBER-OR-SKIP ( n adr len -- n' adr' len')
DUP >R
0 0 2SWAP >NUMBER
DUP R> =
IF 2SWAP 2DROP 1 /STRING
ELSE
2>R D>S SWAP 60 * + 2R>
THEN ;

: PARSE-TIME ( adr len -- seconds)
0 -ROT
BEGIN
DUP
WHILE
SCAN-NUMBER-OR-SKIP
REPEAT
2DROP ;

S" hi 5 or 1 is 44 ho " PARSE-TIME CR .
18104

Using local variables.

: SCAN-NUMBER-OR-SKIP { n adr len -- n' adr' len' }
0. adr len >NUMBER { adr' len' } D>S { m }
len' len =
IF n adr len 1 /STRING
ELSE
n 60 * m + adr' len'
THEN ;

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From [email protected]@21:1/5 to [email protected] on Wed Jun 11 12:18:19 2025

In article <nnd$215a8dbf$3352f729@c328f69c56780220>,
Hans Bezemer <[email protected]> wrote:

On 11-06-2025 03:49, dxf wrote:

On 11/06/2025 3:34 am, LIT wrote:

...
Fourth, if the deﬁnition is extremely time-critical, those
tricky stack manipulators, (e.g., ROT ROT) can really eat up
clock cycles. Direct access to variables is faster."

Pushing variables on the stack, executing them, along with their
associated @ and ! eats clock cycles. This is certainly the case
in the systems you use.

Agreed.

Yes, Brodie warns us next "but careful with variables' use
too" - and I still think my use of variables in two examples
I recently pasted wasn't "legit" in any way. It was just
applying the tips you see above.

When is it "legit" to give up? I've written routines I believed
needed VARIABLEs. But after a 'cooling off' period, I can look
at the problem again afresh and find I can do better. Folks will
say in the real world one couldn't afford this. That's true and
likely why I'm a hobbyist and not a professional programmer.
OTOH it's pretty rare that I write routines with variables in them
to begin with.

As a guy who used Forth programming in a professional environment, I can
at least tell you how I did it..

When you're on the spot, you're on the spot - and you got to provide in
the allotted time, even if it means making sub-optimal code. That's just
the way it is, that's corporate life.

Seriously? They ask me beforehand. Philips matlab got upset
because they were not used to people finishing in the time they
estimate.
In another project I was 10% accurate on a total of 30 bugs, and
within 30% of each bug individually.

Then I was dropped as project leader into a project that had to finish
in three months, and I succeeded. If I failed, nobody would complain.

Fokker Space had a architectural design disapproved by ESO. It was due
at a certain date. Big kudos if you succeed, and I did.

If you tell your boss "Brodie told you to", he's gonna shake his head,
ask who Brodie is and then ship you to the corporate shrink for an
emergency session.

There are stupid bosses, that insist on a one line change over
as 10 line change, as if this made the change more "reliable".
At the same time they disapprove of test automation.

But what I did was to either collect stuff in advance ("Hey, that's a
nice comma'd printout word by Ed. Better make it work in 4tH!") - or
make certain libraries beforehand. In that case, all you have to do is
to shove all those elements together and you're done. The tricky stuff
is already in your tool chest..

Take a look at the 4tH library and notice how much of this stuff is of
no interest at all to the occasional user. Well, that was because it
wasn't written for you. It was written to be applied at work, so I can
do miracles and save my reputation. If you wanna win, you gotta cheat ;-)

I don't agree that tool chests built on practice are personal.

Hans Bezemer

Groetjes Albert
--
Temu exploits Christians: (Disclaimer, only 10 apostles)
Last Supper Acrylic Suncatcher - 15Cm Round Stained Glass- Style Wall
Art For Home, Office And Garden Decor - Perfect For Windows, Bars,
And Gifts For Friends Family And Colleagues.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Pelc@21:1/5 to All on Wed Jun 11 11:39:48 2025

On 10 Jun 2025 at 22:56:28 CEST, "LIT" <LIT> wrote:

3. As Mr. Pelc remarked, stack operators are faster.

This is what Mr. Pelc remarked, regarding such style
of programming - yes, many years ago I was guilty of
that too - already 15 years ago:

https://groups.google.com/g/comp.lang.forth/c/m9xy5k5BfkY/m/FFmH9GE5UJAJ

"Although the code is compilable and can be made efficient,
the source code is a maintenance nightmare!"

Maybe he changed his mind since that time - well, since
he's here, you may want to ask him a question.

I take the last paragraph as a sort of passive-aggressive asking
of some indirectly
asked question. Hence I don't really know what your/the question
is. Never mind,
you opened the box.

Of course I change my mind in 15 years. I'm a human being and so
am entitled to
do so and will do so.

Working code beats all. Clear maintainable code that is easy to
understand is
best. I am stil maintaining 40 year old code and my brain is not
as fast as it used
to be. Keep it simple. Poking around in a Forth source tree of 1.4
million lines
of source code is not what I want to do.

Once twp implementation techniques provide performance within
(say) a factor of
1.5 or 2 of each other, I stop worrying. Short words are better
than long ones.

The locals or Forth stack discussions between Anton and myself
show up a design
flaw I made 30 years or more ago when the VFX native code
generator was new.
The use of ADDR <local> to return the address of the local is a
mistake that can
be replaced by a LOCAL[ size ] buffer. When this is done, code
generation of
locals can become significantly better. However, I'm not going to
wreck client
code for it. My successors can argue about it.

On Usenet, I take people who use their real names more seriously
than those who
do not. Just get a better flame-proof suit and stop being so
precious.

Stephen

--
Stephen Pelc, [email protected]
Wodni & Pelc GmbH
Vienna, Austria
Tel: +44 (0)7803 903612, +34 649 662 974 http://www.vfxforth.com/downloads/VfxCommunity/
free VFX Forth downloads

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kerr-Mudd, John@21:1/5 to Hans Bezemer on Tue Jun 24 10:29:08 2025

On Tue, 24 Jun 2025 11:12:00 +0200
Hans Bezemer <[email protected]> wrote:

On 23-06-2025 20:48, LIT wrote:

OK, have another song: Mr. FIFO stating that
"arrays aren't variables" (maybe need a link?).
Where did they taught you that? At that 'college'
of yours, "elite programmer"? :D

Oh honey, you don't understand? That's not a problem, hon. Go to mummy,
she will explain it to you. But daddy doesn't have time for you. He's
talking to the grown ups. Go play with your dolls and be a good girl!

Please, give up the insults; let your coding skills do the talking.

--
Bah, and indeed Humbug.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Hans Bezemer on Tue Jun 24 16:23:19 2025

Hans Bezemer <[email protected]> writes:

I'm also puzzled why there is always so emphasis on the "speed" issue. I
mean - if you want speed, do your program in C -O3 so to say. It'll blow
any Forth out of the water.

Take a look at the bubble benchmark in Figure 1 of <https://www.complang.tuwien.ac.at/papers/ertl24-interpreter-speed.pdf>. SwiftForth, VFX, and Gforth with all optimizations (the baseline) are
faster than gcc-12 -O3. The reason for that is:

|For bubble, gcc -O3 auto-vectorizes, and the result is that there is
|partial overlap between a store and a following load, which results
|in the hardware taking a slow path rather than performing one of its |store-to-load forwarding optimizations.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/
EuroForth 2024 proceedings: http://www.euroforth.org/ef24/papers/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to dxf on Tue Jun 24 22:38:33 2025

dxf <[email protected]> writes:

Forth forces an average programmer to adopt a level of organisation
sooner than a locals- based language. I suspect forthers that promote
locals are well aware forth is readable and maintainable but are
pursuing personal agendas of style which requires implying the
opposite.

IDK, I've seen some unreadable Forth code that was written by experts.
Whether locals could have helped, I don't know.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to Hans Bezemer on Wed Jun 25 00:21:23 2025

Hans Bezemer <[email protected]> writes:

Fundamentally. I explained the sensation at the end
of "Why Choose Forth". I've been able to tackle things I would never
have been to tackle with a C mindset. ( https://youtu.be/MXKZPGzlx14 )

I just watched this video and enjoyed it, but I don't understand how a C mindset is different. In C you pass stuff as function parameters
instead of on the stack: what's the big deal? And particularly, the
video said nothing about the burning question of locals ;).

It seems to me all the examples mentioned in the video (parsing CSV
files or floating point numerals) are what someone called
micro-problems. Today they much easier with languages like Python, and
back in Forth's heyday there was Lisp, which occupied a mindspace like
Python does now.

I agree that Thinking Forth is a great book.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From [email protected]@21:1/5 to [email protected] on Wed Jun 25 12:14:30 2025

In article <[email protected]>,
Paul Rubin <[email protected]d> wrote:

dxf <[email protected]> writes:

Forth forces an average programmer to adopt a level of organisation
sooner than a locals- based language. I suspect forthers that promote
locals are well aware forth is readable and maintainable but are
pursuing personal agendas of style which requires implying the
opposite.

IDK, I've seen some unreadable Forth code that was written by experts. >Whether locals could have helped, I don't know.

Maybe you are mistaken. I have seen unmaintainable code written by
some self-proclaimed experts.
I have not seen unmaintainable code written by real experts.
I have seen a lot of code. 70% of my 40+ years was spent on removing
defects or enhancing existing code.

There is a problem with the word unreadable. The 800 page proof
of the Fermat theorem is unreadable. At first sight my manx code
is unreadable. I read children stories in Chinese, to most Westerners
they are unreadable.

The story goes that a Boeing doesn't fly until the documentation weight
rivals that of the airplane. I was in the middle of such a process in
Dutch railway design where safety matters.
Pick a document of this gigantic pile and you can't make heads or tails
of it.
Or get the deepseek seminal publication. The only thing that most people understand is the "11 million dollars" some how spent.

Groetjes Albert
--
The Chinese government is satisfied with its military superiority over USA.
The next 5 year plan has as primary goal to advance life expectancy
over 80 years, like Western Europe.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to [email protected] on Wed Jun 25 21:05:15 2025

[email protected] writes:

Maybe you are mistaken. I have seen unmaintainable code written by
some self-proclaimed experts.
I have not seen unmaintainable code written by real experts.

IDK about unmaintainable by other experts, but I've looked at cmForth
(written by Moore) and figForth (written I guess by Bill Ragsdale) and
found both incomprehensible. If those guys aren't experts then the bar
must be pretty high. I do find eForth readable.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Waldek Hebisch@21:1/5 to Hans Bezemer on Thu Jun 26 05:20:13 2025

Hans Bezemer <[email protected]> wrote:

No really, I'm not kidding. When done properly Forth actually changes
the way you work. Fundamentally. I explained the sensation at the end of
"Why Choose Forth". I've been able to tackle things I would never have
been to tackle with a C mindset. ( https://youtu.be/MXKZPGzlx14 )

I do not look at videos (mostly because they are extremally wasteful
way of transmiting concepts, with words once can do this faster).
So I will comment mostly on what you wrote.

Like I always wanted to do a real programming language - no matter how primitive. Now I've done at least a dozen - and that particular trick
seems to get easier by the day.

I am not sure what you mean "do a real programming language".
I have written compilers. The ones where I did all the work
I consider to be toys. But I am pretty confident that if
I wanted I could extend them to a practical language. I also
work on real compilers, but here majority of work was done by
other people and I only worked on parts. Still, while in
a single compiler "my" part (or parts) are minority, they
together cover all stages of practical compiler.

I did not write a serious interpreter or even a part of it
but I looked at code in several interpreters and I think that
I understand subject well enough to write on if needed.

And IMHO a lot can be traced back to the very simple principles Forth is based upon - like a stack. Or the triad "Execute-Number-Error". Or the dictionary. But also the lessons from ThinkForth.

Traditional way to implement Forth is just one way. It is
relatively simple, so this may be attractive. But I would
say not the simplest one: bytecode interpreters are less
clever, so in a sense simpler (at cost of slower execution).
Compilers generating native code can be simple too, and
one can argue that they also need less cleverness than Forth
(but probably more object code).

You'll also find it in my C work. There are a lot more "small functions"
than in your average C program. It works for me like an "inner API". Not
to mention uBasic/4tH - There are plenty of "one-liners" in my
uBasic/4tH programs.

But that train of thought needs to be maintained - and it can only be maintained by submitting to the very philosophy Forth was built upon. I
feel like if I would give in to locals, I'd be back to being an average
C programmer.

I still do C from time to time - but it's not my prime language. For
this reason - and because I'm often just plain faster when using Forth.
It just results in a better program.

My philosophy for developing programs is "follow the problem".
That is we a problem to solve (task to do). We need to
understand it, introduce some data structures and specify
needed computation. This is mostly independent from programming
language. When problem is not well understood we need
to do some research. In this experiments may help a lot
and having interactive programming language in useful
(so this is plus of Forth compared to C). Once we have
data structures and know what computation is needed we
need to encode (represent) this in choosen language.
I would say that large scale structure of the program
will be mostly independent of programming language.
There will be differences at small scale, as different
languages have different idioms. "Builtin" features of
language or "standard" libraries may do significant
part of work. Effort of coding may vary widely,
depending how much is supported by the language and
surroundig ecosystem and how much must be newly
coded. Also, debugging features of programming
system affect speed of coding.

Frankly, I do not see how missing language features
can improve design. I mean, there are people who
try to use fancy features when thay are not needed.
But large scale structure of a program should not be
affected by this. And at smaller scale with some
experience it is not hard to avoid unneeded features.
I would say that there are natural way to approach
given problem and usually best program is one that
follows natural way. Now, if problem naturally needs
several interdependent attributes we need to represnt
them in some way. If dependence is naturaly in stack
way, than stack is a good fit. If dependence is not
naturaly in a stack way, using stack may be possible
after some reorganisation. But may experience is
that if a given structure does not naturally appear
after some research, than reorganisation is not
very likely to lead to such structure. And even if
one mananges to tweak program to such structure, it
is not clear if it is a gain. Anyway, there is substantial
number of problem where stack is unlikely to work in
natural way. So how to represnt attributes? If they
are needed only inside a single function, than natural
way is using local variables. One can use globals, but
for variables that are not needed outside a function
this in unnatural. One can use stack juggling, this
works, but IMO is unnatural. One can collect attributes
in a single structure dynamically allocated at
function entry and freed at exit. This works, but
again is unnatural and needs extra code.

Of course, sometimes other solutions are possible. Maybe
instead of separate variables one can recompute attributes
from something more basic. Maybe some group of attributes
is needed in several functions, then keeping them as part
of single structure is natural. But assuming that you
write program in natural way, you would choose alternative
what it is natural and choose locals only when thay
are a good fit.

You have some point about length of functions. While
pretty small functions using locals are possible, I
have a few longer functions where main reason for keeping
code in one function is because various parts need access
to the same local variables. But I doubt that eliminating
locals and splitting such functions leads to better code:
we get a cluster of function which depend via common
attibutes. This dependence is there regardless of having
single bigger function or several smaller ones (and
regardless how one represents attributes). But with a
single function dependence is explict, and for me easier
to manage.

Avoiding dependence helps, but above I mean unavoidable
dependence. And in fact, I find locals useful to avoid
false dependencies (where a buch of functons look like
they depend on something but in fact they do not).

I still do C from time to time - but it's not my prime language. For
this reason - and because I'm often just plain faster when using Forth.
It just results in a better program.

The only thing I can say is, "it works for me". And when I sometimes
view the works of others - especially when resorting to a C style - I
feel like it could work for you as well.

Nine times out of ten one doesn't need the amount of locals which are applied. One doesn't need a 16 line word - at least not when you
actually want to maintain the darn thing. One could tackle the problem
much more elegant.

My policy is that variable should be a single logical thing.
Which means that frequently I have more variables than
"strictly necessary". That is I do not reuse variable for
different purpose even if that would be possible. IMO
saving here are compiler job, and in case when compiler is
not doing this savings are not worth extra effort (and
IMO worse program structure). Not that in reasonable program
we are talking here about something like say 100 words
or maybe 1000 words which may be significant on a small
embedded system (but compilers for such system are reasonably
good at reusing variables), but is irrelevant for bigger systems.

--
Waldek Hebisch

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to dxf on Thu Jun 26 00:12:41 2025

dxf <[email protected]> writes:

Define 'unreadable'. In general I don't need to understand the nitty
gritty of a routine. But should I and no stack commentary exists, I've
no objections to creating it. It's par for the course in Forth. If it bugged me I wouldn't be doing Forth.

Unreadable = I look at the code and have no idea what it's doing. The
logic is often obscured by stack manipulation. The values in the stack
are meaningful to the program's operation, but what is the meaning? In
most languages, meaningful values have names, and the names convey the
meaning. In Forth, you can write comments for that purpose. Years
after cmForth was published, someone wrote a set of shadow screens for
it, and that helped a lot.

With no named values and no explanatory comments, the program becomes
opaque.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to dxf on Thu Jun 26 21:03:46 2025

dxf <[email protected]> writes:

Yet forthers have no problem with this. Take the SwiftForth source code.
At best you'll get a general comment as to what a function does. How do
they maintain it - the same way anyone proficient in C maintains C code.

Certainly it was a Forther who found cmForth needed that extra
documentation, and took the trouble to write it. C code as I mentioned partially self-documents because it uses named variables in places where
Forth would have the value in an anonymous stack slot.

I looked at some of the SwiftForth library code (the stuff on their web
site) and I did find that pretty readable.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From [email protected]@21:1/5 to [email protected] on Fri Jun 27 15:15:38 2025

In article <[email protected]>,
dxf <[email protected]> wrote:

On 26/06/2025 5:12 pm, Paul Rubin wrote:

dxf <[email protected]> writes:

Define 'unreadable'. In general I don't need to understand the nitty
gritty of a routine. But should I and no stack commentary exists, I've
no objections to creating it. It's par for the course in Forth. If it
bugged me I wouldn't be doing Forth.

Unreadable = I look at the code and have no idea what it's doing. The
logic is often obscured by stack manipulation. The values in the stack
are meaningful to the program's operation, but what is the meaning? In
most languages, meaningful values have names, and the names convey the
meaning. In Forth, you can write comments for that purpose. Years
after cmForth was published, someone wrote a set of shadow screens for
it, and that helped a lot.

With no named values and no explanatory comments, the program becomes
opaque.

Yet forthers have no problem with this. Take the SwiftForth source code.
At best you'll get a general comment as to what a function does. How do
they maintain it - the same way anyone proficient in C maintains C code. >Albert is correct. Familiarity is key to readability. That's not to say >code deserving documentation shouldn't have it. OTOH one shouldn't be >expecting documentation (including stack commentary) for what's an everyday >affair in Forth.

The comment about fig-Forth is incorrect. All snippets of code are
at most 10 lines long, and the function is documented in the glossary.
So a minimal comprehension of 8086 code is sufficient.
OTOH, a Debian developer pointed me to a github archive from him
as shining example.
I studied it. Not a single source file contains a header what it
was about. Moreover a global description was missing. I had no clue
what the purpose of the program was. I lost a lot of trust in Debian.

Groetjes Albert
--
The Chinese government is satisfied with its military superiority over USA.
The next 5 year plan has as primary goal to advance life expectancy
over 80 years, like Western Europe.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to dxf on Sun Jun 29 15:26:32 2025

dxf <[email protected]> writes:

But aren't 'locals' actually PICK/ROLL in disguise?

Do PICK/ROLL skim all the values off the stack and stuff them in
variables to be later popped on and off the stack like a yo-yo?

Locals can be (and I thought usually are) implemented with the
equivalent of PICK and POST, on either the R stack or a separate L
stack. ROLL is different, "n ROLL" actually shuffles n items around and
in most situations seems kind of nuts.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From [email protected]@21:1/5 to It was thus on Mon Jun 30 01:43:24 2025

It was thus said that the Great LIT <[email protected]> once stated:

The more common complaint is that you use some feature they dislike
(typically locals) when you would otherwise DUP ROT instead.

But aren't 'locals' actually PICK/ROLL in disguise?

In my implementation [1], it's a PICK off the return stack (technically,
from a set point in the return stack) as locals aren't allowed to remain on
the data stack per the ANS standard.

-spc

[1] <https://github.com/spc476/ANS-Forth>, specifically, forth.asm
starting at line 7793.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to dxf on Sun Jun 29 23:18:56 2025

dxf <[email protected]> writes:

What is POST ?

Opposite of PICK. Overwrites a slot in the middle of the stack. I saw
that name in another clf post pretty recently, idk if it is commonly implemented. I think I've also seen it called POKE. I'd call it
barbaric from a stack programming perspective if done in user code, but
if the compiler does it in some specific implementation, I don't mind.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to LIT on Mon Jun 30 02:44:35 2025

[email protected] (LIT) writes:

"Pick and Roll are the generic operators which treat the data stack as
an array. If you find you need to use them, you are probably doing it
wrong. Look for ways to refactor your code to be simpler instead."

What is the origin of that quote? PICK treats the stack like an array,
but ROLL treats it more as a circular shift register.

Most CPUs these days have a register file, which is essentially an array
with only immediate-like addressing mode. Presumably that design
evolved because programmers found it useful.

PICK afaict is mostly used with literal offsets as well. Having a
variable offset is suspicious.

: 3DUP ( a b c -- a b c a b c ) 3 PICK 3 PICK 3 PICK ;

Seems clearer than some mess of ROT and return stack temporaries.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From [email protected]@21:1/5 to [email protected] on Mon Jun 30 21:05:47 2025

In article <nnd$7a5cfad1$76518ba7@2a4e6190d58511e2>,
Hans Bezemer <[email protected]> wrote:

On 25-06-2025 09:21, Paul Rubin wrote:

Hans Bezemer <[email protected]> writes:

Fundamentally. I explained the sensation at the end
of "Why Choose Forth". I've been able to tackle things I would never
have been to tackle with a C mindset. ( https://youtu.be/MXKZPGzlx14 )

I just watched this video and enjoyed it, but I don't understand how a C
mindset is different. In C you pass stuff as function parameters
instead of on the stack: what's the big deal? And particularly, the
video said nothing about the burning question of locals ;).

It seems to me all the examples mentioned in the video (parsing CSV
files or floating point numerals) are what someone called
micro-problems. Today they much easier with languages like Python, and
back in Forth's heyday there was Lisp, which occupied a mindspace like
Python does now.

I agree that Thinking Forth is a great book.

It's hard to illustrate things with a multi-KLOC program IMHO. You can
only illustrate principles by using examples that are "contained" in a way.

But I'll try to illustrate a thing or two. Let's say you want to tackle
a problem. And it doesn't go your way. You have to add this thing and
that thing - and hold on to that value. You know what I mean.

about to add. Take a look at getopt() - I think that's a good example.
You can almost see how it grew almost organically by the authors hand.
He never seemed to think "Hmm, maybe I'll make a separate function of it".

getopt is a design error in Forth filosofy. You are writing an interpreter
and Forth is the only interpreter, first commandment.

EXAMPLE:

: option? DROP C@ &- = ;
: handle-arg 1 ARG[] 2DUP option? IF handle-option ELSE handle-file THEN ;
: handle-args BEGIN ARGC 1 > WHILE handle-arg REPEAT intel-hex? ;

\ Execute help directly. We don't want any interference.
: -h help BYE ;

: -c arg-number DROP DECIMAL 1000 * frequency ! 1 multiple !
ARGC 1 <> ABORT" calculate requires 1 argument" ;

\ Note the `arg-number word is no good for multiple arguments.
: -m 1 ARG[] 2 <> ABORT" incorrect multiple args"
SHIFT-ARGS \ rid of -m
frequency DUP 4 CELLS + SWAP DO
0. 1 ARG[] >NUMBER 0<> 107 ?ERROR 2DROP
1000 * I ! SHIFT-ARGS
1 CELLS +LOOP 4 multiple !
ARGC 1 <> ABORT" multiple requires 4 arguments"
;

: main
defaults 0 multiple ! handle-args
multiple @ 0= IF ." specify -h, -c or -m " CR BYE THEN
init custom-action init-calibration-flash
doit
;
--
The Chinese government is satisfied with its military superiority over USA.
The next 5 year plan has as primary goal to advance life expectancy
over 80 years, like Western Europe.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to dxf on Mon Jun 30 13:43:09 2025

dxf <[email protected]> writes:

The stack ops THEMSELVES may be, in a way, "canonical" — but not
solving "each and every" programming task using them
"no-matter-what", IMHO.

But such would indicate a deficiency in Forth. Do C programmers reach a point at which they can't go forward?

Assembly language programmers reach a point where they run out of
machine registers and have to do clumsy things to swap stuff between
registers and memory. C compilers automate that process. Every C
compiler with register allocation has to deal with register spilling.
The programmer doesn't have to deal with it, but it's similar clumsy
assembly code coming out of the compiler.

In Forth without using locals, "register allocation" (deciding what is
in each stack slot) is manual and there are fewer "registers" to begin
with (basically TOS, NOS, TOR, and the 3rd stack element that you can
reach with ROT). Modern CPUs by comparison generally have 16 or more addressible registers. The PDP-11 and 8086 had 8 registers and
programmers found that to be painful.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Pelc@21:1/5 to All on Tue Jul 1 13:26:16 2025

On 30 Jun 2025 at 15:57:28 CEST, "Hans Bezemer" <[email protected]> wrote:

On 24-06-2025 18:23, Anton Ertl wrote:

Hans Bezemer <[email protected]> writes:

I'm also puzzled why there is always so emphasis on the "speed" issue. I >>> mean - if you want speed, do your program in C -O3 so to say. It'll blow >>> any Forth out of the water.

One of our clients makes a construction estimating package
https://www.rib-software.com/en/rib-candy

When ported from MPE's last threaded-code Forth to the early VFX Forth,
screen redraw for the plan of part of a very large building improved by a factor
of ten. The client was very pleased - this was a visible change for their users.

I have also found that fast code enables me not to use progrmming tricks, but to code for readability and the maintenance programmer. I'm still maintaining code I first saw 40 years ago - not much of it, but always a pain to maintain.

Still, in general - GCC beats Forth. Although I have to admit I've got a renewed respect for VFX Forth! Kudos!

Thanks ... blush.

Compiler output performance depends very much on the amunt of time and
money spent on developing it. The VFX code generator was part of the
planned output of an EU ESPRIT project.

Stephen

--
Stephen Pelc, [email protected]
Wodni & Pelc GmbH
Vienna, Austria
Tel: +44 (0)7803 903612, +34 649 662 974 http://www.vfxforth.com/downloads/VfxCommunity/
free VFX Forth downloads

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to Hans Bezemer on Tue Jul 1 11:40:38 2025

Hans Bezemer <[email protected]> writes:

But such would indicate a deficiency in Forth. Do C programmers reach a >>> point at which they can't go forward? ...

Another great argument to leave Forth and embrace C! Why painfully
create kludge to cram into a language that was clearly not created for
that when you have a language available that was actually DESIGNED
with those requirements in mind?!

I'm not sure what you're getting at here, though I see the sarcasm.

Is the kludge locals? They don't seem that kludgy to me. Implementing
them in Forth is straightforward and lots of people have done it.

The point where one can't go forward is basically "running out of
registers". In assembly language those are the machine registers, and
in Forth they're the top few stack slots. In both cases, when you run
out, you have to resort to contorted code.

In C that isn't a problem for the programmer. You can use as many
variables as you like, and if the compiler runs out of registers and has
to make contorted assembly code, it does so without your having to care.

In a traditional Forth with locals, the locals are stack allocated so
accessing them usually costs a memory reference. The programmer gets
the same convenience as a C programmer. The runtime takes a slowdown
compared to code from a register-allocating compiler, but such a
slowdown is already present in a threaded interpreter, so it's fine.

Finally, a fancy enough Forth compiler can do the same things that a C
compiler does. Those compilers are difficult to write, but they exist
(VFX, lxf, etc.). I don't know if locals make writing the compiler more difficult. But the user shouldn't have to care.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to minforth on Tue Jul 1 12:56:02 2025

[email protected] (minforth) writes:

Nobody seems to care about that time. Instead, the focus seems to be primarily on code runtime, even though the difference is only
microseconds or less.

Forth was designed for threaded interpreter implementation and the whole
notion of an optimizing Forth compiler is at best an abstraction
inversion. But, supposedly, VFX compiler output runs 10x as fast as
the same code under an interpreter.

I think in the Moore era, you got two speedups: 1) interpreted Forth was
10x faster than its main competitor, interpreted BASIC; and 2) if your
Forth program was still too slow, you'd identify a few hot spots and
rewrite those in assembler.

Today instead of BASIC we have Python, and interpreted Forth is still a
lot faster than Python. That speed is sufficient for most things, like
it always was, but even more so on modern hardware.

So I don't see much legitimate complaint about slowdowns due to Forth
locals. The objection is based on other considerations, either
legitimate ones that I don't yet understand, or essentially bogus ones
that I don't completely see through. Maybe some combination of the two.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From peter@21:1/5 to Paul Rubin on Tue Jul 1 23:47:05 2025

On Tue, 01 Jul 2025 11:40:38 -0700
Paul Rubin <[email protected]d> wrote:

Hans Bezemer <[email protected]> writes:

But such would indicate a deficiency in Forth. Do C programmers
reach a point at which they can't go forward? ...

Another great argument to leave Forth and embrace C! Why painfully
create kludge to cram into a language that was clearly not created
for that when you have a language available that was actually
DESIGNED with those requirements in mind?!

I'm not sure what you're getting at here, though I see the sarcasm.

Is the kludge locals? They don't seem that kludgy to me.
Implementing them in Forth is straightforward and lots of people have
done it.

The point where one can't go forward is basically "running out of
registers". In assembly language those are the machine registers, and
in Forth they're the top few stack slots. In both cases, when you run
out, you have to resort to contorted code.

In C that isn't a problem for the programmer. You can use as many
variables as you like, and if the compiler runs out of registers and
has to make contorted assembly code, it does so without your having
to care.

In a traditional Forth with locals, the locals are stack allocated so accessing them usually costs a memory reference. The programmer gets
the same convenience as a C programmer. The runtime takes a slowdown compared to code from a register-allocating compiler, but such a
slowdown is already present in a threaded interpreter, so it's fine.

Finally, a fancy enough Forth compiler can do the same things that a C compiler does. Those compilers are difficult to write, but they exist
(VFX, lxf, etc.). I don't know if locals make writing the compiler
more difficult. But the user shouldn't have to care.

The code generator in lxf has no knowledge of what a local is.
locals are conceptually placed on the return stack. lxf is as smart
about the return stack as the data stack. that is why it can produce
very efficient code for simple examples like 3DUP. The actual
implementation of local in the interpreter is just a few lines of code.
The difference with locals will be seen when you have a boundary block,
IF statement, a call etc that require a known state of the stacks.
The real problem for me with locals is that their scope is to the end
of the definition. With the stack you end the scope of an item with a
drop and extend it with a dup, very elegant!
A multipass compiler can of course find the scope of each local but at
the cost of more complexity.

In lxf64 I have introduced a local stack with the same capabilities as
the data and return stack. I am not sure yet if this is better.

The nice thing is that I now have >ls ls> and ls@. Compared with the
return stack this also works across words. One word can put stuff on
the localstack and another retrieve it. This is sometimes very useful.

BR
Peter

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to All on Wed Jul 2 01:26:18 2025

Am 01.07.2025 um 23:47 schrieb peter:

On Tue, 01 Jul 2025 11:40:38 -0700
Paul Rubin <[email protected]d> wrote:

Hans Bezemer <[email protected]> writes:

But such would indicate a deficiency in Forth. Do C programmers
reach a point at which they can't go forward? ...

Another great argument to leave Forth and embrace C! Why painfully
create kludge to cram into a language that was clearly not created
for that when you have a language available that was actually
DESIGNED with those requirements in mind?!

I'm not sure what you're getting at here, though I see the sarcasm.

Is the kludge locals? They don't seem that kludgy to me.
Implementing them in Forth is straightforward and lots of people have
done it.

Finally, a fancy enough Forth compiler can do the same things that a C
compiler does. Those compilers are difficult to write, but they exist
(VFX, lxf, etc.). I don't know if locals make writing the compiler
more difficult. But the user shouldn't have to care.

The code generator in lxf has no knowledge of what a local is.
locals are conceptually placed on the return stack. lxf is as smart
about the return stack as the data stack. that is why it can produce
very efficient code for simple examples like 3DUP. The actual
implementation of local in the interpreter is just a few lines of code.
The difference with locals will be seen when you have a boundary block,
IF statement, a call etc that require a known state of the stacks.
The real problem for me with locals is that their scope is to the end
of the definition. With the stack you end the scope of an item with a
drop and extend it with a dup, very elegant!
A multipass compiler can of course find the scope of each local but at
the cost of more complexity.

In lxf64 I have introduced a local stack with the same capabilities as
the data and return stack. I am not sure yet if this is better.

The nice thing is that I now have >ls ls> and ls@. Compared with the
return stack this also works across words. One word can put stuff on
the localstack and another retrieve it. This is sometimes very useful.

In a sense, such locals become global. I am not sure if this opens the
way inadvertently for hard-to-detect bugs. One rarely discussed property
of locals is that they offer data encapsulation (or have scope in C terminology).

Only one useful application comes to my mind: sharing locals between
quotation and its parent function, i.e. for creating closures. But who
needs thema anyway?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to All on Wed Jul 2 05:00:21 2025

Am 01.07.2025 um 21:56 schrieb Paul Rubin:

[email protected] (minforth) writes:

Nobody seems to care about that time. Instead, the focus seems to be
primarily on code runtime, even though the difference is only
microseconds or less.

I think in the Moore era, you got two speedups: 1) interpreted Forth was
10x faster than its main competitor, interpreted BASIC; and 2) if your
Forth program was still too slow, you'd identify a few hot spots and
rewrite those in assembler.

Today instead of BASIC we have Python, and interpreted Forth is still a
lot faster than Python. That speed is sufficient for most things, like
it always was, but even more so on modern hardware.

Today, you could go insane if you had to write assembler code
with SSE1/2/3/4/AVX/AES etc. extended CPU commands (or take GPU
programming...)

Even chip manufacturers provide C libraries with built-ins and
intrinsics to handle this complexity, and optimising C compilers
for selecting the best operations.

IMO assembler programming in Forth is mostly for retro enthusiasts

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to All on Wed Jul 2 07:34:16 2025

Am 02.07.2025 um 05:00 schrieb minforth:

Am 01.07.2025 um 21:56 schrieb Paul Rubin:

[email protected] (minforth) writes:

Nobody seems to care about that time. Instead, the focus seems to be
primarily on code runtime, even though the difference is only
microseconds or less.

I think in the Moore era, you got two speedups: 1) interpreted Forth was
10x faster than its main competitor, interpreted BASIC; and 2) if your
Forth program was still too slow, you'd identify a few hot spots and
rewrite those in assembler.

Today instead of BASIC we have Python, and interpreted Forth is still a
lot faster than Python. That speed is sufficient for most things, like
it always was, but even more so on modern hardware.

Today, you could go insane if you had to write assembler code
with SSE1/2/3/4/AVX/AES etc. extended CPU commands (or take GPU programming...)

Even chip manufacturers provide C libraries with built-ins and
intrinsics to handle this complexity, and optimising C compilers
for selecting the best operations.

IMO assembler programming in Forth is mostly for retro enthusiasts

P.S. I forgot to mention that this is not true for MCUs and embedded
systems.

I have the utmost respect for Matthias Koch's Mecrisp Stellaris.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Pelc@21:1/5 to dxf on Wed Jul 2 09:33:48 2025

On 2 Jul 2025 at 05:39:52 CEST, "dxf" <[email protected]> wrote:

On 1/07/2025 10:22 pm, Hans Bezemer wrote:

On 27-06-2025 03:39, dxf wrote:

Yet forthers have no problem with this. Take the SwiftForth source code. >>> At best you'll get a general comment as to what a function does. How do >>> they maintain it - the same way anyone proficient in C maintains C code. >>> Albert is correct. Familiarity is key to readability. That's not to say >>> code deserving documentation shouldn't have it. OTOH one shouldn't be
expecting documentation (including stack commentary) for what's an everyday >>> affair in Forth.

I think you and Albert are on the right track here. Familiarity is a large >> part of this "readability" thingy. There are a few notes I want to add,
though:

1. "Infix notation" is part of this familiarity. I know I've commented every >> single expression in TEONW, since I understand those "infix" expressions much
better than all those RPN thingies - and you got something to check your code
against;

2. Intentionality. I do this a LOT. E.g. if you find OVER OVER in my code, >> you may be certain those two items have nothing to do with each other. If you
find 2DUP it's a string, a double number or another "addr/count" array. CHOP >> replaces 1 /STRING. Also: stack patterns can be codified like SPIN or STOW; >>
3. Brevity. Short definitions are easier to understand. If you can abstract >> it, put a name of it can spare the performance - split it up.

4. Naming. I give this a LOT of thought. I prefer reading a name and having a
pretty good idea of what that code does (especially in the context of a
library or a program). See:
https://sourceforge.net/p/forth-4th/wiki/What%27s%20in%20a%20name%3F/

Feel free to disagree. It may not work for you, but at least it works for me.

Recently someone told me about Christianity - how it wasn't meant to be easy -
supposed to be, among other things, a denial of the senses. I'm hearing much the same in Forth. That it's a celibate practice in which one denies everyday
sensory pleasures including readability and maintainability in order to achieve
programming nirvana. Heck, if that's how folks see Forth then perhaps they should stop before the cognitive dissonance sends them crazy or they pop a cork.

IMHO religious belief is not a denial of the senses but a retraining. That
does not mean that the retraining leads to anything valuable, but it can
do depending very much on the trainer and trainee.

Stephen

--
Stephen Pelc, [email protected]
Wodni & Pelc GmbH
Vienna, Austria
Tel: +44 (0)7803 903612, +34 649 662 974 http://www.vfxforth.com/downloads/VfxCommunity/
free VFX Forth downloads

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Hans Bezemer on Wed Jul 2 15:22:22 2025

Hans Bezemer <[email protected]> writes:

And just before you're done you put your stuff on the stack and like a
tiny assembly line it is transported to the next thing. This means that
the function call overhead is MINIMAL - much less than C.

Oh, really? Wasn't it you who wrote
<nnd$34fd6cd6$25a88dac@ac6bb1addf3a4136>:

|if you want speed, do your program in C -O3 so to say. It'll blow
|any Forth out of the water.

And if we look at the results for fib (a benchmark that performs lots
of calls) inf Figure 1 of <https://www.complang.tuwien.ac.at/papers/ertl24-interpreter-speed.pdf>,
gcc -O3 outperforms the fastest Forth system, and gcc -O1 outperforms
the fastest Forth system by even more.

And that's not the solution - it's the PROBLEM. You can add loads of >complexity without much (immediate) penalty. You're not compelled to
study - or even *think* about your algorithm. You most probably will end
up with code that works - without you understanding why.

And that will either bite you later, or limit your capability to expend
on that code.

Yes, you can expend a lot of effort on code that's hard to write and
hard to understand, but that's not limited to Forth.

If you mean that, by making code hard to write, Forth without locals
makes it easier to extend the code, I very much doubt it. In some
cases it may not be harder, but in others (where the extension
requires, e.g., dealing with additional data in existing colon
definitions) it is harder.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/
EuroForth 2024 proceedings: http://www.euroforth.org/ef24/papers/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to minforth on Wed Jul 2 15:44:40 2025

minforth <[email protected]> writes:

Today, you could go insane if you had to write assembler code
with SSE1/2/3/4/AVX/AES etc. extended CPU commands (or take GPU >programming...)

Even chip manufacturers provide C libraries with built-ins and
intrinsics to handle this complexity, and optimising C compilers
for selecting the best operations.

Not really. Each AVX intrinsic corresponds to an instruction, and I
expect the compiler to produce that instruction. The benefit of the
intrinsics is that you can mix this assembly language with C code, and
the C compiler will do the register allocation for you, but normally
not a "better" operation. That being said, I have seen a case where
an AVX256 intrinsic was translated to two AVX128 or SSE2 instructions
because that sequence was suppsed to be faster on some Intel CPU (and
it's Intel who writes the code for AVX intrinsics).

In any case, given that there is one intrinsic for each SIMD
instruction, you go just as insane with the plethora of intrinsics as
with the plethora of SIMD instructions.

The C way to dealing with SIMD instructions is auto-vectorization. It
does not work particularly well, however, but given that it works on
existing benchmarks, it has an unsurmountable advantage over explicit
(manual) vecorization.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/
EuroForth 2024 proceedings: http://www.euroforth.org/ef24/papers/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to minforth on Wed Jul 2 16:07:11 2025

minforth <[email protected]> writes:

Am 01.07.2025 um 23:47 schrieb peter:

In lxf64 I have introduced a local stack with the same capabilities as
the data and return stack. I am not sure yet if this is better.

The nice thing is that I now have >ls ls> and ls@. Compared with the
return stack this also works across words. One word can put stuff on
the localstack and another retrieve it. This is sometimes very useful.

In a sense, such locals become global. I am not sure if this opens the
way inadvertently for hard-to-detect bugs.

This stack combines some properties of the data stack (it not
disturbed by calls) with some of the return stack (you put stuff there
and remove it explicitly, which most words not doing anything
permanent there). However, the interaction of explicit use with use
through locals will mean restrictions; we have a similar situation
with the return stack and counted loops and calls and returns, and we
have learned to deal with that.

Only one useful application comes to my mind: sharing locals between >quotation and its parent function, i.e. for creating closures.

This does not create closures; for some limited usage it behaves like
a closure would behave, but in other uses it does not. Such problems
already plagued Algol 60 compilers, and Knuth wrote the man-or-boy
test to check for them.

But who
needs thema anyway?

Since we implemented closures in 2018 [ertl&paysan18], we have finally
found a compelling use of closures:

We have an actor-like model for letting tasks (threads) talk to each
other, inspired by Heinz Schnitter's Open Network Forth. One task
sends a word to another task, and that task executes that word at some
point. Now we want to send parameterized words to another task (e.g.,
do not just print "hello" in the other task, print something that may
reflect data from the sending task). To do this, we create a one-shot
closure that passes data along with the executed code to the receiving
task and burns (deletes) itself after execution; see <file:///home/anton/gforth/doc/gforth/Message-queues.html>.

We originally had a separate mechanism for passing data, but once we
had closures, this became superfluous and was simplified away.

@InProceedings{ertl&paysan18,
author = {M. Anton Ertl and Bernd Paysan},
title = {Closures --- the {Forth} way},
crossref = {euroforth18},
pages = {17--30},
url = {https://www.complang.tuwien.ac.at/papers/ertl%26paysan.pdf},
url2 = {http://www.euroforth.org/ef18/papers/ertl.pdf},
slides-url = {http://www.euroforth.org/ef18/papers/ertl-slides.pdf},
video = {https://wiki.forth-ev.de/doku.php/events:ef2018:closures},
OPTnote = {refereed},
abstract = {In Forth 200x, a quotation cannot access a local
defined outside it, and therefore cannot be
parameterized in the definition that produces its
execution token. We present Forth closures; they
lift this restriction with minimal implementation
complexity. They are based on passing parameters on
the stack when producing the execution token. The
programmer has to explicitly manage the memory of
the closure. We show a number of usage examples.
We also present the current implementation, which
takes 109~source lines of code (including some extra
features). The programmer can mechanically convert
lexical scoping (accessing a local defined outside)
into code using our closures, by applying assignment
conversion and flat-closure conversion. The result
can do everything one expects from closures,
including passing Knuth's man-or-boy test and living
beyond the end of their enclosing definitions.}
}

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/
EuroForth 2024 proceedings: http://www.euroforth.org/ef24/papers/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Paul Rubin on Wed Jul 2 16:41:44 2025

Paul Rubin <[email protected]d> writes:

Forth was designed for threaded interpreter implementation and the whole >notion of an optimizing Forth compiler is at best an abstraction
inversion.

Looking at <https://en.wikipedia.org/wiki/Abstraction_inversion>, I
don't see that at all. Or if it is, it's at least as bad for
optimizing compilers for other languages. I cannot invoke any innards
of C optimizing compilers, whereas in Forth we at least have LITERAL
COMPILE, etc. to generate code. In Gforth (development version) you
also can invoke SET-OPTIMIZER to specify how a given word is
optimized.

But, supposedly, VFX compiler output runs 10x as fast as
the same code under an interpreter.

You can see some data in Figure 1 of <https://www.complang.tuwien.ac.at/papers/ertl24-interpreter-speed.pdf>

I think in the Moore era, you got two speedups: 1) interpreted Forth was
10x faster than its main competitor, interpreted BASIC

Not sure what you mean with Moore era; he has been active for many
decades.

Maybe on home computers, Forth's main competitor was interpreted
BASIC, but in the environment where Moore discovered Forth
(minicomputers like the IBM 1130 and the PDP-11), it wasn't. If you
read up on the history of Forth, BASIC is not even mentioned. Fortran
and Algol are.

So I don't see much legitimate complaint about slowdowns due to Forth
locals. The objection is based on other considerations, either
legitimate ones that I don't yet understand, or essentially bogus ones
that I don't completely see through.

Those who have a Forth system that implements locals don't object to
the use of locals, those whose Forth system does not implement them,
do. Looks like the objections are sour-grapes arguments.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/
EuroForth 2024 proceedings: http://www.euroforth.org/ef24/papers/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to [email protected] on Wed Jul 2 12:02:01 2025

[email protected] writes:

I had a beef with Andrew Tanenbaum, stating that it is hard to write a c-compiler for the 6502. In reality the 6502 is a brilliant
design. You must realize that the 6502 has 128 16 bit registers on the
zero page.

It's even hard to write compact assembly code, which is why Steve
Wozniak wrote SWEET16.

I briefly used Aztec C on the Apple II, IIRC. I think it generated
bytecode for an interpreter, but am not sure.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to peter on Wed Jul 2 12:07:05 2025

peter <[email protected]> writes:

The nice thing is that I now have >ls ls> and ls@. Compared with the
return stack this also works across words. One word can put stuff on
the localstack and another retrieve it. This is sometimes very useful.

As I remember, Flashforth also has a 3rd stack like that, without having locals. It's called P so you have >P etc.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to All on Thu Jul 3 03:14:54 2025

Am 03.07.2025 um 01:59 schrieb Paul Rubin:

Hans Bezemer <[email protected]> writes:

1. Adding general locals is trivial. It takes just one single line of
Forth.

I don't see how to do it in one line, and trivial is a subjective term.
I'd say in any case that it's not too difficult, but one line seems overoptimistic. Particularly, you need something like (LOCAL) in the
VM. The rest is just some extensions to the colon compiler. Your
mention of it taking 3-4 screens sounded within reason to me, and I
don't consider that to be a lot of code.

I would not implement locals for simple integers only. Forth has enough
stack gymnastics words for that.

IMO locals only make sense if you can at least additionally handle
floats and dynamic strings, preferably also structs and arrays.
Such an implementation is certainly not trivial.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to Hans Bezemer on Wed Jul 2 16:59:34 2025

Hans Bezemer <[email protected]> writes:

1. Adding general locals is trivial. It takes just one single line of
Forth.

I don't see how to do it in one line, and trivial is a subjective term.
I'd say in any case that it's not too difficult, but one line seems overoptimistic. Particularly, you need something like (LOCAL) in the
VM. The rest is just some extensions to the colon compiler. Your
mention of it taking 3-4 screens sounded within reason to me, and I
don't consider that to be a lot of code.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Hans Bezemer on Thu Jul 3 08:43:38 2025

Hans Bezemer <[email protected]> writes:

1. Adding general locals is trivial. It takes just one single line of
Forth. Sure, you don't got the badly designed and much too heavy
Forth-2012 implementation,

There is no Forth-2012 implementation of locals. The proposal
includes a referece implementation, but that is based on a
non-standard word BUILDLV and is therefore not included in <http://www.forth200x.org/reference-implementations/>; instead, you
find there two implementations written in Forth-94:

http://www.forth200x.org/reference-implementations/locals.fs http://www.forth200x.org/reference-implementations/extended-locals.fs

Of these two the locals.fs implementation is the shorter and nicer
one. You can read about these two implementations in <[email protected]>.

However, looking at
<https://forth-standard.org/standard/locals/bColon>, it seems that the
editor included a variation of extensed-locals.fs.

4tH v3.64.2 will even support a *MUCH* lighter, but
fully conformant Forth-2012 LOCALS implementation.

Great! How good that Forth-2012 is not an implementation standard.

If anything, yours is a prime
example of a "sour grape argument".

Which grapes do you suppose that I am unable to reach?

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/
EuroForth 2024 proceedings: http://www.euroforth.org/ef24/papers/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From [email protected]@21:1/5 to [email protected] on Thu Jul 3 13:54:54 2025

In article <[email protected]>,
Paul Rubin <[email protected]d> wrote:

Hans Bezemer <[email protected]> writes:

1. Adding general locals is trivial. It takes just one single line of
Forth.

I don't see how to do it in one line, and trivial is a subjective term.
I'd say in any case that it's not too difficult, but one line seems >overoptimistic. Particularly, you need something like (LOCAL) in the
VM. The rest is just some extensions to the colon compiler. Your
mention of it taking 3-4 screens sounded within reason to me, and I
don't consider that to be a lot of code.

Not one line, but short leaning on existing words.
Also these LOCAL's are not usable in recursed definition.
Example in the context of ciforth.

WANT VALUE [{
: LOCAL
POSTPONE [{ _ VALUE }] POSTPONE TO LATEST >LFA @
POSTPONE LITERAL POSTPONE EXECUTE
; IMMEDIATE

Groetjes Albert
--
The Chinese government is satisfied with its military superiority over USA.
The next 5 year plan has as primary goal to advance life expectancy
over 80 years, like Western Europe.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From [email protected]@21:1/5 to [email protected] on Thu Jul 3 14:13:02 2025

In article <[email protected]>,
minforth <[email protected]> wrote:

Am 03.07.2025 um 01:59 schrieb Paul Rubin:

Hans Bezemer <[email protected]> writes:

1. Adding general locals is trivial. It takes just one single line of
Forth.

I don't see how to do it in one line, and trivial is a subjective term.
I'd say in any case that it's not too difficult, but one line seems
overoptimistic. Particularly, you need something like (LOCAL) in the
VM. The rest is just some extensions to the colon compiler. Your
mention of it taking 3-4 screens sounded within reason to me, and I
don't consider that to be a lot of code.

I would not implement locals for simple integers only. Forth has enough
stack gymnastics words for that.

IMO locals only make sense if you can at least additionally handle
floats and dynamic strings, preferably also structs and arrays.
Such an implementation is certainly not trivial.

Second that. iforth sports not only LOCAL (values), but also
FLOCAL DLOCAL DFLOCAL. You end up establishing a whole menagery
of shadow Forth words.

It is much simpler to allow definitions in a [ .. ] sequence.

Groetjes Albert
--
The Chinese government is satisfied with its military superiority over USA.
The next 5 year plan has as primary goal to advance life expectancy
over 80 years, like Western Europe.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From [email protected]@21:1/5 to [email protected] on Thu Jul 3 14:17:08 2025

In article <nnd$57e17bcd$463b2e07@d86e5bbc05746f06>,
Hans Bezemer <[email protected]> wrote:

On 03-07-2025 01:59, Paul Rubin wrote:

Hans Bezemer <[email protected]> writes:

1. Adding general locals is trivial. It takes just one single line of
Forth.

I don't see how to do it in one line, and trivial is a subjective term.
I'd say in any case that it's not too difficult, but one line seems
overoptimistic. Particularly, you need something like (LOCAL) in the
VM. The rest is just some extensions to the colon compiler. Your
mention of it taking 3-4 screens sounded within reason to me, and I
don't consider that to be a lot of code.

"Short" in my dictionary is. One. Single. Screen. No more. No less (pun >intended).

And this one is one single screen. Even with the dependencies. >https://youtu.be/FH4tWf9vPrA

Typical use:

variable a
variable b

: divide
local a
local b

b ! a ! a @ b @ / ;

Does recursion, the whole enchilada. One line.
Thanks to Fred Behringer - and Albert, who condensed it to a single
single line definition. Praise is where praise is due.

Although 'local variables' like this are much preferred (superior) ,
LOCAL (value) is what is asked for.
If you don't have the akward, forward parsing TO already defined, you
are bound to do more work.

Hans Bezemer

Groetjes Albert
--
The Chinese government is satisfied with its military superiority over USA.
The next 5 year plan has as primary goal to advance life expectancy
over 80 years, like Western Europe.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From [email protected]@21:1/5 to [email protected] on Thu Jul 3 14:27:02 2025

In article <[email protected]>,
Paul Rubin <[email protected]d> wrote:

peter <[email protected]> writes:

The nice thing is that I now have >ls ls> and ls@. Compared with the
return stack this also works across words. One word can put stuff on
the localstack and another retrieve it. This is sometimes very useful.

As I remember, Flashforth also has a 3rd stack like that, without having >locals. It's called P so you have >P etc.

Most Marcel Hendrix Forths has a 3rd stack called "system stack"

S S> S@ (apart from LOCAL stacks).

Groetjes Albert
--
The Chinese government is satisfied with its military superiority over USA.
The next 5 year plan has as primary goal to advance life expectancy
over 80 years, like Western Europe.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From [email protected]@21:1/5 to Anton Ertl on Thu Jul 3 14:51:38 2025

In article <[email protected]>,
Anton Ertl <[email protected]> wrote:

Hans Bezemer <[email protected]> writes:

And just before you're done you put your stuff on the stack and like a
tiny assembly line it is transported to the next thing. This means that
the function call overhead is MINIMAL - much less than C.

Oh, really? Wasn't it you who wrote ><nnd$34fd6cd6$25a88dac@ac6bb1addf3a4136>:

|if you want speed, do your program in C -O3 so to say. It'll blow
|any Forth out of the water.

And if we look at the results for fib (a benchmark that performs lots
of calls) inf Figure 1 of ><https://www.complang.tuwien.ac.at/papers/ertl24-interpreter-speed.pdf>,
gcc -O3 outperforms the fastest Forth system, and gcc -O1 outperforms
the fastest Forth system by even more.

I'm with Knuth here. No algorithms he describes use recursion, only
explicit stacks.
Don't try to optimise recurse functions,
use an explicit stack.
: FIB >R 1 0 R> 0 ?DO SWAP OVER + LOOP NIP ;

And that's not the solution - it's the PROBLEM. You can add loads of >>complexity without much (immediate) penalty. You're not compelled to
study - or even *think* about your algorithm. You most probably will end
up with code that works - without you understanding why.

And that will either bite you later, or limit your capability to expend
on that code.

Yes, you can expend a lot of effort on code that's hard to write and
hard to understand, but that's not limited to Forth.

If you mean that, by making code hard to write, Forth without locals
makes it easier to extend the code, I very much doubt it. In some
cases it may not be harder, but in others (where the extension
requires, e.g., dealing with additional data in existing colon
definitions) it is harder.

I like to remind of the youtube FORTH2020 of Wagner. This concerns
motions of aircraft, position speed, pitch roll and yaw etc.
Terribly complicated, no LOCAL's. There was a question whether LOCAL's
could have made Wagners code easier.
He stated the ideal (paraphrased by me) that "code is its own comment"

My most involved programs are ciasdis and manx. No LOCAL's in sight.
I don't want to imply that these or Wagner's programs are easy to write,
but the effort pays off.

If Beez wants to say that Forth without locals tend to be more
architectural sound, and therefore easier to extend, I agree.

- anton

Groetjes Albert
--
The Chinese government is satisfied with its military superiority over USA.
The next 5 year plan has as primary goal to advance life expectancy
over 80 years, like Western Europe.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From [email protected]@21:1/5 to [email protected] on Fri Jul 4 12:01:51 2025

In article <[email protected]>,
dxf <[email protected]> wrote:

On 3/07/2025 10:51 pm, [email protected] wrote:

...
I like to remind of the youtube FORTH2020 of Wagner. This concerns
motions of aircraft, position speed, pitch roll and yaw etc.
Terribly complicated, no LOCAL's. There was a question whether LOCAL's
could have made Wagners code easier.
He stated the ideal (paraphrased by me) that "code is its own comment"

That was an interesting video even if more a rundown of his (long) history
as a professional forth programmer. Here's the link for anyone curious:

https://youtu.be/V9ES9UZHaag

He said he uses the hardware fp stack for speed. Is he really only
using 8 levels of stack?

8 level is plenty as long as you refrain from recursion that in
Wagners context would be not even remotely useful.

Groetjes Albert

--
The Chinese government is satisfied with its military superiority over USA.
The next 5 year plan has as primary goal to advance life expectancy
over 80 years, like Western Europe.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to dxf on Sat Jul 5 08:49:22 2025

dxf <[email protected]> writes:
[8 stack items on the FP stack]

Puzzling because of a thread here not long ago in which scientific users >appear to suggest the opposite. Such concerns have apparently been around
a long time:

https://groups.google.com/g/comp.lang.forth/c/CApt6AiFkxo/m/wwZmc_Tr1PcJ

I have read through the thread. It's unclear to me which scientific
users you have in mind. My impression is that 8 stack items was
deemed sufficient by many, and preferable (on 387) for efficiency
reasons.

Certainly, of the two points this thread is about, there was a
Forth200x proposal for standardizing a separate FP stack, and this
proposal was accepted. There was no proposal for increasing the
minimum size of the FP stack; Forth-2012 still says:

|The size of a floating-point stack shall be at least 6 items.

One interesting aspect is that VFX 5.x finally includes an FP package
by default, and it started by including an SSE2-based FP package which
supports a deep FP stack. However, MPE received customer complaints
about the lower number of significant digits in SSE2 (binary64)
vs. 387 (80-bit FP values), so they switched the default to the
387-based FP package that only has 8 FP stack items. Apparently no
MPE customer complains about that limitation.

OTOH, iForth-5.1-mini uses the 387 instructions, but stores FP stack
items in memory at least on call boundaries. Maybe Marcel Hendrix can
give some insight into what made him take this additional
implementation effort.

FORTH> : foo f+ f* ; ok
FORTH> : bar f@ f@ f@ execute f! ; ok
FORTH> ' foo idis
$10226000 : foo 488BC04883ED088F4500 [email protected]. $1022600A fld [r13 0 +] tbyte41DB6D00 A[m. $1022600E fld [r13 #16 +] tbyte
41DB6D10 A[m. $10226012 fxch ST(2) D9CA YJ
$10226014 lea r13, [r13 #32 +] qword
4D8D6D20 M.m
$10226018 faddp ST(1), ST DEC1 ^A
$1022601A fxch ST(1) D9C9 YI
$1022601C fpopswap, 41DB6D00D9CA4D8D6D10 A[m.YJM.m. $10226026 fmulp ST(1), ST DEC9 ^I
$10226028 fpush, 4D8D6DF0D9C941DB7D00 M.mpYIA[}. $10226032 ; 488B45004883C508FFE0 H.E.H.E..` ok FORTH> ' bar idis
$10226080 : bar 488BC04883ED088F4500 [email protected]. $1022608A pop rbx 5B [
$1022608B fld [rbx] tbyte DB2B [+
$1022608D pop rbx 5B [
$1022608E fld [rbx] tbyte DB2B [+
$10226090 pop rbx 5B [
$10226091 fld [rbx] tbyte DB2B [+
$10226093 lea r13, [r13 #-48 +] qword
4D8D6DD0 M.mP $10226097 fxch ST(3) D9CB YK
$10226099 fstp [r13 #32 +] tbyte
41DB7D20 A[}
$1022609D fstp [r13 0 +] tbyte41DB7D00 A[}. $102260A1 fstp [r13 #16 +] tbyte
41DB7D10 A[}. $102260A5 pop rbx 5B [
$102260A6 or rbx, rbx 4809DB H.[
$102260A9 je $102260B1 offset NEAR
0F8402000000 ...... $102260AF call rbx FFD3 .S
$102260B1 pop rbx 5B [
$102260B2 fpop, 41DB6D00D9C94D8D6D10 A[m.YIM.m. $102260BC fstp [rbx] tbyte DB3B [;
$102260BE ; 488B45004883C508FFE0 H.E.H.E..` ok

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/
EuroForth 2024 proceedings: http://www.euroforth.org/ef24/papers/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From [email protected]@21:1/5 to Anton Ertl on Sat Jul 5 14:21:44 2025

In article <[email protected]>,
Anton Ertl <[email protected]> wrote:
<SNIP>

One interesting aspect is that VFX 5.x finally includes an FP package
by default, and it started by including an SSE2-based FP package which >supports a deep FP stack. However, MPE received customer complaints
about the lower number of significant digits in SSE2 (binary64)
vs. 387 (80-bit FP values), so they switched the default to the
387-based FP package that only has 8 FP stack items. Apparently no
MPE customer complains about that limitation.

Interesting indeed! I would rather expect customers to complain
about not IEEE compliance.

OTOH, iForth-5.1-mini uses the 387 instructions, but stores FP stack
items in memory at least on call boundaries. Maybe Marcel Hendrix can
give some insight into what made him take this additional
implementation effort.

Once an assembler is in place, using only the internal stack
fp merely costs 23 screens in ciforth. This includes transcendental
functions, that are mostly a wrapper around an assembler instruction:
CODE FCOS FCOS, NEXT, END-CODE
The most involved are hyperbolic sine etc, that must be constructed
by combining exponentials and demands ranges.

I investigated the instruction set, and I found no way to detect
if the 8 registers stack is full.
This would offer the possibility to spill registers to memory only
if it is needed.

- anton

--
The Chinese government is satisfied with its military superiority over USA.
The next 5 year plan has as primary goal to advance life expectancy
over 80 years, like Western Europe.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to All on Sat Jul 5 14:41:11 2025

Am 05.07.2025 um 14:21 schrieb [email protected]:

I investigated the instruction set, and I found no way to detect
if the 8 registers stack is full.
This would offer the possibility to spill registers to memory only
if it is needed.

IIRC signaling and handling fp-stack overflow is not an easy task.
At most, the computer would crash.
IOW, spilling makes sense.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to minforth on Sat Jul 5 14:28:02 2025

minforth <[email protected]> writes:

Am 05.07.2025 um 14:21 schrieb [email protected]:

I investigated the instruction set, and I found no way to detect
if the 8 registers stack is full.
This would offer the possibility to spill registers to memory only
if it is needed.

IIRC signaling and handling fp-stack overflow is not an easy task.

The stopry I read is that Kahan and the 8087 architects intended to
support extending the 8087 stack into memory with an exception
handler, but that part of the specification did not get implemented as intended, and it was then extremely hard or impossible to implement
that feature. The problem was not noticed until after it was too
late; apparently 8 stack items was sufficient for most uses also
outside the Forth context, too.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/
EuroForth 2024 proceedings: http://www.euroforth.org/ef24/papers/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to All on Sat Jul 5 16:24:37 2025

Am 05.07.2025 um 14:41 schrieb minforth:

Am 05.07.2025 um 14:21 schrieb [email protected]:

I investigated the instruction set, and I found no way to detect
if the 8 registers stack is full.
This would offer the possibility to spill registers to memory only
if it is needed.

IIRC signaling and handling fp-stack overflow is not an easy task.
At most, the computer would crash.
IOW, spilling makes sense.

A deep dive into the manual

.. the C1 condition code flag is used for a variety of functions.
When both the IE and SF flags in the x87 FPU status word are set,
indicating a stack overflow or underflow exception (#IS), the C1
flag distinguishes between overflow (C1=1) and underflow (C1=0).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Pelc@21:1/5 to dxf on Sun Jul 6 08:46:53 2025

On 6 Jul 2025 at 04:52:37 CEST, "dxf" <[email protected]> wrote:

On 5/07/2025 6:49 pm, Anton Ertl wrote:

dxf <[email protected]> writes:
[8 stack items on the FP stack]

Puzzling because of a thread here not long ago in which scientific users >>> appear to suggest the opposite. Such concerns have apparently been around >>> a long time:

https://groups.google.com/g/comp.lang.forth/c/CApt6AiFkxo/m/wwZmc_Tr1PcJ

I have read through the thread. It's unclear to me which scientific
users you have in mind. My impression is that 8 stack items was
deemed sufficient by many, and preferable (on 387) for efficiency
reasons.

AFAICS both Skip Carter (proponent) and Julian Noble were suggesting the
6 level minimum were inadequate. A similar sentiment was expressed here
only several months ago. AFAIK all major forths supporting x87 hardware offer software stack options.

Certainly, of the two points this thread is about, there was a
Forth200x proposal for standardizing a separate FP stack, and this
proposal was accepted. There was no proposal for increasing the
minimum size of the FP stack; Forth-2012 still says:

|The size of a floating-point stack shall be at least 6 items.

Only because nothing further was heard. What became of the review
Elizabeth announced I've no idea.

One interesting aspect is that VFX 5.x finally includes an FP package
by default, and it started by including an SSE2-based FP package which
supports a deep FP stack. However, MPE received customer complaints
about the lower number of significant digits in SSE2 (binary64)
vs. 387 (80-bit FP values), so they switched the default to the
387-based FP package that only has 8 FP stack items. Apparently no
MPE customer complains about that limitation.
...

AFAIK x87 hardware stack was always MPE's main and best supported FP
package. As for SSE2 it wouldn't exist if industry didn't consider double-precision adequate. My impression of MPE's SSE2 implementation
is that it's 'a work in progress'. The basic precision is there but transcendentals appear to be limited to single-precision. That'd be
the reason I'd stick with MPE's x87 package. Other reason is it's now
quite difficult and error-prone to switch FP packages as it involves rebuilding the system. The old scheme was simpler and idiot-proof.

You do not have to rebuild the system to switch. Just read the manual.

"The old scheme was simpler and idiot-proof." Yes, that's why we
did it that way, but a certain "guru" who only does testing kept
moaning. If people would prefer us to go back to the old scheme,
VFX 6 still has time for changes. The whole idea that compiling
one file is improper is very non-Forth, or even anti-Forth.

I may be getting grumpier as I get older.

Stephen
--
Stephen Pelc, [email protected]
Wodni & Pelc GmbH
Vienna, Austria
Tel: +44 (0)7803 903612, +34 649 662 974 http://www.vfxforth.com/downloads/VfxCommunity/
free VFX Forth downloads

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to dxf on Sun Jul 6 11:30:27 2025

dxf <[email protected]> writes:

On 5/07/2025 6:49 pm, Anton Ertl wrote:

dxf <[email protected]> writes:
[8 stack items on the FP stack]

Puzzling because of a thread here not long ago in which scientific users >>> appear to suggest the opposite. Such concerns have apparently been around >>> a long time:

https://groups.google.com/g/comp.lang.forth/c/CApt6AiFkxo/m/wwZmc_Tr1PcJ

I have read through the thread. It's unclear to me which scientific
users you have in mind. My impression is that 8 stack items was
deemed sufficient by many, and preferable (on 387) for efficiency
reasons.

AFAICS both Skip Carter (proponent) and Julian Noble were suggesting the
6 level minimum were inadequate.

Skip Carter did not post in this thread, but given that he proposed
the change, he probably found 6 to be too few; or maybe it was just a phenomenon that we also see elsewhere as range anxiety. In any case,
he made no such proposal to Forth-200x, so apparently the need was not pressing.

Julian Noble ignored the FP stack size issue in his first posting in
this thread, unlike the separate FP stack size issue, which he
supported. So it seems that he did not care about a larger FP stack
size. In the other posting he endorsed moving FP stack items to the
data stack, but he did not write why; for all we know he might have
wanted that as a first step for getting the mantissa, exponent and
sign of the FP value as integer (and the other direction for
synthesizing FP numbers from these parts).

AFAIK all major forths supporting x87 hardware
offer software stack options.

Certainly on SwiftForth-4.0 I find no such option, it apparently
proved unnecessary. The manual mentions fpconfig.f, but no such file
exists in a SwiftForth-4.0 directory in the versions I have installed.

There exists such a file on various SwiftForth-3.x versions, but on
most of our machines SwiftForth-3.x segfaults (I have not investigated
why; it used to work). Ok, so I found an old system where it does not segfault, but trying to load FP on that system produced no joy:

[k8:~:118696] sf-3.11.0
SwiftForth i386-Linux 3.11.0 23-Feb-2021
require fpmath File not found

[k8:~:118699] sf-3.11.0 "include /nfs/nfstmp/anton/SwiftForth-3.11.0/lib/options/fpmath.f"
/nfs/nfstmp/anton/SwiftForth-3.11.0/lib/options/fpmath.f
49: REQUIRES fpconfig >>> File not found

[k8:~:118700] sf-3.11.0 "include /nfs/nfstmp/anton/SwiftForth-3.11.0/lib/options/linux/fpconfig.f include /nfs/nfstmp/anton/SwiftForth-3.11.0/lib/options/fpmath.f"
/nfs/nfstmp/anton/SwiftForth-3.11.0/lib/options/fpmath.f
49: REQUIRES fpconfig >>> File not found

[k8:~:118702] sf-3.11.0 "include /nfs/nfstmp/anton/SwiftForth-3.11.0/lib/options/linux/fpconfig.f"
ok
include /nfs/nfstmp/anton/SwiftForth-3.11.0/lib/options/fpmath.f /nfs/nfstmp/anton/SwiftForth-3.11.0/lib/options/fpmath.f
49: REQUIRES fpconfig >>> File not found

Certainly, of the two points this thread is about, there was a
Forth200x proposal for standardizing a separate FP stack, and this
proposal was accepted. There was no proposal for increasing the
minimum size of the FP stack; Forth-2012 still says:

|The size of a floating-point stack shall be at least 6 items.

Only because nothing further was heard. What became of the review
Elizabeth announced I've no idea.

The ANS Forth committee gave up after a price increase by the
origanization under whose umbrella they did their work (it's a
Tom-Sawyer-like business model: You work for them, and they charge you
money for that).

Several years later, we started Forth-200x, and we started with dpANS6/Forth-94, not with whatever the state their revision was in
when they gave up. Concerning these two issues, the separate FP stack
was proposed and accepted; the larger stack depth was not even
proposed, not by Skip Carter, and not by anyone else. If you think
that a larger number of guaranteed FP stack items is necessary,
propose it.

The old scheme was simpler and idiot-proof.

Maybe for using a different FP package which is something I have used
only once (IIRC I modified the 387 package to do store and load FP
values in 8 byes, in order to investigate whether that explains a
performance difference). But thinking about it, no, it was everything
but simple. I had to find the VFX manual every time, then look up the
name of the FP package (which is named as unmemorizable as possible
without going to random names), then search for that package in the
files on the system, and finally cut and paste the path of that file.

A typical sequence of commands was:

locate -i vfx|grep pdf
xpdf /usr/share/doc/VfxForth/VfxLin.pdf
bg
locate ndp387.fth
locate p387.fth
vfxlin "include /usr/local/VfxLinEval/Lib/x86/Ndp387.fth"

If I want to switch from the default FP package to a different
package, I essentially have to take the same steps, I only have to add
two additional commands before including the FP package; the last
command for including the SSE implementation becomes:

vfx64 "integers remove-FP-pack include /nfs/nfstmp/anton/VfxForth64Lin-5.43/Lib/x64/FPSSE64S.fth"

(A special twist here is that the documentation says that the file is
called FPSSE64.fth (with only 2 S characters), so I needed a few more
locate invocations to find the right one).

If you find the former simple, why not the latter (apart from the
documentation mistake)?

In any case, in almost all cases I use the default FP pack, and here
the VFX-5 and SwiftForth-4 approach is unbeatable in simplicity.
Instead of performing the sequence of commands shown above, I just
start the Forth system, and FP words are ready.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/
EuroForth 2024 proceedings: http://www.euroforth.org/ef24/papers/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From [email protected]@21:1/5 to Anton Ertl on Mon Jul 7 11:30:10 2025

In article <[email protected]>,
Anton Ertl <[email protected]> wrote:
<SNIP>

Skip Carter did not post in this thread, but given that he proposed
the change, he probably found 6 to be too few; or maybe it was just a >phenomenon that we also see elsewhere as range anxiety. In any case,
he made no such proposal to Forth-200x, so apparently the need was not >pressing.

Note that the vast experience Wagner has, trumps the anxiety others
may or may not have.

<SNIP>

In any case, in almost all cases I use the default FP pack, and here
the VFX-5 and SwiftForth-4 approach is unbeatable in simplicity.
Instead of performing the sequence of commands shown above, I just
start the Forth system, and FP words are ready.

And even
WANT -fp-
is not much of a hassle in ciforth.

<SNIP>

- anton

Groetjes Albert
--
The Chinese government is satisfied with its military superiority over USA.
The next 5 year plan has as primary goal to advance life expectancy
over 80 years, like Western Europe.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to dxf on Wed Jul 9 15:10:30 2025

dxf <[email protected]> writes:

As for SSE2 it wouldn't exist if industry didn't consider
double-precision adequate.

SSE2 is/was first and foremost a vectorizing extension, and it has been superseded quite a few times, indicating it was never all that
adequate. I don't know whether any of its successors support extended precision though.

W. Kahan was a big believer in extended precision (that's why the 8087
had it from the start). I believes IEEE specifies both 80 bit and 128
bit formats in addition to 64 bit. The RISC-V spec includes encodings
for 128 bit IEEE but I don't know if any RISC-V hardware actually
implements it. I think there are some IBM mainframe CPUs that have it.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to All on Thu Jul 10 02:18:50 2025

Am 10.07.2025 um 00:10 schrieb Paul Rubin:

dxf <[email protected]> writes:

As for SSE2 it wouldn't exist if industry didn't consider
double-precision adequate.

SSE2 is/was first and foremost a vectorizing extension, and it has been superseded quite a few times, indicating it was never all that
adequate. I don't know whether any of its successors support extended precision though.

You don't need 64-bit doubles for signal or image processing.
Most vector/matrix operations on streaming data don't require
them either. Whether SSE2 is adequate or not to handle such data
depends on the application. "Industry" can manage well with 32-bit
floats or even smaller with non-standard number formats.

The AVX extension introduced YMM registers that can do simultaneous
math on four 64-bit double-precision floating-point numbers.
The intended application domain was scientific computing.

The determining factors are data througput and storage space.
Today, with GPUs, speed and power consumption, driven by AI.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to minforth on Wed Jul 9 21:32:42 2025

minforth <[email protected]> writes:

You don't need 64-bit doubles for signal or image processing.
Most vector/matrix operations on streaming data don't require
them either. Whether SSE2 is adequate or not to handle such data
depends on the application.

Sure, and for that matter, AI inference uses 8 bit and even 4 bit
floating point. Kahan on the other hand was interested in engineering
and scientific applications like PDE solvers (airfoils, fluid dynamics,
FEM, etc.). That's an area where roundoff error builds up after many iterations, thus extended precision.

"Industry" can manage well with 32-bit floats or even smaller with non-standard number formats.

Depends on your notion of "industry".

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to dxf on Wed Jul 9 21:35:20 2025

dxf <[email protected]> writes:

I suspect IEEE simply standardized what had become common practice among implementers.

No, it was really new and interesting. https://people.eecs.berkeley.edu/~wkahan/ieee754status/754story.html

What little I know about SSE2 it's not as well thought out or organized
as Intel's original effort. E.g. doing something as simple as changing
sign of an fp number is a pain when NANs are factored in.

I wonder if later SSE/AVX/whatever versions fixed this stuff.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to minforth on Wed Jul 9 22:59:00 2025

minforth <[email protected]> writes:

I don't do parallelization, but I was still surprised by the good
results using FMA. In other words, increasing floating-point number
size is not always the way to go.

Kahan was an expert in clever numerical algorithms that avoid roundoff
errors, Kahan summation being one such algorithm. But he realized that
most programmers don't have the numerics expertise to come up with
schemes like that. A simpler and usually effective way to avoid
roundoff error swamping the result is simply to use double or extended precision. So that is what he often suggested.

Here's an example of a FEM calculation that works well with 80 bit but
poorly with 64 bit FP:

https://people.eecs.berkeley.edu/~wkahan/Cantilever.pdf

Anyhow, first step is to select the best fp rounding method ....

Kahan advised compiling the program three times, once for each IEEE
rounding mode. Run all three programs and see if the outputs differ by
enough to care about. If they do, you have some precision loss to deal
with somehow, possibly by use of wider floats.

https://people.eecs.berkeley.edu/~wkahan/Mindless.pdf

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to All on Thu Jul 10 07:37:02 2025

Am 10.07.2025 um 06:32 schrieb Paul Rubin:

minforth <[email protected]> writes:

You don't need 64-bit doubles for signal or image processing.
Most vector/matrix operations on streaming data don't require
them either. Whether SSE2 is adequate or not to handle such data
depends on the application.

Sure, and for that matter, AI inference uses 8 bit and even 4 bit
floating point.

Or fuzzy control for instance.

Kahan on the other hand was interested in engineering
and scientific applications like PDE solvers (airfoils, fluid dynamics,
FEM, etc.). That's an area where roundoff error builds up after many iterations, thus extended precision.

That's why I use Kahan summation for dot products. It is slow but
rounding error accumulation remains small. A while ago I read an
article about this issue in which the author(s) performed extensive tests
of different dot product calculation algorithms on many serial
data sets from finance, geology, oil industry, meteorology etc.
Their target criterion was to find an acceptable balance between
computational speed and minimal error.

The 'winner' was a chained fused-multiply-add algorithm (many
CPUs/GPUs can perform FMA in hardware) which makes for shorter code
(good for caching). And it supports speed improvement by
parallelization (recursive halving of the sets until manageable
vector size followed by parallel computation).

I don't do parallelization, but I was still surprised by the good
results using FMA. In other words, increasing floating-point number
size is not always the way to go. Anyhow, first step is to select
the best fp rounding method ....

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Paul Rubin on Thu Jul 10 07:47:23 2025

Paul Rubin <[email protected]d> writes:

dxf <[email protected]> writes:

As for SSE2 it wouldn't exist if industry didn't consider
double-precision adequate.

SSE2 is/was first and foremost a vectorizing extension, and it has been >superseded quite a few times, indicating it was never all that
adequate.

But SSE2 was also the way to finally implement mainstream floating
point: double precision instead of extended precision (with its
double-rounding woes when trying to implement double precision) and
registers (for which register allocation algorithms have been worked
on for a long time) instead of the stack. So starting with AMD64
(which was guaranteed to include SSE2) SSE2 became the preferred
scalar floating point instruction set, which is also reflected in the
ABIs on AMD64. And in this function SSE2 has not been superseded.

Concerning vectors, AVX allows 256 bits of width, eliminates the
alignment brain damage of SSE/SSE2, and gives us three-address
instructions. AVX2 gives us integer instructions. The various
AVX-512 extensions are a mess of overlapping extensions (to be unified
by AVX10) that generally provide up to 512 bits of width and control
of individual lanes with mask registers.

I don't know whether any of its successors support extended
precision though.

No.

W. Kahan was a big believer in extended precision (that's why the 8087
had it from the start). I believes IEEE specifies both 80 bit and 128
bit formats in addition to 64 bit.

Not 80-bit format. binary128 and binary256 are specified.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/
EuroForth 2024 proceedings: http://www.euroforth.org/ef24/papers/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to dxf on Thu Jul 10 08:07:02 2025

dxf <[email protected]> writes:

I suspect IEEE simply standardized what had become common practice among >implementers.

Not at all. There was no common practice at the time.

While there was some sentiment to standardize the VAX FP stuff, and as
far as number formats are concerned, they almost did (IEEE binary32
uses the same format as the VAX F, IEEE binary64 uses the same format
as VAX G, and IEEE binary128 uses the same format as VAX H), if we
ignore the perverse byte order of the VAX formats. However, IEEE FP
uses a different bias for the exponent, requires implementing denormal
numbers, infinities and NaNs.

So actually none of the hardware manufacturers implemented IEEE FP at
the time, not DEC, not IBM, and not Cray. And yet, industry accepted
IEEE FP and within a few years all new architectures supported IEEE
FP, and new models of existing hardware usually also implemented IEEE
FP.

By using 80 bits /internally/ Intel went a long way to
achieving IEEE's spec for double precision.

The 8087 did not just use 80 bits internally, it exposed them to
programmers. When Intel released the 8087, IEEE 754 was not finished.
But Kahan was both active in the standardization community and in the
8087 development, so you can find his ideas in both. His and Intel's
idea was that the 8087 would be IEEE standard-conforming, but given
that the standard came out later, that was not quite the case.

E.g. doing something as simple as changing
sign of an fp number is a pain when NANs are factored in.

I don't see that. When you change the sign of a NaN, it's still a
NaN.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/
EuroForth 2024 proceedings: http://www.euroforth.org/ef24/papers/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Pelc@21:1/5 to minforth on Thu Jul 10 08:50:43 2025

On 10 Jul 2025 at 02:18:50 CEST, "minforth" <[email protected]> wrote:

"Industry" can manage well with 32-bit
floats or even smaller with non-standard number formats.

My customers beg to differ and some use 128 bit numbers for
their work. In a construction estimate for one runway for the
new Hong Kong airport, the cost difference between a 64 bit FP
calculation and the integer calculation was US 10 million dollars.
This was for pile capping which involves a large quantity of relatively
small differences.

Stephen

--
Stephen Pelc, [email protected]
Wodni & Pelc GmbH
Vienna, Austria
Tel: +44 (0)7803 903612, +34 649 662 974 http://www.vfxforth.com/downloads/VfxCommunity/
free VFX Forth downloads

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to dxf on Thu Jul 10 08:35:49 2025

dxf <[email protected]> writes:

The catch with SSE is there's nothing like FCHS or FABS
so depending on how one implements them, results vary across implementations.

You can see in Gforth how to implement FNEGATE and FABS with SSE2:

see fnegate
Code fnegate
0x000055e6a78a8274: add $0x8,%rbx
0x000055e6a78a8278: xorpd 0x24d8f(%rip),%xmm15 # 0x55e6a78cd010
0x000055e6a78a8281: mov %r15,%r9
0x000055e6a78a8284: mov (%rbx),%rax
0x000055e6a78a8287: jmp *%rax
end-code
ok
0x55e6a78cd010 16 dump
55E6A78CD010: 00 00 00 00 00 00 00 80 - 00 00 00 00 00 00 00 00
ok
see fabs
Code fabs
0x000055e6a78a84fe: add $0x8,%rbx
0x000055e6a78a8502: andpd 0x24b15(%rip),%xmm15 # 0x55e6a78cd020
0x000055e6a78a850b: mov %r15,%r9
0x000055e6a78a850e: mov (%rbx),%rax
0x000055e6a78a8511: jmp *%rax
end-code
ok
0x55e6a78cd020 16 dump
55E6A78CD020: FF FF FF FF FF FF FF 7F - 00 00 00 00 00 00 00 00

The actual implementation is the xorpd instruction for FNEGATE, and in
the andpd instruction for FABS. The memory locations contain masks:
for FNEGATE only the sign bit is set, for FABS everything but the sign
bit is set.

Sure you can implement FNEGATE and FABS in more complicated ways, but
you can also implement them in more complicated ways if you use the
387 instruction set. Here's an example of more complicated
implementations:

see fnegate
FNEGATE
( 004C4010 4833C0 ) XOR RAX, RAX
( 004C4013 F34D0F7EC8 ) MOVQ XMM9, XMM8
( 004C4018 664C0F6EC0 ) MOVQ XMM8, RAX
( 004C401D F2450F5CC1 ) SUBSD XMM8, XMM9
( 004C4022 C3 ) RET/NEXT
( 19 bytes, 5 instructions )
ok
see fabs
FABS
( 004C40B0 E8FBEFFFFF ) CALL 004C30B0 FS@
( 004C40B5 4885DB ) TEST RBX, RBX
( 004C40B8 488B5D00 ) MOV RBX, [RBP]
( 004C40BC 488D6D08 ) LEA RBP, [RBP+08]
( 004C40C0 0F8D05000000 ) JNL/GE 004C40CB
( 004C40C6 E845FFFFFF ) CALL 004C4010 FNEGATE
( 004C40CB C3 ) RET/NEXT
( 28 bytes, 7 instructions )

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/
EuroForth 2024 proceedings: http://www.euroforth.org/ef24/papers/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to All on Thu Jul 10 12:14:24 2025

Am 10.07.2025 um 10:50 schrieb Stephen Pelc:

On 10 Jul 2025 at 02:18:50 CEST, "minforth" <[email protected]> wrote:

"Industry" can manage well with 32-bit
floats or even smaller with non-standard number formats.

My customers beg to differ and some use 128 bit numbers for
their work. In a construction estimate for one runway for the
new Hong Kong airport, the cost difference between a 64 bit FP
calculation and the integer calculation was US 10 million dollars.
This was for pile capping which involves a large quantity of relatively
small differences.

You are right. "Industry" is one of those non-words that should be
used with care, or avoided altogether, before it becomes a tautology.

IIRC I only had one real application for 128-bit floats: simulation
of heat propagation through thick-walled tubes. The simulation
involved numerical integration which can be prone to error accumulation.
One variant of MinForth's fp-number wordset can be built with gcc's
libquadmath library. It is slower, but speed is not always important.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to Anton Ertl on Thu Jul 10 12:33:52 2025

[email protected] (Anton Ertl) writes:

I believes IEEE specifies both 80 bit and 128 bit formats in addition
to 64 bit.

Not 80-bit format. binary128 and binary256 are specified.

I see, 80 bits is considered double-extended. "The x87 and Motorola
68881 80-bit formats meet the requirements of the IEEE 754-1985 double
extended format,[12] as does the IEEE 754 128-bit binary format." (https://en.wikipedia.org/wiki/Extended_precision)

Interestingly, Kahan's 1997 report on IEEE 754's status does say 80 bit
is specified. But it sounds like that omits some nuance.

https://people.eecs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to All on Thu Jul 10 23:16:27 2025

Am 10.07.2025 um 21:33 schrieb Paul Rubin:

[email protected] (Anton Ertl) writes:

I believes IEEE specifies both 80 bit and 128 bit formats in addition
to 64 bit.

Not 80-bit format. binary128 and binary256 are specified.

I see, 80 bits is considered double-extended. "The x87 and Motorola
68881 80-bit formats meet the requirements of the IEEE 754-1985 double extended format,[12] as does the IEEE 754 128-bit binary format." (https://en.wikipedia.org/wiki/Extended_precision)

Interestingly, Kahan's 1997 report on IEEE 754's status does say 80 bit
is specified. But it sounds like that omits some nuance.

https://people.eecs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF

Kahan was also overly critical of dynamic Unum/Posit formats.

Time has shown that he was partially wrong: https://spectrum.ieee.org/floating-point-numbers-posits-processor

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to minforth on Thu Jul 10 18:40:32 2025

minforth <[email protected]> writes:

Kahan was also overly critical of dynamic Unum/Posit formats.
Time has shown that he was partially wrong: https://spectrum.ieee.org/floating-point-numbers-posits-processor

I don't feel qualified to draw a conclusion from this. I wonder what
the numerics community thinks, if there is any consensus. I remember
being dubious of posits when I first heard of them, though Kahan
probably influenced that. I do know that IEEE 754 took a lot of trouble
to avoid undesirable behaviours that never would have occurred to most
of us. No idea how well posits do at that. I guess though, given the continued attention they get, they must be more interesting than I had
thought.

I saw one of the posit articles criticizing IEEE 754 because IEEE 754
addition is not always associative. But that is inherent in how
floating point arithmetic works, and I don't see how posit addition can
avoid it. Let a = 1e100, b = -1e100, and c=1. So mathematically,
a+b+c=1. You should get that from (a+b)+c in your favorite floating
point format. But a+(b+c) will almost certainly be 0, without very high precision (300+ bits).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to dxf on Thu Jul 10 20:17:55 2025

dxf <[email protected]> writes:

When someone begins with the line it rarely ends well:
"Twenty years ago anarchy threatened floating-point arithmetic."
One floating-point to rule them all.

This gives a good perspective on posits:

https://people.eecs.berkeley.edu/~demmel/ma221_Fall20/Dinechin_etal_2019.pdf

Floating point arithmetic in the 1960s (before my time) was really in a terrible state. Kahan has written about it. Apparently IBM 360
floating point arithmetic had to be redesigned after the fact, because
the original version had such weird anomalies.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to All on Fri Jul 11 05:15:49 2025

Am 11.07.2025 um 03:40 schrieb Paul Rubin:

minforth <[email protected]> writes:

Kahan was also overly critical of dynamic Unum/Posit formats.
Time has shown that he was partially wrong:
https://spectrum.ieee.org/floating-point-numbers-posits-processor

I don't feel qualified to draw a conclusion from this. I wonder what
the numerics community thinks, if there is any consensus. I remember
being dubious of posits when I first heard of them, though Kahan
probably influenced that. I do know that IEEE 754 took a lot of trouble
to avoid undesirable behaviours that never would have occurred to most
of us. No idea how well posits do at that. I guess though, given the continued attention they get, they must be more interesting than I had thought.

I saw one of the posit articles criticizing IEEE 754 because IEEE 754 addition is not always associative. But that is inherent in how
floating point arithmetic works, and I don't see how posit addition can
avoid it. Let a = 1e100, b = -1e100, and c=1. So mathematically,
a+b+c=1. You should get that from (a+b)+c in your favorite floating
point format. But a+(b+c) will almost certainly be 0, without very high precision (300+ bits).

AFAIK Cuda does not support posits (yet). BFLOAT16 etc. still win the
game, until the AI industry pours big money into the chip foundries
for posit math GPUs.

Even then, it is questionable, whether or when it would seep into the general-purpose CPU market.

For Forthers to play with, of course. ;o)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to All on Fri Jul 11 09:09:00 2025

Am 11.07.2025 um 05:17 schrieb Paul Rubin:

dxf <[email protected]> writes:

When someone begins with the line it rarely ends well:
"Twenty years ago anarchy threatened floating-point arithmetic."
One floating-point to rule them all.

This gives a good perspective on posits:

https://people.eecs.berkeley.edu/~demmel/ma221_Fall20/Dinechin_etal_2019.pdf

Quintessence:

Overburdened or incompetent programmers +
Posits are tricky beasts ==>
Programmers _need_ AI co-workers to avoid pitfalls

Modern times....

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to dxf on Fri Jul 11 00:55:43 2025

dxf <[email protected]> writes:

But was it the case by the mid/late 70's - or certain individuals saw an opportunity to influence the burgeoning microprocessor market? Notions of single and double precision already existed in software floating point -

Hardware floating point also had single and double precision. The
really awful 1960s systems were gone by the mid 70s. But there were a
lot of competing formats, ranging from bad to mostly-ok. VAX floating
point was mostly ok, DEC wanted IEEE to adopt it, Kahan was ok with
that, but Intel thought "go for the best possible". Kahan's
retrospectives on this stuff are good reading:

http://people.eecs.berkeley.edu/~wkahan/index.htm

I've linked a few of them. I liked the quote

It was remarkable that so many hardware people there, knowing how
difficult p754 would be, agreed that it should benefit the community
at large. If it encouraged the production of floating-point software
and eased the development of reliable software, it would help create a
larger market for everyone's hardware. This degree of altruism was so
astonishing that MATLAB's creator Dr. Cleve Moler used to advise
foreign visitors not to miss the country's two most awesome
spectacles: the Grand Canyon, and meetings of IEEE p754.

from http://people.eecs.berkeley.edu/~wkahan/ieee754status/754story.html

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to minforth on Fri Jul 11 07:02:05 2025

minforth <[email protected]> writes:

Am 10.07.2025 um 21:33 schrieb Paul Rubin:
Kahan was also overly critical of dynamic Unum/Posit formats.

Time has shown that he was partially wrong: >https://spectrum.ieee.org/floating-point-numbers-posits-processor

What is supposed to be partially wrong?

FP numbers have a number of not-so-nice properties, and John L,
Gustafson uses that somewhat successfully to sell his alternatives to
the gullible. The way to do that is to give some examples where
traditional FP numbers fail and his alternative under consideration
works. I have looked at a (IIRC) slide deck by Kahan where he shows
examples where the altenarnative by Gustafson (don't remember which
one he looked at in that slide deck) fails and traditional FP numbers
work.

Where does that leave us? Kahan makes the good argument that
numerical analysts have worked out techniques to deal with the
shortcomings of traditional FP numbers for over 70 years. For
Gustafson's number formats these techniques are not applicable; maybe
one can find new ones for these number formats, but that's not clear.

For Posits (Type III Unums), which are close to traditional FP in many respects, one can see how that would work out; while traditional FP
has a fixed division between mantissa and exponents, in Posits the
division depends on the size of the exponent. This means that
reasoning about the accuracy of the computation would have to consider
the size of the exponent, and is therefore more complex than for
traditional FP; with a little luck you can produce a result that gives
an error bound based on the smallest mantissa size, but that error
bound will be worse than for tranditional FP.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/
EuroForth 2024 proceedings: http://www.euroforth.org/ef24/papers/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to Anton Ertl on Fri Jul 11 01:15:00 2025

[email protected] (Anton Ertl) writes:

I have looked at a (IIRC) slide deck by Kahan where he shows examples
where the altenarnative by Gustafson (don't remember which one he
looked at in that slide deck) fails and traditional FP numbers work.

Maybe this: http://people.eecs.berkeley.edu/~wkahan/UnumSORN.pdf

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Paul Rubin on Fri Jul 11 07:27:19 2025

Paul Rubin <[email protected]d> writes:

I guess though, given the
continued attention they get, they must be more interesting than I had >thought.

IMO it's the usual case of a somewhat complex topic where existing
solutions have undesirable properties, and someone promises a solution
that supposedly solves these problems. The attention comes from the
problems, not from the merits of the promised solution.

There has been attention given to research into the philosopher's
stone for many centuries; I don't think that makes it interesting
other than as an example of how people fall for promises.

I saw one of the posit articles criticizing IEEE 754 because IEEE 754 >addition is not always associative. But that is inherent in how
floating point arithmetic works, and I don't see how posit addition can
avoid it.

If you only added posits of a given width, you couldn't. Therefore
the posit specification also defines quire<n> types, which are
fixed-point numbers that can represent all the values of the posit<n>
types plus additional bits such that a sequence of a lot of additions
does not overflow. If you add the posits using a quire as
accumulator, and only then convert back to a posit, the whole thing is associative.

Of course you could also introduce a fixed-point accumulator for
traditional FP numbers and get the same benefit without using posits
for the rest.

A problem is how these accumulator types are represented in
programming languages. If somebody writes

0e n 0 ?do a i th f@ f+ loop x f!

should the 0e be stored in the accumulator, and F+ be translated to an
addition to the accumulator, and should the F! then convert the
accumulator to FP? What about

0e x f! n 0 ?do x f@ a i th f@ f+ x f! loop

In Forth I would make the accumulator explicit, with separate
FP-to-accumulator addition operations and explicit accumulator-to-fp conversion, but I expect that many people (across programming
languages) would prefer an automatic approach that works with existing
source code. We see that with auto-vectorization.

How big would the accumulator be? Looking at <https://en.wikipedia.org/wiki/Unum_(number_format)#Quire>, for
posit32 (the largest format given on the page) the quire32 type would
have 512 bits, and would allow adding up of 2^151 posit32 numbers.

Let's see how big an accumulator for binary32 would have to be: There
are exponents for finite numbers from -126..127, i.e., 254 finite
exponent values, and 23 mantissa bits, plus the sign bit, so every
binary32 number can be represented as a 278-bit fixed-point number
(with scale factor 2^-149). If you want to also allow intermediate
results of, say, 2^64 additions (good for 97 years of additions at 6G
additions per second), that increases the accumulator to 342 bits; but
note that the bigger numbers can only be represented as infinity in
binary32.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/
EuroForth 2024 proceedings: http://www.euroforth.org/ef24/papers/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to dxf on Fri Jul 11 08:33:12 2025

dxf <[email protected]> writes:

On 11/07/2025 1:17 pm, Paul Rubin wrote:

This gives a good perspective on posits:

https://people.eecs.berkeley.edu/~demmel/ma221_Fall20/Dinechin_etal_2019.pdf

Yes, that looks ok. One thing I noticed is that they suggest
implementing the smaller posit formats by intelligent table lookup.
If we have small bit widths and table lookup, I wonder if we should go
for any variant of FP (including posits) at all, or if an
exponent-only (i.e., logarithmic) representation would not be better.
E.g., for 8 bits, out of the 256 values, 2 would represent infinities,
one would represent NaN, and one would represent 0, leaving 252
remaining values. If we use 2^-11 (~1.065) as base B, this would give
a number range of B^-126=0.000356 to B^125=2635. You can vary B to
either give a more fine-grained resolution at the expense of a smaller
number range or a larger number range at the expense of a finer
resolution. <https://developer.nvidia.com/blog/floating-point-8-an-introduction-to-efficient-lower-precision-ai-training/>
presents E4M3 with +-448 range, and E5M2 with +-57344 range. But note
that the next number after 1 is 1.125 for E4M3 and 1.25 for E5M2, both
more coarse-grained than the 1.065 that an exponent-only format with
B=2^-11 gives you.

Addition and subtraction would be performed by table lookup (and would
almost always be approximate), for multiplication and division an
integer adder can be used.

Floating point arithmetic in the 1960s (before my time) was really in a
terrible state. Kahan has written about it. Apparently IBM 360
floating point arithmetic had to be redesigned after the fact, because
the original version had such weird anomalies.

But was it the case by the mid/late 70's - or certain individuals saw an >opportunity to influence the burgeoning microprocessor market?

Yes, that's the thing with FP. Some people just do their computations
and who cares if the results might be an artifact of numerical
instability. For wheather forecasts, there is no telling if a bad
prediction is due to a numerical error, due to imperfect measurement
data, or because of the butterfly effect (which is a convenient
excuse).

Other people care more about the results, and perform numerical
analysis. There are only a few specialists for that, and they have
asked for and gotten features in IEEE 754 and the hardware that the
vast majority of programmers never consciously uses, e.g., rounding
modes or the inexact "exception" (actually a flag, not a Forth
exception), which allows them to tell if there was a rounding error in
a computation. But when you use a library designed with the help of
numerical analysis, you might benefit from the existence of these
features.

They have also asked for and gotten things like denormal numbers,
infinities and NaNs that result in fewer numerical pitfalls for
programmers who are not numerical analysts. These features may be
irrelevant for those who do weather prediction, but I expect that
those who found that binary64 provided by VFX's SSE2-based package was
not good enough may benefit from such features.

In any case, FP numbers are used in very diverse ways. Not everybody
needs all the features, and even fewer features are consciously
needed, but that's the usual case with things that are not
custom-taylored for your application.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/
EuroForth 2024 proceedings: http://www.euroforth.org/ef24/papers/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Paul Rubin on Fri Jul 11 10:14:49 2025

Paul Rubin <[email protected]d> writes:

[email protected] (Anton Ertl) writes:

I have looked at a (IIRC) slide deck by Kahan where he shows examples
where the altenarnative by Gustafson (don't remember which one he
looked at in that slide deck) fails and traditional FP numbers work.

Maybe this: http://people.eecs.berkeley.edu/~wkahan/UnumSORN.pdf

Yes.

Here's a quote:

| These claims pander to Ignorance and Wishful Thinking.

That's my impression, too, and not just for Type I unums.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/
EuroForth 2024 proceedings: http://www.euroforth.org/ef24/papers/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to dxf on Sun Jul 13 09:01:41 2025

dxf <[email protected]> writes:

On 11/07/2025 8:22 pm, Anton Ertl wrote:

The rest of the industry has standardized on binary64 and binary32,
and they prefer bit-equivalent results for ease of testing. So as
soon as SSE2 gave that to them, they flocked to SSE2.
...

I wonder how much of this is academic or trend inspired?

Is ease of testing an academic concern or a trend?

AFAICS Forth
clients haven't flocked to it else vendors would have SSE2 offerings at
the same level as their x387 packs.

For Forth, Inc. and MPE AFAIK their respective IA-32 Forth system was
the only one with hardware FP for many years, so there probably was
little pressure from users for bit-identical results with, say, SPARC,
because they did not have a Forth system that ran on SPARC.

And when they did their IA-32 systems, SSE2 did not exist, so of
course they used the 387. Plus, 387 was guaranteed to be available
with Intel's Pentium and AMD's K5, while SSE2 was only available on
the Pentium 4 and the Athlon 64; so for many years there was a good
reason to prefer 387 over SSE2 if you compiled for IA-32. And gcc
generated 387 code to this day if you ask it to produce code for
IA-32. Only with AMD64 SSE2 was guaranteed, and only there gcc
defaults to it if you use float or double. Now SwiftForth and VFX are
only being ported to AMD64 since a relatively short time.

And as long as customers did not ask for bit-identical results to
those on, say, a Raspi, there was little reason to reimplement FP with
SSE2. I wonder if the development of the SSE2 package for VFX was
influenced by the availability of VFX for the Raspi.

These Forth systems also don't do global register allocation or auto-vectorization, so two other reasons why, e.g., C compilers chose
to use SSE2 on AMD64 (where SSE2 was guaranteed to be available) don't
exist for them.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/
EuroForth 2024 proceedings: http://www.euroforth.org/ef24/papers/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From peter@21:1/5 to Anton Ertl on Mon Jul 14 09:09:00 2025

On Mon, 14 Jul 2025 06:04:13 GMT
[email protected] (Anton Ertl) wrote:

dxf <[email protected]> writes:

On 13/07/2025 7:01 pm, Anton Ertl wrote:

...
For Forth, Inc. and MPE AFAIK their respective IA-32 Forth system
was the only one with hardware FP for many years, so there
probably was little pressure from users for bit-identical results
with, say, SPARC, because they did not have a Forth system that
ran on SPARC.

What do you mean by "bit-identical results"? Since SSE2 comes
without transcendentals (or basics such as FABS and FNEGATE) and >implementers are expected to supply their own, if anything, I expect >results across platforms and compilers to vary.

There are operations for which IEEE 754 specifies the result to the
last bit (except that AFAIK the representation of NaNs is not
specified exactly), among them F+ F- F* F/ FSQRT, probably also
FNEGATE and FABS. It does not specify the exact result for
transcendental functions, but if your implementation performs the same bit-exact operations for computing a transcendental function on two
IEEE 754 compliant platforms, the result will be bit-identical (if it
is a number). So just use the same implementations of transcentental functions, and your results will be bit-identical; concerning the
NaNs, if you find a difference, check if the involved values are NaNs.

- anton

This of course excludes the use of libm or other math libraries provided
by the distribution. They will change between releases.
I have with success used fdlibm, that is the base for many others. I
gives max 1 ulp rounding error. I have now also tested the core-math
project https://gitlab.inria.fr/core-math/core-math This gives
correctly rounded functions at the cost of being 10 times the compiled
size! A complete library with trig, log, pow etc comes in at 500k.

Peter

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to dxf on Mon Jul 14 06:04:13 2025

dxf <[email protected]> writes:

On 13/07/2025 7:01 pm, Anton Ertl wrote:

...
For Forth, Inc. and MPE AFAIK their respective IA-32 Forth system was
the only one with hardware FP for many years, so there probably was
little pressure from users for bit-identical results with, say, SPARC,
because they did not have a Forth system that ran on SPARC.

What do you mean by "bit-identical results"? Since SSE2 comes without >transcendentals (or basics such as FABS and FNEGATE) and implementers
are expected to supply their own, if anything, I expect results across >platforms and compilers to vary.

There are operations for which IEEE 754 specifies the result to the
last bit (except that AFAIK the representation of NaNs is not
specified exactly), among them F+ F- F* F/ FSQRT, probably also
FNEGATE and FABS. It does not specify the exact result for
transcendental functions, but if your implementation performs the same bit-exact operations for computing a transcendental function on two
IEEE 754 compliant platforms, the result will be bit-identical (if it
is a number). So just use the same implementations of transcentental functions, and your results will be bit-identical; concerning the
NaNs, if you find a difference, check if the involved values are NaNs.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/
EuroForth 2024 proceedings: http://www.euroforth.org/ef24/papers/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to Anton Ertl on Mon Jul 14 01:24:03 2025

[email protected] (Anton Ertl) writes:

So just use the same implementations of transcentental functions, and
your results will be bit-identical

Same implementations = same FP operations in the exact same order? That
seems hard to ensure, if the functions are implemented in a language
that leaves anything up to a compiler.

Also, in the early implementations x87, 68881, NS320something(?), transcententals were included in the coprocessor and the workings
weren't visible. There is a proposal to add this to RISC-V (https://libre-soc.org/ztrans_proposal/). It looks like there was an
AVX-512 ER subset that also does transcententals, but it only appeared
on some Xeon Phi processors now discontinued (per Wikipedia article on
AVX). No idea about other processors.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Paul Rubin on Mon Jul 14 10:11:57 2025

Paul Rubin <[email protected]d> writes:

[email protected] (Anton Ertl) writes:

So just use the same implementations of transcentental functions, and
your results will be bit-identical

Same implementations = same FP operations in the exact same order?

Same operations with the same data flow. Independent operations can
be reordered.

That
seems hard to ensure, if the functions are implemented in a language
that leaves anything up to a compiler.

Even gcc heeds data flow of FP operations unless you tell it with
-fastmath that anything goes.

Also, in the early implementations x87, 68881, NS320something(?), >transcententals were included in the coprocessor and the workings
weren't visible.

The bigger problem with at least x87 is that math you don't always get bit-identical results even for basic operations such as addition,
thanks to double rounding. So even if you implement transcendentals
yourself based basic operations, you can see results that are not bit-identical.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/
EuroForth 2024 proceedings: http://www.euroforth.org/ef24/papers/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to mhx on Mon Jul 14 11:31:24 2025

[email protected] (mhx) writes:

This looks very interesting. I can find Kahan and Neumaier, but
"tree addition" didn't turn up (There is a suspicious looking
reliability paper about the approach which surely is not what
you meant). Or is it pairwise addition what I should look for?

I think the idea is to treat (say) a 1024 element sum into two
512-element sums that you compute separately, then add the results
together. You do the 512-element sums the same way, recursively.
Sometimes you can parallelize the computations, and depending on the CPU
you might be able to use vector or SIMD instructions once the chunks are
small enough.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to All on Wed Jul 16 18:15:08 2025

Am 16.07.2025 um 13:25 schrieb Anton Ertl:

I did not do any accuracy measurements, but I did performance
measurements

YMMV but "fast but wrong" would not be my goal. ;-)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to minforth on Wed Jul 16 16:23:03 2025

minforth <[email protected]> writes:

Am 16.07.2025 um 13:25 schrieb Anton Ertl:

I did not do any accuracy measurements, but I did performance
measurements

YMMV but "fast but wrong" would not be my goal. ;-)

I did test correctness with cases where roundoff errors do not play a
role.

As mentioned, the RECursive balanced-tree sum (which is also the
fastest on several systems and absolutely) is expected to be more
accurate in those cases where roundoff errors do play a role. But if
you care about that, better design a test and test it yourself. It
will be interesting to see how you find out which result is more
accurate when they differ.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2025 CFP: http://www.euroforth.org/ef25/cfp.html

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to All on Wed Jul 16 19:17:16 2025

Am 16.07.2025 um 18:23 schrieb Anton Ertl:

minforth <[email protected]> writes:

Am 16.07.2025 um 13:25 schrieb Anton Ertl:

I did not do any accuracy measurements, but I did performance
measurements

YMMV but "fast but wrong" would not be my goal. ;-)

I did test correctness with cases where roundoff errors do not play a
role.

As mentioned, the RECursive balanced-tree sum (which is also the
fastest on several systems and absolutely) is expected to be more
accurate in those cases where roundoff errors do play a role. But if
you care about that, better design a test and test it yourself. It
will be interesting to see how you find out which result is more
accurate when they differ.

Meanwhile many years ago, comparative tests were carried out with a
couple of representative archived serial data (~50k samples) by
using a Java 128-bit quadruple fp-math class to perform summations
and calculate dot-product results.

The results were compared with those of naive linear summation and multiplication and pairwise divide&conquer summation at different
rounding modes, for float32 and float64. Ultimately, Kahan summation
was the winner. It is slow, but there were no in-the-loop
requirements, so for a background task, Kahan was fast enough.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From peter@21:1/5 to Anton Ertl on Thu Jul 17 10:14:00 2025

On Wed, 16 Jul 2025 15:39:26 GMT
[email protected] (Anton Ertl) wrote:

[email protected] (Anton Ertl) writes:

I did not do any accuracy measurements, but I did performance
measurements on a Ryzen 5800X:

cycles:u
gforth-fast iforth lxf SwiftForth VFX 3_057_979_501 6_482_017_334 6_087_130_593 6_021_777_424 6_034_560_441 NAI
6_601_284_920 6_452_716_125 7_001_806_497 6_606_674_147 6_713_703_069 UNR
3_787_327_724 2_949_273_264 1_641_710_689 7_437_654_901 1_298_257_315 REC
9_150_679_812 14_634_786_781 SR

cycles:u

This second table is about instructions:u

gforth-fast iforth lxf SwiftForth VFX
13_113_842_702 6_264_132_870 9_011_308_923 11_011_828_048 8_072_637_768 NAI
6_802_702_884 2_553_418_501 4_238_099_417 11_277_658_203 3_244_590_981 UNR
9_370_432_755 4_489_562_792 4_955_679_285 12_283_918_226 3_915_367_813 REC
51_113_853_111 29_264_267_850 SR

- anton

I have run this test now on my Ryzen 9950X for lxf, lxf64 ans a snapshot of gforth

Here are the results

Ryzen 9950X

lxf64
5,010,566,495 NAI cycles:u
2,011,359,782 UNR cycles:u
646,926,001 REC cycles:u
3,589,863,082 SR cycles:u

lxf64
7,019,247,519 NAI instructions:u
4,128,689,843 UNR instructions:u
4,643,499,656 REC instructions:u
25,019,182,759 SR instructions:u

gforth-fast 20250219
2,048,316,578 NAI cycles:u
7,157,520,448 UNR cycles:u
3,589,638,677 REC cycles:u
17,199,889,916 SR cycles:u

gforth-fast 20250219
13,107,999,739 NAI instructions:u
6,789,041,049 UNR instructions:u
9,348,969,966 REC instructions:u
50,108,032,223 SR instructions:u

lxf
6,005,617,374 NAI cycles:u
6,004,157,635 UNR cycles:u
1,303,627,835 REC cycles:u
9,187,422,499 SR cycles:u

lxf
9,010,888,196 NAI instructions:u
4,237,679,129 UNR instructions:u
4,955,258,040 REC instructions:u
26,018,680,499 SR instructions:u

Doing the milliseconds timing gives

lxf64 native code
timer-reset ' naive-sum bench .elapsed 889 ms elapsed ok
timer-reset ' unrolled-sum bench .elapsed 360 ms elapsed ok
timer-reset ' recursive-sum bench .elapsed 114 ms elapsed ok
timer-reset ' shift-reduce-sum bench .elapsed 647 ms elapsed ok

lxf64 token code
timer-reset ' naive-sum bench .elapsed 2�284 ms elapsed ok
timer-reset ' unrolled-sum bench .elapsed 2�723 ms elapsed ok
timer-reset ' recursive-sum bench .elapsed 3�474 ms elapsed ok
timer-reset ' shift-reduce-sum bench .elapsed 6�842 ms elapsed ok

lxf
timer-reset ' naive-sum bench .elapsed 1073 milli-seconds ok timer-reset ' unrolled-sum bench .elapsed 1103 milli-seconds ok timer-reset ' recursive-sum bench .elapsed 234 milli-seconds ok timer-reset ' shift-reduce-sum bench .elapsed 1632 milli-seconds ok

It is interesting to note how the Best algorithm" change depending
on the underlying system implementation.
lxf uses the x87 builtin fp stack, lxf64 uses sse4 and a large fp stack

Thanks for these tests, they uncovered a problem with the lxf64 code
generator. It could only handle 114 immediate values in a basic block!
Both sum128 and nsum128 compiles gigantic functions of over 2k compile code.

Best Regards
Peter

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to peter on Thu Jul 17 12:54:29 2025

peter <[email protected]> writes:

Ryzen 9950X

lxf64
5,010,566,495 NAI cycles:u
2,011,359,782 UNR cycles:u
646,926,001 REC cycles:u
3,589,863,082 SR cycles:u

lxf64 =20
7,019,247,519 NAI instructions:u =20
4,128,689,843 UNR instructions:u =20
4,643,499,656 REC instructions:u=20
25,019,182,759 SR instructions:u=20

gforth-fast 20250219
2,048,316,578 NAI cycles:u
7,157,520,448 UNR cycles:u
3,589,638,677 REC cycles:u
17,199,889,916 SR cycles:u

gforth-fast 20250219
13,107,999,739 NAI instructions:u=20
6,789,041,049 UNR instructions:u
9,348,969,966 REC instructions:u=20
50,108,032,223 SR instructions:u=20

lxf
6,005,617,374 NAI cycles:u
6,004,157,635 UNR cycles:u
1,303,627,835 REC cycles:u
9,187,422,499 SR cycles:u

lxf
9,010,888,196 NAI instructions:u
4,237,679,129 UNR instructions:u=20
4,955,258,040 REC instructions:u=20
26,018,680,499 SR instructions:u

lxf uses the x87 builtin fp stack, lxf64 uses sse4 and a large fp stack=20

Apparently the latency of ADDSD (SSE2) is down to 2 cycles on Zen5
(visible in lxf64 UNR and gforth-fast NAI) while the latency of FADD
(387) is still 6 cycles (lxf NAI and UNR). I have no explanation why
on lxf64 NAI performs so much worse than UNR, and in gforth-fast UNR
so much worse than NAI.

For REC the latency should not play a role. There lxf64 performs at
7.2IPC and 1.55 F+/cycle, whereas lxf performs only at 3.8IPC and 0.77 F+/cycle. My guess is that FADD can only be performed by one FPU, and
that's connected to one dispatch port, and other instructions also
need or are at least assigned to this dispatch port.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2025 CFP: http://www.euroforth.org/ef25/cfp.html

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to minforth on Thu Jul 17 13:56:36 2025

minforth <[email protected]> writes:

Meanwhile many years ago, comparative tests were carried out with a
couple of representative archived serial data (~50k samples)

Representative of what? Serial: what series?

Anyway, since I don't have these data, I won't repeat this experiment
with the routines I have written.

Ultimately, Kahan summation
was the winner. It is slow, but there were no in-the-loop
requirements, so for a background task, Kahan was fast enough.

I wanted to see how slow, so I added KAHAN-SUM to

https://www.complang.tuwien.ac.at/forth/programs/pairwise-sum.4th

and on the Ryzen 5800X I got (data for the other routines from the
earlier posting):

cycles:u
gforth-fast iforth lxf SwiftForth VFX
3_057_979_501 6_482_017_334 6_087_130_593 6_021_777_424 6_034_560_441 NAI
6_601_284_920 6_452_716_125 7_001_806_497 6_606_674_147 6_713_703_069 UNR
3_787_327_724 2_949_273_264 1_641_710_689 7_437_654_901 1_298_257_315 REC
9_150_679_812 14_634_786_781 SR 57_819_112_550 28_621_991_440 28_431_247_791 28_409_857_650 28_462_276_524 KAH

instructions:u
gforth-fast iforth lxf SwiftForth VFX 13_113_842_702 6_264_132_870 9_011_308_923 11_011_828_048 8_072_637_768 NAI
6_802_702_884 2_553_418_501 4_238_099_417 11_277_658_203 3_244_590_981 UNR
9_370_432_755 4_489_562_792 4_955_679_285 12_283_918_226 3_915_367_813 REC 51_113_853_111 29_264_267_850 SR 54_114_197_272 18_264_494_804 21_011_621_955 27_012_178_800 20_072_845_336 KAH

The versions used are still:
Gforth 0.7.9_20250625
iForth 5.1-mini
lxf 1.7-172-983
SwiftForth x64-Linux 4.0.0-RC89
VFX Forth 64 5.43 [build 0199] 2023-11-09

KAHan-sum is More than 20 times slower than REC on VFX64. The
particular slowness of gforth-fast is probably due to the weaknesses
of FP stack caching in Gforth.

One can do something like Kahan summation also for pairwise addition.
The base step (half of the additions) becomes simpler (no compensation
in any input), but more complicated in the inner additions (one
compensation each). The main benefit would be that several additions
can be done in parallel, and the expected error is even smaller.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2025 CFP: http://www.euroforth.org/ef25/cfp.html

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to All on Thu Jul 17 18:02:56 2025

Am 17.07.2025 um 15:56 schrieb Anton Ertl:

minforth <[email protected]> writes:

Meanwhile many years ago, comparative tests were carried out with a
couple of representative archived serial data (~50k samples)

Representative of what? Serial: what series?

Measured process signals and machine vibrations.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From peter@21:1/5 to Anton Ertl on Thu Jul 17 22:48:25 2025

On Thu, 17 Jul 2025 12:54:29 GMT
[email protected] (Anton Ertl) wrote:

peter <[email protected]> writes:

Ryzen 9950X

lxf64
5,010,566,495 NAI cycles:u
2,011,359,782 UNR cycles:u
646,926,001 REC cycles:u
3,589,863,082 SR cycles:u

lxf64 =20
7,019,247,519 NAI instructions:u =20
4,128,689,843 UNR instructions:u =20
4,643,499,656 REC instructions:u=20
25,019,182,759 SR instructions:u=20

gforth-fast 20250219
2,048,316,578 NAI cycles:u
7,157,520,448 UNR cycles:u
3,589,638,677 REC cycles:u
17,199,889,916 SR cycles:u

gforth-fast 20250219
13,107,999,739 NAI instructions:u=20
6,789,041,049 UNR instructions:u
9,348,969,966 REC instructions:u=20
50,108,032,223 SR instructions:u=20

lxf
6,005,617,374 NAI cycles:u
6,004,157,635 UNR cycles:u
1,303,627,835 REC cycles:u
9,187,422,499 SR cycles:u

lxf
9,010,888,196 NAI instructions:u
4,237,679,129 UNR instructions:u=20
4,955,258,040 REC instructions:u=20
26,018,680,499 SR instructions:u

lxf uses the x87 builtin fp stack, lxf64 uses sse4 and a large fp stack=20

Apparently the latency of ADDSD (SSE2) is down to 2 cycles on Zen5
(visible in lxf64 UNR and gforth-fast NAI) while the latency of FADD
(387) is still 6 cycles (lxf NAI and UNR). I have no explanation why
on lxf64 NAI performs so much worse than UNR, and in gforth-fast UNR
so much worse than NAI.

For REC the latency should not play a role. There lxf64 performs at
7.2IPC and 1.55 F+/cycle, whereas lxf performs only at 3.8IPC and 0.77 F+/cycle. My guess is that FADD can only be performed by one FPU, and
that's connected to one dispatch port, and other instructions also
need or are at least assigned to this dispatch port.

- anton

I did a test coding the sum128 as a code word with avx-512 instructions
and got the following results

285,584,376 cycles:u
941,856,077 instructions:u

timing was
timer-reset ' recursive-sum bench .elapsed 51 ms elapsed

so half the time of the original recursive.
with 32 zmm registers I could have done a sum256 also

the code is below for reference
r13 is the fp stack pointer
rbx top of stack
xmm0 top of fp stack

code asum128

movsd [r13-0x8], xmm0
lea r13, [r13-0x8]

vmovapd zmm0, [rbx]
vmovapd zmm1, [rbx+64]
vmovapd zmm2, [rbx+128]
vmovapd zmm3, [rbx+192]
vmovapd zmm4, [rbx+256]
vmovapd zmm5, [rbx+320]
vmovapd zmm6, [rbx+384]
vmovapd zmm7, [rbx+448]
vmovapd zmm8, [rbx+512]
vmovapd zmm9, [rbx+576]
vmovapd zmm10, [rbx+640]
vmovapd zmm11, [rbx+704]
vmovapd zmm12, [rbx+768]
vmovapd zmm13, [rbx+832]
vmovapd zmm14, [rbx+896]
vmovapd zmm15, [rbx+960]

vaddpd zmm0, zmm0, zmm1
vaddpd zmm2, zmm2, zmm3
vaddpd zmm4, zmm4, zmm5
vaddpd zmm6, zmm6, zmm7
vaddpd zmm8, zmm8, zmm9
vaddpd zmm10, zmm10, zmm11
vaddpd zmm12, zmm12, zmm13
vaddpd zmm14, zmm14, zmm15

vaddpd zmm0, zmm0, zmm2
vaddpd zmm4, zmm4, zmm6
vaddpd zmm8, zmm8, zmm10
vaddpd zmm12, zmm12, zmm14

vaddpd zmm0, zmm0, zmm4
vaddpd zmm8, zmm8, zmm12

vaddpd zmm0, zmm0, zmm8

; Horizontal sum of zmm0

vextractf64x4 ymm1, zmm0, 1
vaddpd ymm2, ymm1, ymm0

vextractf64x2 xmm3, ymm2, 1
vaddpd ymm4, ymm3, ymm2

vhaddpd xmm0, xmm4, xmm4

ret
end-code

lxf64 uses a modified fasm as the backend assembler
so full support for all instructions

BR
Peter

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to dxf on Fri Jul 18 15:34:05 2025

dxf <[email protected]> writes:

So in mandating bit-identical results, not only in calculations but also >input/output

I don't think that IEEE 754 specifies I/O, but I could be wrong.

IEEE 754 is all about giving the illusion of truth in
floating-point when, if anything, they should be warning users don't be >fooled.

I don't think that IEEE 754 mentions truth. It does, however, specify
the inexact "exception" (actually a flag), which allows you to find
out if the results of the computations are exact or if some rounding
was involved.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2025 CFP: http://www.euroforth.org/ef25/cfp.html

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to dxf on Mon Jul 21 13:28:11 2025

dxf <[email protected]> writes:

AFAICS IEEE 754 offers nothing particularly useful for the end-user.
Either one's fp application works - or it doesn't. IEEE hasn't
changed that.

The purpose of IEEE FP was to improve the numerical accuracy of
applications that used it as opposed to other formats.

IEEE's relevance is that it spurred Intel into making an FPU which in
turn made implementing fp easy.

Exactly the opposite, Intel decided that it wanted to make an FPU and it
wanted the FPU to have the best FP arithmetic possible. So it
commissioned Kahan (a renowned FP expert) to design the FP format.
Kahan said "Why not use the VAX format? It is pretty good". Intel said
it didn't want pretty good, it wanted the best, so Kahan said "ok" and
designed the 8087 format.

The IEEE standardization process happened AFTER the 8087 was already in progress. Other manufacturers signed onto it, some of them overcoming
initial resistance, after becoming convinced that it was the right
thing.

http://people.eecs.berkeley.edu/~wkahan/ieee754status/754story.html

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From B. Pym@21:1/5 to B. Pym on Tue Jul 29 15:22:17 2025

B. Pym wrote:

: get-number ( accum adr len -- accum' adr' len' )
{ adr len }
0. adr len >number { adr' len' }
len len' =
if
2drop adr len 1 /string
else
d>s swap 60 * +
adr' len'
then ;

: parse-time ( adr len -- seconds)
0 -rot
begin
dup
while
get-number
repeat
2drop ;

s" foo-bar" parse-time . 0
s" foo55bar" parse-time . 55
s" foo 1 bar 55 zoo" parse-time . 155

Actually prints 115.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From B. Pym@21:1/5 to B. Pym on Tue Jul 29 15:07:23 2025

B. Pym wrote:

mhx wrote:

On Sun, 6 Oct 2024 7:51:31 +0000, dxf wrote:

Is there an easier way of doing this? End goal is a double number representing centi-secs.

empty decimal

: SPLIT ( a u c -- a2 u2 a3 u3 ) >r 2dup r> scan 2swap 2 pick - ;
: >INT ( adr len -- u ) 0 0 2swap >number 2drop drop ;

: /T ( a u -- $hour $min $sec )
2 0 do [char] : split 2swap dup if 1 /string then loop
2 0 do dup 0= if 2rot 2rot then loop ;

: .T 2swap 2rot cr >int . ." hr " >int . ." min " >int . ." sec " ;

s" 1:2:3" /t .t
s" 02:03" /t .t
s" 03" /t .t
s" 23:59:59" /t .t
s" 0:00:03" /t .t

Why don't you use the fact that >NUMBER returns the given
string starting with the first unconverted character?
SPLIT should be redundant.

-marcel

: CHAR-NUMERIC? 48 58 WITHIN ;
: SKIP-NON-NUMERIC ( adr u -- adr2 u2)
BEGIN
DUP IF OVER C@ CHAR-NUMERIC? NOT ELSE 0 THEN
WHILE
1 /STRING
REPEAT ;

: SCAN-NEXT-NUMBER ( n adr len -- n2 adr2 len2)
2>R 60 * 0. 2R> >NUMBER
2>R D>S + 2R> ;

: PARSE-TIME ( adr len -- seconds)
0 -ROT
BEGIN
SKIP-NON-NUMERIC
DUP
WHILE
SCAN-NEXT-NUMBER
REPEAT
2DROP ;

S" hello 1::36 world" PARSE-TIME CR .
96 ok

: get-number ( accum adr len -- accum' adr' len' )
{ adr len }
0. adr len >number { adr' len' }
len len' =
if
2drop adr len 1 /string
else
d>s swap 60 * +
adr' len'
then ;

: parse-time ( adr len -- seconds)
0 -rot
begin
dup
while
get-number
repeat
2drop ;

s" foo-bar" parse-time . 0
s" foo55bar" parse-time . 55
s" foo 1 bar 55 zoo" parse-time . 155
s" and9foo 1 bar 55 zoo" parse-time . 32515

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Bob Worm
  Tue Jul 28 16:01:18 2026
  from Wales, Uk via Telnet
- Rixter
  Tue Jul 28 13:42:46 2026
  from Madison, Nc via Telnet
- Krenn
  Tue Jul 28 11:59:57 2026
  from Sydney, Nsw via Telnet
- Rixter
  Tue Jul 28 01:23:48 2026
  from Madison, Nc via Telnet
- Centurion
  Mon Jul 27 22:50:42 2026
  from Berea, Ohio via Telnet
- Ataricrypt
  Mon Jul 27 19:19:17 2026
  from England via Telnet
- Bob Worm
  Mon Jul 27 15:19:55 2026
  from Wales, Uk via Telnet
- Rixter
  Mon Jul 27 13:04:59 2026
  from Madison, Nc via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	741
Nodes:	16 (2 / 14)
Uptime:	50:28:26
Calls:	12,444
Calls today:	4
Files:	15,192
Messages:	6,537,155

Re: Parsing timestamps?

Who's Online

Recent Visitors

System Info