Is there an easier way of doing this? End goal is a double number representing centi-secs.
Is there an easier way of doing this? End goal is a double number >representing centi-secs.
empty decimal
: SPLIT ( a u c -- a2 u2 a3 u3 ) >r 2dup r> scan 2swap 2 pick - ;
: >INT ( adr len -- u ) 0 0 2swap >number 2drop drop ;
: /T ( a u -- $hour $min $sec )
2 0 do [char] : split 2swap dup if 1 /string then loop
2 0 do dup 0= if 2rot 2rot then loop ;
: .T 2swap 2rot cr >int . ." hr " >int . ." min " >int . ." sec " ;
s" 1:2:3" /t .t
s" 02:03" /t .t
s" 03" /t .t
s" 23:59:59" /t .t
s" 0:00:03" /t .t
The HH:MM:SS format is easy but how to deal with the variants shown above? They occur in the real world.
Is there an easier way of doing this? End goal is a double number >representing centi-secs.
empty decimal
: SPLIT ( a u c -- a2 u2 a3 u3 ) >r 2dup r> scan 2swap 2 pick - ;
: >INT ( adr len -- u ) 0 0 2swap >number 2drop drop ;
: /T ( a u -- $hour $min $sec )
2 0 do [char] : split 2swap dup if 1 /string then loop
2 0 do dup 0= if 2rot 2rot then loop ;
: .T 2swap 2rot cr >int . ." hr " >int . ." min " >int . ." sec " ;
s" 1:2:3" /t .t
s" 02:03" /t .t
s" 03" /t .t
s" 23:59:59" /t .t
s" 0:00:03" /t .t
I know you don't care about this case, but:
Is there an easier way of doing this? End goal is a double number representing centi-secs.
empty decimal
: SPLIT ( a u c -- a2 u2 a3 u3 ) >r 2dup r> scan 2swap 2 pick - ;
: >INT ( adr len -- u ) 0 0 2swap >number 2drop drop ;
: /T ( a u -- $hour $min $sec )
2 0 do [char] : split 2swap dup if 1 /string then loop
2 0 do dup 0= if 2rot 2rot then loop ;
: .T 2swap 2rot cr >int . ." hr " >int . ." min " >int . ." sec " ;
s" 1:2:3" /t .t
s" 02:03" /t .t
s" 03" /t .t
s" 23:59:59" /t .t
s" 0:00:03" /t .t
On Sun, 6 Oct 2024 7:51:31 +0000, dxf wrote:
Is there an easier way of doing this? End goal is a double number representing centi-secs.
empty decimal
: SPLIT ( a u c -- a2 u2 a3 u3 ) >r 2dup r> scan 2swap 2 pick - ;
: >INT ( adr len -- u ) 0 0 2swap >number 2drop drop ;
: /T ( a u -- $hour $min $sec )
2 0 do [char] : split 2swap dup if 1 /string then loop
2 0 do dup 0= if 2rot 2rot then loop ;
: .T 2swap 2rot cr >int . ." hr " >int . ." min " >int . ." sec " ;
s" 1:2:3" /t .t
s" 02:03" /t .t
s" 03" /t .t
s" 23:59:59" /t .t
s" 0:00:03" /t .t
Why don't you use the fact that >NUMBER returns the given
string starting with the first unconverted character?
SPLIT should be redundant.
-marcel
mhx wrote:
On Sun, 6 Oct 2024 7:51:31 +0000, dxf wrote:
Is there an easier way of doing this? End goal is a double number representing centi-secs.
empty decimal
: SPLIT ( a u c -- a2 u2 a3 u3 ) >r 2dup r> scan 2swap 2 pick - ;
: >INT ( adr len -- u ) 0 0 2swap >number 2drop drop ;
: /T ( a u -- $hour $min $sec )
2 0 do [char] : split 2swap dup if 1 /string then loop
2 0 do dup 0= if 2rot 2rot then loop ;
: .T 2swap 2rot cr >int . ." hr " >int . ." min " >int . ." sec " ;
s" 1:2:3" /t .t
s" 02:03" /t .t
s" 03" /t .t
s" 23:59:59" /t .t
s" 0:00:03" /t .t
Why don't you use the fact that >NUMBER returns the given
string starting with the first unconverted character?
SPLIT should be redundant.
-marcel
: CHAR-NUMERIC? 48 58 WITHIN ;
: SKIP-NON-NUMERIC ( adr u -- adr2 u2)
BEGIN
DUP IF OVER C@ CHAR-NUMERIC? NOT ELSE 0 THEN
WHILE
1 /STRING
REPEAT ;
: SCAN-NEXT-NUMBER ( n adr len -- n2 adr2 len2)
2>R 60 * 0. 2R> >NUMBER
2>R D>S + 2R> ;
: PARSE-TIME ( adr len -- seconds)
0 -ROT
BEGIN
SKIP-NON-NUMERIC
DUP
WHILE
SCAN-NEXT-NUMBER
REPEAT
2DROP ;
S" hello 1::36 world" PARSE-TIME CR .
96 ok
mhx wrote:
On Sun, 6 Oct 2024 7:51:31 +0000, dxf wrote:
Is there an easier way of doing this? End goal is a double number representing centi-secs.
empty decimal
: SPLIT ( a u c -- a2 u2 a3 u3 ) >r 2dup r> scan 2swap 2 pick - ;
: >INT ( adr len -- u ) 0 0 2swap >number 2drop drop ;
: /T ( a u -- $hour $min $sec )
2 0 do [char] : split 2swap dup if 1 /string then loop
2 0 do dup 0= if 2rot 2rot then loop ;
: .T 2swap 2rot cr >int . ." hr " >int . ." min " >int . ." sec " ;
s" 1:2:3" /t .t
s" 02:03" /t .t
s" 03" /t .t
s" 23:59:59" /t .t
s" 0:00:03" /t .t
Why don't you use the fact that >NUMBER returns the given
string starting with the first unconverted character?
SPLIT should be redundant.
-marcel
: CHAR-NUMERIC? 48 58 WITHIN ;
: SKIP-NON-NUMERIC ( adr u -- adr2 u2)
BEGIN
DUP IF OVER C@ CHAR-NUMERIC? NOT ELSE 0 THEN
WHILE
1 /STRING
REPEAT ;
: SCAN-NEXT-NUMBER ( n adr len -- n2 adr2 len2)
2>R 60 * 0. 2R> >NUMBER
2>R D>S + 2R> ;
: PARSE-TIME ( adr len -- seconds)
0 -ROT
BEGIN
SKIP-NON-NUMERIC
DUP
WHILE
SCAN-NEXT-NUMBER
REPEAT
2DROP ;
S" hello 1::36 world" PARSE-TIME CR .
96 ok
Third, any statement must come with proof. And in this case that means extended benchmarking. I can tell you beforehand that I've never seen significant differences between locals and stack. I'm sorry to say that
- but it's true.
Once full native code compilation and optimisation is turned on, you
can get surprising results. At one stage we (MPE) de-localled a
substantial portion of the PowerNet TCP/IP stack - all in high-level
Forth. For the modified code, size decreased by 25% and performance
increased by 50%.
B. Pym wrote:
mhx wrote:
On Sun, 6 Oct 2024 7:51:31 +0000, dxf wrote:
Is there an easier way of doing this? End goal is a double number representing centi-secs.
empty decimal
: SPLIT ( a u c -- a2 u2 a3 u3 ) >r 2dup r> scan 2swap 2 pick - ;
: >INT ( adr len -- u ) 0 0 2swap >number 2drop drop ;
: /T ( a u -- $hour $min $sec )
2 0 do [char] : split 2swap dup if 1 /string then loop
2 0 do dup 0= if 2rot 2rot then loop ;
: .T 2swap 2rot cr >int . ." hr " >int . ." min " >int . ." sec " ;
s" 1:2:3" /t .t
s" 02:03" /t .t
s" 03" /t .t
s" 23:59:59" /t .t
s" 0:00:03" /t .t
Why don't you use the fact that >NUMBER returns the given
string starting with the first unconverted character?
SPLIT should be redundant.
-marcel
: CHAR-NUMERIC? 48 58 WITHIN ;
: SKIP-NON-NUMERIC ( adr u -- adr2 u2)
BEGIN
DUP IF OVER C@ CHAR-NUMERIC? NOT ELSE 0 THEN
WHILE
1 /STRING
REPEAT ;
: SCAN-NEXT-NUMBER ( n adr len -- n2 adr2 len2)
2>R 60 * 0. 2R> >NUMBER
2>R D>S + 2R> ;
: PARSE-TIME ( adr len -- seconds)
0 -ROT
BEGIN
SKIP-NON-NUMERIC
DUP
WHILE
SCAN-NEXT-NUMBER
REPEAT
2DROP ;
S" hello 1::36 world" PARSE-TIME CR .
96 ok
: SCAN-NUMBER-OR-SKIP ( n adr len -- n' adr' len')
DUP >R
0 0 2SWAP >NUMBER
DUP R> =
IF 2SWAP 2DROP 1 /STRING
ELSE
2>R D>S SWAP 60 * + 2R>
THEN ;
: PARSE-TIME ( adr len -- seconds)
0 -ROT
BEGIN
DUP
WHILE
SCAN-NUMBER-OR-SKIP
REPEAT
2DROP ;
S" hi 5 or 1 is 44 ho " PARSE-TIME CR .
18104
On 11-06-2025 03:49, dxf wrote:
On 11/06/2025 3:34 am, LIT wrote:
...
Fourth, if the definition is extremely time-critical, those
tricky stack manipulators, (e.g., ROT ROT) can really eat up
clock cycles. Direct access to variables is faster."
Pushing variables on the stack, executing them, along with their
associated @ and ! eats clock cycles. This is certainly the case
in the systems you use.
Agreed.
Yes, Brodie warns us next "but careful with variables' use
too" - and I still think my use of variables in two examples
I recently pasted wasn't "legit" in any way. It was just
applying the tips you see above.
When is it "legit" to give up? I've written routines I believed
needed VARIABLEs. But after a 'cooling off' period, I can look
at the problem again afresh and find I can do better. Folks will
say in the real world one couldn't afford this. That's true and
likely why I'm a hobbyist and not a professional programmer.
OTOH it's pretty rare that I write routines with variables in them
to begin with.
As a guy who used Forth programming in a professional environment, I can
at least tell you how I did it..
When you're on the spot, you're on the spot - and you got to provide in
the allotted time, even if it means making sub-optimal code. That's just
the way it is, that's corporate life.
If you tell your boss "Brodie told you to", he's gonna shake his head,
ask who Brodie is and then ship you to the corporate shrink for an
emergency session.
But what I did was to either collect stuff in advance ("Hey, that's a
nice comma'd printout word by Ed. Better make it work in 4tH!") - or
make certain libraries beforehand. In that case, all you have to do is
to shove all those elements together and you're done. The tricky stuff
is already in your tool chest..
Take a look at the 4tH library and notice how much of this stuff is of
no interest at all to the occasional user. Well, that was because it
wasn't written for you. It was written to be applied at work, so I can
do miracles and save my reputation. If you wanna win, you gotta cheat ;-)
Hans Bezemer
3. As Mr. Pelc remarked, stack operators are faster.
This is what Mr. Pelc remarked, regarding such style
of programming - yes, many years ago I was guilty of
that too - already 15 years ago:
https://groups.google.com/g/comp.lang.forth/c/m9xy5k5BfkY/m/FFmH9GE5UJAJ
"Although the code is compilable and can be made efficient,
the source code is a maintenance nightmare!"
Maybe he changed his mind since that time - well, since
he's here, you may want to ask him a question.
On 23-06-2025 20:48, LIT wrote:
OK, have another song: Mr. FIFO stating that
"arrays aren't variables" (maybe need a link?).
Where did they taught you that? At that 'college'
of yours, "elite programmer"? :D
Oh honey, you don't understand? That's not a problem, hon. Go to mummy,
she will explain it to you. But daddy doesn't have time for you. He's
talking to the grown ups. Go play with your dolls and be a good girl!
I'm also puzzled why there is always so emphasis on the "speed" issue. I
mean - if you want speed, do your program in C -O3 so to say. It'll blow
any Forth out of the water.
Forth forces an average programmer to adopt a level of organisation
sooner than a locals- based language. I suspect forthers that promote
locals are well aware forth is readable and maintainable but are
pursuing personal agendas of style which requires implying the
opposite.
Fundamentally. I explained the sensation at the end
of "Why Choose Forth". I've been able to tackle things I would never
have been to tackle with a C mindset. ( https://youtu.be/MXKZPGzlx14 )
dxf <[email protected]> writes:
Forth forces an average programmer to adopt a level of organisation
sooner than a locals- based language. I suspect forthers that promote
locals are well aware forth is readable and maintainable but are
pursuing personal agendas of style which requires implying the
opposite.
IDK, I've seen some unreadable Forth code that was written by experts. >Whether locals could have helped, I don't know.
Maybe you are mistaken. I have seen unmaintainable code written by
some self-proclaimed experts.
I have not seen unmaintainable code written by real experts.
No really, I'm not kidding. When done properly Forth actually changes
the way you work. Fundamentally. I explained the sensation at the end of
"Why Choose Forth". I've been able to tackle things I would never have
been to tackle with a C mindset. ( https://youtu.be/MXKZPGzlx14 )
Like I always wanted to do a real programming language - no matter how primitive. Now I've done at least a dozen - and that particular trick
seems to get easier by the day.
And IMHO a lot can be traced back to the very simple principles Forth is based upon - like a stack. Or the triad "Execute-Number-Error". Or the dictionary. But also the lessons from ThinkForth.
You'll also find it in my C work. There are a lot more "small functions"
than in your average C program. It works for me like an "inner API". Not
to mention uBasic/4tH - There are plenty of "one-liners" in my
uBasic/4tH programs.
But that train of thought needs to be maintained - and it can only be maintained by submitting to the very philosophy Forth was built upon. I
feel like if I would give in to locals, I'd be back to being an average
C programmer.
I still do C from time to time - but it's not my prime language. For
this reason - and because I'm often just plain faster when using Forth.
It just results in a better program.
I still do C from time to time - but it's not my prime language. For
this reason - and because I'm often just plain faster when using Forth.
It just results in a better program.
The only thing I can say is, "it works for me". And when I sometimes
view the works of others - especially when resorting to a C style - I
feel like it could work for you as well.
Nine times out of ten one doesn't need the amount of locals which are applied. One doesn't need a 16 line word - at least not when you
actually want to maintain the darn thing. One could tackle the problem
much more elegant.
Define 'unreadable'. In general I don't need to understand the nitty
gritty of a routine. But should I and no stack commentary exists, I've
no objections to creating it. It's par for the course in Forth. If it bugged me I wouldn't be doing Forth.
Yet forthers have no problem with this. Take the SwiftForth source code.
At best you'll get a general comment as to what a function does. How do
they maintain it - the same way anyone proficient in C maintains C code.
On 26/06/2025 5:12 pm, Paul Rubin wrote:
dxf <[email protected]> writes:
Define 'unreadable'. In general I don't need to understand the nitty
gritty of a routine. But should I and no stack commentary exists, I've
no objections to creating it. It's par for the course in Forth. If it
bugged me I wouldn't be doing Forth.
Unreadable = I look at the code and have no idea what it's doing. The
logic is often obscured by stack manipulation. The values in the stack
are meaningful to the program's operation, but what is the meaning? In
most languages, meaningful values have names, and the names convey the
meaning. In Forth, you can write comments for that purpose. Years
after cmForth was published, someone wrote a set of shadow screens for
it, and that helped a lot.
With no named values and no explanatory comments, the program becomes
opaque.
Yet forthers have no problem with this. Take the SwiftForth source code.
At best you'll get a general comment as to what a function does. How do
they maintain it - the same way anyone proficient in C maintains C code. >Albert is correct. Familiarity is key to readability. That's not to say >code deserving documentation shouldn't have it. OTOH one shouldn't be >expecting documentation (including stack commentary) for what's an everyday >affair in Forth.
But aren't 'locals' actually PICK/ROLL in disguise?Do PICK/ROLL skim all the values off the stack and stuff them in
variables to be later popped on and off the stack like a yo-yo?
The more common complaint is that you use some feature they dislike
(typically locals) when you would otherwise DUP ROT instead.
But aren't 'locals' actually PICK/ROLL in disguise?
What is POST ?
"Pick and Roll are the generic operators which treat the data stack as
an array. If you find you need to use them, you are probably doing it
wrong. Look for ways to refactor your code to be simpler instead."
On 25-06-2025 09:21, Paul Rubin wrote:
Hans Bezemer <[email protected]> writes:
Fundamentally. I explained the sensation at the end
of "Why Choose Forth". I've been able to tackle things I would never
have been to tackle with a C mindset. ( https://youtu.be/MXKZPGzlx14 )
I just watched this video and enjoyed it, but I don't understand how a C
mindset is different. In C you pass stuff as function parameters
instead of on the stack: what's the big deal? And particularly, the
video said nothing about the burning question of locals ;).
It seems to me all the examples mentioned in the video (parsing CSV
files or floating point numerals) are what someone called
micro-problems. Today they much easier with languages like Python, and
back in Forth's heyday there was Lisp, which occupied a mindspace like
Python does now.
I agree that Thinking Forth is a great book.
It's hard to illustrate things with a multi-KLOC program IMHO. You can
only illustrate principles by using examples that are "contained" in a way.
But I'll try to illustrate a thing or two. Let's say you want to tackle
a problem. And it doesn't go your way. You have to add this thing and
that thing - and hold on to that value. You know what I mean.
about to add. Take a look at getopt() - I think that's a good example.
You can almost see how it grew almost organically by the authors hand.
He never seemed to think "Hmm, maybe I'll make a separate function of it".
The stack ops THEMSELVES may be, in a way, "canonical" — but not
solving "each and every" programming task using them
"no-matter-what", IMHO.
But such would indicate a deficiency in Forth. Do C programmers reach a point at which they can't go forward?
On 24-06-2025 18:23, Anton Ertl wrote:
Hans Bezemer <[email protected]> writes:
I'm also puzzled why there is always so emphasis on the "speed" issue. I >>> mean - if you want speed, do your program in C -O3 so to say. It'll blow >>> any Forth out of the water.
Still, in general - GCC beats Forth. Although I have to admit I've got a renewed respect for VFX Forth! Kudos!
Another great argument to leave Forth and embrace C! Why painfullyBut such would indicate a deficiency in Forth. Do C programmers reach a >>> point at which they can't go forward? ...
create kludge to cram into a language that was clearly not created for
that when you have a language available that was actually DESIGNED
with those requirements in mind?!
Nobody seems to care about that time. Instead, the focus seems to be primarily on code runtime, even though the difference is only
microseconds or less.
Hans Bezemer <[email protected]> writes:
Another great argument to leave Forth and embrace C! Why painfullyBut such would indicate a deficiency in Forth. Do C programmers
reach a point at which they can't go forward? ...
create kludge to cram into a language that was clearly not created
for that when you have a language available that was actually
DESIGNED with those requirements in mind?!
I'm not sure what you're getting at here, though I see the sarcasm.
Is the kludge locals? They don't seem that kludgy to me.
Implementing them in Forth is straightforward and lots of people have
done it.
The point where one can't go forward is basically "running out of
registers". In assembly language those are the machine registers, and
in Forth they're the top few stack slots. In both cases, when you run
out, you have to resort to contorted code.
In C that isn't a problem for the programmer. You can use as many
variables as you like, and if the compiler runs out of registers and
has to make contorted assembly code, it does so without your having
to care.
In a traditional Forth with locals, the locals are stack allocated so accessing them usually costs a memory reference. The programmer gets
the same convenience as a C programmer. The runtime takes a slowdown compared to code from a register-allocating compiler, but such a
slowdown is already present in a threaded interpreter, so it's fine.
Finally, a fancy enough Forth compiler can do the same things that a C compiler does. Those compilers are difficult to write, but they exist
(VFX, lxf, etc.). I don't know if locals make writing the compiler
more difficult. But the user shouldn't have to care.
On Tue, 01 Jul 2025 11:40:38 -0700
Paul Rubin <[email protected]d> wrote:
Hans Bezemer <[email protected]> writes:
Another great argument to leave Forth and embrace C! Why painfullyBut such would indicate a deficiency in Forth. Do C programmers
reach a point at which they can't go forward? ...
create kludge to cram into a language that was clearly not created
for that when you have a language available that was actually
DESIGNED with those requirements in mind?!
I'm not sure what you're getting at here, though I see the sarcasm.
Is the kludge locals? They don't seem that kludgy to me.
Implementing them in Forth is straightforward and lots of people have
done it.
Finally, a fancy enough Forth compiler can do the same things that a C
compiler does. Those compilers are difficult to write, but they exist
(VFX, lxf, etc.). I don't know if locals make writing the compiler
more difficult. But the user shouldn't have to care.
The code generator in lxf has no knowledge of what a local is.
locals are conceptually placed on the return stack. lxf is as smart
about the return stack as the data stack. that is why it can produce
very efficient code for simple examples like 3DUP. The actual
implementation of local in the interpreter is just a few lines of code.
The difference with locals will be seen when you have a boundary block,
IF statement, a call etc that require a known state of the stacks.
The real problem for me with locals is that their scope is to the end
of the definition. With the stack you end the scope of an item with a
drop and extend it with a dup, very elegant!
A multipass compiler can of course find the scope of each local but at
the cost of more complexity.
In lxf64 I have introduced a local stack with the same capabilities as
the data and return stack. I am not sure yet if this is better.
The nice thing is that I now have >ls ls> and ls@. Compared with the
return stack this also works across words. One word can put stuff on
the localstack and another retrieve it. This is sometimes very useful.
[email protected] (minforth) writes:
Nobody seems to care about that time. Instead, the focus seems to be
primarily on code runtime, even though the difference is only
microseconds or less.
I think in the Moore era, you got two speedups: 1) interpreted Forth was
10x faster than its main competitor, interpreted BASIC; and 2) if your
Forth program was still too slow, you'd identify a few hot spots and
rewrite those in assembler.
Today instead of BASIC we have Python, and interpreted Forth is still a
lot faster than Python. That speed is sufficient for most things, like
it always was, but even more so on modern hardware.
Am 01.07.2025 um 21:56 schrieb Paul Rubin:
[email protected] (minforth) writes:
Nobody seems to care about that time. Instead, the focus seems to be
primarily on code runtime, even though the difference is only
microseconds or less.
I think in the Moore era, you got two speedups: 1) interpreted Forth was
10x faster than its main competitor, interpreted BASIC; and 2) if your
Forth program was still too slow, you'd identify a few hot spots and
rewrite those in assembler.
Today instead of BASIC we have Python, and interpreted Forth is still a
lot faster than Python. That speed is sufficient for most things, like
it always was, but even more so on modern hardware.
Today, you could go insane if you had to write assembler code
with SSE1/2/3/4/AVX/AES etc. extended CPU commands (or take GPU programming...)
Even chip manufacturers provide C libraries with built-ins and
intrinsics to handle this complexity, and optimising C compilers
for selecting the best operations.
IMO assembler programming in Forth is mostly for retro enthusiasts
On 1/07/2025 10:22 pm, Hans Bezemer wrote:
On 27-06-2025 03:39, dxf wrote:
Yet forthers have no problem with this. Take the SwiftForth source code. >>> At best you'll get a general comment as to what a function does. How do >>> they maintain it - the same way anyone proficient in C maintains C code. >>> Albert is correct. Familiarity is key to readability. That's not to say >>> code deserving documentation shouldn't have it. OTOH one shouldn't be
expecting documentation (including stack commentary) for what's an everyday >>> affair in Forth.
I think you and Albert are on the right track here. Familiarity is a large >> part of this "readability" thingy. There are a few notes I want to add,
though:
1. "Infix notation" is part of this familiarity. I know I've commented every >> single expression in TEONW, since I understand those "infix" expressions much
better than all those RPN thingies - and you got something to check your code
against;
2. Intentionality. I do this a LOT. E.g. if you find OVER OVER in my code, >> you may be certain those two items have nothing to do with each other. If you
find 2DUP it's a string, a double number or another "addr/count" array. CHOP >> replaces 1 /STRING. Also: stack patterns can be codified like SPIN or STOW; >>
3. Brevity. Short definitions are easier to understand. If you can abstract >> it, put a name of it can spare the performance - split it up.
4. Naming. I give this a LOT of thought. I prefer reading a name and having a
pretty good idea of what that code does (especially in the context of a
library or a program). See:
https://sourceforge.net/p/forth-4th/wiki/What%27s%20in%20a%20name%3F/
Feel free to disagree. It may not work for you, but at least it works for me.
Recently someone told me about Christianity - how it wasn't meant to be easy -
supposed to be, among other things, a denial of the senses. I'm hearing much the same in Forth. That it's a celibate practice in which one denies everyday
sensory pleasures including readability and maintainability in order to achieve
programming nirvana. Heck, if that's how folks see Forth then perhaps they should stop before the cognitive dissonance sends them crazy or they pop a cork.
And just before you're done you put your stuff on the stack and like a
tiny assembly line it is transported to the next thing. This means that
the function call overhead is MINIMAL - much less than C.
And that's not the solution - it's the PROBLEM. You can add loads of >complexity without much (immediate) penalty. You're not compelled to
study - or even *think* about your algorithm. You most probably will end
up with code that works - without you understanding why.
And that will either bite you later, or limit your capability to expend
on that code.
Today, you could go insane if you had to write assembler code
with SSE1/2/3/4/AVX/AES etc. extended CPU commands (or take GPU >programming...)
Even chip manufacturers provide C libraries with built-ins and
intrinsics to handle this complexity, and optimising C compilers
for selecting the best operations.
Am 01.07.2025 um 23:47 schrieb peter:
In lxf64 I have introduced a local stack with the same capabilities as
the data and return stack. I am not sure yet if this is better.
The nice thing is that I now have >ls ls> and ls@. Compared with the
return stack this also works across words. One word can put stuff on
the localstack and another retrieve it. This is sometimes very useful.
In a sense, such locals become global. I am not sure if this opens the
way inadvertently for hard-to-detect bugs.
Only one useful application comes to my mind: sharing locals between >quotation and its parent function, i.e. for creating closures.
But who
needs thema anyway?
Forth was designed for threaded interpreter implementation and the whole >notion of an optimizing Forth compiler is at best an abstraction
inversion.
But, supposedly, VFX compiler output runs 10x as fast as
the same code under an interpreter.
I think in the Moore era, you got two speedups: 1) interpreted Forth was
10x faster than its main competitor, interpreted BASIC
So I don't see much legitimate complaint about slowdowns due to Forth
locals. The objection is based on other considerations, either
legitimate ones that I don't yet understand, or essentially bogus ones
that I don't completely see through.
I had a beef with Andrew Tanenbaum, stating that it is hard to write a c-compiler for the 6502. In reality the 6502 is a brilliant
design. You must realize that the 6502 has 128 16 bit registers on the
zero page.
The nice thing is that I now have >ls ls> and ls@. Compared with the
return stack this also works across words. One word can put stuff on
the localstack and another retrieve it. This is sometimes very useful.
Hans Bezemer <[email protected]> writes:
1. Adding general locals is trivial. It takes just one single line of
Forth.
I don't see how to do it in one line, and trivial is a subjective term.
I'd say in any case that it's not too difficult, but one line seems overoptimistic. Particularly, you need something like (LOCAL) in the
VM. The rest is just some extensions to the colon compiler. Your
mention of it taking 3-4 screens sounded within reason to me, and I
don't consider that to be a lot of code.
1. Adding general locals is trivial. It takes just one single line of
Forth.
1. Adding general locals is trivial. It takes just one single line of
Forth. Sure, you don't got the badly designed and much too heavy
Forth-2012 implementation,
4tH v3.64.2 will even support a *MUCH* lighter, but
fully conformant Forth-2012 LOCALS implementation.
If anything, yours is a prime
example of a "sour grape argument".
Hans Bezemer <[email protected]> writes:
1. Adding general locals is trivial. It takes just one single line of
Forth.
I don't see how to do it in one line, and trivial is a subjective term.
I'd say in any case that it's not too difficult, but one line seems >overoptimistic. Particularly, you need something like (LOCAL) in the
VM. The rest is just some extensions to the colon compiler. Your
mention of it taking 3-4 screens sounded within reason to me, and I
don't consider that to be a lot of code.
Am 03.07.2025 um 01:59 schrieb Paul Rubin:
Hans Bezemer <[email protected]> writes:
1. Adding general locals is trivial. It takes just one single line of
Forth.
I don't see how to do it in one line, and trivial is a subjective term.
I'd say in any case that it's not too difficult, but one line seems
overoptimistic. Particularly, you need something like (LOCAL) in the
VM. The rest is just some extensions to the colon compiler. Your
mention of it taking 3-4 screens sounded within reason to me, and I
don't consider that to be a lot of code.
I would not implement locals for simple integers only. Forth has enough
stack gymnastics words for that.
IMO locals only make sense if you can at least additionally handle
floats and dynamic strings, preferably also structs and arrays.
Such an implementation is certainly not trivial.
On 03-07-2025 01:59, Paul Rubin wrote:
Hans Bezemer <[email protected]> writes:
1. Adding general locals is trivial. It takes just one single line of
Forth.
I don't see how to do it in one line, and trivial is a subjective term.
I'd say in any case that it's not too difficult, but one line seems
overoptimistic. Particularly, you need something like (LOCAL) in the
VM. The rest is just some extensions to the colon compiler. Your
mention of it taking 3-4 screens sounded within reason to me, and I
don't consider that to be a lot of code.
"Short" in my dictionary is. One. Single. Screen. No more. No less (pun >intended).
And this one is one single screen. Even with the dependencies. >https://youtu.be/FH4tWf9vPrA
Typical use:
variable a
variable b
: divide
local a
local b
b ! a ! a @ b @ / ;
Does recursion, the whole enchilada. One line.
Thanks to Fred Behringer - and Albert, who condensed it to a single
single line definition. Praise is where praise is due.
Hans Bezemer
peter <[email protected]> writes:
The nice thing is that I now have >ls ls> and ls@. Compared with the
return stack this also works across words. One word can put stuff on
the localstack and another retrieve it. This is sometimes very useful.
As I remember, Flashforth also has a 3rd stack like that, without having >locals. It's called P so you have >P etc.
S S> S@ (apart from LOCAL stacks).
Hans Bezemer <[email protected]> writes:
And just before you're done you put your stuff on the stack and like a
tiny assembly line it is transported to the next thing. This means that
the function call overhead is MINIMAL - much less than C.
Oh, really? Wasn't it you who wrote ><nnd$34fd6cd6$25a88dac@ac6bb1addf3a4136>:
|if you want speed, do your program in C -O3 so to say. It'll blow
|any Forth out of the water.
And if we look at the results for fib (a benchmark that performs lots
of calls) inf Figure 1 of ><https://www.complang.tuwien.ac.at/papers/ertl24-interpreter-speed.pdf>,
gcc -O3 outperforms the fastest Forth system, and gcc -O1 outperforms
the fastest Forth system by even more.
And that's not the solution - it's the PROBLEM. You can add loads of >>complexity without much (immediate) penalty. You're not compelled to
study - or even *think* about your algorithm. You most probably will end
up with code that works - without you understanding why.
And that will either bite you later, or limit your capability to expend
on that code.
Yes, you can expend a lot of effort on code that's hard to write and
hard to understand, but that's not limited to Forth.
If you mean that, by making code hard to write, Forth without locals
makes it easier to extend the code, I very much doubt it. In some
cases it may not be harder, but in others (where the extension
requires, e.g., dealing with additional data in existing colon
definitions) it is harder.
- anton
On 3/07/2025 10:51 pm, [email protected] wrote:
...
I like to remind of the youtube FORTH2020 of Wagner. This concerns
motions of aircraft, position speed, pitch roll and yaw etc.
Terribly complicated, no LOCAL's. There was a question whether LOCAL's
could have made Wagners code easier.
He stated the ideal (paraphrased by me) that "code is its own comment"
That was an interesting video even if more a rundown of his (long) history
as a professional forth programmer. Here's the link for anyone curious:
https://youtu.be/V9ES9UZHaag
He said he uses the hardware fp stack for speed. Is he really only
using 8 levels of stack?
Puzzling because of a thread here not long ago in which scientific users >appear to suggest the opposite. Such concerns have apparently been around
a long time:
https://groups.google.com/g/comp.lang.forth/c/CApt6AiFkxo/m/wwZmc_Tr1PcJ
One interesting aspect is that VFX 5.x finally includes an FP package
by default, and it started by including an SSE2-based FP package which >supports a deep FP stack. However, MPE received customer complaints
about the lower number of significant digits in SSE2 (binary64)
vs. 387 (80-bit FP values), so they switched the default to the
387-based FP package that only has 8 FP stack items. Apparently no
MPE customer complains about that limitation.
OTOH, iForth-5.1-mini uses the 387 instructions, but stores FP stack
items in memory at least on call boundaries. Maybe Marcel Hendrix can
give some insight into what made him take this additional
implementation effort.
- anton--
I investigated the instruction set, and I found no way to detect
if the 8 registers stack is full.
This would offer the possibility to spill registers to memory only
if it is needed.
Am 05.07.2025 um 14:21 schrieb [email protected]:
I investigated the instruction set, and I found no way to detect
if the 8 registers stack is full.
This would offer the possibility to spill registers to memory only
if it is needed.
IIRC signaling and handling fp-stack overflow is not an easy task.
Am 05.07.2025 um 14:21 schrieb [email protected]:
I investigated the instruction set, and I found no way to detect
if the 8 registers stack is full.
This would offer the possibility to spill registers to memory only
if it is needed.
IIRC signaling and handling fp-stack overflow is not an easy task.
At most, the computer would crash.
IOW, spilling makes sense.
On 5/07/2025 6:49 pm, Anton Ertl wrote:
dxf <[email protected]> writes:
[8 stack items on the FP stack]
Puzzling because of a thread here not long ago in which scientific users >>> appear to suggest the opposite. Such concerns have apparently been around >>> a long time:
https://groups.google.com/g/comp.lang.forth/c/CApt6AiFkxo/m/wwZmc_Tr1PcJ
I have read through the thread. It's unclear to me which scientific
users you have in mind. My impression is that 8 stack items was
deemed sufficient by many, and preferable (on 387) for efficiency
reasons.
AFAICS both Skip Carter (proponent) and Julian Noble were suggesting the
6 level minimum were inadequate. A similar sentiment was expressed here
only several months ago. AFAIK all major forths supporting x87 hardware offer software stack options.
Certainly, of the two points this thread is about, there was a
Forth200x proposal for standardizing a separate FP stack, and this
proposal was accepted. There was no proposal for increasing the
minimum size of the FP stack; Forth-2012 still says:
|The size of a floating-point stack shall be at least 6 items.
Only because nothing further was heard. What became of the review
Elizabeth announced I've no idea.
One interesting aspect is that VFX 5.x finally includes an FP package
by default, and it started by including an SSE2-based FP package which
supports a deep FP stack. However, MPE received customer complaints
about the lower number of significant digits in SSE2 (binary64)
vs. 387 (80-bit FP values), so they switched the default to the
387-based FP package that only has 8 FP stack items. Apparently no
MPE customer complains about that limitation.
...
AFAIK x87 hardware stack was always MPE's main and best supported FP
package. As for SSE2 it wouldn't exist if industry didn't consider double-precision adequate. My impression of MPE's SSE2 implementation
is that it's 'a work in progress'. The basic precision is there but transcendentals appear to be limited to single-precision. That'd be
the reason I'd stick with MPE's x87 package. Other reason is it's now
quite difficult and error-prone to switch FP packages as it involves rebuilding the system. The old scheme was simpler and idiot-proof.
On 5/07/2025 6:49 pm, Anton Ertl wrote:
dxf <[email protected]> writes:
[8 stack items on the FP stack]
Puzzling because of a thread here not long ago in which scientific users >>> appear to suggest the opposite. Such concerns have apparently been around >>> a long time:
https://groups.google.com/g/comp.lang.forth/c/CApt6AiFkxo/m/wwZmc_Tr1PcJ
I have read through the thread. It's unclear to me which scientific
users you have in mind. My impression is that 8 stack items was
deemed sufficient by many, and preferable (on 387) for efficiency
reasons.
AFAICS both Skip Carter (proponent) and Julian Noble were suggesting the
6 level minimum were inadequate.
AFAIK all major forths supporting x87 hardware
offer software stack options.
Certainly, of the two points this thread is about, there was a
Forth200x proposal for standardizing a separate FP stack, and this
proposal was accepted. There was no proposal for increasing the
minimum size of the FP stack; Forth-2012 still says:
|The size of a floating-point stack shall be at least 6 items.
Only because nothing further was heard. What became of the review
Elizabeth announced I've no idea.
The old scheme was simpler and idiot-proof.
Skip Carter did not post in this thread, but given that he proposed
the change, he probably found 6 to be too few; or maybe it was just a >phenomenon that we also see elsewhere as range anxiety. In any case,
he made no such proposal to Forth-200x, so apparently the need was not >pressing.
In any case, in almost all cases I use the default FP pack, and here
the VFX-5 and SwiftForth-4 approach is unbeatable in simplicity.
Instead of performing the sequence of commands shown above, I just
start the Forth system, and FP words are ready.
- anton
As for SSE2 it wouldn't exist if industry didn't consider
double-precision adequate.
dxf <[email protected]> writes:
As for SSE2 it wouldn't exist if industry didn't consider
double-precision adequate.
SSE2 is/was first and foremost a vectorizing extension, and it has been superseded quite a few times, indicating it was never all that
adequate. I don't know whether any of its successors support extended precision though.
You don't need 64-bit doubles for signal or image processing.
Most vector/matrix operations on streaming data don't require
them either. Whether SSE2 is adequate or not to handle such data
depends on the application.
"Industry" can manage well with 32-bit floats or even smaller with non-standard number formats.
I suspect IEEE simply standardized what had become common practice among implementers.
What little I know about SSE2 it's not as well thought out or organized
as Intel's original effort. E.g. doing something as simple as changing
sign of an fp number is a pain when NANs are factored in.
I don't do parallelization, but I was still surprised by the good
results using FMA. In other words, increasing floating-point number
size is not always the way to go.
Anyhow, first step is to select the best fp rounding method ....
minforth <[email protected]> writes:
You don't need 64-bit doubles for signal or image processing.
Most vector/matrix operations on streaming data don't require
them either. Whether SSE2 is adequate or not to handle such data
depends on the application.
Sure, and for that matter, AI inference uses 8 bit and even 4 bit
floating point.
Kahan on the other hand was interested in engineering
and scientific applications like PDE solvers (airfoils, fluid dynamics,
FEM, etc.). That's an area where roundoff error builds up after many iterations, thus extended precision.
dxf <[email protected]> writes:
As for SSE2 it wouldn't exist if industry didn't consider
double-precision adequate.
SSE2 is/was first and foremost a vectorizing extension, and it has been >superseded quite a few times, indicating it was never all that
adequate.
I don't know whether any of its successors support extended
precision though.
W. Kahan was a big believer in extended precision (that's why the 8087
had it from the start). I believes IEEE specifies both 80 bit and 128
bit formats in addition to 64 bit.
I suspect IEEE simply standardized what had become common practice among >implementers.
By using 80 bits /internally/ Intel went a long way to
achieving IEEE's spec for double precision.
E.g. doing something as simple as changing
sign of an fp number is a pain when NANs are factored in.
"Industry" can manage well with 32-bit
floats or even smaller with non-standard number formats.
The catch with SSE is there's nothing like FCHS or FABS
so depending on how one implements them, results vary across implementations.
On 10 Jul 2025 at 02:18:50 CEST, "minforth" <[email protected]> wrote:
"Industry" can manage well with 32-bit
floats or even smaller with non-standard number formats.
My customers beg to differ and some use 128 bit numbers for
their work. In a construction estimate for one runway for the
new Hong Kong airport, the cost difference between a 64 bit FP
calculation and the integer calculation was US 10 million dollars.
This was for pile capping which involves a large quantity of relatively
small differences.
I believes IEEE specifies both 80 bit and 128 bit formats in additionNot 80-bit format. binary128 and binary256 are specified.
to 64 bit.
[email protected] (Anton Ertl) writes:
I believes IEEE specifies both 80 bit and 128 bit formats in additionNot 80-bit format. binary128 and binary256 are specified.
to 64 bit.
I see, 80 bits is considered double-extended. "The x87 and Motorola
68881 80-bit formats meet the requirements of the IEEE 754-1985 double extended format,[12] as does the IEEE 754 128-bit binary format." (https://en.wikipedia.org/wiki/Extended_precision)
Interestingly, Kahan's 1997 report on IEEE 754's status does say 80 bit
is specified. But it sounds like that omits some nuance.
https://people.eecs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF
Kahan was also overly critical of dynamic Unum/Posit formats.
Time has shown that he was partially wrong: https://spectrum.ieee.org/floating-point-numbers-posits-processor
When someone begins with the line it rarely ends well:
"Twenty years ago anarchy threatened floating-point arithmetic."
One floating-point to rule them all.
minforth <[email protected]> writes:
Kahan was also overly critical of dynamic Unum/Posit formats.
Time has shown that he was partially wrong:
https://spectrum.ieee.org/floating-point-numbers-posits-processor
I don't feel qualified to draw a conclusion from this. I wonder what
the numerics community thinks, if there is any consensus. I remember
being dubious of posits when I first heard of them, though Kahan
probably influenced that. I do know that IEEE 754 took a lot of trouble
to avoid undesirable behaviours that never would have occurred to most
of us. No idea how well posits do at that. I guess though, given the continued attention they get, they must be more interesting than I had thought.
I saw one of the posit articles criticizing IEEE 754 because IEEE 754 addition is not always associative. But that is inherent in how
floating point arithmetic works, and I don't see how posit addition can
avoid it. Let a = 1e100, b = -1e100, and c=1. So mathematically,
a+b+c=1. You should get that from (a+b)+c in your favorite floating
point format. But a+(b+c) will almost certainly be 0, without very high precision (300+ bits).
dxf <[email protected]> writes:
When someone begins with the line it rarely ends well:
"Twenty years ago anarchy threatened floating-point arithmetic."
One floating-point to rule them all.
This gives a good perspective on posits:
https://people.eecs.berkeley.edu/~demmel/ma221_Fall20/Dinechin_etal_2019.pdf
But was it the case by the mid/late 70's - or certain individuals saw an opportunity to influence the burgeoning microprocessor market? Notions of single and double precision already existed in software floating point -
Am 10.07.2025 um 21:33 schrieb Paul Rubin:
Kahan was also overly critical of dynamic Unum/Posit formats.
Time has shown that he was partially wrong: >https://spectrum.ieee.org/floating-point-numbers-posits-processor
I have looked at a (IIRC) slide deck by Kahan where he shows examples
where the altenarnative by Gustafson (don't remember which one he
looked at in that slide deck) fails and traditional FP numbers work.
I guess though, given the
continued attention they get, they must be more interesting than I had >thought.
I saw one of the posit articles criticizing IEEE 754 because IEEE 754 >addition is not always associative. But that is inherent in how
floating point arithmetic works, and I don't see how posit addition can
avoid it.
On 11/07/2025 1:17 pm, Paul Rubin wrote:
This gives a good perspective on posits:
https://people.eecs.berkeley.edu/~demmel/ma221_Fall20/Dinechin_etal_2019.pdf
Floating point arithmetic in the 1960s (before my time) was really in a
terrible state. Kahan has written about it. Apparently IBM 360
floating point arithmetic had to be redesigned after the fact, because
the original version had such weird anomalies.
But was it the case by the mid/late 70's - or certain individuals saw an >opportunity to influence the burgeoning microprocessor market?
[email protected] (Anton Ertl) writes:
I have looked at a (IIRC) slide deck by Kahan where he shows examples
where the altenarnative by Gustafson (don't remember which one he
looked at in that slide deck) fails and traditional FP numbers work.
Maybe this: http://people.eecs.berkeley.edu/~wkahan/UnumSORN.pdf
On 11/07/2025 8:22 pm, Anton Ertl wrote:
The rest of the industry has standardized on binary64 and binary32,
and they prefer bit-equivalent results for ease of testing. So as
soon as SSE2 gave that to them, they flocked to SSE2.
...
I wonder how much of this is academic or trend inspired?
AFAICS Forth
clients haven't flocked to it else vendors would have SSE2 offerings at
the same level as their x387 packs.
dxf <[email protected]> writes:
On 13/07/2025 7:01 pm, Anton Ertl wrote:
...
For Forth, Inc. and MPE AFAIK their respective IA-32 Forth system
was the only one with hardware FP for many years, so there
probably was little pressure from users for bit-identical results
with, say, SPARC, because they did not have a Forth system that
ran on SPARC.
What do you mean by "bit-identical results"? Since SSE2 comes
without transcendentals (or basics such as FABS and FNEGATE) and >implementers are expected to supply their own, if anything, I expect >results across platforms and compilers to vary.
There are operations for which IEEE 754 specifies the result to the
last bit (except that AFAIK the representation of NaNs is not
specified exactly), among them F+ F- F* F/ FSQRT, probably also
FNEGATE and FABS. It does not specify the exact result for
transcendental functions, but if your implementation performs the same bit-exact operations for computing a transcendental function on two
IEEE 754 compliant platforms, the result will be bit-identical (if it
is a number). So just use the same implementations of transcentental functions, and your results will be bit-identical; concerning the
NaNs, if you find a difference, check if the involved values are NaNs.
- anton
On 13/07/2025 7:01 pm, Anton Ertl wrote:
...
For Forth, Inc. and MPE AFAIK their respective IA-32 Forth system was
the only one with hardware FP for many years, so there probably was
little pressure from users for bit-identical results with, say, SPARC,
because they did not have a Forth system that ran on SPARC.
What do you mean by "bit-identical results"? Since SSE2 comes without >transcendentals (or basics such as FABS and FNEGATE) and implementers
are expected to supply their own, if anything, I expect results across >platforms and compilers to vary.
So just use the same implementations of transcentental functions, and
your results will be bit-identical
[email protected] (Anton Ertl) writes:
So just use the same implementations of transcentental functions, and
your results will be bit-identical
Same implementations = same FP operations in the exact same order?
That
seems hard to ensure, if the functions are implemented in a language
that leaves anything up to a compiler.
Also, in the early implementations x87, 68881, NS320something(?), >transcententals were included in the coprocessor and the workings
weren't visible.
This looks very interesting. I can find Kahan and Neumaier, but
"tree addition" didn't turn up (There is a suspicious looking
reliability paper about the approach which surely is not what
you meant). Or is it pairwise addition what I should look for?
I did not do any accuracy measurements, but I did performanceYMMV but "fast but wrong" would not be my goal. ;-)
measurements
Am 16.07.2025 um 13:25 schrieb Anton Ertl:
I did not do any accuracy measurements, but I did performanceYMMV but "fast but wrong" would not be my goal. ;-)
measurements
minforth <[email protected]> writes:
Am 16.07.2025 um 13:25 schrieb Anton Ertl:
I did not do any accuracy measurements, but I did performanceYMMV but "fast but wrong" would not be my goal. ;-)
measurements
I did test correctness with cases where roundoff errors do not play a
role.
As mentioned, the RECursive balanced-tree sum (which is also the
fastest on several systems and absolutely) is expected to be more
accurate in those cases where roundoff errors do play a role. But if
you care about that, better design a test and test it yourself. It
will be interesting to see how you find out which result is more
accurate when they differ.
[email protected] (Anton Ertl) writes:
I did not do any accuracy measurements, but I did performance
measurements on a Ryzen 5800X:
cycles:u
gforth-fast iforth lxf SwiftForth VFX 3_057_979_501 6_482_017_334 6_087_130_593 6_021_777_424 6_034_560_441 NAI
6_601_284_920 6_452_716_125 7_001_806_497 6_606_674_147 6_713_703_069 UNR
3_787_327_724 2_949_273_264 1_641_710_689 7_437_654_901 1_298_257_315 REC
9_150_679_812 14_634_786_781 SR
cycles:u
This second table is about instructions:u
gforth-fast iforth lxf SwiftForth VFX
13_113_842_702 6_264_132_870 9_011_308_923 11_011_828_048 8_072_637_768 NAI
6_802_702_884 2_553_418_501 4_238_099_417 11_277_658_203 3_244_590_981 UNR
9_370_432_755 4_489_562_792 4_955_679_285 12_283_918_226 3_915_367_813 REC
51_113_853_111 29_264_267_850 SR
- anton
Ryzen 9950X
lxf64
5,010,566,495 NAI cycles:u
2,011,359,782 UNR cycles:u
646,926,001 REC cycles:u
3,589,863,082 SR cycles:u
lxf64 =20
7,019,247,519 NAI instructions:u =20
4,128,689,843 UNR instructions:u =20
4,643,499,656 REC instructions:u=20
25,019,182,759 SR instructions:u=20
gforth-fast 20250219
2,048,316,578 NAI cycles:u
7,157,520,448 UNR cycles:u
3,589,638,677 REC cycles:u
17,199,889,916 SR cycles:u
gforth-fast 20250219
13,107,999,739 NAI instructions:u=20
6,789,041,049 UNR instructions:u
9,348,969,966 REC instructions:u=20
50,108,032,223 SR instructions:u=20
lxf
6,005,617,374 NAI cycles:u
6,004,157,635 UNR cycles:u
1,303,627,835 REC cycles:u
9,187,422,499 SR cycles:u
lxf
9,010,888,196 NAI instructions:u
4,237,679,129 UNR instructions:u=20
4,955,258,040 REC instructions:u=20
26,018,680,499 SR instructions:u
lxf uses the x87 builtin fp stack, lxf64 uses sse4 and a large fp stack=20
Meanwhile many years ago, comparative tests were carried out with a
couple of representative archived serial data (~50k samples)
Ultimately, Kahan summation
was the winner. It is slow, but there were no in-the-loop
requirements, so for a background task, Kahan was fast enough.
minforth <[email protected]> writes:
Meanwhile many years ago, comparative tests were carried out with a
couple of representative archived serial data (~50k samples)
Representative of what? Serial: what series?
peter <[email protected]> writes:
Ryzen 9950X
lxf64
5,010,566,495 NAI cycles:u
2,011,359,782 UNR cycles:u
646,926,001 REC cycles:u
3,589,863,082 SR cycles:u
lxf64 =20
7,019,247,519 NAI instructions:u =20
4,128,689,843 UNR instructions:u =20
4,643,499,656 REC instructions:u=20
25,019,182,759 SR instructions:u=20
gforth-fast 20250219
2,048,316,578 NAI cycles:u
7,157,520,448 UNR cycles:u
3,589,638,677 REC cycles:u
17,199,889,916 SR cycles:u
gforth-fast 20250219
13,107,999,739 NAI instructions:u=20
6,789,041,049 UNR instructions:u
9,348,969,966 REC instructions:u=20
50,108,032,223 SR instructions:u=20
lxf
6,005,617,374 NAI cycles:u
6,004,157,635 UNR cycles:u
1,303,627,835 REC cycles:u
9,187,422,499 SR cycles:u
lxf
9,010,888,196 NAI instructions:u
4,237,679,129 UNR instructions:u=20
4,955,258,040 REC instructions:u=20
26,018,680,499 SR instructions:u
lxf uses the x87 builtin fp stack, lxf64 uses sse4 and a large fp stack=20
Apparently the latency of ADDSD (SSE2) is down to 2 cycles on Zen5
(visible in lxf64 UNR and gforth-fast NAI) while the latency of FADD
(387) is still 6 cycles (lxf NAI and UNR). I have no explanation why
on lxf64 NAI performs so much worse than UNR, and in gforth-fast UNR
so much worse than NAI.
For REC the latency should not play a role. There lxf64 performs at
7.2IPC and 1.55 F+/cycle, whereas lxf performs only at 3.8IPC and 0.77 F+/cycle. My guess is that FADD can only be performed by one FPU, and
that's connected to one dispatch port, and other instructions also
need or are at least assigned to this dispatch port.
- anton
So in mandating bit-identical results, not only in calculations but also >input/output
IEEE 754 is all about giving the illusion of truth in
floating-point when, if anything, they should be warning users don't be >fooled.
AFAICS IEEE 754 offers nothing particularly useful for the end-user.
Either one's fp application works - or it doesn't. IEEE hasn't
changed that.
IEEE's relevance is that it spurred Intel into making an FPU which in
turn made implementing fp easy.
: get-number ( accum adr len -- accum' adr' len' )
{ adr len }
0. adr len >number { adr' len' }
len len' =
if
2drop adr len 1 /string
else
d>s swap 60 * +
adr' len'
then ;
: parse-time ( adr len -- seconds)
0 -rot
begin
dup
while
get-number
repeat
2drop ;
s" foo-bar" parse-time . 0
s" foo55bar" parse-time . 55
s" foo 1 bar 55 zoo" parse-time . 155
mhx wrote:
On Sun, 6 Oct 2024 7:51:31 +0000, dxf wrote:
Is there an easier way of doing this? End goal is a double number representing centi-secs.
empty decimal
: SPLIT ( a u c -- a2 u2 a3 u3 ) >r 2dup r> scan 2swap 2 pick - ;
: >INT ( adr len -- u ) 0 0 2swap >number 2drop drop ;
: /T ( a u -- $hour $min $sec )
2 0 do [char] : split 2swap dup if 1 /string then loop
2 0 do dup 0= if 2rot 2rot then loop ;
: .T 2swap 2rot cr >int . ." hr " >int . ." min " >int . ." sec " ;
s" 1:2:3" /t .t
s" 02:03" /t .t
s" 03" /t .t
s" 23:59:59" /t .t
s" 0:00:03" /t .t
Why don't you use the fact that >NUMBER returns the given
string starting with the first unconverted character?
SPLIT should be redundant.
-marcel
: CHAR-NUMERIC? 48 58 WITHIN ;
: SKIP-NON-NUMERIC ( adr u -- adr2 u2)
BEGIN
DUP IF OVER C@ CHAR-NUMERIC? NOT ELSE 0 THEN
WHILE
1 /STRING
REPEAT ;
: SCAN-NEXT-NUMBER ( n adr len -- n2 adr2 len2)
2>R 60 * 0. 2R> >NUMBER
2>R D>S + 2R> ;
: PARSE-TIME ( adr len -- seconds)
0 -ROT
BEGIN
SKIP-NON-NUMERIC
DUP
WHILE
SCAN-NEXT-NUMBER
REPEAT
2DROP ;
S" hello 1::36 world" PARSE-TIME CR .
96 ok
| Sysop: | Keyop |
|---|---|
| Location: | Huddersfield, West Yorkshire, UK |
| Users: | 715 |
| Nodes: | 16 (2 / 14) |
| Uptime: | 26:26:09 |
| Calls: | 12,106 |
| Calls today: | 6 |
| Files: | 15,006 |
| Messages: | 6,518,193 |