Hello Waldek, Stephen and others!
Not so long ago my compatriot professor Waldek Hebisch "has coded" the Forth version of the Heap Sort Algorithm.
http://www.math.uni.wroc.pl/~hebisch/prog/taxi_hs.fs
Heap Sort is a beautiful algorithm which learning will give a lot to each of you! You will learn what "binary trees" are, when they are "complete", how to represent a binary tree as an ordinary array, why this array should be indexed from 1, and what
relates this binary index to moving through the tree. You will learn what "heap condition" a complete binary tree must satisfy in order to be a "heap". Finally, you will learn the surprising non-obvious thing, why you are creating a heap of arbitrary
data in O(n) pessimistic time, which is faster than you thought!
The Heap Sort Algorithm sorts in pessimistic O(n*log n) time which is better than QuickSort which does it in O(n^2). Admittedly Merge Sort has this time too, but Heap Sort sorts in place and Merge Sort needs extra second memory.
You can INCLUDE the above file after downloading it and after SET-SIZE to 2GB and reloading 64-bit VFX Forth or I will paste 2 words here:
: sift
( rra ra ir ii )
DUP 2 * 1+ ( rra ra ir ii jj )
BEGIN
>R OVER R@ SWAP R> OVER OVER ( rra ra ir ii jj ir jj ir jj )
> WHILE ( rra ra ir ii jj ir jj )
1+ > IF ( rra ra ir ii jj )
>R >R OVER R> SWAP R> SWAP OVER ( rra ra ir ii jj ra jj )
CELLS + DUP @ SWAP CELL+ @ < IF ( rra ra ir ii jj )
1 +
THEN
THEN ( rra ra ir ii jj )
>R >R >R OVER OVER R> ROT ROT R> ROT ROT R@
ROT ROT R> ( rra ra ir ii jj rra ra jj )
CELLS OVER + @ ROT ( rra ra ir ii jj ra rr_jj rra )
OVER < IF ( rra ra ir ii jj ra rr_jj )
>R >R OVER CELLS R> + R> SWAP !
SWAP DROP DUP DUP + 1+
ELSE
DROP DROP DROP ( rra ra ir ii )
OVER 1 +
THEN ( rra ra ir ii jj )
REPEAT
DROP DROP DROP SWAP DROP ( rra ra ii )
CELLS + !
;
: heapsort ( ra n )
0 OVER 1 - 2 / DO
OVER OVER OVER ( ra n ra n ra )
I CELLS + @ ROT ROT I ( ra n rra ra n I )
sift
-1 +LOOP
1 - 1 SWAP DO ( ra )
DUP DUP I CELLS + ( ra ra ra+I )
DUP @ ( ra ra ra+I rra )
SWAP ROT ( ra rra ra+I ra )
@ SWAP ! ( ra rra )
I 1 = IF
OVER !
ELSE
OVER I 0 ( rra ra I 0 )
sift
THEN
-1 +LOOP
;
As everybody can see the above code is ugly. Alas, this code is even an example of why not to use Forth. Admit, the code discourages you.
You probably want to see how it works. Let's create an array of 10 numbers and sort it:
create example 13 , 7 , 9 , 12 , 1 , 8 , 2 , 99 , 14 , 3 ,
: print 10 0 do example i cells + @ . loop ;
print
example 10 heapsort
print
Note 1: The word heapsort incorrectly leaves an address on the stack. There is no DROP before ; in heapsort.
There is even more junk on the stack after all the rama_taxis is done.
Question to Waldek: But basically your heapsort works so it could be used all over the world. The question of copyright arises. Do you give your heapsort code to the community for free and agree to any use of your heapsort, even without mentioning that
it is you who "has coded" it?
Question to all of you: Can you rewrite this algorithm so that it becomes Forth's pride and not Forth's insult?
Maybe you might want to use variables or locals or arrange human comments?
You don't have to stay in CORE, but even if you want to stay like Waldek, then VARIABLE and ( ) are rather in Core, right Waldek? I don't know where this fear of using variables by Forthers comes from.
But when we use VFX Forth, the above code becomes beautiful.
It's because of Stephen who designed the VFX code generator.
Be sure to disassemble words sift and heapsort to see this beauty!
dasm sift
( 00977880 488BD3 ) MOV RDX, RBX
( 00977883 48D1E3 ) SHL RBX, # 1
( 00977886 48FFC3 ) INC RBX
( 00977889 488D6DF8 ) LEA RBP, [RBP+-08]
( 0097788D 48895500 ) MOV [RBP], RDX
( 00977891 90 ) NOP
( 00977892 90 ) NOP
( 00977893 90 ) NOP
( 00977894 90 ) NOP
( 00977895 90 ) NOP
( 00977896 90 ) NOP
( 00977897 90 ) NOP
( 00977898 53 ) PUSH RBX
( 00977899 488B1C24 ) MOV RBX, [RSP]
( 0097789D 5A ) POP RDX
( 0097789E 483B5508 ) CMP RDX, [RBP+08]
( 009778A2 488D6DF0 ) LEA RBP, [RBP+-10]
( 009778A6 488B4D18 ) MOV RCX, [RBP+18]
( 009778AA 48894D00 ) MOV [RBP], RCX
( 009778AE 48895D08 ) MOV [RBP+08], RBX
( 009778B2 488BDA ) MOV RBX, RDX
( 009778B5 0F8DC1000000 ) JNL/GE 0097797C
( 009778BB 48FFC3 ) INC RBX
( 009778BE 483B5D00 ) CMP RBX, [RBP]
( 009778C2 488B5D08 ) MOV RBX, [RBP+08]
( 009778C6 488D6D10 ) LEA RBP, [RBP+10]
( 009778CA 0F8D2A000000 ) JNL/GE 009778FA
( 009778D0 53 ) PUSH RBX
( 009778D1 48FF7500 ) PUSH QWORD [RBP]
( 009778D5 5B ) POP RBX
( 009778D6 5A ) POP RDX
( 009778D7 488BCA ) MOV RCX, RDX
( 009778DA 48C1E103 ) SHL RCX, # 03
( 009778DE 48034D10 ) ADD RCX, [RBP+10]
( 009778E2 488B01 ) MOV RAX, 0 [RCX]
( 009778E5 483B4108 ) CMP RAX, [RCX+08]
( 009778E9 48895D00 ) MOV [RBP], RBX
( 009778ED 488BDA ) MOV RBX, RDX
( 009778F0 0F8D04000000 ) JNL/GE 009778FA
( 009778F6 4883C301 ) ADD RBX, # 01
( 009778FA 53 ) PUSH RBX
( 009778FB 48FF7500 ) PUSH QWORD [RBP]
( 009778FF 48FF7508 ) PUSH QWORD [RBP+08]
( 00977903 5B ) POP RBX
( 00977904 5A ) POP RDX
( 00977905 488B0C24 ) MOV RCX, [RSP]
( 00977909 58 ) POP RAX
( 0097790A 48C1E003 ) SHL RAX, # 03
( 0097790E 48034510 ) ADD RAX, [RBP+10]
( 00977912 4C8B00 ) MOV R8, 0 [RAX]
( 00977915 4C3B4518 ) CMP R8, [RBP+18]
( 00977919 488D6DF0 ) LEA RBP, [RBP+-10]
( 0097791D 488B4520 ) MOV RAX, [RBP+20]
( 00977921 48894500 ) MOV [RBP], RAX
( 00977925 48894D08 ) MOV [RBP+08], RCX
( 00977929 48895510 ) MOV [RBP+10], RDX
( 0097792D 48895D18 ) MOV [RBP+18], RBX
( 00977931 498BD8 ) MOV RBX, R8
( 00977934 0F8E31000000 ) JLE/NG 0097796B
( 0097793A 53 ) PUSH RBX
( 0097793B 48FF7500 ) PUSH QWORD [RBP]
( 0097793F 488B5D10 ) MOV RBX, [RBP+10]
( 00977943 48C1E303 ) SHL RBX, # 03
( 00977947 5A ) POP RDX
( 00977948 4803DA ) ADD RBX, RDX
( 0097794B 5A ) POP RDX
( 0097794C 488913 ) MOV 0 [RBX], RDX
( 0097794F 488B5D08 ) MOV RBX, [RBP+08]
( 00977953 48035D08 ) ADD RBX, [RBP+08]
( 00977957 48FFC3 ) INC RBX
( 0097795A 488B5508 ) MOV RDX, [RBP+08]
( 0097795E 48895510 ) MOV [RBP+10], RDX
( 00977962 488D6D10 ) LEA RBP, [RBP+10]
( 00977966 E90C000000 ) JMP 00977977
( 0097796B 488B5D18 ) MOV RBX, [RBP+18]
( 0097796F 4883C301 ) ADD RBX, # 01
( 00977973 488D6D10 ) LEA RBP, [RBP+10]
( 00977977 E91CFFFFFF ) JMP 00977898
( 0097797C 488B5D10 ) MOV RBX, [RBP+10]
( 00977980 48C1E303 ) SHL RBX, # 03
( 00977984 48035D20 ) ADD RBX, [RBP+20]
( 00977988 488B5528 ) MOV RDX, [RBP+28]
( 0097798C 488913 ) MOV 0 [RBX], RDX
( 0097798F 488B5D30 ) MOV RBX, [RBP+30]
( 00977993 488D6D38 ) LEA RBP, [RBP+38]
( 00977997 C3 ) RET/NEXT
( 280 bytes, 86 instructions )
dasm heapsort
( 009779D0 488BD3 ) MOV RDX, RBX
( 009779D3 4883C3FF ) ADD RBX, # -01
( 009779D7 B902000000 ) MOV ECX, # 00000002
( 009779DC 488BC3 ) MOV RAX, RBX
( 009779DF 488BDA ) MOV RBX, RDX
( 009779E2 4899 ) CQO
( 009779E4 48F7F9 ) IDIV RCX
( 009779E7 488D6DF0 ) LEA RBP, [RBP+-10]
( 009779EB 48C7450000000000 ) MOV QWord [RBP], # 00000000
( 009779F3 48895D08 ) MOV [RBP+08], RBX
( 009779F7 488BD8 ) MOV RBX, RAX
( 009779FA E81195A9FF487A970000000 CALL 00410F10 (DO) 0000000000977A48
( 00977A07 90 ) NOP
( 00977A08 498BD6 ) MOV RDX, R14
( 00977A0B 48C1E203 ) SHL RDX, # 03
( 00977A0F 48035500 ) ADD RDX, [RBP]
( 00977A13 498BCE ) MOV RCX, R14
( 00977A16 488D6DE0 ) LEA RBP, [RBP+-20]
( 00977A1A 48895D00 ) MOV [RBP], RBX
( 00977A1E 488B4520 ) MOV RAX, [RBP+20]
( 00977A22 48894508 ) MOV [RBP+08], RAX
( 00977A26 488B02 ) MOV RAX, 0 [RDX]
( 00977A29 48894510 ) MOV [RBP+10], RAX
( 00977A2D 48895D18 ) MOV [RBP+18], RBX
( 00977A31 488BD9 ) MOV RBX, RCX
( 00977A34 E847FEFFFF ) CALL 00977880 SIFT
( 00977A39 4983C6FF ) ADD R14, # -01
( 00977A3D 4983C7FF ) ADD R15, # -01
( 00977A41 71C5 ) JNO 00977A08
( 00977A43 415E ) POP R14
( 00977A45 415F ) POP R15
( 00977A47 58 ) POP RAX
( 00977A48 4883C3FF ) ADD RBX, # -01
( 00977A4C 488D6DF8 ) LEA RBP, [RBP+-08]
( 00977A50 48C7450001000000 ) MOV QWord [RBP], # 00000001
( 00977A58 E8B394A9FFD77A970000000 CALL 00410F10 (DO) 0000000000977AD7
( 00977A65 90 ) NOP
( 00977A66 90 ) NOP
( 00977A67 90 ) NOP
( 00977A68 498BD6 ) MOV RDX, R14
( 00977A6B 48C1E203 ) SHL RDX, # 03
( 00977A6F 4803D3 ) ADD RDX, RBX
( 00977A72 488B0A ) MOV RCX, 0 [RDX]
( 00977A75 488B03 ) MOV RAX, 0 [RBX]
( 00977A78 488902 ) MOV 0 [RDX], RAX
( 00977A7B 498BD6 ) MOV RDX, R14
( 00977A7E 4883FA01 ) CMP RDX, # 01
( 00977A82 488D6DF8 ) LEA RBP, [RBP+-08]
( 00977A86 48895D00 ) MOV [RBP], RBX
( 00977A8A 488BD9 ) MOV RBX, RCX
( 00977A8D 0F8514000000 ) JNZ/NE 00977AA7
( 00977A93 488B5500 ) MOV RDX, [RBP]
( 00977A97 48891A ) MOV 0 [RDX], RBX
( 00977A9A 488B5D00 ) MOV RBX, [RBP]
( 00977A9E 488D6D08 ) LEA RBP, [RBP+08]
( 00977AA2 E921000000 ) JMP 00977AC8
( 00977AA7 498BD6 ) MOV RDX, R14
( 00977AAA 488D6DE8 ) LEA RBP, [RBP+-18]
( 00977AAE 48895500 ) MOV [RBP], RDX
( 00977AB2 488B5518 ) MOV RDX, [RBP+18]
( 00977AB6 48895508 ) MOV [RBP+08], RDX
( 00977ABA 48895D10 ) MOV [RBP+10], RBX
( 00977ABE BB00000000 ) MOV EBX, # 00000000
( 00977AC3 E8B8FDFFFF ) CALL 00977880 SIFT
( 00977AC8 4983C6FF ) ADD R14, # -01
( 00977ACC 4983C7FF ) ADD R15, # -01
( 00977AD0 7196 ) JNO 00977A68
( 00977AD2 415E ) POP R14
( 00977AD4 415F ) POP R15
( 00977AD6 58 ) POP RAX
( 00977AD7 C3 ) RET/NEXT
( 264 bytes, 71 instructions )
Everyone can see now how powerful Stephen's Optimizer in VFX Forth is.
Even ugly Forth code it turned into beautiful assembly code.
I will not show off my broken English to describe how beautifully VFX Forth shortened and rearranged everything.
For example, let's look at the inside of the loop in our PRINT word:
( 009778D8 498BD6 ) MOV RDX, R14
( 009778DB 488B14D530789700 ) MOV RDX, [+RDX*8+00977830]
( 009778E3 488D6DF8 ) LEA RBP, [RBP+-08]
( 009778E7 48895D00 ) MOV [RBP], RBX
( 009778EB 488BDA ) MOV RBX, RDX
( 009778EE E8555DAAFF ) CALL 0041D648 .
You can see that 5 words: example i cells + @ are coded into almost 1 assembly instruction!
Question to Stephen Pelc: After all, your wonderful optimization fairy tale can be carried on. Why not to use single instruction:
MOV RDX, [8*R14+00977830]
instead of two of yours:
MOV RDX, R14
MOV RDX, [8*RDX+00977830]
or even load the target registry rbx right away:
MOV RBX, [8*R14+00977830]
instead of your 3 instructions?
MOV RDX, R14
MOV RDX, [8*RDX+00977830]
MOV RBX, RDX
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)