On Thursday, March 18, 2021 at 4:28:08 PM UTC-7,
[email protected] wrote:
This is pretty vague but I've been mumbling on this idea in my head for a while. There is a core standard for 6502 assembler source code - the mnemonics are invariant (I hope), and the syntax for hex ($0A), addressing modes, register names etc. might
also be completely standard and invariant from one assembler to another.
However, the standard halts there and each assembler designer from that point elaborates on the language in their own way, leading to the existence of multiple dialects with varying degrees of similarity. Pseudo-ops (EQU, ORG, ASC), rules for forming
labels, comments etc. are all areas of syntactic departure. Dialects include Merlin, Apple Pascal, Big Mac, ORCA for 6502 ... basically each assembler defines one more dialect of the language ... even different versions of one assembler might result in
more "dialects", although the differences there are likely miniscule.
I briefly started doing some of this in AppleWin's debugger but got side-tracked.
https://github.com/AppleWin/AppleWin/blob/master/source/Debugger/Debugger_Assembler.cpp#L302
There are _many_ small differences. Take for example a simple instruction:
STA $0002
Should this be?
* Absolute? 8D 02 00 STA $0002
* Zero-page? 85 02 STA $02
Also,
What does the assembler default to? Does it generate the 2-byte opcode or the 3-byte opcode?
How do you force the assembler to generate the _other_ opcode?
Stings are another pain point. Due to the Woz's unfortunate non-conventional high ASCII usage there are times you want:
* low-bit ASCII,
* other times you want high-bit ASCII,
* times you want them mixed, and
* times you want to have raw hex (such as zero termination.)
For example one of the things that makes ca65 garbage out-of-the-box is that it wasn't actually designed by someone who _does_ Apple 2 6502 programming so you have to jump through all sorts of macro garbage to solve practical problems. Merlin32 makes
this trivial by using ' and " syntax.
48 65 6C 6C 6F ASC 'Hello' ; Using simple quote, the high bit is set to 0 (standard ASCII)
C8 E5 EC EC EF ASC "Hello" ; Using double quotes, the high bit is set to 1 (for Text Screen encoding)
41 8D 00 ASC 'A',8D,00
Partially shamelessly copied from help page
https://brutaldeluxe.fr/products/crossdevtools/merlin/
Another common example is "pad to end of page". Every assembler has their own syntax for doing this. Again Merlin32 makes this trivial:
A0 A0 A0 ... DS \,$A0 ; Fill memory with 0xA0 values until the next memory page
If I come across an ASM listing in a magazine, the code or the article may or may not identify the target assembler for this particular unit of source code. As well, I may have a different assembler but not the one targeted by the author. So I'd wish
for a tool that could
a) take a source file and detect the assembler(s) it is targeted to, or compatible with; and
b) automatically translate from dialect A to dialect B
Given that there are so few assemblers you could just "brute force" it and see what assembles without errors.
I'm interested in this as a possible programming challenge. Currently I foresee much slogging through assembler manuals to document the syntax rules of each dialect. Does such a tool already exist in some form?
No. The closest I know of is 6502 bench.
https://github.com/fadden/6502bench
Or, do you know of any techniques or information sources that would to some degree address this functional requirement?
You may want to think about this process in reverse:
Given a series of opcodes, generate the 7 different ways it would be written in 7 different assemblers. At this point you are basically doing a string compare.
Why? Because the opcodes are the "ground truth".
Opcodes = representation
Assembly = presentation
What I would do to start is:
1. Create a GitHub project
2. Create a directory that has:
a) a HDV with all the (popular?) native Apple 2 assemblers (Apple Pascal, Big Mac, DOS Tool Kit, Merlin, ORCA, etc.) and
b) along with cross assemblers (Merlin32, sbasm3, ca65, etc.)
c) a batch file that compiles the 7+ different assembly source files
3) Generate a complete 8x256 spreadsheet / text file matrix for every valid opcode, where opcodes run vertically and the assemblers run horizontally. (Columns are each assembler input). I would *highly* recommend you have the assemblers in alphabetical
order:
Opcode ApplePascal Big Mac CA65 DOS Toolkit Merlin Merlin32 ORCA SBASM3 Notes
00
:
85 02
:
8D 02 01
:
FF
4. Next you'll want to catalog radix input (What is the character prefix for a binary literal?)
5. Next you'll want to catalog all the pseudo-directives as ORG, EQU
6. Next you'll want to catalog all the byte, word, address, storage layouts
7. Next you'll want to catalog string types
8. Next you'll want to catalog expressions
9. Next you'll want to catalog PC usage and utility functionality (pad to end of page, etc.)
10. Next you'll want to catalog local and global labels
11. Lastly you'll want to cover macros
I'm probably missing a few different types of syntax but that should be enough to get you started.
Disclaimer: I can barely write two lines of ASM to save my life, although I've been intending to learn it for the last 40 years. This exercise would go a long way to scratching that itch!
Never too late to learn! Mark Lemmert learning 6502 assembly language and shipping his Nox Archaist game proved that all that you need is determination, time, and capacity.
Keep us posted of your progress!
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)