FWIW, the cp2 test data includes a text file with (I think) every possible Applesoft token and various tricky cases. Trailing spaces, embedded control characters, tokenswithoutspacesbetween, tokens with spaces in the middle, PRINT statements without a
closing quote, "PRINT" replaced with "?", lower case, etc. The first half of it is executable; then I got lazy.
https://github.com/fadden/CiderPress2/blob/main/TestData/fileconv/ALL.TOKENS.mod.txt
For comparison, the tokenized form can be found in the disk images in the same directory.
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)