=============================================================================
Tokenize - Turn ASCII Files into Tokenized BASIC                 Version 1.00

(c) Stephen Fryatt, 2014-2021                                 31 October 2021
=============================================================================


Licence
-------

  Tokenize is licensed under the EUPL, Version 1.2 only (the "Licence"); you
  may not use this work except in compliance with the Licence.

  You may obtain a copy of the Licence at
  http://joinup.ec.europa.eu/software/page/eupl

  Unless required by applicable law or agreed to in writing, software
  distributed under the Licence is distributed on an "AS IS" basis, WITHOUT
  WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

  See the Licence for the specific language governing permissions and
  limitations under the Licence.

  The source for Tokenize can be found on GitHub, at
  https://github.com/steve-fryatt/tokenize



Introduction
------------

  Tokenize is a tool for converting ASCII text files into tokenized BASIC (of
  the ARM BASIC V variety) at the command line.  It is written in C, and can
  be compiled to run on platforms other than RISC OS.

  In addition to tokenizing files, Tokenize can link multiple library files
  into a single output -- both by passing multiple source files on the
  command line and by processing (and removing) "LIBRARY" statements found in
  the source.

  Finally, Tokenize can perform the actions of the BASIC "CRUNCH" command on
  the tokenized output, substitute 'constant' variables with values passed in
  on the command-line, and convert textual "SYS" names into their numeric
  equivalents.

  Be aware that Tokenize is NOT a BASIC syntax checker.  While it will check
  some aspects of syntax that it requires to do its job, it will not verify
  that a program is completely syntactically correct and that it will load
  into the BASIC Interpreter and execute as intended.



Command Line Usage
------------------

  Tokenize is a command line tool, and should placed in your library or on
  your path as appropriate.  Once installed, it can be used in its most basic
  form as

  "tokenize <source file 1> [<source file 2> ...] -out <output> [<options>]"

  where one or more source files are tokenized into a single output file.

  It is possible to add the -verbose parameter switch, in which case Tokenize
  will generate more detailed information about what it is doing.

  For more information about the options available, use "tokenize -help".


  Source Files
  ------------

  The source files used by Tokenize are plain ASCII text: as saved from BASIC
  using the "TEXTSAVE" command, or by loading a BASIC file into a text editor
  and saving it as text.

  The same requirements in terms of spacing and layout are placed on the
  files as would be applied by BASIC's own tokenizer.  The only difference is
  that -- as with loading files with "TEXTLOAD" and editing BASIC in most
  text editors on RISC OS -- line numbers are optional.  Thus a file could
  read

     80 FOR I% = 1 TO 10
     90   PRINT "Hello World!"
    100 NEXT I%

  but it could equally read

    FOR I% = 1 TO 10
      PRINT "Hello World!"
    NEXT I%

  In the latter case, the line numbers will be applied automatically as
  described in the following section.

  Keyword abbreviations are fully supported, using the rules defined in the
  BASIC Reference Manual.  Thus "PRINT" and "P." would both convert into the
  same token, while "OR" and "OR." would not.  For clarity, the use of
  abbreviations in source code is discouraged.


  Line Numbers
  ------------

  All BASIC programs must have line numbers, but if they are not included in
  the supplied source files then Tokenize will add them automatically.

  A line number is considered to be any digits at the start of a line --
  otherwise, lines should start with non-numeric characters.  The whitespace
  around line numbers is optional; if a number is present, the space before
  it will be ignored for the purposes of indenting what follows.  Numbers can
  fall in the range from 0 to 65279 inclusive, which is the range allowed by
  BASIC.

  If line numbers are present in the source files, then they will be used in
  the output.  If they are not present, then Tokenize will number the lines
  it creates automatically: starting at the value given to the -start
  parameter and incrementing by the value given to the -increment parameter
  -- these both default to 10 if not specified.

  If line numbers are given, they must be sequential: duplicate numbers are
  not allowed, and lines can not be numbered backwards.  It is possible to
  mix numbered and un-numbered lines, in which case Tokenize will switch
  between the two numbering methods: numbers must still be sequential, and if
  the numbered lines break the automatic sequence an error will be given.
  Thus

       FOR i% = 1 TO 10
    15   PRINT "Hello World!"
       NEXT i%

  would be valid using the default value for -start and -increment and would
  result in lines 10, 15 and 25 being generated.  However

       FOR i% = 1 TO 10
     5   PRINT "Hello World!"
       NEXT i%

  would not be valid, and would give the error "Line number 5 out of sequence
  at line 2 of 'example file'" (because this would be trying to create the
  sequence 10, 5 and 15 using the default settings of -start and -increment).


  Linking and Libraries
  ---------------------

  Used with a single source file, Tokenize will take the file and convert it
  into a single tokenized BASIC program:

  "tokenize <source file> -out <output file>"

  If more than one source file is specified, then the files will be
  concatenated in the order that they are included in the parameters:

  "tokenize <source 1> <source 2> <source 3> -out <output file>"

  would result in the files being linked in the order

    REM File 1
    REM File 2
    REM File 3

  In addition to files explicitly given on the command line, Tokenize can
  link in files referenced via "LIBRARY" commands in the source files
  themselves if the -link parameter is used.  Thus if the statement
  "LIBRARY "MoreCode"" was included in a source file, the file MoreCode would
  be added to the list of files to be linked in and the statement would be
  removed from the tokenized output.  Such files will be added to the queue
  for linking after any given on the command line, and will be processed in
  the order that they are found in the source.

  "LIBRARY" commands must appear alone in a statement in order to be
  considered for linking.  In other words, these two examples would be linked

    LIBRARY "WimpLib"
    PROCfoo : LIBRARY "BarLib" : REM This is OK

  but on the other hand this example

    IF foo% THEN LIBRARY "BarLib"

  would not, because the "LIBRARY" statement is not able to be processed in
  isolation from the rest of the code.  If a "LIBRARY" command is skipped in
  this way, a warning will be given.

  All "LIBRARY" commands will be considered for linking, including any found
  within linked source files.  The filenames given to "LIBRARY" must be
  string constants: if a variable is encountered (eg. "LIBRARY variable$")
  then the statement will be left in the tokenized output and a warning will
  be given.  Filenames found in "LIBRARY" commands are assumed to be in
  'local' format, including being treated as having relative location if
  applicable.

  When used on RISC OS, filenames are passed straight to the filing system:
  any path variables must be set correctly before use -- using the
  conventional "*Set" command -- to point to ASCII versions of the specified
  library files.  The -path parameter described below is NOT AVAILABLE on
  RISC OS, and will result in an error if used.

  When used on other platforms, it is possible to specify 'path variables' to
  Tokenize so that statements such as "LIBRARY "BASIC:Library"" can be used.
  On RISC OS, such a filename would resolve to the file Library somewhere on
  "<BASIC$Path>".  Tokenize allows paths to be specified using the -path
  parameter: "-path BASIC:libs/" would mean that "BASIC:" would expand to
  "libs/" and therefore result in "LIBRARY "libs/Library"" for the example
  here.  As with RISC OS path variables, paths must end with a directory
  separator in the local format.  More than one -path parameter can be
  specified on the command line if required.


  Constant Variables
  ------------------

  When tokenizing a file, Tokenize can replace specific variables with
  constant values given via the -define parameter on the command line.  Any
  instances of the variable found in the code will be replaced by the
  constant value, unless they are being assigned: in which case the statement
  containing the assignment will be removed completely.

  A -define parameter is followed by a variable assignment in the form
  "<name>=<value>".  To assign an integer variable "int%" the constant value
  10, for example, use "-define int%=10"; to give the string variable "name$"
  the value "Hello World", use "-define "name$=Hello World"".  Note that
  string values are not surrounded by double quotation marks, but if the
  value contains spaces then the whole parameter value must be enclosed as
  here.

  Note that support for constants is currently limited: all references to the
  variable will be replaced, including those defined as "LOCAL".  In general,
  Tokenize will correctly identify assembler mnemonics within assembler
  blocks, but there may be some sequences which are not recognised and get
  treated as variables by mistake.  It is advisable to check the generated
  code carefully when substitutions are being carried out.

  Remember that there are some locations in BASIC where variables can be used
  but constant values can not.  An example is diadic indirection, where

    PRINT a%!0

  is valid and will print the integer value stored at the location pointed to
  by a%, while

    PRINT &8200!0

  is unlikely to have the expected effect (printing 33280 followed by the
  inter value stored at &0).  Tokenize will not validate your choice of
  substitutions for you.

  Within these limitations, however, it is possible to use constants to
  insert details such as build dates into code.  For example, given the code

    REM >Build Date
    :
    VERSION$ = "1.23"
    BUILD_DATE$ = "12 Mar 2014"
    :
    PRINT "Example Program"
    PRINT VERSION$;" (";BUILD_DATE$;")"

  then passing the definitions "-define VERSION$=0.12 -define "BUILD_DATE$=01
  Apr 2014"" would result in the output

    REM >Build Date
    :
    :
    PRINT "Example Program"
    PRINT "0.12";" (";"01 Apr 2014";")"

  being produced.


  SWI Number Conversion
  ---------------------

  When tokenising a BASIC program, Tokenize can convert any SWI names used in
  "SYS" commands into numeric constants -- removing the need for the BASIC
  interpreter to look them up when the program is run.  To perform this
  operation, use the -swi parameter on the command line.

  When running on RISC OS, Tokenize will look the SWI names up with the help
  of the operating system: as with any other BASIC cruncher, the modules used
  to provide any extension SWIs must be loaded when the tokenisation happens,
  or the names will be left in place (with a warning).

  When running on other platforms, the option of looking up SWI names via the
  OS is not available.  Instead, Tokenize can read the details from suitable
  C header files: swis.h that comes with GCC and Acorn C is a good option.
  While it can not parse the files in the same way that a C preprocessor
  would, Tokenize will look for any "#define" lines which appear to be
  followed by a valid SWI name and number combination; for example

    #define OS_WriteC    0x000000
    #define OS_WriteS    0x000001
    #define OS_Write0    0x000002

  The X versions of SWIs are inferred from their 'non-X' forms, and vice
  versa: the three "#define" statements above would implicitly add

    #define XOS_WriteC   0x020000
    #define XOS_WriteS   0x020001
    #define XOS_Write0   0x020002

  at the same time.  Were the X versions encountered first, the 'non-X'
  versions would instead be added implicitly.

  Files containing SWI names should be passed to Tokenize using the -swis
  parameter.  When used alongside the GCCSDK, it may be possible to access it
  via the GCCSDK environment variables -- for example:

    -swis $GCCSDK_INSTALL_CROSSBIN/../arm-unknown-riscos/include/swis.h


  Tabs, Indentation and Crunching
  -------------------------------

  Tokenize can adjust the indentation and spacing of the BASIC source when
  generating tokenized output.

  By default, any whitespace before a leading line number is ignored.
  Whitespace following the line number is retained, and used in the output.
  Thus both

    REM >Example
    REM
    :
    FOR i% = 1 TO 10
      PRINT "Hello World!"
    NEXT i%

  and

     10REM >Example
     20REM
     30:
    100FOR i% = 1 TO 10
    110  PRINT "Hello World!"
    120NEXT i%

  would both result in the same indentation.  If a space were left between
  the line number and the rest of the statement in the second example, then
  this would be copied to the resulting file.

  If tabs are used anywhere outside of string constants, then by default
  these will be expanded into spaces so as to align with the next tab stop if
  all the preceding keywords are fully expanded.  By default tab stops are 8
  columns apart, but this can be changed using the -tab parameter: for
  example "tokenize -tab 4" to use four column tabs.  It is recommended to
  configure Tokenize to use the same tab width as set in the text editor used
  to edit the source code.

  If -tab is used with a value of zero, then tabs will not be expanded and
  will instead be passed intact to the output file.  Due to the requirement
  to expand keywords, using tabs can have unexpected effects if abbreviated
  keywords are used in the source.

  The -crunch parameter can be used to make Tokenize reduce whitespace within
  the tokenized BASIC file.  In use it operates very much like BASIC's
  "CRUNCH" command.  It takes a series of letters after it, which indicate
  what crunching to apply -- in some instances, these are case sensitive.

  E

    Setting E will cause empty statements to be removed from the file, along
    with any empty lines (whether already there or created by removing empty
    statements).

  I

    The I option will cause all start-of-line indentation to be removed (tabs
    and spaces) so that all lines start in the first column.

  L

    Setting L will cause completely empty lines to be removed from the file;
    any whitespace will cause them to be retained.  This gives compatibility
    with the behaviour of TEXTLOAD.  The E parameter includes the behaviour
    of L.

  R

    The r and R options allow comments to be stripped from the source code;
    if used in conjunction with E then any lines which end up being empty
    will get removed.  An upper case R will strip all comments; a lower case
    r will only strip comments after the first contiguous block of lines
    containing only "REM" statements at the head of the first file.  In other
    words

      REM >Example
      REM
      REM This is the end of the head comment.

      REM This line will be removed.

      PRINT "Hello World!"

    and even

      REM >Example
      REM
      REM This is the end of the head comment.
      PRINT "Hello World!" : REM This comment will be removed.
      REM This line will be removed.

    would both result in

      REM >Example
      REM
      REM This is the end of the head comment.
      PRINT "Hello World!"

    being output to the tokenized BASIC file given a setting of "-crunch er".

  T

    Setting T will cause trailing spaces to be stripped from lines, and lines
    only containing whitespace to be reduced to a single space.  This gives
    compatibility with the behaviour of TEXTLOAD.  The E parameter includes
    the behaviour of T.

  W

    The w and W options allow whitespace to be removed from within lines.
    Using the lower case w results in all blocks of contiguous whitespace
    (tabs and spaces) being reduced to a single space, while the upper case W
    will cause it to be removed completely.



Errors and Warnings
-------------------

  There are a number of error and warning messages that Tokenize can generate
  while running.  These can indicate problems with the source BASIC code, or
  with the availability of files on disc, and are detailed here.

  Using the -warn flag will turn on additional warnings relating to the
  structure of the source code.  In addition, adding the -verbose flag to the
  command line will cause Tokenize to produce additional information about
  what it is doing during linking and tokenizing.


  Errors
  ------

  Errors are generated during the linking and tokenization process when a
  problem occurs which can not be resolved.  The process will be halted, and
  must be re-started once the error has been resolved.

  AUTO line number too large

    Automatic line numbering applied via the -start and -increment parameters
    has risen above the maximum allowed value of 65279.

  Constant variable <variable> already defined

    Variables can only be assigned as constants on the command line once.  If
    a variable is listed more than once, this error is given.

  Failed to open source file '<file>'

    A source file -- either specified on the command line or via a linked
    "LIBRARY" statement -- could not be opened for processing.  This could be
    because it did not have the correct permissions, or because it did not
    exist in the location specified.  Remember that on some platforms,
    filenames will be case-sensitive -- references that work on RISC OS's
    case-insensitive Filecore systems might fail on other platform's
    case-sensitive filesystems.

  Line number <n> out of range

    A line number explicitly specified in a source file is too large or too
    small.  BASIC can only handle numbers between 0 and 65279.

  Line too long

    A line can only be 251 bytes long once tokenized.

  Misformed deleted statement

    If Tokenize needs to remove a statement -- such as a linked in "LIBRARY"
    or "REM" that is to be crunched -- then it mist be able to determine that
    the statement terminates cleanly.  If it finds additional text where it
    is expecting to find a colon or line ending, then an error would be
    raised.  For example

      LIBRARY "LibFile" ELSE

    would cause this error.  Note that code which would raise this specific
    error -- as opposed to warnings relating to the processing of "LIBRARY"
    commands -- would almost certainly raise a Syntax Error from BASIC
    itself, anyway.


  Warnings
  --------

  Warnings are generated during linking and tokenization when an event occurs
  which the user should be aware of but which may well not prevent the
  tokenized program from working.

  Constant variable assignment to <variable> removed

    A variable defined on the command line has been found as the target of an
    assignment, resulting in the entire statement being removed.

  Line number <n> out of sequence

    A line number explicitly specified in a source file is out of sequence,
    and would be less than or equal to the number of the line before.  This
    could be due to a clear error such as

      20 REM This will give an error.
      10 PRINT "Hello World!"

    or it could be more subtle.  In this example

      10 REM >Example
         REM
         REM This will give an error.
      20 PRINT "Hello World!"

    the default -increment of 10 will result in the two intermediate lines
    being given numbers of 20 and 30.  By the time the "PRINT" statement is
    reached, the line 20 would need to be 41 or greater.

  SYS <name> not found on lookup

    If the -swi option is in force, Tokenize failed to find a match for a
    textual SWI name <name> and therefore could not convert it into numeric
    form.  This could be due to an error in the source file, or it could be
    because the name does not appear in the lookup table used by Tokenize.
    On RISC OS this could be as a result of the module providing the SWI not
    being loaded; if SWI definitions have been supplied via the -swis option
    (on all platforms), then it means that the SWI is not defined in these.

  Unisolated LIBRARY not linked

    If the -link option is in force then in order to be able to remove
    "LIBRARY" statements after linking their associated files, Tokenize needs
    to know that they are self-contained.  If "LIBRARY" does not appear at
    the start of a statement (such as if it appears in an "IF ...  THEN"
    construct) then it will not be linked.  Note that this will not catch
    "LIBRARY" as part of a multi-line "IF ...  THEN ...  ENDIF" -- be aware
    that

      IF condition% = TRUE THEN
        LIBRARY "LibCode"
      ENDIF

    would be linked, but might have some unexpected consequences.

  Unterminated string

    String constants must be enclosed by double quotes ("..."), and a pair of
    double quotes ("") together in a string is treated as a single double
    quote.  Tokenize will raise a warning if it reaches the end of a line in
    a source file whilst thinking that it is still inside a string.
    Generally unterminated strings should be avoided, but in some
    circumstances it is possible to write (bad) code which BASIC will accept
    despite them being present -- for this reason, their presence does not
    raise an error.

  Variable LIBRARY not linked

    A "LIBRARY" statement with a variable following it was encountered while
    the -link option was in force. "LIBRARY" statements can only be linked if
    they are followed by a constant string ("LIBRARY "LibraryFile""); when
    followed by a variable ("LIBRARY lib_info$"), Tokenize can not determine
    the name of the file and will therefore leave the statement in-situ.


  Optional Warnings
  -----------------

  By using the -warn parameter on the command line, Tokenize will emit
  additional warnings relating to the structure of the source code.  At
  present it can check the use of variables, and the presence of function and
  procedure definitions in the code.  The parameter is followed by a series
  of letters, which specify the warnings to be enabled -- these may be
  case-sensitive.

  P

    If -warn p or -warn P are used, Tokenize will check the function and
    procedure definitions within the source files.  A lower case p will
    report on missing or duplicate definitions; an upper case P will also
    report on any unused definitions.

  V

    If -warn v or -warn V are used, Tokenize will check variable and array
    definitions.

    For variables, a record is kept of each time a variable is assigned to
    and each time it is read.  If a variable is found to have no assignments,
    then a warning will be raised.  If an upper case V is used, a warning
    will also be given for any variable which is assigned but never read.

    Note that a lack of warnings will not guarantee correct execution of the
    code.  Tokenize makes no effort to check the order in which reads and
    assignments are carried out, or to watch for conditional execution.  As a
    result, the following piece of code would fail to execute due to both
    variables in the "PRINT" statement being undefined, despite generating no
    warnings.

      IF FALSE THEN foo% = 19
      PRINT foo%, bar%
      bar% = 21

    For arrays, Tokenize does not track assignments but instead records the
    number of uses inside and outside of "DIM" statements.  A warning will be
    given if an array is used without any "DIM" statement; if an upper case V
    is used, a warning will also be given for any array that is dimensioned
    but never used.

    As with variables, no attempt is made to check the order of execution:
    the following would generate no warnings:

      array%(0) = 1
      DIM array%(10)

    Multiple instances of an array being dimensioned are also not reported,
    as this is valid in the context of "LOCAL" arrays.

  The warning messages which can be reported are as follows:

  No definition found for <name>

    This warning, which will be generated if -warn p or -warn P is included
    on the command line, indicates that the source contains "FN" or "PROC"
    calls for which "DEF" statements have not been seen.  These warnings are
    likely to be spurious unless libraries are being linked by Tokenize.

  <name> defined more than once

    Also generated if -warn p or -warn P is passed on the command line, this
    warning indicates that two or more "DEF" statements have been found for
    the same "FN" or "PROC" name.

  <name> is defined but not used

    This warning is only generated if -warn P is included on the command
    line, and indicates that a "DEF" statement has been found for an "FN" or
    "PROC" which is never called from within the source code.

  Variable <name> referenced but not assigned

    This warning is generated if -warn v or -warn V is passed on the command
    line, and indicates that a variable has been found whose value is never
    assigned.

  Variable <name> assigned but not referenced

    This warning is only generated if -warn V is passed on the command line,
    and indicates that a variable has been found whose value is assigned but
    apparently never referenced.

  Array <name> used but not defined

    This warning is generated if -warn v or -warn V is passed on the command
    line, and indicates that an array has been found which never features in
    a "DIM" statement.

  Array <name> defined but not used

    This warning is only generated if -warn V is passed on the command line,
    and indicates that an array has been found in a "DIM" statement which
    never appears anywhere else in the code.



Compatibility Issues
--------------------

  Since there's no definitive syntax for BBC BASIC, the performance of
  Tokenize has been tested against BASIC's "TEXTLOAD" command throughout
  development.  The test base currently comprises over 3,300 files, of which
  at present less than 100 differ in their tokenised output when compared to
  the BASIC from RISC OS 5.

  The following list details known areas that Tokenize differs from BASIC, or
  which may cause confusion.  If you find any others, please get in touch
  (details at the end of this file) -- short examples in ASCII format which
  exhibit the problem, plus the version of RISC OS and BASIC that were used,
  are very helpful in identifying issues.  Please be aware that even
  different versions of BASIC tokenise some pieces of source code in
  different ways.


  Exponential numeric constants
  -----------------------------

  BASIC's tokeniser does not recognise the exponential part of numeric
  constants of the form "1E6", whereas Tokenize does.  Both BASIC and
  Tokenize would treat the following statement in the same way

    IF A < 1E6 THEN GOTO 20

  However, if the spaces are removed to give

    IFA<1E6THENGOTO20

  then while Tokenize will still generate the expected code, BASIC would run
  the "E6" into "THEN" and prevent its tokenisation.

  While the behaviour of Tokenize differs from that of BASIC, it seems that
  it is actually more correct and so has been left in.


  Numeric constants for line numbers
  ----------------------------------

  BASIC treats numeric constants in its source differently depending on
  whether they are considered to be line numbers (such as the "100" in
  "GOTO 100") or just more general numeric expressions (such as the "42" in
  "L%=42").  While this distinction is generally handled correctly, the work
  done validating Tokenize has shown up a couple of confusing issues.

  BASIC prior to RISC OS 5 (including, presumably, RISC OS 6) can incorrectly
  treat numeric constants as line numbers when they follow "TRACE" in its
  function form.  Thus affected versions of BASIC would parse

    BPUT#TRACE, 2

  and treat the "2" as a line number despite it being a numeric parameter to
  "BPUT".

  A fix was implemented for this in RISC OS 5, but it inadvertently prevented
  the recognition of other -- valid -- line number constants.  One occurrence
  was conditional statements with the "THEN" omitted:

    IF A% = 0 GOTO 100

  would fail to spot the "100" as being a line number.  There is a
  possibility that it also affected constants following "RESTORE" and
  "ON GOTO"-like constructs.  A fix for this was implemented in RISC OS 5 in
  June 2014, since when RISC OS 5 BASIC and Tokenize appear to generate
  identical code.


  The QUIT keyword
  ----------------

  The "QUIT" keyword has two standard forms: a statement with no parameters,
  and a function.  As such, it can be used to start a variable name: the code

    QUITTING% = TRUE

  would set the variable "QUITTING%" to be "TRUE".  In RISC OS 5, however, a
  third form has been added: a statment taking a single parameter (a value to
  pass back to the caller as a return code).  This means that on RISC OS 5,
  this line would tokenise as the keyword "QUIT" followed by the variable
  "TING%" and then a nonsensical "= TRUE".

  Tokenize takes the same approach as RISC OS 5, such that "QUIT" can not be
  used to start variable names.


  Control characters in string constants
  --------------------------------------

  In BASIC, it's valid to include non-printing control characters in string
  constants if you can find a way to insert them.  Tokenize will honour any
  such characters that it finds within string constants, but -- just as with
  BASIC's "TEXTLOAD" -- the presence of a newline in a string constant will
  be seen as the end of the line.  This will result in an unterminated string
  warning, and the very likely failure to parse the following line of the
  source.

  In general, the inclusion of raw non-printing characters in string
  constants is probably unwise: use can be made of "CHR$()" to avoid this
  problem.  Similarly, the use of 'top-bit' characters when manipulating
  BASIC source on other platforms may cause confusion unless care is taken
  with character sets.  Again, the use of "CHR$()" can be helpful: such as

    PRINT CHR$(169);" John Smith"

  to include a copyright declaration, for example.


  Line numbers
  ------------

  BASIC's interpreter (as opposed to its tokeniser) is surprisingly relaxed
  about the use of line numbers if no reference is made to them (ie. if there
  are no "GOTO" or similar commands in the program).  Duplicate line numbers
  are possible, as are non-sequential numbers (such as lines appearing, and
  being executed, in an order like 10, 30, 20, 40) -- examples have even been
  seen where all the lines are numbered zero.  The thing that all such BASIC
  programs share is the fact that they have been generated by some software
  other than BASIC itself (such as a BASIC linker, cruncher or automated
  source management tool).

  Unfortunately, this relaxed approach to line numbering does not extend to
  re-tokenising code which has been saved out with such numbers as text.
  BASIC's "TEXTLOAD" and Tokenize differ in their handling of such source
  files, but both will generate tokenised files which do not match the
  original code.  In such cases, the only options are to renumber the source
  text files by hand, or to remove the numbers completely.


  Crunched code
  -------------

  Although not an issue specific to Tokenize, it's worth re-iterating that
  crunching BASIC can be a one-way operation.  While BASIC's "CRUNCH" command
  and Tokenise's -crunch parameter take care to ensure that the code they
  generate is still valid in ASCII form, most of the third-party crunchers
  are not concerned with the need to edit the compacted code.

  Converting such code to text via "TEXTSAVE", a text editor or similar, then
  passing it to either "TEXTLOAD" or Tokenize will often end in failure.  The
  lack of spaces can easily result in keywords being mistaken for variables
  and vice versa -- sometimes this will give a "line too long" error from the
  tokeniser; alternatively it may not show up until the affected parts of the
  program get executed by the interpreter.



Version History
---------------

  Here is a list of the versions of Tokenize, along with all the changes
  made.


  1.00 (31 October 2021)
  ----------------------

  Initial release build with version number.



Updates and Contacting Me
-------------------------

  If you have any comments about Tokenize, or would like to report any bugs
  that you find, you can email me at the address below.

  Updates to Tokenize and more software for RISC OS computers can be found
  on my website at http://www.stevefryatt.org.uk/risc-os/risc-os/

  Stephen Fryatt
  email: info@stevefryatt.org.uk
