TkScript |
|
reference guide | Scanner |
1. Introduction
The scanner chops up source files into tokens. This is a done as a preprocessing step before the resulting token array is handed to the parser.
A token is a single reserved keyword, operator, identifier or character sequence.
A pool of unique Strings (identifiers and string constants) is built during the scanner pass.
The scanner keeps track of line numbers so that tokens (and later parser tree nodes) can be mapped back to source lines and modules.
2. Charset
Script sources must use the 8bit / ASCII format.
2.1. Escape sequences
Sequence | Description |
\\ | backslash |
\' | single quotation mark |
\" | double quotation mark |
\n | linefeed |
\r | carriage return |
\t | tabulator |
\v | vertical tabulator |
\f | form feed |
\b | backspace |
\a | alert (beep/flash screen) |
\e | ESC character (decimal 27, octal 033). |
2.1.1. Escaping embedded substrings
Strings can contain "quoted" substrings. Substrings should be surrounded by single or double quotation marks. Example:
String
words1 = "\'hello\' \', \' \'world\'";
String
words2 = '\"hello\" \", \" \"world\"';
trace "words1 = " + #(words1.splitSpace(true)); // true=scan for substrings
trace "words2 = " + #(words2.splitSpace(true));
2.1.2. Linefeed character in print/trace statements
The linefeed character
'\a'
at the end of a String printed using the
print
or
trace
statements can be omitted, it will be added automatically.
Example:
print "hello, world."; // print statement will add linefeed automatically
You can use
stdout
to print the string as-is (i.e. without any linefeed added) Example:
stdout "hello, world.\n"; // print string as-is
3. Identifiers, keywords, literals and constants
Identifiers are used to define unique names for variables, classes, functions and constants. Identifiers are case-sensitive which means that e.g. MyVariable
and myvariable
are clearly distinguished.
The first char of an identifier has to be a letter [a-zA-Z] or the underscore _; identifiers must not contain delimiters and operators; reserved keywords may also not be used as identifiers. The length should not exceed 128 characters.
4. Delimiter chars
The source scanner uses the following delimiter chars when tokenizing a source script:
= > < == <= >= != , ! && || ++ -- + - * / & | ^ % << >> += -= *= /=
&= |= ^= %= <<= >>= ( ) { } #[ [ ] ;
| |
5. Comments
5.1. Line comments
print "hello, world."; // This is a line comment
5.2. Block comments
print /* This is a block comment */ "hello, world.";
Note: Block comments should not be nested, it might confuse the syntax highlighters of certain text editors.
6. Reserved keywords
The following keywords may not be used as identifier names:
- boolean
- break
- byte
- case
- catch
- char
- clamp
- class
- compile
- constraint
- default
- #define
- define
- deref
- delegate
- dtrace
- do
- else
- enum
- exception
- explain
- extends
- false
- finally
- float
- for
- function
- if
- instanceof
- int
- local
- loop
- method
- module
- namespace
- null
- Object
- prepare
- print
- private
- protected
- return
- returns
- short
- static
- String
- switch
- tag
- this
- throw
- trace
- true
- try
- use
- var
- void
- while
- wrap
7. Number formats and literals
The following forms of representation can be used to write constant values:
7.1. Decimal integer
7.2. 32bit floating point number
float f = 10.25;
float f2 = 10.25f;
7.3. Hexadecimal integer
int i = $fedcba98 // ASM-style hex literal
int htmlColor = #fedcba98 // ARGB32/HTML-style color literal
int chex = 0x900df00d; // C-style hex literal
7.4. Binary integer
7.5. A single ASCII character
char c = '!';
char c2 = '\n'; // linefeed ASCII character
7.6. A sequence of one or more ASCII characters
String
s = "a \'string\'\n";
7.7. An ANSI escape sequence (clear screen)
This example clears the screen if run within an ANSI compatible terminal emulator (e.g. Linux terminals):
print "\e[2J";
7.8. Common constant literals
7.8.1. true
This literal represents the integer/boolean value 1 (true).
Example:
boolean bPrintHello = true;
if(bPrintHello)
{
print "hello, world.";
}
7.8.2. false
This literal represents the integer/boolean value 0 (false).
Example:
boolean bDontPrintHello = false;
if(!bDontPrintHello)
{
print "hello, world.";
}
7.8.3. maybe
This literal represents the integer/tristate value -1 (maybe).
Example:
boolean bLastChoice = true;
boolean bPrintHello = maybe;
if(maybe == bPrintHello)
{
bPrintHello = bLastChoice;
}
if(bPrintHello)
{
print "hello, world.";
}
maybe
is used quite rarely but can be useful when e.g. dealing with UI preferences dialogs that want certain elements to be pointed out as being unchanged.
7.8.4. The "null" literal
null
represents a pointer to
no Object
.
// Object references are deleted by (pointer-)assigning 'null'
String
s <= new String
;
s <= null; // Deletes "s" since the variable "owns" the pointer
If a pointer variable is initialized with
null
during its declaration, no initial object will be allocated:
String
s <= null; // Do not allocate initial String
7.9. String-number conversions at runtime
The number format conversion is also performed at runtime when numbers are assigned to strings or vice versa.
Example:
int i = String
(42);
The number parser is also useful to initialize number objects (see
Number objects
):
Double
d <= Double.News("3.1415926535897932384626433832795");
8. User defined constants
8.1. Module constants
The #define
resp. define
keywords are used to define a constant value in a source module.
Please notice that TkScript has no sophisticated preprocessor so the constant value must be a single token.
Please also take a look at class constants, which allow complex initialization expressions.
8.1.1. Int constant example
#define NUMLOOPS 42
loop(NUMLOOPS) { /* ... */ }
8.1.2. String constant example
#define AUTHOR "Bastian Spiegel "
trace "This software was developed by " + AUTHOR ;
8.1.3. Enumeration example
enum { RED, GREEN, BLUE }; // => RED==0, GREEN==1, BLUE==2
enum { RED, GREEN=4, BLUE }; // => RED==0, GREEN==4, BLUE==5
Also see
The define and enum statements
.
8.2. Class constants
TkScript allows constants to be declared in the scope of a script class.
In contrary to module constants, class constants are strongly typed and can use complex initialization expressions.
The reason for this is that the constants are actually initialized in the first parser pass, not the scanner :).
8.2.1. A simple class constants example
class CConst {
define int A = 1 + 3*4 - 2*3;
define float B = PI * 0.5f;
define String
C = "hello, world.";
}
print CConst.A;
print CConst.B;
print CConst.C;
Also see
Constants
.
8.3. Plugin constants
Native C++ plugin classes can export constants via the
→yac plugin interface.
For example, take a look at the
tkopengl
plugin which exports a lot of constants (see
DisplayList
and
Texture
).
9. System constants
The following is a list of pre-defined system constants:
1PI | - 1 divided by PI |
1SQRT2 | - 1 divided by sqrt(2) |
2PI | - PI multiplied by 2 (6.2..) |
BIG_ENDIAN | - big endian byte order, msb first |
default | - 0 or 0.0 or false |
E | - the math constant E |
false | - 0 |
IOS_IN | - input/output stream mode |
IOS_INOUT | - input/output stream mode |
IOS_OUT | - input/output stream mode |
LITTLE_ENDIAN | - little endian byte order, lsb first |
LN10 | - the math constant ln(10) |
LOG10 | - the math constant log(10) |
maybe | - -1 |
null | - NULL object pointer |
PI | - the math constant PI |
PI2 | - PI divided by 2 (1.5..) |
RAND_MAX | - maximum value returned by sirnd ASM opcode |
SEEK_BEG | - stream seek mode |
SEEK_CUR | - stream seek mode |
SEEK_CUR | - stream seek mode |
SQRT2 | - the math constant sqrt(2) |
true | - 1 |
auto-generated by "DOG", the TkScript document generator. Wed, 31/Dec/2008 15:53:35