1 The Syntax of ALS Prolog
- 1.1 Constants
- 1.2 Variables
- 1.3 Compound Terms
- 1.4 Curly Braces
- 1.5 Lists
- 1.6 Strings
- 1.7 Operators
- 1.8 Comments
- 1.9 Preprocessor Directives: Syntax
This chapter describes the syntax of ALS Prolog, which is for the most part the syntax of the ISO Prolog standard. Prolog syntax is quite simple and regular, which is a great strength.
1.1 Constants
The simplest Prolog data type is a constant, which comes in two flavors:
- atoms (sometimes called symbols)
- numbers
The notion of a constant corresponds roughly to the notion of a name in a natural language. Names in natural languages refer to things (which covers a lot of ground), and constants in Prolog are be used to refer to things when the language is interpreted.
1.1.1 Numbers
Prolog uses two representations for numbers:
- integer
- floating point
When it is impossible to use an integer representation due to the size of a nominal
integer, a floating point representation can be used instead. This means that extremely large integers may actually require the extended precision of a floating point value. Any operation involving integers, such as a call to is/2
, will first attempt to use an integer representation for the result, and will use a floating point value only when necessary.
This type coercion is carried out consistently within the Prolog system.
There is no automatic conversion of floating point numbers into integers. (Note that the ISO Prolog standard now forbids this kind of conversion.)
Integers
The textual representation of an integer consists of a sequence of one or more digits (0 through 9) optionally preceded by a ‘-‘ to signify a negative number. The parser assumes that all integers are written using base ten, unless the special binary, octal, or hexadecimal notation is used.
The hexadecimal notation is a 0x
followed by a sequence of valid hexadecimal digits. The following are valid hexadecimal digits:
0 1 2 3 4 5 6 7 8 9 a b c d e f A B C D E F
The octal notation is a 0o
followed by a sequence of valid octal digits. The octal digits are:
0 1 2 3 4 5 6 7
The binary notation is a 0b
followed by a sequence of 0’s and 1’s.
Here are some examples of integers:
0 4532 -273 0000001 0x1fff 0b1001 0o123
It is important to note that a term of the form +5
is not an integer, but instead is a structured term.
Floating point numbers
Floating point numbers are slightly more complex than integers in that they may have either a fractional part, an exponent, or both. A fractional floating point number consists of a sequence of one or more numeric characters, followed by a dot (‘.’), in turn followed by another sequence of one or more numeric characters; the entire expression may optionally be preceded by a ‘-‘. Here are some examples of floating point numbers:
0.0 3.1415927 -3.4 000023.540000
You can also specify an exponent using scientific notation. An exponent is either an e or an E followed by an optional ‘-‘, signifying a negative exponent, followed by a sequence of one or more numeric characters. Here are examples of floating point numbers with exponents:
0.1e-3 10E99 -44.66e-88 0E-0
ASCII Codes
ASCII (American Standard Code for Information Interchange) codes are small integers between 0 and 255 inclusive that represent characters. The parser will translate any printable character into its corresponding ASCII integer. In order to get the ASCII code for a character, precede the character by the characters 0'
. For example, the code for the characters ‘A’, ‘8’, and ‘%’ would be given by:
0'A 0'8 0'%
In addition, the ANSI C-style octal and hex forms expression can be used. Thus, all of the expressions below denote the number 65:
0'A 0'\101 0'\x41
The table below displays several example ASCII sequences:
Expr | Octal Expr | Hex Expr | ASCI Code | Char |
---|---|---|---|---|
0’A | 0’\101 | 0’\x41 | 65 | Upper case A |
0’c | 0’\143 | 0’\x63 | 99 | Lower case c |
0’~ | 0’\176 | 0’\x7e | 126 | Tilde character |
There also exists a small collection of symbolic control characters which can be thought of as synonyms for certain of the ASCI control character codes. These are presented in the following table:
Expr | Octal Expr | Hex Expr | ASCII Code | Char |
---|---|---|---|---|
0’\a | 0’\007 | 0’\x7 | 7 | alert (‘bell’) |
0’\b | 0’\010 | 0’x\8 | 8 | backspace |
0’\f | 0’\014 | 0’\xC | 12 | form feed |
0’\n | 0’\012 | 0’\xA | 10 | new line |
0’\r | 0’\015 | 0’\xD | 13 | return |
0’\t | 0’\011 | 0’\x9 | 9 | horizontal tab |
0’\v | 0’\147 | 0’\x77 | 119 | vertical tab |
Atoms
An atom is a sequence of characters that are parsed together as a constant.
Alphanumeric atoms
An alphanumeric atom is a sequence of characters that begins with a lower case letter, and is followed by zero or more alphanumeric characters, possibly including ‘_’. Here are some examples of alphanumeric atoms:
foobar123 zIPPY bread_and_butter money
Quoted atoms
A quoted atom is formed by placing any sequence of characters between single quotes (‘). A single quote can be included in the text of the atom by using two consecutive single quotes for each one desired, or by prefixing the embedded single quote with the backslash (\) escape character. The following are all quoted atoms:
'any char will do' '$*#!#@%#*'
'Can''t miss' 'Can\'t miss' '99999'
If the characters that compose a quoted atom can be interpreted as an atom when
they occur without the enclosing single quotes, then it is not necessary to use the
quoted form. However, if the atom contains characters that aren’t allowed in a simple atom, then the quotes are required. Note that the last example above is an atom
whose print name is 99999
, not the integer 99999
.
Quoted atoms can span multiple lines, but in this case the end of each such line must
be preceded by the backslash escape character, as in the following example of an
atom:
'We are the stars which sing. \
We sing with our light; \
We are the birds of fire, \
We fly over the sky. \
-- Algonquin poem.'
Special atoms
A special atom is any sequence of characters from the following set:
+-*/\^<>=':.?@#&.
In addition, the atoms, []
, !
, ;
and ,
are considered to be special atoms. Some other examples of special atoms are:
+= && @>= == <---------
Most special atoms are automatically read as quoted atoms unless they have been declared as operators (See Section 1.7 Operators).
1.2 Variables
A variable consists of either a _ (underbar character) or an upper case letter, followed by a sequence of alphanumeric characters and dollar signs. Here are some variables:
Variable X123a _a$bc _123 _
1.3 Compound Terms
A compound term is consists of a symbolic constant, called a functor, followed by a left parenthesis followed by one or more terms separated by commas, followed by a right parenthesis. The number of terms separated by commas enclosed in the parentheses is called the arity of the structure. For example, the compound term
f(a,b(X),y)
has arity 3.
1.4 Curly Braces
Instead of prefixing a structured term with a functor, the curly brace notation allows a sequence of terms, separated by commas, to be grouped together in a comma list with ‘{}’ as the principal functor. For example,
{all,the,young,dudes}
parses internally into:
'{}'((all,the,young,dudes))
1.5 Lists
The simplest list is the empty list, represented by the atom ‘[]’. Any other list is
a structured term with ./2
as principal functor and whose second argument is a list.
Lists can be written by using ‘.’ explicitly as a functor, or using the special list notation.
A list using list notation is written as a [
followed by the successive first arguments
of all the sublists in order separated by commas, followed by ]
. The following are
all different ways of writing the same list:
a.b.c.[]
[a,b,c]
'.'(a,'.'(b,'.'(c,[])))
Unless specified, the last tail of a list is assumed to be []
. A tail of a list can be
specified explicitly by using |
, as in these examples:
[a|X]
[1,2,3|[]]
[Head|Tail]
The list notation for lists is preferable to using ‘.’ explicitly because the dot is also used in floating point numbers and to signal termination of input terms.
1.6 Strings
A string is any sequence of characters enclosed in double quotes (“). The parser automatically translates any string into the list of ASCII codes that corresponds to the characters between the quotes. For example, the string
"It's a dog's life"
is translated into
[73,116,39,115,32,97,32,100,111,103,39,115,32,108,10,5,102,101]
Double quotes can be embedded in strings by either repeating the double quote or by using the backslash escape character before the embedded “, as for example in
"She said, ""hi.""".
"She said, \"hi.\"".
1.7 Operators
The prefix functor notation is convenient for writing terms with many arguments. However, Prolog allows a program to define a more readable syntax for structured terms with one or arguments. For example, the parser recognizes the text
a+b+c
as an expression representing
+(+(a,b),c)
because the special atom + is declared as an infix operator. Infix operators are written between their two arguments. For the other operator types, prefix and postfix, the operator (functor) is written before (prefix) or after (postfix) the single argument to the term.
What Makes an Operator?
Operators are either alphanumeric atoms or special atoms which have a corresponding precedence and associativity. The associativity is sometimes referred to as the type of an operator. Operators may be declared by using the op/3
builtin. Precedences range from 1 to 1200 with the lower precedences having the tightest binding. Another way of looking at this is that in an expression such as 1*X+Y
, the
operator with the highest precedence will be the principal functor. So 1*X+Y
is
equivalent to '+'('*'(1,X),Y)
because the ‘*’ binds tighter than the ‘+’. The types of operators are named
fx, fy, xf, yf, xfx, yfx, and xfy,
where the ‘f’ shows the position of the operator. Hence, fx
and fy
indicate prefix
operators, yf
, and xf
indicate postfix operators, and xfx
, yfx
, and xfy
indicate
infix operators. An ‘x’ indicates that the operator will not associate with operators
of the same or greater precedence, while a ‘ y’ indicates that it will associate with
operators of the same or lower precedence, but not operators of greater precedence. The table below describes all of the predefined binary operators in ALS Prolog:
Operator | Specifier | Precedence | Operator | Specifier | Precedence |
---|---|---|---|---|---|
:- | xfx | 1200 | =:= | xfx | 700 |
–> | xfx | 1200 | == | xfx | 700 |
==> | xfy | 1200 | < | xfx | 700 |
when | xfx | 1190 | =< | xfx | 700 |
where | xfx | 1180 | > | xfx | 700 |
with | xfx | 1170 | >= | xfx | 700 |
if | xfx | 1160 | := | xfy | 600 |
; | xfy | 1100 | + | yfx | 500 |
| | xfy | 1100 | - | yfx | 500 |
-> | xfy | 1050 | /\ | yfx | 500 |
, | xfy | 1000 | \/ | yfx | 500 |
: | xfy | 950 | xor | yfx | 500 |
. | xfy | 800 | or | yfx | 500 |
= | xfx | 700 | and | yfx | 500 |
= | xfx | 700 | * | yfx | 400 |
== | xfx | 700 | / | yfx | 400 |
== | xfx | 700 | // | yfx | 400 |
@< | xfx | 700 | div | yfx | 400 |
@=< | xfx | 700 | rem | yfx | 400 |
@> | xfx | 700 | mod | yfx | 400 |
@>= | xfx | 700 | « | yfx | 400 |
=.. | xfx | 700 | » | yfx | 400 |
is | xfx | 700 | ** | xfx | 200 |
^ | xfy | 200 |
The next table describes the predefined prefix operators for ALS Prolog:
Operator | Specifier | Precedence | Operator | Specifier | Precedence |
---|---|---|---|---|---|
:- | fx | 1200 | nospy | fx | 800 |
?- | fx | 1200 | - | fy | 200 |
vi | fx | 1125 | + | fy | 200 |
edit | fx | 1125 | \ | fy | 200 |
ls | fx | 1125 | export | fx | 1200 |
cd | fx | 1125 | use | fx | 1200 |
dir | fx | 1125 | module | fx | 1200 |
not | fx | 900 | ’’ | fx | 925 |
+ | fx | 900 | ’ | fx | 930 |
trace | fx | 800 | ~ | fy | 300 |
spy | fx | 800 |
Special Cases
It is possible to declare an operator via op/3
that can never be parsed. Even though quoted atoms can be assigned a precedence and associativity, the parser will only interpret alphanumeric atoms or special atoms as operators.
White space
White space, or layout characters, refers to the part of source code, data, and goals that is not made up of readable characters. The term white space comes from the fact that these unreadable characters appear white when source code is printed on a sheet of white paper. White space is any sequence of spaces, tabs, or new lines. Generally speaking, white space has little meaning to the parser. It is occasionally important for recognizing full stops, and for delimiting constructs which, if they were run together, would not be recognizable as separate constructs. There are also places where additional white space is either inappropriate or changes the meaning of the text. For example, you can’t embed a space in a number.
1.8 Comments
Comments can be put anywhere white space can occur. Comments can take one of two forms:
- A line comment: anything following a percent sign (%) is ignored until the end of line.
- A block comment: anything enclosed in a ‘/* */’ pair is ignored. Block com- ments may span many lines if desired. Block comments may be nested, thus allowing commented code to be commented out.
/*
*
* /*
* * This is one way to use block comments
* */
*
*/
connected(foot bone, leg bone).
/* Here's another */
connected(headbone, neckbone). % line comments can
connected(doggone, fishbone). % look good
% next to code that
% you write
1.9 Preprocessor Directives: Syntax
ALS Prolog supports preprocessor directives which can affect the text at the time the program is compiled (or loaded into an image). These expressions include the following:
#include #if #else #elif #endif
Each of these must occur at the beginning of a line of program text. Each of #include
, #if
, and #elif
must be followed by a Prolog term, but each of #else
and #endif
must stand on a line by themselves. The #include
directive should be followed by a Prolog double quoted string, intended to name a file:
#include “/mydir/foo.pro”
No full stop (.) should follow this expression, nor the expressions following #if
and #elif
. The expression following #if
or #elif
can be an arbitrary Prolog term.
The expressions #if
, #else
, #elif
, #endif
must be organized as conditionals in a manner similar to their use in C programs. Thus, the first expression occurring must be an #if
, and the last must be an #endif
. Between them there can be zero or more occurrences of #else
and #elif
. There can be at most one occurrence of #else between a given #if ... #endif
pair, and it must follow all of the zero or more occurrences of #elif
between the same pair. Preprocessor directive semantics appears in Section 2.3 Preprocessor Directives.