wiki:LanguageGrammar
Warning: Can't synchronize with repository "(default)" (Unsupported version control system "bzr": Can't find an appropriate component, maybe the corresponding plugin was not enabled? ). Look in the Trac log for more information.
Last modified 2 years ago Last modified on 12/08/09 13:58:54

Grammar

This is a proposed definition of the language.

Statements

Statements are terminated with a semicolon, as in most C-like languages. Whitespace in the middle of statements, including newlines, is ignored in tokenization.

The preprocessor, however, uses standard C preprocessor semantics, where a newline terminates the macro unless preceeded by \.

Comments

C multiline comments using /* */ are supported. Nested comments are not supported and will cause errors. C99/C++ single-line comments using are supported. A # on a line by itself will be ignored, but is parsed as a macro, not a comment; extra text on such a line will cause errors.

Data Types

Most data types can optionally take "constructor-like" parameters to further restrict their behavior, a lá SQL. These are called user constraints. User constraints are treated as hints to the runtime, and variables with the same underlying data type are considered identical regardless of any user constraints. Thus, "integer(0,5)" and "integer" are considered the same type, whereas "sinteger" and "uinteger" are not.

Scalars

All scalar types are first-class values.

  • null - No user constraints exist. NULL is a both a type and a value, and is ever only equal to itself.
  • boolean - No user constraints exist. A boolean always occupies a single byte in memory and may only contain the values 0 and 1.
  • [s|u]integer(min[, max]) - A user may constrain integers to minimum and maximum values. All integers are 64 bits wide, and may be signed or unsigned. "integer" by itself is an unbounded signed integer. In any case where it's relevant, all values are stored in big-endian order.
  • double - No user constraints exist. A double is a double-precision 64-bit floating-point value which can represent the appropriate range of values, including positive and negative Infinity, Zero, and NaN.
  • string(maxlen) - A user may constrain strings to a maximum number of characters (not bytes). All strings are maniplated as UTF-8 and stored in Unicode Normalization Form D.
  • buffer(maxlen) - A user may constrain buffers to a maximum number of bytes. A buffer is a blob of raw binary data which is never interpreted by the language runtime in any way.

Compounds

Compound types may or may not be first-class values.

  • array(type) - A user may constrain an array to contain only a single type of values. Arrays are ordered, unbounded, and can not contain index holes. For a non-consecutively numbered array, use a dictionary(uinteger, ...).
  • dictionary(type[, type]) - A user may constrain a dictionary to contain only specific key and value types. Dictionaries are efficient maps of keys to values. Both keys and values may be any type by default.
  • Users may define structures, which are implemented as dictionaries with a constrained set of keys and key types. A structure is a compound type which can only be converted to a dictionary, not to another structure type.
  • object - A user may constrain an object to any point in the defined class hierarchy by using the class name in place of the "object" keyword. Objects are not dictionaries.

Other

  • mixed - A pseudo-type which represents the "any type" concept when defining parameter types, function return types, array or dictionary constraints, structure members, and so forth.
  • opaque - A cover-all type which is used by the runtime to implement opaque access to resources such as opened file descriptors and database connection handles, similar to PHP's "resource" type.
  • function - A function pointer, which can be constructed from a function name, a class and static method, an object and method, or a closure/lambda function.
  • undef - The type of a variable which hasn't been declared or set.

Variables

Variables are identifiers preceded with a $. Variable names may even be keywords; the $ character is considered unambiguous. A variable may be of any type, and is undefined until either assigned a value or explicitly declared. Variables have standard lexical scope; variables defined in the global scope are available everywhere.

Operators

Arithmetic

The standard ADD +, SUB -, MUL *, DIV /, MOD %, and NEG (unary) - operators are supported.

The division operator yields a double value if either of its operands are double. It otherwise yields an sinteger if either input was signed, otherwise a uinteger.

Bitwise

The standard AND &, OR |, XOR ^, NOT ~, SHL <<, and SHR >> operators are supported.

Additionally, SAL >>+ (shift arithmetic left), ROL <<| (rotate left), and ROR >>| (rotate right) are supported.

Comparison

The standard EQ ==, NEQ !=, LT <, GT >, LTE <=, and GTE >= operators are supported. PHP's === type-equality comparison does not exist.

Logical

The standard NOT !, AND &&, and OR
operators are supported. There are no "not", "and", or "or" keywords.

Assignment

The standard ASSIGN =, ADD +=, SUB -=, MUL *=, DIV /=, MOD %=, AND &=, OR |=, XOR ^=, SHL <<=, SHR >>=, INC ++, and DEC -- operators are supported.

Additionally, SAL >>+=, ROL <<|=, and ROR >>|= are supported.

String

The CONCAT . and CONCAT .= string concatenation and concatenating assignment operators are supported. It is an error to use these operators with non-string types.

Conditional

The standard TERNARY ?: conditional operator is supported.

Additionally, IFNULL ?? and IFUNDEF ?| operators are supported. The expression "$n ?? 5" evaluates to $n if $n is not null, otherwise to 5. The expression "$n ?| 5" evaluates to $n if $n is defined, otherwise to 5.

Index

The standard INDEX [] operator is supported.

The index operator has different semantics for each type to which it can be applied:

  • Applied to either integer type, the index operator extracts an individual byte in LSB-to-MSB order as a uinteger, where the index must be a uinteger(0,7).
  • Applied to a string, the index operator extracts the individual character (not byte) at the given uinteger index as a string.
  • Applied to a buffer, the index operator extracts the individual byte at the given integer index as a uinteger.
  • Applied to an array, the index operator extracts the value at the given integer index as its original type.
  • Applied to a dictionary, the index operator extracts the value for the given key, which may be of any type, as its original type.
  • Applied to any other type, the index operator is an error.

Control Structures

The supported control structures are "if/else if/else", "while", "do/while", "for", "foreach", "switch", "continue", "break", and "return". Each has the expected C-style semantics.

A "foreach" loop has different semantics for each type to which it can be applied:

  • Applied to a string, a foreach loop will yield the index and character for each character in the string.
  • Applied to a buffer, a foreach loop will yield the index and byte for each byte in the string.
  • Applied to an array, a foreach loop will yield the index and value for each value in the array.
  • Applied to a dictionary, a foreach loop will yield the key and value for each entry in the dictionary.
  • Applied to any other type, the foreach structure is an error.

Functions

All functions are closures. As with Lua, the only difference between a named function and an anonymous one is syntactic sugar. Thus:
function foo() { }
Is the same as:
$foo = function () { }
With the difference being that the first foo() can be called without a $ before it. However, the address of the first foo() can still be taken:
$foo = ptrfunction(foo);
The ptrfunction function (recursive, isn't it?) uses its parameters to return a closure-style reference to the function, if there is one. The function can be called any of the following ways, using type and number of parameters to differentiate:

$foo = ptrfunction(foo); // Take address of foo()
$foo = ptrfunction(Circle, Pi); // Take address of Circle::Pi()
$foo = ptrfunction($a_circle, radius); // Take address of $a_circle->radius() (calling $foo() will set $this appropriately)
$foo = function () { /* ... */ } // Create an anonymous closure
$foo = ptrfunction(function () { /* ... */ }); // Longer version of previous line
$foo = ptrfunction($foo); // If parameter is already a function pointer, return it.
$foo = ptrfunction($class_name_string, $method_name_string); // take address of $class_name_string::$method_name_string()
$foo = ptrfunction($func_name_string); // take address of function named by $func_name_string;

Because ptrfunction() is itself a function, it can be given as a parameter to itself and will return a valid reference.

Preprocessor

The preprocessor functions a great deal like the C preprocessor. It understands most of the same directives:

#
The empty directive. Has no effect.
#line "filename" lineno
Sets the filename and line number for uses of __FILE__ and __LINE__
#include "filename"
The standard include directive, which replaces the directive with the contents of the included file. There is an include path, but no <> syntax; only the "" syntax is allowed.
#import "filename"
Same as #include, but prevents the same file from being included twice. The restriction is obeyed only by import directives, never by includes.
#define macro(params) expansion
Has the same semantics as C's constant/macro expansion facility.
#undef macro
Undefine a previously defined macro. Fail silently if no such macro is defined.
#if cond/#elif cond/#else cond/#endif
Conditional branches in the preprocessor. Again the same semantics as C.
#warning/#error
Emits a diagnostic message from the preprocessor, stopping it on errors.
__pp_directive(...)
When used in macro expansions, expands to the given directive and given parameters, allowing for the use of macros within macros. Example:
// If the given name exists, define a POSIX_* constant to it, otherwise define the POSIX_* constant to the given value
#define POSIX_DEFINE(name, value)           \
    __pp_if(defined(name))                  \
        __pp_define(POSIX_ ## name, name)   \
    __pp_else()                             \
        __pp_define(POSIX_ ## name, value)  \
    __pp_endif()

POSIX_DEFINE(S_IRUSR, 0000400);
// expands to:
#if defined(S_IRUSR)
    #define POSIX_S_IRUSR S_IRUSR
#else
    #define POSIX_S_IRUSR 00004000
#endif

Classes and objects

The language supports the normal concepts of classes and objects. It includes the ability to declare methods and entire classes, and also for static methods in the normal fashion, as in PHP. Interfaces are also supported.

Methods, both static and instance, and properties share the same namespace. In this way, classes are effectively dictionaries containing a list of properties, some of which may be functions which will then become instance methods of instantiated objects. It is possible to assign new methods to an instance of a class at runtime by assigning function pointers as properties.

Objects may be cast to another type from which they inherit in the normal way. Accessing a method in a casted object which is not defined at that level of the inheritance hierarchy will throw an error (a std::bad_cast exception in C++ terms).

For later definition

Still to be defined:

  • Namespaces
  • Exceptions
  • Operator precedence

Example

#!/usr/local/bin/qmd
// The above line would be optional.

<?qmd
// A processing instruction in the PHP style. This would also be optional.

#define CONSTANT    5
// A standard preprocessor macro. The syntax of the preprocessor is heavily based on C's but is probably not precisely the same.

$int1 = CONSTANT;
// Previously undefined variable "int1" is set to an integer value.

struct StructureType
{
    integer        $an_integer;
    struct
    {
        array                  $array_of_anything;
        dict                   $map_of_anything_to_anything;
        array(integer)         $array_of_only_integers;
        dict(integer, mixed)   $map_of_integer_to_anything;
    }              $embedded_structure;
    struct EmbeddedStructure2
    {
        double                 $a_double_precision_float;
    };
    array(EmbeddedStructure2)  $array_of_only_that_struct;
};
// A structure may contain any set of types. Arrays and dictionaries are limited by their initial specifier types, which default to "mixed" for arrays and "mixed, mixed" for dictionaries.
// Anonymous structures may be used for one-shot definitions. An embedded structure which gives a tag (name of structure) but no definition (no field name) is local to the containing
// structure and is most useful for defining arrays of internal structures.

$struct1 = StructureType{{ 1, {{ ((1, "a")), [["a" = 1, 1 = "a"]], ((1, 2, 3)), [[1 = "a", 2 = 3]] }}, (( {{ 1.0 }}, {{ 2.0 }}, {{ 3.0 }} )) }};
// This shows how a complex structure would be initialized. A much much simpler example would be:
$struct2 = Timeval{{ 10 /* sec */, 0 /* usec */ }};

$int2 = $int1 + $int5;
// It is an ERROR to refer to uninitialized variables; this line will cause a compile error but not a syntax error.

$array1 = ((1, 2, 3, "a", "b", "c"));
// Previously undefined variable "array1" is set to an array value.
// This is the use of array constructor syntax. The use of plain parenthesis or [] brackets is highly ambiguous (do they refer to an expression/index or a constructor?).
// CONSIDER THIS MORE DEEPLY LATER

$dict1 = [["a" = 1, "b" = 2, "c" = 3]];
// Previously undefined variable "dict1" is set to a dictionary value.
// Dictionary constructor syntax, same comments as array constructor.

print("I am an octopus, wow, wow, {$array1} {$dict1}.\n");
// The print() function is in fact a function, not a language construct. Such language constructs as used by PHP are ambiguous and confusing, an unnecessary special case.
// Variables MUST be surrounded inside braces {} inside a string in order to be replaced by their string values. Again, removal of PHP's ambiguity.
// The conversion of any given type to a string will be strictly defined by the language. Consider possibly allowing user overrides for builtin types?

function factorial_recursive(integer(0,) $n)
{
    return ($n < 2 ? $n : $n * factorial_recursive($n - 1));
}

function factorial_iterative(integer(0,) $n)
{
    $result = $n;
    for ($i = $n - 1; $i > 0; ++$i) {
        $result *= $n;
    }
    return $result;
}
// Functions are declared much as in PHP, plus the required type specifiers (use mixed to denote "any type").
// The integer type can be given an optional range qualifier, just as array and dictionaries can have optional type limiters.

// Obviously, C++-style comments are supported.
/* So are C-style comments. */
# Hash comments are NOT supported. This line is a syntax error.

?>
// End of processing instruction, only if the opening one was used of course.

Stuff outside the processing instruction is echoed as is, but if and only if there is no shell interpreter line.
Any such line, including the otherwise meaningless "#!" causes the entire file to be treated as inline code, making the processing instruction no longer valid.
Use one or the other in any given file, not both. This is true on a global basis; file inclusion is done by the preprocessor, which means that the resulting
parsed code is a concatenation of the parent file and all included files (as in C).