wiki:Preprocessor
Warning: Can't synchronize with repository "(default)" (Unsupported version control system "bzr": Can't find an appropriate component, maybe the corresponding plugin was not enabled? ). Look in the Trac log for more information.
Last modified 2 years ago Last modified on 12/15/09 21:13:14

The QMD Preprocessor

The QMD preprocessor acts much like the more familiar C preprocessor, which allows for complex compile-time macro definitions and expansions, as well as file inclusion, within source code. Each directive available in the QMD preprocessor is described in detail here.

Invoking the preprocessor

The preprocessor may be invoked as a subcommand of the qmd binary at any time.

Modes

If a shebang is detected in a source file, the preprocessor will process the entire text of the file as a QMD script. This is referred to as "script" mode.
If, however, no shebang is detected, the preprocessor will only process text found within <?qmd ?> processing instructions. This is called "escape" mode.
The preprocessor makes no attempt whatsoever to process any text outside processing instructions in escape mode; QMD processing instructions that are commented out within the external source's context will still be processed normally.

For each included file processed, the preprocessor intelligently meshes the modes of including and included file. The algorithm is:

  1. Detect the mode of the current file.
  2. If current file is the first file processed, done.
  3. If the current file uses "script" mode, then:
    1. Strip the shebang from the current file.
    2. If the including file uses "escape" mode, then treat the current file as if it were contained by <?qmd and ?>.
  4. If the current file uses "escape" mode, then:
    1. If the including file uses "script" mode, signal an error. Done. TODO: Delve further into this case later. Potential resolution: Add a preprocessing stage which converts all text outside the tags to print() function calls and properly escaped strings.
    2. Treat the current file as if it were contained by ?> and <?qmd.
  5. Substitute the contents of the current file in the including file. Done.

Passes

By default, the preprocessor functions in a two-pass mode:

  1. Strip comments.
  2. Process directives in order.

Output

The preprocessor can produce multiple forms of output:

compiler
This output form is intended for the compiler, and consists of a stream of tokens with no comments, the absolute minimum necessary whitespace, and terse file/line markers. For maximum efficiently, this type of output is typically connected directly to the compiler. This form can also be requested in standalone mode, but there is rarely any reason to do so, as it is effectively unreadable for any non-trivial input.
C
This output form attempts to mimic the behavior of GNU CPP as described in the GNU CPP manual's section on output. Directives are replaced with blank lines and comments with spaces, and long runs of blank lines are discarded. This form is included for completeness' space and its implementation is considered a low priority, as it has no forseeable advantage over other output forms.
comments
This output form runs only the comment-stripping pass on the input, leaving everything else untouched. It has no obvious use except for debugging the preprocessor.
pretty
This output form attempts to preserve the original source as much as possible, by retaining comments and indentation, expanding macros with their original definition's indentation, and omitting #line directives and markers. It is intended to assist debugging of macros.

Any output form can be constrained by excluding imported and included headers from the output; this constraint, combined with the pretty form, is expected to be useful for debugging purposes.

TODO
Consider (and test) whether the efficiency gain of implementing each output form as conditional statements during input processing outweighs the clarity and maintainability of storing all possible information during processing and implementing output forms as pluggable modules (classes?) on the back side of the processor. Especially for the compiler form, it may be advantagous to avoid as much extra memory usage and processing overhead as possible. Definitely implement it using classes first, and consider branched code if this proves unmanagably slow, keeping in mind the likelihood that all performance-critical uses will make use of compiled and/or cached code as LLVM allows, thus making the compiler speed less important. Also remember that the QMD preprocessor is not expected to have to parse the sheer number of complex headers that a C compiler does, since there shouldn't be an endless maze of system headers.

Comments

The preprocessor is responsible for parsing and stripping comments from input source. Both C-style comments /* comment with any number of lines here */ and C++-style comments // This comment runs to the end of the line are supported.

Directives

The preprocessor accepts a number of directives for manipulating the source text.

Directive Name Expansion __pp_ macro
# Empty No empty
#! Shebang No none
#line Line Yes line
#include Include Yes include
#import Import Yes import
#define Define Yes define
#undef Undefine No undefine
#if If Yes if
#elif Else If Yes elif
#else Else Yes else
#endif End If No endif
#warning Warning Yes warning
#error Error Yes error

General directive syntax

  • All preprocessor directives begin with a hash # character, which must occur at the beginning of a line. The beginning of a line is defined as the character immediately following a CR \r U+000D, LF \n U+000A, or CRLF \r\n U+000D U+000A sequence.
  • A preprocessor directive may never be preceeded by whitespace, though it may be followed by extra whitespace. Preprocessor directives preceeded by unexpected whitespace will be ignored and passed along to the compiler, where a syntax error will most likely result.
  • The preprocessor honors strings as defined by the language tokenizer. Thus, preprocessor directives that would otherwise be recognized are ignored when enclosed by single quotes ', double quotes ", or any form of HereDoc? <<<TEXT syntax.
  • Preprocessor directives are ignored within multiline C comments.
  • Unrecognized directives will generate a fatal error message.

# - Empty directive

Syntax:

#

When # appears on a line by itself, or followed only by whitespace, it is recognized as valid, but causes the preprocessor to do nothing. This directive serves no purpose and exists solely for the benefit of closer compatibility with C.

# can be created by the __pp_empty macro, but there is little point to doing so. Either form may be useful for pretty-printing in the pretty output form.

#! - Shebang directive

Syntax:

#!qmd [options]
#!/path/to/qmd[suffix] [options]

The shebang directive serves two purposes:

  1. It unambiguously identifies a QMD file as not making use of processing instruction tags. Any QMD file with no shebang is assumed to consist of PHP-style "escape from HTML" text.
  2. It provides the standard UNIX functionality of identifying a command interpreter for a shell attempting to execute a QMD script.

Unlike all other preprocessor directives, a shebang directive may appear only as the first line in a file. It is considered a fatal syntax error for it to appear unescaped anywhere else.

It is platform-dependant whether the first form, with no path given, will automatically invoke QMD; it should not be relied upon to do so.

Any options given in a shebang line will be processed as if the QMD interpreter was invoked with those options. If QMD was invoked from the commandline, the shebang options are appended to the original commandline arguments. For example:

$ qmd --no-ini-file
#!qmd --use-ini-file=/path/to/qmd.ini
// more code here

This script will execute as if QMD had been invoked as "qmd --no-ini-file --use-ini-file=/path/to/qmd.ini".

The shebang is not recognized if it does not match the PCRE regular expression "^.*?/qmd([^/ \t]*.*?)?$" (where / is replaced by a platform-dependant directory separator); this allows for shebangs to be used in other scripting languages where QMD may be embedded. For example, the following script, when run through first QMD and then PHP, will print "Hello, world":

#!/usr/bin/php
<?php
print "<?qmd print("Hello, world!"); ?>";
?>

The shebang is considered both a preprocessor and compiler directive in that it is recognized and used by both.

The parameter to #! is not (and can not be) subject to macro expansion.
#! can not be created by a __pp_ macro.

#line - Line declaration directive

Syntax:

#line lineno
#line "filename" [lineno]

The line directive is more of a compiler directive than a preprocessor directive, but is listed here because it is recognized by both preprocessor and compiler. It is also typically emitted by the preprocessor so that the compiler may produce more useful error messages, and affects how the preprocessor handles the __FILE__ and __LINE__ constants.

In the first form, #line changes the current line number without affecting the current file. In the second form, #line changes the current file; if a line number is given, the current line is set to it, otherwise it becomes 1.

The parameters to #line are subject to macro expansion.
#line can be created by the __pp_line macro.

#include - File inclusion directive

Syntax:

#include "filename"

This is the familiar #include directive from C. The effect of the include directive is to replace itself with the contents of the referenced file. The filename must appear in double quotes ", and must exist somewhere in the filesystem - an include directive referencing a nonexistent file is a processor error. Nor can the include directive make use of the include_path setting, as it is part of the runtime and not available to the preprocessor. To include potentially missing files or files referenced by the include_path, use the runtime include() funtion, which serves a similar purpose to the include directive but is considerably less efficient.

The search path for included files is taken from several sources:

  1. If the filename resolves to an absolute path, no other paths are searched.
  2. The directory of the current source file, i.e. __DIR__.
  3. Any -I options passed to the qmd interpreter by whatever means (including command line options and shebang).
  4. The default_include_path built into the qmd interpreter.

There is no <> form of inclusion as in C; all include paths are "user" paths.

If a filename resolves to the path of a file that has already been included, #include will include it again. To avoid this behavior, use #import instead.

The parameter to #include is subject to macro expansion.
#include can be created by the __pp_include macro.

#import - Exclusive file inclusion directive

Syntax:

#import "filename"

#import is exactly identical to #include, with one important exception: If the filename given to #import resolves to a path that has already been imported by #import, it will not be imported again. No error or warning is emitted. The function import() is the runtime counterpart to #import.

  • If a file is first included by #include and then imported by #import, it is the same as including it twice.
  • If a file is first imported by #import and then included by #include, it is the same as including it twice.

In other words, #include ignores the "already imported" list, and #import does not check whether a file has already been included by #include.

The verbs "include" and "import" are not used interchangably; their usage is consistent throughout the documentation.

It is considered best practice to use #include where it is known to be safe to do so. #import is not significantly slower for most cases, but can carry a small penalty if the number of imported files exceeds 20 or so.

The parameter to #import is subject to macro expansion.
#import can be created by the __pp_import macro.

#define - Macro definition directive

The directive

Syntax:

#define constant [expansion]
#define macro(params) [expansion]

The #define directive is intended to be as similar as possible to its C counterpart. It supports the same semantics and expansions, including the strinfication # and token pasting ## operators and the __VA_ARGS__ GNU extension for vararg macro processing (as well as the token-pasted form for swallowing extra commas). Further documentation of the exact semantics can be found in The GNU CPP manual's section on Macros, as this is the standard on which the directive is based in QMD. The directive may be safely assumed to work as described therein, save for any exceptions documented here.

Note: QMD makes no use of any code from the GNU implementation of CPP, and makes no claim to compatibility with GNU CPP. QMD's only reference to GNU CPP is in the similarity of macro semantics.

See the #if directive for information on the semantics of the defined() macro function.

#define parameters are subject to macro expansion, but as in C, this is true only of the expansion. The macro name is never subject to expansion.
#define can be created with the __pp_define macro. See below for more information and discussions of recursion issues.

Meta-macros - __pp_*

In C-like languages which make use of the C preprocessor, it is not possible to do something like this:

#define DEFINE_WITH_PREFIX_AND_VALUE(name, value, prefix) \
  #ifndef name                  \
  #define prefix ## name value  \
  #else                         \
  #define prefix ## name name   \
  #endif

DEFINE_WITH_PREFIX_AND_VALUE(ENOENT, 2, ERRNO_)
// The desired effect is this expansion:
#ifndef ENOENT
# define ERRNO_ENOENT 2
#else
# define ERRNO_ENOENT ENOENT
#endif
// The actual result is a syntax error from CPP.

QMD's preprocessor provides a set of macros which make this possible:

#!qmd

#define DEFINE_WITH_PREFIX_AND_VALUE(name, value, prefix) \
  __pp_if(!defined(name))                \
      __pp_define(prefix ## name, value) \
  __pp_else()                            \
      __pp_define(prefix ## name, name)  \
  __pp_endif()

// This macro will produce exactly the desired effect in the previous fragment, with the exception that QMD has no #ifndef statement and thus defined() is used instead.
// No, you can't do __pp_define(prefix ## name, defined(name) ? name : value). Don't get greedy.

The __pp_* macros expand exactly as if they were macros which were defined to their respective directives, including argument prescan and tokenization. They are processed at the time of use, adding an extra scan to macro expansions for each time they appear.

The __pp_* macros are valid only within macro bodies. They are not valid functions in compiled code and can not be used with other directives.

Conceptually, and potentially in implementation, any #-syntax for a preprocessor directive can be replaced by with the equivelant __pp_ call, i.e.:

#!qmd

// This is the same as the macro above:
__pp_define(DEFINE_WITH_PREFIX_AND_VALUE(name, value, prefix), \
  __pp_if(!defined(name))                \
      __pp_define(prefix ## name, value) \
  __pp_else()                            \
      __pp_define(prefix ## name, name)  \
  __pp_endif()                           \
); // The trailing semicolon is ignored, allowing for a natural C-like syntax

Predefined constants

The following constants are predefined by the preprocessor during pass two. There are several more macros defined by the compiler which are not listed here.

__PATH__
Expands to the full absolute path of the file currently being processed, as modified by any preceding #line directives. The path is a single-quoted constant string, i.e. '/path/to/source/main.qmd'.
__FILE__
Expands to the name of the file currently being processed. Exactly equivelant to basename(__PATH__).
__DIR__
Expands to the full absolute path of the directory containing the file currently being processed. Exactly equivelant to dirname(__PATH__).
__LINE__
Expands to the number of the line on which the macro appears, as modified by any preceding #line directives, as an unsigned integer constant.
__TIME__
Expands to the date and time at which the preprocessor is running, as a single-quoted constant string in ISO 8601 format: 'YYYY-MM-DDTHH:NN:SS±ZZZZ', where Y is year, M is month, D is day, H is hour, N is minute, S is second, and Z is timezone offset from UTC in hours.
__QMD__
Always expands to boolean TRUE.
__QMD_VERSION__
__QMD_VERSION_MAJOR__
__QMD_VERSION_MINOR__
__QMD_VERSION_PATCH__
Expands to the numeric, major, minor, and patch versions of the running version of QMD, respectively, as unsigned integer constants.
__QMD_VERSION_PRETTY__
Expands to a single-quoted constant string describing the running version of QMD in the form '1.0.0'.
__QMD_PATH__
Expands to the absolute path of the running QMD interpreter process as a single-quoted constant string. If QMD is not being run as an independant process, expands to the empty string.
__COUNTER__
Expands to a monotonically increasing value as an unsigned integer constant; on each use, it's value will increase by one.

#undef - Macro un-definition directive

Syntax:

#undef macro

#undef is very simple: It undoes the effect of a #define. A macro undefined by #undef no longer expands to anything, but does not affect any expansions of said macro before the undefinition. #undef only works on macros created by a #define processed by the preprocessor; it does not undeclare variables (use the unset() function for that) or functions.

The parameter to #undef is not subject to macro expansion.
#undef can be created with the __pp_undef macro.

#if/#elif/#else/#endif - Conditional preprocessing directives

The directives

Syntax:

#if conditional
#elif conditional
#else
#endif [conditional]

The conditional preprocessing directives operate much as they do in C, following nearly identical expression semantics. As with #define, these directives can be assumed to operate as specified by the GNU CPP manual's section on Conditionals unless otherwise noted here.

QMD does not support the #ifdef and #ifndef directives from C; use #if defined(name) and #if !defined(name) instead. The form #if defined name is also unsupported.

The #endif directive is allowed to take an uncommented parameter in QMD, but this parameter is entirely ignored, regardless of the original conditional. The only restriction is that the parameter(s) to #endif must parse as valid tokens (i.e. no unbalanced quotes, etc.).

The parameters to #if and #elif are subject to macro expansion. The parameter to #endif is entirely ignored and thus not subject to expansion.
All conditional directives may be generated with the appropriate __pp_* macros. It is an error if a __pp_* macro appears within the expression of a conditional directive, before or after expansion.

Operators in preprocessor conditionals

All arithmetic, bitwise, comparison, and logical operators are supported within preprocessor conditionals, as well as the CONCAT . string operator. They all show identical precedence and association as their QMD counterparts. See LanguageGrammar for more details.

The defined() function is considered an operator within preprocessor conditionals. It has left associativity and no precedence.

Assignment, string (except CONCAT .), conditional, and index operators are not supported.

All division in preprocessor conditionals is done with unsigned integers.

Literal values in preprocessor conditionals

Any valid QMD literal scalar constant may appear within a preprocessor conditional. NULL is cast to boolean FALSE.

#warning/#error - Signaling the user

Syntax:

#warning message
#error message

#warning and #error emit messages from the preprocessor for the user to see. QMD's handling of these messages is defined by standard error handling semantics. A #error halts preprocessing with a fatal error, while #warning only emits a message at the warning level.

The string message given to either directive is subject to standard macro expansion; the final message shown to the user is a concatenation of all arguments given to the directive. All arguments must be valid tokens; to escape text that may not parse as valid, surround it in quotes. Quotes will be stripped in the final display; to display a literal quoted string, surround it in quotes as well. The standard single- and double-quoted string escapes are recognized. Variable interpolation does not take place.

Examples

#warning This is something I "can't" do safely.
// Warning: This is something I can't do safely.
#define FOO 5
#error "It's impossible to" use the "FOO" macro this way when its value is FOO .
// Error: It's impossible to use the FOO macro this way when its value is 5.
#error "You said \"Do it right,\" but I can't when FOO is " FOO .
// Error: You said "Do it right," but I can't when FOO is 5.
#error "I forgot to put a space between the macro and the period when I said I wanted the value of " FOO.
// Error: I forgot to put a space between the macro and the period when I said I wanted the value of FOO.
#error "This message needs to span two lines.\nIt doesn't make much sense any other way."
// Error: This message needs to span two lines.
// It doesn't make much sense any other way.