reflex.cpp File Reference

updated Mon Apr 10 2017 by Robert van Engelen
 
Macros | Functions | Variables
reflex.cpp File Reference

RE/flex scanner generator replacement for Flex/Lex. More...

#include "reflex.h"
Include dependency graph for reflex.cpp:

Macros

#define WITH_BOOST_PARTIAL_MATCH_BUG
 Work around the Boost.Regex partial_match bug by forcing the generated scanner to buffer all input. More...
 

Functions

int lower (int c)
 Convert to lower case. More...
 
static std::string file_ext (std::string &name, const char *ext)
 Add file extension if not present. More...
 
int main (int argc, char **argv)
 Main program instantiates Reflex class and runs Reflex::main(argc, argv). More...
 

Variables

static const char * options_table []
 Table with command-line reflex options and lex specification %options. More...
 
static const Reflex::Library library_table []
 Table with regex library properties. More...
 
static const char * newline = "\n"
 

Detailed Description

RE/flex scanner generator replacement for Flex/Lex.

Author
Robert van Engelen - engel.nosp@m.en@g.nosp@m.enivi.nosp@m.a.co.nosp@m.m

Macro Definition Documentation

#define WITH_BOOST_PARTIAL_MATCH_BUG

Work around the Boost.Regex partial_match bug by forcing the generated scanner to buffer all input.

Function Documentation

static std::string file_ext ( std::string &  name,
const char *  ext 
)
static

Add file extension if not present.

Returns
copy of file name string with extension ext
int lower ( int  c)
inline

Convert to lower case.

Returns
lower case char.
int main ( int  argc,
char **  argv 
)

Main program instantiates Reflex class and runs Reflex::main(argc, argv).

Variable Documentation

const Reflex::Library library_table[]
static

Table with regex library properties.

This table is extensible and new regex libraries may be added. Each regex library is described by:

  • a unique name that is used for specifying the matcher=NAME option
  • the header file to be included
  • the pattern type or class used by the matcher class
  • the matcher class
  • the regex library signature

A regex library signature is a string of the form "decls:escapes?+.", see reflex::convert.

The optional "decls:" part specifies which modifiers and other special (?...) constructs are supported:

  • non-capturing group (?:...) is supported
  • one or all of "imsx" specify which (?ismx:...) modifiers are supported
  • # specifies that (?#...) comments are supported
  • = specifies that (?=...) lookahead is supported
  • < specifies that (?<...) lookbehind is supported
  • ! specifies that (?!=...) and (?!<...) are supported
  • ^ specifies that (?^...) negative (reflex) patterns are supported

The "escapes" characters specify which standard escapes are supported:

  • a for \a (BEL U+0007)
  • b for \b (BS U+0008) in brackets [\b] only AND the \b word boundary
  • c for \cX control character specified by X modulo 32
  • d for \d ASCII digit [0-9]
  • e for \e ESC U+001B
  • f for \f FF U+000C
  • h for \h ASCII blank [ \t] (SP U+0020 or TAB U+0009)
  • i for \i reflex indent boundary
  • j for \j reflex dedent boundary
  • l for \l ASCII lower case letter [a-z]
  • n for \n LF U+000A
  • p for \p{C} ASCII POSIX character class specified by C
  • r for \r CR U+000D
  • s for \s space (SP, TAB, LF, VT, FF, or CR)
  • t for \t TAB U+0009
  • u for \u ASCII upper case letter [A-Z] (when not followed by {XXXX})
  • v for \v VT U+000B
  • w for \w ASCII word-like character [0-9A-Z_a-z]
  • x for \xXX 8-bit character encoding in hexadecimal
  • y for \y word boundary
  • z for \z end of input anchor
  • 0 for \0nnn 8-bit character encoding in octal requires a leading 0
  • `for `\ begin of input anchor
  • ' for \' end of input anchor
  • < for \< left word boundary
  • > for \> right word boundary
  • A for \A begin of input anchor
  • B for \B non-word boundary
  • D for \D ASCII non-digit [^0-9]
  • H for \H ASCII non-blank [^ \t]
  • L for \L ASCII non-lower case letter [^a-z]
  • P for \P{C} ASCII POSIX inverse character class specified by C
  • Q for \Q...\E quotations
  • S for \S ASCII non-space (no SP, TAB, LF, VT, FF, or CR)
  • U for \U ASCII non-upper case letter [^A-Z]
  • W for \W ASCII non-word-like character [^0-9A-Z_a-z]
  • Z for \Z end of input anchor, before the final line break

The optional "?+" specify lazy and possessive support:

  • ? lazy quantifiers for repeats are supported
  • + possessive quantifiers for repeats are supported

The optional "." (dot) specifies that dot matches any character except newline.

const char* newline = "\n"
static
const char* options_table[]
static

Table with command-line reflex options and lex specification %options.

The table consists of option names with hyphens replaced by underscores.