reflex Namespace Reference

updated Mon Apr 10 2017 by Robert van Engelen
 
Namespaces | Classes | Typedefs | Functions
reflex Namespace Reference

Namespaces

 convert_flag
 
 Posix
 
 Unicode
 

Classes

class  AbstractLexer
 The abstract lexer class template that is the abstract root class of all reflex-generated scanners. More...
 
class  AbstractMatcher
 The abstract matcher base class template defines an interface for all pattern matcher engines. More...
 
class  Bits
 RE/flex Bits class for dynamic bit vectors. More...
 
class  BoostMatcher
 Boost matcher engine class implements reflex::PatternMatcher pattern matching interface with scan, find, split functors and iterators, using the Boost::regex library. More...
 
class  BoostPerlMatcher
 Boost matcher engine class, extends reflex::BoostMatcher for Boost Perl regex matching. More...
 
class  BoostPosixMatcher
 Boost matcher engine class, extends reflex::BoostMatcher for Boost POSIX regex matching. More...
 
class  FlexLexer
 Flex-compatible FlexLexer abstract base class template derived from reflex::AbstractMatcher for the reflex-generated yyFlexLexer scanner class. More...
 
class  Input
 Input character sequence class for unified access to sources of input text. More...
 
struct  lazy_intersection
 Intersection of two ordered sets, with an iterator to get elements lazely. More...
 
struct  lazy_union
 Union of two ordered sets, with an iterator to get elements lazely. More...
 
class  Matcher
 RE/flex matcher engine class, implements reflex::PatternMatcher pattern matching interface with scan, find, split functors and iterators. More...
 
class  ORanges
 RE/flex ORanges (open-ended, ordinal value range) template class. More...
 
class  Pattern
 Pattern class holds a regex pattern and its compiled FSM opcode table or code for the reflex::Matcher engine. More...
 
class  PatternMatcher
 The pattern matcher class template extends abstract matcher base class. More...
 
struct  range_compare
 Functor to order ranges in the reflex::Ranges set container. More...
 
class  Ranges
 RE/flex Ranges template class. More...
 
class  regex_error
 Regex syntax error exceptions. More...
 
class  StdEcmaMatcher
 std matcher engine class, extends reflex::StdMatcher for ECMA std::regex::ECMAScript syntax and regex matching. More...
 
class  StdMatcher
 std matcher engine class implements reflex::PatternMatcher pattern matching interface with scan, find, split functors and iterators, using the C++11 std::regex library. More...
 
class  StdPosixMatcher
 std matcher engine class, extends reflex::StdMatcher for POSIX ERE std::regex::awk syntax and regex matching. More...
 
struct  TypeOp
 TypeOp<T>::Type = T, TypeOp<T>::ConstType = const T, TypeOp<T>::NonConstType = non-const T. More...
 
struct  TypeOp< const T >
 Template specialization of reflex::TypeOp. More...
 

Typedefs

typedef int convert_flag_type
 Conversion flags for reflex::convert. More...
 
typedef int regex_error_type
 Regex syntax error exception error code. More...
 
typedef timeval timer_type
 

Functions

int isword (int c)
 Check ASCII word-like character [A-Za-z0-9_]. More...
 
std::string convert (const char *pattern, const char *signature=NULL, convert_flag_type flags=convert_flag::none, const std::map< std::string, std::string > *macros=NULL) throw (regex_error)
 Returns the converted regex string given a regex library signature and conversion flags, throws regex_error. More...
 
std::string convert (const std::string &pattern, const char *signature=NULL, convert_flag_type flags=convert_flag::none, const std::map< std::string, std::string > *macros=NULL) throw (regex_error)
 
template<typename S1 , typename S2 >
bool is_disjoint (const S1 &s1, const S2 &s2)
 Check if sets s1 and s2 are disjoint. More...
 
template<typename T , typename S >
bool is_in_set (const T &x, const S &s)
 Check if value x is in set s. More...
 
template<typename S1 , typename S2 >
bool is_subset (const S1 &s1, const S2 &s2)
 Check if set s1 is a subset of set s2. More...
 
template<typename S1 , typename S2 >
void set_insert (S1 &s1, const S2 &s2)
 Insert set s2 into set s1. More...
 
template<typename S1 , typename S2 >
void set_delete (S1 &s1, const S2 &s2)
 Delete elements of set s2 from set s1. More...
 
void timer_start (timer_type &t)
 Start timer. More...
 
float timer_elapsed (timer_type &t)
 Return elapsed time in milliseconds (ms) with microsecond precision since the last call up to 1 minute (wraps if elapsed time exceeds 1 minute!) More...
 
std::string latin1 (int a, int b, int esc= 'x', bool brackets=true)
 Convert an 8-bit ASCII + Latin-1 Supplement range [a,b] to a regex pattern. More...
 
std::string utf8 (int a, int b, int esc= 'x', const char *par="(", bool strict=true)
 Convert a UCS-4 range [a,b] to a UTF-8 regex pattern. More...
 
size_t utf8 (int c, char *s)
 Convert UCS-4 to UTF-8, fills with REFLEX_NONCHAR_UTF8 when out of range, or unrestricted UTF-8 with WITH_UTF8_UNRESTRICTED. More...
 
int utf8 (const char *s, const char **r=NULL)
 Convert UTF-8 to UCS, returns REFLEX_NONCHAR for invalid UTF-8 except for MUTF-8 U+0000 and 0xD800-0xDFFF surrogate halves (use WITH_UTF8_UNRESTRICTED to remove any limits on UTF-8 encodings up to 6 bytes). More...
 

Typedef Documentation

Conversion flags for reflex::convert.

Regex syntax error exception error code.

typedef timeval reflex::timer_type

Function Documentation

std::string reflex::convert ( const char *  pattern,
const char *  signature = NULL,
convert_flag_type  flags = convert_flag::none,
const std::map< std::string, std::string > *  macros = NULL 
)
throw (regex_error
)

Returns the converted regex string given a regex library signature and conversion flags, throws regex_error.

A regex library signature is a string of the form "decls:escapes?+.".

The optional "decls:" part specifies which modifiers and other special (?...) constructs are supported:

  • non-capturing group (?:...) is supported
  • one or all of "imsx" specify which (?ismx:...) modifiers are supported
  • # specifies that (?#...) comments are supported
  • = specifies that (?=...) lookahead is supported
  • < specifies that (?<...) lookbehind is supported
  • ! specifies that (?!=...) and (?!<...) are supported
  • ^ specifies that (?^...) negative (reflex) patterns are supported

The "escapes" characters specify which standard escapes are supported:

  • a for \a (BEL U+0007)
  • b for \b (BS U+0008) in brackets [\b] only AND the \b word boundary
  • c for \cX control character specified by X modulo 32
  • d for \d ASCII digit [0-9]
  • e for \e ESC U+001B
  • f for \f FF U+000C
  • h for \h ASCII blank [ \t] (SP U+0020 or TAB U+0009)
  • i for \i reflex indent boundary
  • j for \j reflex dedent boundary
  • l for \l ASCII lower case letter [a-z]
  • n for \n LF U+000A
  • p for \p{C} ASCII POSIX character class specified by C
  • r for \r CR U+000D
  • s for \s space (SP, TAB, LF, VT, FF, or CR)
  • t for \t TAB U+0009
  • u for \u ASCII upper case letter [A-Z] (when not followed by {XXXX})
  • v for \v VT U+000B
  • w for \w ASCII word-like character [0-9A-Z_a-z]
  • x for \xXX 8-bit character encoding in hexadecimal
  • y for \y word boundary
  • z for \z end of input anchor
  • 0 for \0nnn 8-bit character encoding in octal requires a leading 0
  • `for `\ begin of input anchor
  • ' for \' end of input anchor
  • < for \< left word boundary
  • > for \> right word boundary
  • A for \A begin of input anchor
  • B for \B non-word boundary
  • D for \D ASCII non-digit [^0-9]
  • H for \H ASCII non-blank [^ \t]
  • L for \L ASCII non-lower case letter [^a-z]
  • P for \P{C} ASCII POSIX inverse character class specified by C
  • Q for \Q...\E quotations
  • S for \S ASCII non-space (no SP, TAB, LF, VT, FF, or CR)
  • U for \U ASCII non-upper case letter [^A-Z]
  • W for \W ASCII non-word-like character [^0-9A-Z_a-z]
  • Z for \Z end of input anchor, before the final line break

The optional "?+" specify lazy and possessive support:

  • ? lazy quantifiers for repeats are supported
  • + possessive quantifiers for repeats are supported

The optional "." (dot) specifies that dot matches any character except newline.

Parameters
patternregex string pattern to convert
signatureregex library signature
flagsconversion flags
macros{name} macros to expand
std::string reflex::convert ( const std::string &  pattern,
const char *  signature = NULL,
convert_flag_type  flags = convert_flag::none,
const std::map< std::string, std::string > *  macros = NULL 
)
throw (regex_error
)
inline
template<typename S1 , typename S2 >
bool reflex::is_disjoint ( const S1 &  s1,
const S2 &  s2 
)

Check if sets s1 and s2 are disjoint.

Returns
true or false
template<typename T , typename S >
bool reflex::is_in_set ( const T &  x,
const S &  s 
)
inline

Check if value x is in set s.

Returns
true or false
template<typename S1 , typename S2 >
bool reflex::is_subset ( const S1 &  s1,
const S2 &  s2 
)

Check if set s1 is a subset of set s2.

Returns
true or false
int reflex::isword ( int  c)
inline

Check ASCII word-like character [A-Za-z0-9_].

Returns
nonzero if argument c is in [A-Za-z0-9_], zero otherwise.
Parameters
cCharacter to check
std::string reflex::latin1 ( int  a,
int  b,
int  esc = 'x',
bool  brackets = true 
)

Convert an 8-bit ASCII + Latin-1 Supplement range [a,b] to a regex pattern.

Returns
regex string to match the UCS range encoded in UTF-8.
Parameters
alower bound of UCS range
bupper bound of UCS range
escescape char 'x' for hex , or '0' or '\0' for octal \0nnn and
bracketsplace in [ brackets ]
template<typename S1 , typename S2 >
void reflex::set_delete ( S1 &  s1,
const S2 &  s2 
)

Delete elements of set s2 from set s1.

template<typename S1 , typename S2 >
void reflex::set_insert ( S1 &  s1,
const S2 &  s2 
)
inline

Insert set s2 into set s1.

float reflex::timer_elapsed ( timer_type t)
inline

Return elapsed time in milliseconds (ms) with microsecond precision since the last call up to 1 minute (wraps if elapsed time exceeds 1 minute!)

Parameters
ttimer to be updated
void reflex::timer_start ( timer_type t)
inline

Start timer.

Parameters
ttimer to be initialized
std::string reflex::utf8 ( int  a,
int  b,
int  esc = 'x',
const char *  par = "(",
bool  strict = true 
)

Convert a UCS-4 range [a,b] to a UTF-8 regex pattern.

Returns
regex string to match the UCS range encoded in UTF-8.
Parameters
alower bound of UCS range
bupper bound of UCS range
escescape char 'x' for hex , or '0' or '\0' for octal \0nnn and
parcapturing or non-capturing parenthesis "(?:"
strictreturned regex is strict UTF-8 (true) or permissive and lean UTF-8 (false)
size_t reflex::utf8 ( int  c,
char *  s 
)
inline

Convert UCS-4 to UTF-8, fills with REFLEX_NONCHAR_UTF8 when out of range, or unrestricted UTF-8 with WITH_UTF8_UNRESTRICTED.

Returns
length (in bytes) of UTF-8 character sequence stored in s.
Parameters
cUCS-4 character U+0000 to U+10ffff (unless WITH_UTF8_UNRESTRICTED)
spoints to the buffer to populate with UTF-8 (1 to 6 bytes) not NUL-terminated
int reflex::utf8 ( const char *  s,
const char **  r = NULL 
)
inline

Convert UTF-8 to UCS, returns REFLEX_NONCHAR for invalid UTF-8 except for MUTF-8 U+0000 and 0xD800-0xDFFF surrogate halves (use WITH_UTF8_UNRESTRICTED to remove any limits on UTF-8 encodings up to 6 bytes).

Returns
UCS character.
Parameters
spoints to the buffer with UTF-8 (1 to 6 bytes)
rpoints to pointer to set to the new position in s after the UTF-8 sequence, optional