reflex::Pattern Class Reference

updated Mon Apr 10 2017 by Robert van Engelen
 
Classes | Public Types | Public Member Functions | Protected Member Functions | Private Types | Private Member Functions | Static Private Member Functions | Private Attributes | Friends | List of all members
reflex::Pattern Class Reference

Pattern class holds a regex pattern and its compiled FSM opcode table or code for the reflex::Matcher engine. More...

#include <pattern.h>

Collaboration diagram for reflex::Pattern:
Collaboration graph
[legend]

Classes

struct  Option
 Global modifier modes, syntax flags, and compiler options. More...
 
struct  Position
 Finite state machine construction position information. More...
 
struct  State
 Finite state machine. More...
 

Public Types

enum  Const { IMAX = 0xFFFF }
 Common constants. More...
 
typedef uint16_t Index
 index into opcodes array Pattern::opc_ and subpattern indexing More...
 
typedef uint32_t Opcode
 32 bit opcode word More...
 
typedef void(* FSM) (class Matcher &)
 

Public Member Functions

 Pattern (const char *regex, const char *options=NULL) throw (regex_error)
 Construct a pattern object given a regex string. More...
 
 Pattern (const char *regex, const std::string &options) throw (regex_error)
 Construct a pattern object given a regex string. More...
 
 Pattern (const std::string &regex, const char *options=NULL) throw (regex_error)
 Construct a pattern object given a regex string. More...
 
 Pattern (const std::string &regex, const std::string &options) throw (regex_error)
 Construct a pattern object given a regex string. More...
 
 Pattern (const Opcode *code) throw (regex_error)
 Construct a pattern object given an opcode table. More...
 
 Pattern (FSM fsm) throw (regex_error)
 Construct a pattern object given a function pointer to FSM code. More...
 
virtual ~Pattern (void)
 Destructor, deletes internal code array when owned and allocated. More...
 
Index size (void) const
 Number of subpatterns of this pattern object. More...
 
const std::string operator[] (Index choice) const
 Get subpattern of this pattern object. More...
 
bool reachable (Index choice) const
 Check is subpattern is reachable by a match. More...
 
size_t nodes (void) const
 Get the number of finite state machine nodes (vertices). More...
 
size_t edges (void) const
 Get the number of finite state machine edges (transitions on input characters). More...
 
size_t words (void) const
 Get the code size in number of words. More...
 
float parse_time () const
 Get elapsed regex parsing and analysis time. More...
 
float nodes_time () const
 Get elapsed DFA vertices construction time. More...
 
float edges_time () const
 Get elapsed DFA edges construction time. More...
 
float words_time () const
 Get elapsed code words assembly time. More...
 

Protected Member Functions

virtual void error (regex_error_type code, size_t pos=0) const throw (regex_error)
 Throw an error. More...
 

Private Types

enum  Meta {
  META_MIN = 0x100, META_NWB = 0x101, META_NWE = 0x102, META_BWB = 0x103,
  META_EWB = 0x104, META_BWE = 0x105, META_EWE = 0x106, META_BOL = 0x107,
  META_EOL = 0x108, META_BOB = 0x109, META_EOB = 0x10A, META_IND = 0x10B,
  META_DED = 0x10C, META_MAX
}
 Meta characters. More...
 
typedef unsigned int Char
 
typedef ORanges< CharChars
 represent (wide) char set as a set of ranges More...
 
typedef size_t Location
 
typedef ORanges< LocationRanges
 
typedef std::set< LocationSet
 
typedef std::map< int, RangesMap
 
typedef std::set< PositionPositions
 
typedef std::map< Position, PositionsFollow
 
typedef std::pair< Chars, PositionsMove
 
typedef std::list< MoveMoves
 

Private Member Functions

void init (const char *options) throw (regex_error)
 Initialize the pattern at construction. More...
 
void init_options (const char *options)
 
void parse (Positions &startpos, Follow &followpos, Map &modifiers, Map &lookahead) throw (regex_error)
 
void parse1 (bool begin, Location &loc, Positions &firstpos, Positions &lastpos, bool &nullable, Follow &followpos, Positions &lazypos, Map &modifiers, Ranges &lookahead, Index &iter) throw (regex_error)
 
void parse2 (bool begin, Location &loc, Positions &firstpos, Positions &lastpos, bool &nullable, Follow &followpos, Positions &lazypos, Map &modifiers, Ranges &lookahead, Index &iter) throw (regex_error)
 
void parse3 (bool begin, Location &loc, Positions &firstpos, Positions &lastpos, bool &nullable, Follow &followpos, Positions &lazypos, Map &modifiers, Ranges &lookahead, Index &iter) throw (regex_error)
 
void parse4 (bool begin, Location &loc, Positions &firstpos, Positions &lastpos, bool &nullable, Follow &followpos, Positions &lazypos, Map &modifiers, Ranges &lookahead, Index &iter) throw (regex_error)
 
void parse_esc (Location &loc) const throw (regex_error)
 
void compile (State &start, Follow &followpos, const Map &modifiers, const Map &lookahead) throw (regex_error)
 
void lazy (const Positions &lazypos, Positions &pos) const
 
void lazy (const Positions &lazypos, const Positions &pos, Positions &pos1) const
 
void greedy (Positions &pos) const
 
void trim_lazy (Positions &pos) const
 
void compile_transition (State *state, Follow &followpos, const Map &modifiers, const Map &lookahead, Moves &moves) const throw (regex_error)
 
void transition (Moves &moves, const Chars &chars, const Positions &follow) const
 
Char compile_esc (Location loc, Chars &chars) const throw (regex_error)
 
void compile_list (Location loc, Chars &chars, const Map &modifiers) const throw (regex_error)
 
void posix (size_t index, Chars &chars) const
 
void flip (Chars &chars) const
 
void assemble (State &start) throw (regex_error)
 
void compact_dfa (State &start)
 
void encode_dfa (State &start) throw (regex_error)
 
void gencode_dfa (const State &start) const
 
void gencode_dfa_closure (FILE *fd, const State *start, int nest) const
 
void delete_dfa (State &start)
 
void export_dfa (const State &start) const
 
void export_code (void) const
 
Location find_at (Location loc, char c) const
 
Char at (Location k) const
 
bool eq_at (Location loc, const char *s) const
 
Char escape_at (Location loc) const
 
Char escapes_at (Location loc, const char *escapes) const
 

Static Private Member Functions

static bool is_modified (int mode, const Map &modifiers, Location loc)
 
static bool is_meta (Char c)
 
static Opcode opcode_take (Index index)
 
static Opcode opcode_redo (void)
 
static Opcode opcode_tail (Index index)
 
static Opcode opcode_head (Index index)
 
static Opcode opcode_goto (Char lo, Char hi, Index index)
 
static Opcode opcode_halt (void)
 
static bool is_opcode_redo (Opcode opcode)
 
static bool is_opcode_take (Opcode opcode)
 
static bool is_opcode_tail (Opcode opcode)
 
static bool is_opcode_head (Opcode opcode)
 
static bool is_opcode_halt (Opcode opcode)
 
static bool is_opcode_meta (Opcode opcode)
 
static bool is_opcode_meta (Opcode opcode, Char a)
 
static bool is_opcode_match (Opcode opcode, unsigned char c)
 
static Char meta_of (Opcode opcode)
 
static Char lo_of (Opcode opcode)
 
static Char hi_of (Opcode opcode)
 
static Index index_of (Opcode opcode)
 

Private Attributes

Option opt_
 pattern compiler options More...
 
std::string rex_
 regular expression string More...
 
std::vector< Locationend_
 entries point to the subpattern's ending '|' or '\0' More...
 
std::vector< bool > acc_
 true if subpattern n is acceptable (state is reachable) More...
 
size_t vno_
 number of finite state machine vertices |V| More...
 
size_t eno_
 number of finite state machine edges |E| More...
 
const Opcodeopc_
 points to the opcode table More...
 
Index nop_
 number of opcodes generated More...
 
FSM fsm_
 function pointer to FSM code More...
 
float pms_
 ms elapsed time to parse regex More...
 
float vms_
 ms elapsed time to compile DFA vertices More...
 
float ems_
 ms elapsed time to compile DFA edges More...
 
float wms_
 ms elapsed time to assemble code words More...
 

Friends

class Matcher
 permit access by the reflex::Matcher engine More...
 

Detailed Description

Pattern class holds a regex pattern and its compiled FSM opcode table or code for the reflex::Matcher engine.

More info TODO

Member Typedef Documentation

typedef unsigned int reflex::Pattern::Char
private

represent (wide) char set as a set of ranges

typedef std::map<Position,Positions> reflex::Pattern::Follow
private
typedef void(* reflex::Pattern::FSM) (class Matcher &)

function pointer to FSM code

typedef uint16_t reflex::Pattern::Index

index into opcodes array Pattern::opc_ and subpattern indexing

typedef size_t reflex::Pattern::Location
private
typedef std::map<int,Ranges> reflex::Pattern::Map
private
typedef std::pair<Chars,Positions> reflex::Pattern::Move
private
typedef std::list<Move> reflex::Pattern::Moves
private
typedef uint32_t reflex::Pattern::Opcode

32 bit opcode word

typedef std::set<Position> reflex::Pattern::Positions
private
typedef std::set<Location> reflex::Pattern::Set
private

Member Enumeration Documentation

Common constants.

Enumerator
IMAX 

max index, also serves as a marker

enum reflex::Pattern::Meta
private

Meta characters.

Enumerator
META_MIN 
META_NWB 

non-word at begin \Bx

META_NWE 

non-word at end x\B

META_BWB 

begin of word at begin \<x

META_EWB 

end of word at begin \>x

META_BWE 

begin of word at end x\<

META_EWE 

end of word at end x\>

META_BOL 

begin of line ^

META_EOL 

end of line $

META_BOB 

begin of buffer \A

META_EOB 

end of buffer \Z

META_IND 

indent boundary \i

META_DED 

dedent boundary \j (must be the largest META code)

META_MAX 

max meta characters

Constructor & Destructor Documentation

reflex::Pattern::Pattern ( const char *  regex,
const char *  options = NULL 
)
throw (regex_error
)
inlineexplicit

Construct a pattern object given a regex string.

reflex::Pattern::Pattern ( const char *  regex,
const std::string &  options 
)
throw (regex_error
)
inlineexplicit

Construct a pattern object given a regex string.

reflex::Pattern::Pattern ( const std::string &  regex,
const char *  options = NULL 
)
throw (regex_error
)
inlineexplicit

Construct a pattern object given a regex string.

reflex::Pattern::Pattern ( const std::string &  regex,
const std::string &  options 
)
throw (regex_error
)
inlineexplicit

Construct a pattern object given a regex string.

reflex::Pattern::Pattern ( const Opcode code)
throw (regex_error
)
inlineexplicit

Construct a pattern object given an opcode table.

reflex::Pattern::Pattern ( FSM  fsm)
throw (regex_error
)
inlineexplicit

Construct a pattern object given a function pointer to FSM code.

virtual reflex::Pattern::~Pattern ( void  )
inlinevirtual

Destructor, deletes internal code array when owned and allocated.

Member Function Documentation

void reflex::Pattern::assemble ( State start)
throw (regex_error
)
private
Char reflex::Pattern::at ( Location  k) const
inlineprivate
void reflex::Pattern::compact_dfa ( State start)
private
void reflex::Pattern::compile ( State start,
Follow followpos,
const Map modifiers,
const Map lookahead 
)
throw (regex_error
)
private
Char reflex::Pattern::compile_esc ( Location  loc,
Chars chars 
) const
throw (regex_error
)
private
void reflex::Pattern::compile_list ( Location  loc,
Chars chars,
const Map modifiers 
) const
throw (regex_error
)
private
void reflex::Pattern::compile_transition ( State state,
Follow followpos,
const Map modifiers,
const Map lookahead,
Moves moves 
) const
throw (regex_error
)
private
void reflex::Pattern::delete_dfa ( State start)
private
size_t reflex::Pattern::edges ( void  ) const
inline

Get the number of finite state machine edges (transitions on input characters).

Returns
number of edges or 0 when no finite state machine was constructed by this pattern.
float reflex::Pattern::edges_time ( ) const
inline

Get elapsed DFA edges construction time.

void reflex::Pattern::encode_dfa ( State start)
throw (regex_error
)
private
bool reflex::Pattern::eq_at ( Location  loc,
const char *  s 
) const
inlineprivate
virtual void reflex::Pattern::error ( regex_error_type  code,
size_t  pos = 0 
) const
throw (regex_error
)
protectedvirtual

Throw an error.

Parameters
codeerror code
posoptional location of the error in regex string Pattern::rex_
Char reflex::Pattern::escape_at ( Location  loc) const
inlineprivate
Char reflex::Pattern::escapes_at ( Location  loc,
const char *  escapes 
) const
inlineprivate
void reflex::Pattern::export_code ( void  ) const
private
void reflex::Pattern::export_dfa ( const State start) const
private
Location reflex::Pattern::find_at ( Location  loc,
char  c 
) const
inlineprivate
void reflex::Pattern::flip ( Chars chars) const
private
void reflex::Pattern::gencode_dfa ( const State start) const
private
void reflex::Pattern::gencode_dfa_closure ( FILE *  fd,
const State start,
int  nest 
) const
private
void reflex::Pattern::greedy ( Positions pos) const
private
static Char reflex::Pattern::hi_of ( Opcode  opcode)
inlinestaticprivate
static Index reflex::Pattern::index_of ( Opcode  opcode)
inlinestaticprivate
void reflex::Pattern::init ( const char *  options)
throw (regex_error
)
private

Initialize the pattern at construction.

void reflex::Pattern::init_options ( const char *  options)
private
static bool reflex::Pattern::is_meta ( Char  c)
inlinestaticprivate
static bool reflex::Pattern::is_modified ( int  mode,
const Map modifiers,
Location  loc 
)
inlinestaticprivate
static bool reflex::Pattern::is_opcode_halt ( Opcode  opcode)
inlinestaticprivate
static bool reflex::Pattern::is_opcode_head ( Opcode  opcode)
inlinestaticprivate
static bool reflex::Pattern::is_opcode_match ( Opcode  opcode,
unsigned char  c 
)
inlinestaticprivate
static bool reflex::Pattern::is_opcode_meta ( Opcode  opcode)
inlinestaticprivate
static bool reflex::Pattern::is_opcode_meta ( Opcode  opcode,
Char  a 
)
inlinestaticprivate
static bool reflex::Pattern::is_opcode_redo ( Opcode  opcode)
inlinestaticprivate
static bool reflex::Pattern::is_opcode_tail ( Opcode  opcode)
inlinestaticprivate
static bool reflex::Pattern::is_opcode_take ( Opcode  opcode)
inlinestaticprivate
void reflex::Pattern::lazy ( const Positions lazypos,
Positions pos 
) const
private
void reflex::Pattern::lazy ( const Positions lazypos,
const Positions pos,
Positions pos1 
) const
private
static Char reflex::Pattern::lo_of ( Opcode  opcode)
inlinestaticprivate
static Char reflex::Pattern::meta_of ( Opcode  opcode)
inlinestaticprivate
size_t reflex::Pattern::nodes ( void  ) const
inline

Get the number of finite state machine nodes (vertices).

Returns
number of nodes or 0 when no finite state machine was constructed by this pattern.
float reflex::Pattern::nodes_time ( ) const
inline

Get elapsed DFA vertices construction time.

static Opcode reflex::Pattern::opcode_goto ( Char  lo,
Char  hi,
Index  index 
)
inlinestaticprivate
static Opcode reflex::Pattern::opcode_halt ( void  )
inlinestaticprivate
static Opcode reflex::Pattern::opcode_head ( Index  index)
inlinestaticprivate
static Opcode reflex::Pattern::opcode_redo ( void  )
inlinestaticprivate
static Opcode reflex::Pattern::opcode_tail ( Index  index)
inlinestaticprivate
static Opcode reflex::Pattern::opcode_take ( Index  index)
inlinestaticprivate
const std::string reflex::Pattern::operator[] ( Index  choice) const

Get subpattern of this pattern object.

Returns
subpattern string or "".
void reflex::Pattern::parse ( Positions startpos,
Follow followpos,
Map modifiers,
Map lookahead 
)
throw (regex_error
)
private
void reflex::Pattern::parse1 ( bool  begin,
Location loc,
Positions firstpos,
Positions lastpos,
bool &  nullable,
Follow followpos,
Positions lazypos,
Map modifiers,
Ranges lookahead,
Index iter 
)
throw (regex_error
)
private
void reflex::Pattern::parse2 ( bool  begin,
Location loc,
Positions firstpos,
Positions lastpos,
bool &  nullable,
Follow followpos,
Positions lazypos,
Map modifiers,
Ranges lookahead,
Index iter 
)
throw (regex_error
)
private
void reflex::Pattern::parse3 ( bool  begin,
Location loc,
Positions firstpos,
Positions lastpos,
bool &  nullable,
Follow followpos,
Positions lazypos,
Map modifiers,
Ranges lookahead,
Index iter 
)
throw (regex_error
)
private
void reflex::Pattern::parse4 ( bool  begin,
Location loc,
Positions firstpos,
Positions lastpos,
bool &  nullable,
Follow followpos,
Positions lazypos,
Map modifiers,
Ranges lookahead,
Index iter 
)
throw (regex_error
)
private
void reflex::Pattern::parse_esc ( Location loc) const
throw (regex_error
)
private
float reflex::Pattern::parse_time ( ) const
inline

Get elapsed regex parsing and analysis time.

void reflex::Pattern::posix ( size_t  index,
Chars chars 
) const
private
bool reflex::Pattern::reachable ( Index  choice) const
inline

Check is subpattern is reachable by a match.

Returns
true if subpattern is reachable.
Index reflex::Pattern::size ( void  ) const
inline

Number of subpatterns of this pattern object.

Returns
number of subpatterns.
void reflex::Pattern::transition ( Moves moves,
const Chars chars,
const Positions follow 
) const
private
void reflex::Pattern::trim_lazy ( Positions pos) const
private
size_t reflex::Pattern::words ( void  ) const
inline

Get the code size in number of words.

Returns
number of words or 0 when no code was generated by this pattern.
float reflex::Pattern::words_time ( ) const
inline

Get elapsed code words assembly time.

Friends And Related Function Documentation

friend class Matcher
friend

permit access by the reflex::Matcher engine

Member Data Documentation

std::vector<bool> reflex::Pattern::acc_
private

true if subpattern n is acceptable (state is reachable)

float reflex::Pattern::ems_
private

ms elapsed time to compile DFA edges

std::vector<Location> reflex::Pattern::end_
private

entries point to the subpattern's ending '|' or '\0'

size_t reflex::Pattern::eno_
private

number of finite state machine edges |E|

FSM reflex::Pattern::fsm_
private

function pointer to FSM code

Index reflex::Pattern::nop_
private

number of opcodes generated

const Opcode* reflex::Pattern::opc_
private

points to the opcode table

Option reflex::Pattern::opt_
private

pattern compiler options

float reflex::Pattern::pms_
private

ms elapsed time to parse regex

std::string reflex::Pattern::rex_
private

regular expression string

float reflex::Pattern::vms_
private

ms elapsed time to compile DFA vertices

size_t reflex::Pattern::vno_
private

number of finite state machine vertices |V|

float reflex::Pattern::wms_
private

ms elapsed time to assemble code words


The documentation for this class was generated from the following file: