wiki:uaflex

uaflex

NAME

uaflex - unicode aware flexible lexical analyzer generator in Ada

SYNOPSIS

uaflex OPTIONS FILE

DESCRIPTION

uaflex is a tool to help writing lexical analyzers in Ada. It is successor of aflex. Main feature of uaflex is support of Unicode characters in generated analyzers.

uaflex accepts set of rules with high level description of tokens and generates set of Ada compilation units to easy create lexical analyzer.

OPTIONS

All arguments are required.

--types <package name> Name of an Ada package to store types and constants used in scanner code.
--tokens <package name> Name of an Ada package where token definition is declared
--handler <package name> Name of an Ada package to store definition of abstract token handler
--scanner <package name> Name of an Ada package where scanner code is located
<input file> Name of file with token description

USING UAFLEX

Input file format and generated code

Input file contains two sections: definitions section and rules section in this format:

<definitions>
%%
<rules>
%%

A definition is kind of macros used in rules. Definition has this form:

name regular_expression

Each rule describes a token identified by the analyzer. Instead of providing actual code to execute when a token is found, rule contains name of a method of handler to call. Rule has this form:

regular_expression method_name

Rule is optionally prefixed with list on start conditions.

For each rule uaflex generates method of abstract handler and writes code of this abstract handler in a package named by --handler option. User then inherits from this handler to provide actions performed on each recognized token.

Beside this uaflex generates package S.Tables and procedure S.On_Accept:

   package Tables is
      function To_Class (Value : Matreshka.Internals.Unicode.Code_Point)
        return Character_Class;
      pragma Inline (To_Class);

      function Switch (S : State; Class : Character_Class) return State;
      pragma Inline (Switch);

      function Rule (S : State) return Rule_Index;
      pragma Inline (Rule);
   end Tables;

   procedure On_Accept
     (Self    : not null access Handler'Class;
      Scanner : not null access S.Scanner'Class;
      Rule    : Rule_Index;
      Token   : out Parser_Tokens.Token;
      Skip    : in out Boolean);

Where S is provided by --scanner option.

Code for the scanner is not generated by uaflex on each run. Instead user copies UAFLEX.Scanners package from uaflex source and customize it if needed. This package leverages generated S.Tables and S.On_Accept to make lexical analysis. Scanner provides this interface:

package UAFLEX.Scanners is
   use UAFLEX.Lexer_Types;

   subtype Token is Parser_Tokens.Token;
   type Scanner is tagged limited private;

   procedure Set_Source
     (Self : in out Scanner'Class;
      Source : not null Abstract_Sources.Source_Access);

   procedure Set_Handler
     (Self    : in out Scanner'Class;
      Handler : not null UAFLEX.Handlers.Handler_Access);

   subtype Start_Condition is State;

   procedure Set_Start_Condition
    (Self : in out Scanner'Class; Condition : Start_Condition);

   function Get_Start_Condition
     (Self : Scanner'Class) return Start_Condition;

   procedure Get_Token (Self : access Scanner'Class; Result : out Token);

   function Get_Text
     (Self : Scanner'Class) return League.Strings.Universal_String;

   function Get_Token_Length (Self : Scanner'Class) return Positive;
   function Get_Token_Position (Self : Scanner'Class) return Positive;

Types for states, rule indexes, character class and constants for start conditions are generated in a package named by --types option.

Regular expressions

Next characters have special meaning in regular expressions:

Form Meaning
x character "x" if x doesn't have special meaning
"x" character "x" even when it has special meaning
\x character "x" even when it has special meaning
x+ 1 or more occurrences of x
x* 0 or more occurrences of x
x? optional occurrence of x
(x) x
. any character, except line feed
x|y x or y
[xy] character x or character y
[x-z] characters in range from x to z
[^x] any character except x
{XX} application of XX definition
^x x at the begging of a line
x$ x at the and of line
<y>x rule x on start condition y

Special character could appear in regular expression as next sequences:

\a Match bell character (can be used in character classes).
\e Match escape character (can be used in character classes).
\f Match form feed character (can be used in character classes).
\n Match LF (can be used in character classes).
\r Match CR (can be used in character classes).
\t Match horizontal tab character (can be used in character classes).
\v Match vertical tab character (can be used in character classes).
\cX where X in range A-Z Match an ASCII character Control+A through Control+Z (can be used in character classes).
\uFFFF where FFFF are 4 hexadecimal digits. Matches a character with specified Unicode code point (can be used inside character classes).
\UFFFFFFFF where FFFFFFFF are 8 hexadecimal digits. Matches a character with specified Unicode code point (can be used inside character classes).
\p{name} or [:name:] where name is name of the binary property or general category. Match a character with the specified binary property or value of general category (can be used in character classes).
\P{name} or [:^name:] where name is name of the binary property or general category. Match a character except characters with the specified binary property or value of general category (can be used in character classes).

Expressions have next priorities:

" [] () Highest
+ * ? ..
concatination ..
| Lowest

Character classes could be expressed in square brackets. Characters lost special meaning inside such brackets. But "\", "-" and "" have another special meaning here. Example:

[abc] Character a, b or c
[^abc] Any character except a, b and c
[-+0-9] Characters "+", "-" or digit from 0 to 9
[\t\n ] Tabulation character, new line or space
[:Letter:] Unicode character from category Letter

EXAMPLES

Example of Ada 2012 scanner could by found in Gela project:

Last modified 6 years ago Last modified on May 6, 2015, 1:43:04 PM
Note: See TracWiki for help on using the wiki.