[[PageOutline]] = Regular Expression Engine = Regular expression engine uses Perl-style syntax with Unicode extensions. Non-backtracking virtual machine guarantee that regular expression searches run in time linear in the size of the input. == Example program == Here is example to show how to use API of the regular expression engine. {{{ with Ada.Command_Line; with Ada.Strings.Wide_Wide_Fixed; with Ada.Wide_Wide_Text_IO; with League.Regexps; with League.Strings; procedure Demo is function Read (File_Name : String) return League.Strings.Universal_String; ---------- -- Read -- ---------- function Read (File_Name : String) return League.Strings.Universal_String is File : Ada.Wide_Wide_Text_IO.File_Type; Buffer : Wide_Wide_String (1 .. 1024); Last : Natural; begin Ada.Wide_Wide_Text_IO.Open (File, Ada.Wide_Wide_Text_IO.In_File, File_Name, "wcem=8"); Ada.Wide_Wide_Text_IO.Get_Line (File, Buffer, Last); Ada.Wide_Wide_Text_IO.Close (File); return League.Strings.To_Universal_String (Buffer (1 .. Last)); end Read; Expression : League.Strings.Universal_String := Read (Ada.Command_Line.Argument (1)); String : League.Strings.Universal_String := Read (Ada.Command_Line.Argument (2)); Pattern : League.Regexps.Regexp_Pattern := League.Regexps.Compile (Expression); Match : League.Regexps.Regexp_Match := Pattern.Find_Match (String); begin if Match.Is_Matched then Ada.Wide_Wide_Text_IO.Put_Line ("Match found:" & Integer'Wide_Wide_Image (Match.First_Index) & " .." & Integer'Wide_Wide_Image (Match.Last_Index) & " => '" & League.Strings.To_Wide_Wide_String (Match.Capture) & "'"); for J in 1 .. Match.Capture_Count loop Ada.Wide_Wide_Text_IO.Put_Line (" \" & Ada.Strings.Wide_Wide_Fixed.Trim (Integer'Wide_Wide_Image (J), Ada.Strings.Both) & ":" & Integer'Wide_Wide_Image (Match.First_Index (J)) & " .." & Integer'Wide_Wide_Image (Match.Last_Index (J)) & " => '" & League.Strings.To_Wide_Wide_String (Match.Capture (J)) & "'"); end loop; else Ada.Wide_Wide_Text_IO.Put_Line ("Not matched"); end if; end Demo; }}} == Syntax == === Characters === || Any character except Pattern_Syntax and Pattern_White_Space || All characters except the special characters Pattern_Syntax and Pattern_White_Space match a single instance of themselves. || || '''.''' ''(dot)'' || Matches any single character. || || '''\'''''X'' ''where X is Pattern_White_Space or Pattern_Syntax'' || Matches a specified character (can be used inside character classes). || || '''\a''' || Match bell character (can be used in character classes). || || '''\e''' || Match escape character (can be used in character classes). || || '''\f''' || Match form feed character (can be used in character classes). || || '''\n''' || Match LF (can be used in character classes). || || '''\r''' || Match CR (can be used in character classes). || || '''\t''' || Match horizontal tab character (can be used in character classes). || || '''\v''' || Match vertical tab character (can be used in character classes). || || '''\c'''''X'' ''where X in range A-Z'' || Match an ASCII character Control+A through Control+Z (can be used in character classes). || || '''\u'''''FFFF'' ''where FFFF are 4 hexadecimal digits'' || Matches a character with specified Unicode code point (can be used inside character classes). || || '''\U'''''FFFFFFFF'' ''where FFFFFFFF are 8 hexadecimal digits'' || Matches a character with specified Unicode code point (can be used inside character classes). || || '''\Q''' ... '''\E''' || Matches the characters between '''\Q''' and '''\E''' literally, suppressing the meaning of special characters. || === Named character classes === || '''\p{'''''name'''''}''' ''where name is name of the binary property or general category'' || Match a character with the specified binary property or value of general category (can be used in character classes). || || '''\P{'''''name'''''}''' ''where name is name of the binary property or general category'' || Match a character except characters with the specified binary property or value of general category (can be used in character classes). || || '''[:'''''name''''':]''' ''where name is name of the binary property or general category'' || Match a character with the specified binary property or value of general category (can be used in character classes). || || '''[:!^'''''name''''':]''' ''where name is name of the binary property or general category'' || Match a character except characters with the specified binary property or value of general category (can be used in character classes). || ==== Supported binary properties ==== || Short name || Full name || Alternative name || || AHex || ASCII_Hex_Digit || || || Alpha || Alphabetic || || || Bidi_C || Bidi_Control || || || Bidi_M || Bidi_Mirrored || || || CE || Composition_Exclusion || || || Comp_Ex || Full_Composition_Exclusion || || || Dash || Dash || || || Dep || Deprecated || || || DI || Default_Ignorable_Code_Point || || || Dia || Diacritic || || || Ext || Extender || || || Gr_Base || Grapheme_Base || || || Gr_Ext || Grapheme_Extend || || || Gr_Link || Grapheme_Link || || || Hex || Hex_Digit || || || Hyphen || Hyphen || || || IDC || ID_Continue || || || Ideo || Ideographic || || || IDS || ID_Start || || || IDSB || IDS_Binary_Operator || || || IDST || IDS_Trinary_Operator || || || Join_C || Join_Control || || || LOE || Logical_Order_Exception || || || Lower || Lowercase || || || Math || Math || || || NChar || Noncharacter_Code_Point || || || OAlpha || Other_Alphabetic || || || ODI || Other_Default_Ignorable_Code_Point || || || OGr_Ext || Other_Grapheme_Extend || || || OIDC || Other_ID_Continue || || || OIDS || Other_ID_Start || || || OLower || Other_Lowercase || || || OMath || Other_Math || || || OUpper || Other_Uppercase || || || Pat_Syn || Pattern_Syntax || || || Pat_WS || Pattern_White_Space || || || QMark || Quotation_Mark || || || Radical || Radical || || || SD || Soft_Dotted || || || STerm || STerm || || || Term || Terminal_Punctuation || || || UIdeo || Unified_Ideograph || || || Upper || Uppercase || || || VS || Variation_Selector || || || WSpace || White_Space || space || || XIDC || XID_Continue || || || XIDS || XID_Start || || || XO_NFC || Expands_On_NFC || || || XO_NFD || Expands_On_NFD || || || XO_NFKC || Expands_On_NFKC || || || XO_NFKD || Expands_On_NFKD || || ==== Supported values of general category property ==== || Short name || Full name || Alternative name || || C || Other || || || Cc || Control || cntrl || || Cf || Format || || || Cn || Unassigned || || || Co || Private_Use || || || Cs || Surrogate || || || L || Letter || || || LC || Cased_Letter || || || Ll || Lowercase_Letter || || || Lm || Modifier_Letter || || || Lo || Other_Letter || || || Lt || Titlecase_Letter || || || Lu || Uppercase_Letter || || || M || Mark || || || Mc || Spacing_Mark || || || Me || Enclosing_Mark || || || Mn || Nonspacing_Mark || || || N || Number || || || Nd || Decimal_Number || digit || || Nl || Letter_Number || || || No || Other_Number || || || P || Punctuation || punct || || Pc || Connector_Punctuation || || || Pd || Dash_Punctuation || || || Pe || Close_Punctuation || || || Pf || Final_Punctuation || || || Pi || Initial_Punctuation || || || Po || Other_Punctuation || || || Ps || Open_Punctuation || || || S || Symbol || || || Sc || Currency_Symbol || || || Sk || Modifier_Symbol || || || Sm || Math_Symbol || || || So || Other_Symbol || || || Z || Separator || || || Zl || Line_Separator || || || Zp || Paragraph_Separator || || || Zs || Space_Separator || || === Character classes === || '''['''''members''''']''' || Match any character specified by members. || || '''[!^'''''members''''']''' || Match any character except specified by members. || ==== Character class members ==== || Any character except Pattern_Syntax and Pattern_White_Space || All characters except the special characters Pattern_Syntax and Pattern_White_Space adds a single instance of themselves into the class. || || ''x'''''-'''''y'' || Specifies a range of characters. || === Quantifiers === || '''?''' || Makes the preceding item optional. Greedy, so the optional item is included in the match if possible. || || '''??''' || Makes the preceding item optional. Lazy, so the optional item is excluded in the match if possible. || || '''*''' || Repeats the previous item zero or more times. Greedy, so as many items as possible will be matched before trying permutations with less matches of the preceding item, up to the point where the preceding item is not matched at all. || || '''*?''' || Repeats the previous item zero or more times. Lazy, so the engine first attempts to skip the previous item, before trying permutations with ever increasing matches of the preceding item. || || '''+''' || Repeats the previous item once or more. Greedy, so as many items as possible will be matched before trying permutations with less matches of the preceding item, up to the point where the preceding item is matched only once. || || '''+?''' || Repeats the previous item once or more. Lazy, so the engine first matches the previous item only once, before trying permutations with ever increasing matches of the preceding item. || '''{'''''n'''''}''' ''where n is an integer >= 1'' || Repeats the previous item exactly ''n'' times. || || '''{'''''n''''','''''m'''''}''' ''where n >= 0 and m >= n'' || Repeats the previous item between ''n'' and ''m'' times. Greedy, so repeating m times is tried before reducing the repetition to n times. || || '''{'''''n''''','''''m'''''}?''' ''where n >= 0 and m >= n'' || Repeats the previous item between ''n'' and ''m'' times. Lazy, so repeating n times is tried before increasing the repetition to m times. || || '''{'''''n''''',}''' ''where n >= 0'' || Repeats the previous item at least ''n'' times. Greedy, so as many items as possible will be matched before trying permutations with less matches of the preceding item, up to the point where the preceding item is matched only n times. || || '''{'''''n''''',}?''' ''where n >= 0'' || Repeats the previous item ''n'' or more times. Lazy, so the engine first matches the previous item n times, before trying permutations with ever increasing matches of the preceding item. || || '''{,'''''n'''''}''' ''where n >= 0'' || Repeats the previous item between zero and ''n'' times. Greedy, so as many items as possible will be matched before trying permutations with less matches of the preceding item, up to the point where the preceding item is matched only n times. || || '''{,'''''n'''''}?''' ''where n >= 0'' || Repeats the previous item between zero and ''n'' times. Lazy, so the engine first matches the previous item n times, before trying permutations with ever increasing matches of the preceding item. || === Composites === || ''x'' ''y'' || ''x'' followed by ''y'' || || ''x'' '''|''' ''y'' || Causes the regex engine to match either the part on the left side, or the part on the right side. Can be strung together into a series of options. The pipe has the lowest precedence of all operators. Use grouping to alternate only part of the regular expression. || === Grouping === || '''('''regex''')''' || Round brackets group the regex between them. They capture the text matched by the regex inside them that can be reused in a backreference, and they allow you to apply regex operators to the entire grouped regex. || || '''(?:'''regex''')''' || Non-capturing parentheses group the regex so you can apply regex operators, but do not capture anything and do not create backreferences. || === Comments === || '''(?#'''comment''')''' || Everything between '''(?#''' and ''')''' is ignored by the regex engine. || === Anchors === || '''!^''' ''(caret)'' || Matches at the start of the string the regex pattern is applied to. Matches a position rather than a character. || || '''$''' ''(dollar)'' || Matches at the end of the string the regex pattern is applied to. Matches a position rather than a character. ||