Changes between Version 1 and Version 2 of Reference/League/Universal_String


Ignore:
Timestamp:
Apr 7, 2011, 6:48:18 PM (10 years ago)
Author:
vadim.godunko
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Reference/League/Universal_String

    v1 v2  
    11[[PageOutline]]
    22
    3 = Universal_String Reference (League.Strings) =
    4 
    5 Universal_String is a base type to represent information in textual form as unbounded sequence of Unicode characters (Unicode code points). Several optimization techniques are used to optimize both space and performance characteristics.
    6 
    7 == Operations ==
     3= Universal_String Reference =
     4
     5Universal_String is a base type to represent information in textual form as unbounded sequence of Unicode characters (Unicode code points).
     6
     7== Public Subprograms ==
    88
    99||=  Operation                               =||=  Description  =||
    1010|| [#concatenation "&"]                       || Concatenate strings and characters ||
    11 || [#less "<"]                                || Compares two strings for binary less ||
    12 || [#lessequal "<="]                          || Compares two strings for binary less or equality ||
    13 || [#equal "="]                               || Compares two strings for binary equality ||
    14 || [#greater ">"]                             || Compares two strings for binary greater ||
    15 || [#greaterequal ">="]                       || Compares two strings for binary greater or equality ||
     11|| [#less "<"]                                || Compares two strings for binary less than ||
     12|| [#lessequal "<="]                          || Compares two strings for binary less than or equal to ||
     13|| [#equal "="]                               || Compares two strings for binary equal to ||
     14|| [#greater ">"]                             || Compares two strings for binary greater than ||
     15|| [#greaterequal ">="]                       || Compares two strings for binary greater than or equal to ||
    1616|| [#Append Append]                           || Appends string or character to the string ||
    1717|| [#Clear Clear]                             || Clears the string ||
     
    3535|| [#To_Wide_Wide_String To_Wide_Wide_String] || Converts Universal_String into Wide_Wide_String ||
    3636
    37 == Additional Subprograms ==
     37== Related Subprograms ==
    3838
    3939||=  Operation                               =||=  Description  =||
     
    4242== Detailed Description ==
    4343
     44Universal_String stores a string of valid Unicode characters (all Unicode characters in code point range 16!#0000# .. 16!#10_FFFF# except surrogate character range 16#D800# .. 16#DFFF#). Lower bound of string is 1 and upper bound may vary from 0 to Natural'Last.
     45
     46[http://www.unicode.org Unicode] is an international standard that supports most of the writing systems in use today.
     47
     48Several optimization techniques are used to optimize both space and performance characteristics, see [[#Optimization techniques]] section for more information.
     49
    4450=== "&" === #concatenation
    4551
     
    7480}}}
    7581
     82Each of the "&" functions returns an Universal_String obtained by concatenating the string or character given or represented by one of the parameters, with the string or character given or represented by the other parameter.
     83
     84"&" functions with parameters of Wide_Wide_Character/Wide_Wide_String/Universal_Character types raises Constraint_Error when character is not valid Unicode character.
     85
    7686=== "<" === #less
    7787
     
    8191}}}
    8292
     93Returns True if string Left is lexically less than string Right; otherwise returns False.
     94
     95The comparison is based exclusively on the numeric Unicode values of the characters and is very fast, but is not what a human would expect. Consider sorting user-interface strings using the [#Collation Collation] function.
     96
    8397=== "<=" === #lessequal
    8498
     
    88102}}}
    89103
     104Returns True if string Left is lexically less than or equal to string Right; otherwise returns False.
     105
     106The comparison is based exclusively on the numeric Unicode values of the characters and is very fast, but is not what a human would expect. Consider sorting user-interface strings using the [#Collation Collation] function.
     107
    90108=== "=" === #equal
    91109
     
    95113}}}
    96114
     115Returns True if string Left is lexically equal to string Right; otherwise returns False.
     116
     117The comparison is based exclusively on the numeric Unicode values of the characters and is very fast, but is not what a human would expect. Consider sorting user-interface strings using the [#Collation Collation] function.
     118
    97119=== ">" === #greater
    98120
     
    102124}}}
    103125
     126Returns True if string Left is lexically greater than string Right; otherwise returns False.
     127
     128The comparison is based exclusively on the numeric Unicode values of the characters and is very fast, but is not what a human would expect. Consider sorting user-interface strings using the [#Collation Collation] function.
     129
    104130=== ">=" === #greaterequal
    105131
     
    109135}}}
    110136
     137Returns True if string Left is lexically greater than or equal to string Right; otherwise returns False.
     138
     139The comparison is based exclusively on the numeric Unicode values of the characters and is very fast, but is not what a human would expect. Consider sorting user-interface strings using the [#Collation Collation] function.
     140
    111141=== Append ===
    112142
     
    129159}}}
    130160
     161Appends the string or character Item onto the end of this string.
     162
     163The procedures that has parameter of type Wide_Wide_Character/Wide_Wide_String/Universal_Character raises Constraint_Error when character is not a valid Unicode character.
     164
     165The Append functions for characters, as well as for small strings are typically very fast (constant time), because Universal_String preallocates extra space at the end of the string data so it can grow without reallocating the entire string each time.
     166
    131167=== Clear ===
    132168
     
    134170   procedure Clear (Self : in out Universal_String'Class);
    135171}}}
     172
     173Clears the contents of the string and makes it empty.
    136174
    137175=== Collation ===
     
    299337== Optimization techniques ==
    300338
    301 Textual representation of the information is used application widely, thus it is important to provide efficient implementation.
    302 Several optimization techniques are used for Universal_String implementation, below is list of most important.
    303 
    304 1. Constant size of objects. Objects occupy constant size on the stack, independently of size of actual data. This is also important for multitasking application where size of the stack of each task is limited.
    305 1. Copy-on-write. Data can be shared between several objects till it is not modified. This makes assignment operation to be constant time operation and minimize occupied memory.
     339Textual representation of the information is used in applications widely, thus it is important to provide efficient implementation. Several optimization techniques are used for Universal_String implementation, below is the list of most important.
     340
     3411. Constant size of objects. Objects occupy constant size on the stack, independently of size of actual string data. This is very important for multitasking application where size of the stack of each task is limited; as well as don't require to use secondary stack to return objects.
     3421. Copy-on-write. String data is shared between several objects till it is not modified. This makes assignment operation to be constant time operation and minimize memory usage.
    3063431. UTF-16 encoding. Internally, data stored using UTF-16 encoding, which provides balance between memory use and performance.
    307 1. SIMD optimization. On platforms where SIMD operations is available many string operations utilize special SIMD versions of algorithms.
     3441. SIMD optimization. On platforms where SIMD operations are available many string operations utilize special SIMD versions of algorithms. On platforms where SIMD operations are not available pseudo-vectorization technique is used to process several characters by one operation using 32-bit or 64-bit registers.
     3451. Usage of flat arrays. Internally, string data stored in flat arrays indexed starting from 0. This slightly improves performance of string data traversing.
     3461. Memory preallocation. Memory allocation for internal string data takes in sense memory allocation granularity. This significantly improves performance of operations like appending of character or small string, because memory reallocation is usually not needed in this cases.
     3471. Null-termination. All internal data are null-terminated, even this is completely invisible for applications and doesn't exclude use of Unicode character with code point 0 in strings. But in some cases (like Microsoft Windows platform or SQLite database) this allows to pass internal data to external libraries directly without conversion. Null-termination also allows to optimize some operations by removing range checks for surrogate pairs handling, because last character and null-terminator forms (invalid surrogate) pair.