Opened 2 years ago

Closed 2 years ago

Last modified 21 months ago

#492 closed defect (fixed)

utf-8 codec fails on emoji

Reported by: Maxim Reznik Owned by: Maxim Reznik
Priority: major Milestone: 18.0
Component: Matreshka - League Version: 0.7.0
Keywords: utf-8 emoji Cc:

Description

Matreshka decodes \u1F60A (😊) into utf-8 incorrectly.
It gets 0xf0, 0xdf, 0x98, 0x8a
instead of 0xF0, 0x9F, 0x98, 0x8A

Test case:

with Ada.Streams;
with League.Strings;
with League.Text_Codecs;
procedure Main is
   use type Ada.Streams.Stream_Element_Array;
   Line : constant Wide_Wide_String :=
     (1 => Wide_Wide_Character'Val (16#1F60A#));  --  😊
   Text : constant League.Strings.Universal_String :=
     League.Strings.To_Universal_String (Line);
   UTF8 : constant League.Text_Codecs.Text_Codec :=
     League.Text_Codecs.Codec (League.Strings.To_Universal_String ("utf-8"));
   Raw  : constant Ada.Streams.Stream_Element_Array :=
     UTF8.Encode (Text).To_Stream_Element_Array;
begin
   if Raw /= (16#F0#, 16#9F#, 16#98#, 16#8A#) then
      raise Program_Error;
   end if;
end Main;

Change History (2)

comment:1 by Maxim Reznik, 2 years ago

Owner: set to Maxim Reznik
Resolution: fixed
Status: newclosed

In 5819:

Fix utf-8 codec.

Closes #492

comment:2 by vadim.godunko, 21 months ago

Milestone: 0.8.018.0

Milestone renamed

Note: See TracTickets for help on using tickets.