Charset erro

26 views
Skip to first unread message

Glauber Alexandre Brossi da Cunha

unread,
Jul 5, 2023, 3:46:24 PM7/5/23
to MariaDB ColumnStore
Good afternoon

During my tests on MDB Community Server 11.0.2 and CS 4.6.7, I came across a charset error in the select involving a small table INNODB, and the table with COLUMNSTORE.

I have this INNODB table
CREATE TABLE Strategies_Temp (
ESME_ID INT,
ESME_CODE VARCHAR(100),
ESGE_CODE VARCHAR(100),
session_id INT)
ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_swedish_ci;

And this COLUMNSTORE table
CREATE TABLE `GEM_HISTORIC_CS` (
   `GEMH_DATA` DATE,
   `GEMH_ESME_ID` SMALLINT,
   `GEMH_FINANCIAL_DAY` DOUBLE,
   `GEMH_MES_FINANCERO` DOUBLE
) ENGINE=Columnstore DEFAULT CHARSET=latin1 COLLATE=latin1_swedish_ci;

Below are two records from the Estrategias_Temp table that have a special accent character.

ESME_ID | ESME_CODIGO
1 | Juros Resíduo
2 | Juros Internacional Inclinação

When I do a select having GEM_HISTORICO_CS as the main table doing a JOIN with Estrategias_Temp, the displayed records are coming out this way

Juros Resíduo
Juros Internacional Inclinação

I looked for a topic already created with a similar problem, but I didn't find anything

Does anyone have any idea what it could be?
Thanks

Att
Glauber
From Brazil

Glauber Alexandre Brossi da Cunha

unread,
Jul 5, 2023, 4:42:11 PM7/5/23
to MariaDB ColumnStore
Update

I changed the two tables to no longer use DEFAULT CHARSET=latin1 and recreated both to use DEFAULT CHARSET=utf8mb4

After this change I no longer had the problem.

I wanted to better understand why I had this problem using latin1, because all database tables are with latin1.

And right now I'm just doing tests in our DEV environment, and since every INNODB table needs to be using utf8mb4 to JOIN with a COLUMNSTORE table, I'm going to have problems in PROD, because I can't recreate these possible tables that we're going to use in JOINS.

Thanks
Att
Glauber

Roland Noland

unread,
Jul 5, 2023, 6:21:27 PM7/5/23
to Glauber Alexandre Brossi da Cunha, MariaDB ColumnStore
Greetings, 
This is mostly interesting how could you get away with Portuguese chars and latin1. AFAIK can't do Portuguese. It is basically ascii.
When you set the collation to swedish your joins are somehow produced but they surely don't follow Portugese language comparison rules.
The output you get is controlled by the session variable and it is how mariadb client interprets the bytes. MCS doesn't change the contents.

Regards,
Roman

ср, 5 июл. 2023 г., 20:46 Glauber Alexandre Brossi da Cunha <glaub...@gmail.com>:
--
You received this message because you are subscribed to the Google Groups "MariaDB ColumnStore" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mariadb-columns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mariadb-columnstore/36a54cd2-3690-4dbf-ab53-2d22412312c9n%40googlegroups.com.

Todd Stoffel

unread,
Jul 5, 2023, 6:58:36 PM7/5/23
to MariaDB ColumnStore
FYI.  ColumnStore has always advised that customers set their system locale, charset and collation to UTF8.

Glauber Alexandre Brossi da Cunha

unread,
Jul 7, 2023, 11:57:04 AM7/7/23
to MariaDB ColumnStore
Good afternoon

Just to stay registered

I'm using this conversion in my select to work around the problem.

CONVERT(C.MEGE_CODIGO USING utf8mb4)

This way I keep the CHARSET of my tables in latin1

Thank you for your help
Att
Glauber

alexey vorovich

unread,
Jul 7, 2023, 12:10:09 PM7/7/23
to Glauber Alexandre Brossi da Cunha, MariaDB ColumnStore
Hi

Devil's advocate question : why not  "set charset and collation to UTF8 " globally as recommended below ? What could be the argument against that ?


------ Original Message ------
From "Glauber Alexandre Brossi da Cunha" <glaub...@gmail.com>
To "MariaDB ColumnStore" <mariadb-c...@googlegroups.com>
Date 7/7/2023 11:57:04 AM
Subject Re: Charset erro

drrtuy

unread,
Jul 22, 2023, 10:30:18 AM7/22/23
to MariaDB ColumnStore
Smart move indeed.
You should be aware that if you have Portuguese utf8 chars only you need to take into account that the symbols are two bytes wide whilst tables with latin1 presumes the symbol is 1 byte long. This difference might end up breaking utf8 symbol in the middle and corrupting the value.

Regards,
Roman

пятница, 7 июля 2023 г. в 18:57:04 UTC+3, Glauber Alexandre Brossi da Cunha:
Reply all
Reply to author
Forward
0 new messages