Encoding troubles with pythong, mysql and utf8mb4

186 views
Skip to first unread message

imad.youb...@gmail.com

unread,
Aug 22, 2018, 3:52:30 AM8/22/18
to sqlalchemy
I get the following warnings, when trying to save a simple dataframe to mysql.: 

> C:\...\anaconda3\lib\site-packages\pymysql\cursors.py:170: Warning: (1366, "Incorrect string value: '\\x92\\xE9t\\xE9)' for column 'VARIABLE_VALUE' at row 518")
  result = self._query(query)

And
> C:\...anaconda3\lib\site-packages\pymysql\cursors.py:170: Warning:
> (3719, "'utf8' is currently an alias for the character set UTF8MB3,
> but will be an alias for UTF8MB4 in a future release. Please consider
> using UTF8MB4 in order to be unambiguous.")   result =
> self._query(query)

Environment info : I use Mysql8, python3.6 (pymysql 0.9.2, sqlalchemy 1.2.1)

I visited posts like the one linked bellow, none of which seem to give a solution as to how to avoid this warning.

 - [MySQL “incorrect string value” error when save unicode string in Django][1] -> Indication is to use UTF8


N.B : The Collation in the table within mysql doesn't seem to be set to the one I specified in the `create_db` function within the `Connection` class.

The executable code:

    import DataEngine.db.Connection as connection
    import random
    import pandas as pd
    
    if __name__ == "__main__":
        conn = connection.Connection(host="host_name", port="3306", user="username", password="password")
        conn.create_db("raw_data")
        conn.establish("raw_data")
        l1 = []
        for i in range(100):
            l_nested = []
            for j in range(10):
                l_nested.append(random.randint(0, 100))
            l1.append(l_nested)
        df = pd.DataFrame(l1)
    
        conn.save(df, "random_df")
        # df2 = conn.retrieve("random_df")

My `Connection class`

    import sqlalchemy
    import pymysql
    import pandas as pd
    
    
    class Connection:
        def __init__(self: object, host: str, port: str, user: str, password: str):
            self.host = host
            self.port = port
            self.user = user
            self.password = password
            self.conn = None
    
        def create_db(self: object, db_name: str, charset: str = "utf8mb4", collate:str ="utf8mb4_unicode_ci",drop_if_exists: bool = True):
            c = pymysql.connect(host=self.host, user=self.user, password=self.password)
            if drop_if_exists:
                c.cursor().execute("DROP DATABASE IF EXISTS " + db_name)
            c.cursor().execute("CREATE DATABASE " + db_name + " CHARACTER SET=" + charset + " COLLATE=" + collate)
            c.close()
            print("Database %s created with a %s charset" % (db_name, charset))
    
        def establish(self: object, db_name: str, charset: str = "utf8mb4"):
            self.conn = sqlalchemy.create_engine(
                "mysql+pymysql://" + self.user + ":" + self.password + "@" + self.host + ":" + self.port + "/" + db_name +
                "?charset=" + charset)
            print("Connection with database : %s has been established as %s at %s." % (db_name, self.user, self.host))
            print("Charset : %s" % charset)
    
        def retrieve(self, table):
            df = pd.read_sql_table(table, self.conn)
            return df
    
        def save(self: object, df: "Pandas.DataFrame", table: str, if_exists: str = "replace", chunksize: int = 10000):
            df.to_sql(name=table, con=self.conn, if_exists=if_exists, chunksize=chunksize)

Some elements that might help:
[![Database is of charset utf8mb4][2]][2]
[![Table columns don't seem to have the correct collation even though I specified it when creating the db][3]][3]



Simon King

unread,
Aug 22, 2018, 4:37:30 AM8/22/18
to sqlal...@googlegroups.com
I've never used Pandas, so this may not make any sense, but where does
the column "VARIABLE_VALUE" come from? Is it a column in your
dataframe?

Simon
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
> ---
> You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.
> To post to this group, send email to sqlal...@googlegroups.com.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

imad.youb...@gmail.com

unread,
Aug 22, 2018, 5:31:19 AM8/22/18
to sqlalchemy
Hello,

That's a good question actually. If you follow my Stackoverflow Post you'll see I've updated it with the following information :

The dataframe that is persisted in the database is :

   index   0   1   2   3   4   5   6   7   8   9
0      0  11  57  75  45  81  70  91  66  93  96
1      1  51  43   3  64   2   6  93   5  49  40
2      2  35  80  76  11  23  87  19  32  13  98
3      3  82  10  69  40  34  66  42  24  82  59
4      4  49  74  39  61  14  63  94  92  82  85
5      5  50  47  90  75  48  77  17  43   5  29
6      6  70  40  78  60  29  48  52  48  39  36
7      7  21  87  41  53  95   3  31  67  50  30
8      8  72  79  73  82  20  15  51  14  38  42
9      9  68  71  11  17  48  68  17  42  83  95


So I actually have no idea where that column name is referenced.

Best regards,
Imad.

imad.youb...@gmail.com

unread,
Aug 22, 2018, 5:36:57 AM8/22/18
to sqlalchemy
Sorry for my last messy message, here's a better version.

That's a good question actually. If you follow my Stackoverflow Post you'll see I've updated it with the following information :


The dataframe that is persisted in the database (See attached image):


DataFrame.JPG


So I actually have no idea wherethe column "VARIABLE_VALUE" is coming from.


On Wednesday, August 22, 2018 at 10:37:30 AM UTC+2, Simon King wrote:

Simon King

unread,
Aug 22, 2018, 6:20:22 AM8/22/18
to sqlal...@googlegroups.com
According to https://dev.mysql.com/doc/refman/5.7/en/variables-table.html, VARIABLE_VALUE is a column in the INFORMATION_SCHEMA GLOBAL_VARIABLES and SESSION_VARIABLES tables.

You could use the python warnings filter to turn these warnings into exceptions:

import warnings
warnings.simplefilter("error")

...which will give you an exception and a stack trace, so you can see where these warnings are coming from.

Simon

Mike Bayer

unread,
Aug 22, 2018, 12:14:46 PM8/22/18
to sqlal...@googlegroups.com
This user's question is also on bitbucket where I've summarized what is likely going on, so we can avoid doing the same work twice here:  https://bitbucket.org/zzzeek/sqlalchemy/issues/4085/warning-1366-incorrect-string-value#comment-47394537

To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscribe@googlegroups.com.

To post to this group, send email to sqlal...@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
 
http://www.sqlalchemy.org/
 
To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages