MySQL parsing not working?

43 views
Skip to first unread message

Yang Liu

unread,
Nov 28, 2017, 3:53:24 AM11/28/17
to antlr-discussion
I use the g4 file provided here: https://github.com/antlr/grammars-v4/tree/master/mysql and successfully build the MySqlParser.py, MySqlLexer.py, MySqlParserVisitor.py, MySqlParserListener.py using following command:

java -Xmx500M -cp "/usr/local/lib/antlr-4.7-complete.jar:$CLASSPATH" org.antlr.v4.Tool -visitor -Dlanguage=Python3 MySqlLexer.g4
java -Xmx500M -cp "/usr/local/lib/antlr-4.7-complete.jar:$CLASSPATH" org.antlr.v4.Tool -visitor -Dlanguage=Python3 MySqlParser.g4


I want to get the table list of a single query, so I write following code:

# -*- coding: utf-8 -*-

from antlr4 import *
from MySqlLexer import MySqlLexer as Lexer
from MySqlParser import MySqlParser as Parser
from MySqlParserVisitor import MySqlParserVisitor as Visitor
from MySqlParserListener import MySqlParserListener as Listener

def evaluate(sql):
    input = InputStream(sql)
    lexer = Lexer(input)
    stream = CommonTokenStream(lexer)
    parser = Parser(stream)
    visitor = Visitor()
    print(parser.expression_list())

if __name__ == '__main__':
    evaluate("SELECT a FROM b")


When executing the Python file, it just complains following error:

(env) ➜  sqlparser3 python3 test.py
line 1:0 mismatched input 'SELECT' expecting {'CASE', 'CAST', 'CONVERT', 'CURRENT_USER', 'DEFAULT', 'EXISTS', 'FALSE', 'IF', 'INTERVAL', 'LEFT', <INVALID>, 'NOT', 'NULL', 'REPLACE', 'RIGHT', 'TRUE', 'VALUES', 'DATE', 'TIME', 'TIMESTAMP', 'DATETIME', 'YEAR', 'CHAR', 'BINARY', 'TEXT', 'ENUM', 'AVG', 'BIT_AND', 'BIT_OR', 'BIT_XOR', 'COUNT', 'GROUP_CONCAT', 'MAX', 'MIN', 'STD', 'STDDEV', 'STDDEV_POP', 'STDDEV_SAMP', 'SUM', 'VAR_POP', 'VAR_SAMP', 'VARIANCE', 'CURRENT_DATE', 'CURRENT_TIME', 'CURRENT_TIMESTAMP', 'LOCALTIME', 'CURDATE', 'CURTIME', 'DATE_ADD', 'DATE_SUB', 'EXTRACT', 'LOCALTIMESTAMP', 'NOW', 'POSITION', 'SUBSTR', 'SUBSTRING', 'SYSDATE', 'TRIM', 'UTC_DATE', 'UTC_TIME', 'UTC_TIMESTAMP', 'ACTION', 'AFTER', 'ALGORITHM', 'ANY', 'AT', 'AUTHORS', 'AUTOCOMMIT', 'AUTOEXTEND_SIZE', 'AUTO_INCREMENT', 'AVG_ROW_LENGTH', 'BEGIN', 'BINLOG', 'BIT', 'BTREE', 'CASCADED', 'CHAIN', 'CHECKSUM', 'CIPHER', 'CLIENT', 'COALESCE', 'CODE', 'COLUMNS', 'COLUMN_FORMAT', 'COMMENT', 'COMMIT', 'COMPACT', 'COMPLETION', 'COMPRESSED', 'CONCURRENT', 'CONNECTION', 'CONSISTENT', 'CONTAINS', 'CONTRIBUTORS', 'COPY', 'DATA', 'DATAFILE', 'DEFINER', 'DELAY_KEY_WRITE', 'DIRECTORY', 'DISABLE', 'DISCARD', 'DISK', 'DO', 'DUMPFILE', 'DUPLICATE', 'DYNAMIC', 'ENABLE', 'ENDS', 'ENGINE', 'ENGINES', 'ERRORS', 'ESCAPE', 'EVEN', 'EVENT', 'EVENTS', 'EVERY', 'EXCHANGE', 'EXCLUSIVE', 'EXPIRE', 'EXTENT_SIZE', 'FIELDS', 'FIRST', 'FIXED', 'FULL', 'FUNCTION', 'GLOBAL', 'GRANTS', 'HASH', 'HOST', 'IDENTIFIED', 'IMPORT', 'INITIAL_SIZE', 'INPLACE', 'INSERT_METHOD', 'INVOKER', 'ISOLATION', 'ISSUER', 'KEY_BLOCK_SIZE', 'LANGUAGE', 'LAST', 'LESS', 'LEVEL', 'LIST', 'LOCAL', 'LOGFILE', 'LOGS', 'MASTER', 'MAX_CONNECTIONS_PER_HOUR', 'MAX_QUERIES_PER_HOUR', 'MAX_ROWS', 'MAX_SIZE', 'MAX_UPDATES_PER_HOUR', 'MAX_USER_CONNECTIONS', 'MERGE', 'MID', 'MIN_ROWS', 'MODIFY', 'MUTEX', 'MYSQL', 'NAME', 'NAMES', 'NCHAR', 'NO', 'NODEGROUP', 'NONE', 'OFFLINE', 'OFFSET', 'OJ', 'OLD_PASSWORD', 'ONLINE', 'ONLY', 'OPTIONS', 'OWNER', 'PACK_KEYS', 'PARSER', 'PARTIAL', 'PARTITIONING', 'PARTITIONS', 'PASSWORD', 'PLUGINS', 'PORT', 'PRESERVE', 'PROCESSLIST', 'PROFILE', 'PROFILES', 'PROXY', 'QUERY', 'QUICK', 'REBUILD', 'REDO_BUFFER_SIZE', 'REDUNDANT', 'RELAYLOG', 'REMOVE', 'REORGANIZE', 'REPAIR', 'REPLICATION', 'RETURNS', 'ROLLBACK', 'ROLLUP', 'ROW', 'ROWS', 'ROW_FORMAT', 'SAVEPOINT', 'SCHEDULE', 'SECURITY', 'SERVER', 'SESSION', 'SHARE', 'SHARED', 'SIGNED', 'SIMPLE', 'SLAVE', 'SNAPSHOT', 'SOCKET', 'SOME', 'SOUNDS', 'SQL_BUFFER_RESULT', 'SQL_CACHE', 'SQL_NO_CACHE', 'START', 'STARTS', 'STATS_AUTO_RECALC', 'STATS_PERSISTENT', 'STATS_SAMPLE_PAGES', 'STATUS', 'STORAGE', 'SUBJECT', 'SUBPARTITION', 'SUBPARTITIONS', 'TABLESPACE', 'TEMPORARY', 'TEMPTABLE', 'THAN', 'TRANSACTION', 'TRUNCATE', 'UNDEFINED', 'UNDOFILE', 'UNDO_BUFFER_SIZE', 'UNKNOWN', 'UPGRADE', 'USER', 'VALUE', 'VARIABLES', 'VIEW', 'WAIT', 'WARNINGS', 'WORK', 'WRAPPER', 'X509', 'XML', 'QUARTER', 'MONTH', 'DAY', 'HOUR', 'MINUTE', 'WEEK', 'SECOND', 'MICROSECOND', 'TABLES', 'ROUTINE', 'EXECUTE', 'FILE', 'PROCESS', 'RELOAD', 'SHUTDOWN', 'SUPER', 'PRIVILEGES', 'ARMSCII8', 'ASCII', 'BIG5', 'CP1250', 'CP1251', 'CP1256', 'CP1257', 'CP850', 'CP852', 'CP866', 'CP932', 'DEC8', 'EUCJPMS', 'EUCKR', 'GB2312', 'GBK', 'GEOSTD8', 'GREEK', 'HEBREW', 'HP8', 'KEYBCS2', 'KOI8R', 'KOI8U', 'LATIN1', 'LATIN2', 'LATIN5', 'LATIN7', 'MACCE', 'MACROMAN', 'SJIS', 'SWE7', 'TIS620', 'UCS2', 'UJIS', 'UTF16', 'UTF16LE', 'UTF32', 'UTF8', 'UTF8MB3', 'UTF8MB4', 'ARCHIVE', 'BLACKHOLE', 'CSV', 'FEDERATED', 'INNODB', 'MEMORY', 'MRG_MYISAM', 'MYISAM', 'NDB', 'NDBCLUSTER', 'PERFOMANCE_SCHEMA', 'REPEATABLE', 'COMMITTED', 'UNCOMMITTED', 'SERIALIZABLE', 'GEOMETRY', 'GEOMETRYCOLLECTION', 'LINESTRING', 'MULTILINESTRING', 'MULTIPOINT', 'MULTIPOLYGON', 'POINT', 'POLYGON', 'ABS', 'ACOS', 'ADDDATE', 'ADDTIME', 'AES_DECRYPT', 'AES_ENCRYPT', 'AREA', 'ASBINARY', 'ASIN', 'ASTEXT', 'ASWKB', 'ASWKT', 'ASYMMETRIC_DECRYPT', 'ASYMMETRIC_DERIVE', 'ASYMMETRIC_ENCRYPT', 'ASYMMETRIC_SIGN', 'ASYMMETRIC_VERIFY', 'ATAN', 'ATAN2', 'BENCHMARK', 'BIN', 'BIT_COUNT', 'BIT_LENGTH', 'BUFFER', 'CEIL', 'CEILING', 'CENTROID', 'CHARACTER_LENGTH', 'CHARSET', 'CHAR_LENGTH', 'COERCIBILITY', 'COLLATION', 'COMPRESS', 'CONCAT', 'CONCAT_WS', 'CONNECTION_ID', 'CONV', 'CONVERT_TZ', 'COS', 'COT', 'CRC32', 'CREATE_ASYMMETRIC_PRIV_KEY', 'CREATE_ASYMMETRIC_PUB_KEY', 'CREATE_DH_PARAMETERS', 'CREATE_DIGEST', 'CROSSES', 'DATEDIFF', 'DATE_FORMAT', 'DAYNAME', 'DAYOFMONTH', 'DAYOFWEEK', 'DAYOFYEAR', 'DECODE', 'DEGREES', 'DES_DECRYPT', 'DES_ENCRYPT', 'DIMENSION', 'DISJOINT', 'ELT', 'ENCODE', 'ENCRYPT', 'ENDPOINT', 'ENVELOPE', 'EQUALS', 'EXP', 'EXPORT_SET', 'EXTERIORRING', 'EXTRACTVALUE', 'FIELD', 'FIND_IN_SET', 'FLOOR', 'FORMAT', 'FOUND_ROWS', 'FROM_BASE64', 'FROM_DAYS', 'FROM_UNIXTIME', 'GEOMCOLLFROMTEXT', 'GEOMCOLLFROMWKB', 'GEOMETRYCOLLECTIONFROMTEXT', 'GEOMETRYCOLLECTIONFROMWKB', 'GEOMETRYFROMTEXT', 'GEOMETRYFROMWKB', 'GEOMETRYN', 'GEOMETRYTYPE', 'GEOMFROMTEXT', 'GEOMFROMWKB', 'GET_FORMAT', 'GET_LOCK', 'GLENGTH', 'GREATEST', 'GTID_SUBSET', 'GTID_SUBTRACT', 'HEX', 'IFNULL', 'INET6_ATON', 'INET6_NTOA', 'INET_ATON', 'INET_NTOA', 'INSTR', 'INTERIORRINGN', 'INTERSECTS', 'ISCLOSED', 'ISEMPTY', 'ISNULL', 'ISSIMPLE', 'IS_FREE_LOCK', 'IS_IPV4', 'IS_IPV4_COMPAT', 'IS_IPV4_MAPPED', 'IS_IPV6', 'IS_USED_LOCK', 'LAST_INSERT_ID', 'LCASE', 'LEAST', 'LENGTH', 'LINEFROMTEXT', 'LINEFROMWKB', 'LINESTRINGFROMTEXT', 'LINESTRINGFROMWKB', 'LN', 'LOAD_FILE', 'LOCATE', 'LOG', 'LOG10', 'LOG2', 'LOWER', 'LPAD', 'LTRIM', 'MAKEDATE', 'MAKETIME', 'MAKE_SET', 'MASTER_POS_WAIT', 'MBRCONTAINS', 'MBRDISJOINT', 'MBREQUAL', 'MBRINTERSECTS', 'MBROVERLAPS', 'MBRTOUCHES', 'MBRWITHIN', 'MD5', 'MLINEFROMTEXT', 'MLINEFROMWKB', 'MONTHNAME', 'MPOINTFROMTEXT', 'MPOINTFROMWKB', 'MPOLYFROMTEXT', 'MPOLYFROMWKB', 'MULTILINESTRINGFROMTEXT', 'MULTILINESTRINGFROMWKB', 'MULTIPOINTFROMTEXT', 'MULTIPOINTFROMWKB', 'MULTIPOLYGONFROMTEXT', 'MULTIPOLYGONFROMWKB', 'NAME_CONST', 'NULLIF', 'NUMGEOMETRIES', 'NUMINTERIORRINGS', 'NUMPOINTS', 'OCT', 'OCTET_LENGTH', 'ORD', 'OVERLAPS', 'PERIOD_ADD', 'PERIOD_DIFF', 'PI', 'POINTFROMTEXT', 'POINTFROMWKB', 'POINTN', 'POLYFROMTEXT', 'POLYFROMWKB', 'POLYGONFROMTEXT', 'POLYGONFROMWKB', 'POW', 'POWER', 'QUOTE', 'RADIANS', 'RAND', 'RANDOM_BYTES', 'RELEASE_LOCK', 'REVERSE', 'ROUND', 'ROW_COUNT', 'RPAD', 'RTRIM', 'SEC_TO_TIME', 'SESSION_USER', 'SHA', 'SHA1', 'SHA2', 'SIGN', 'SIN', 'SLEEP', 'SOUNDEX', 'SQL_THREAD_WAIT_AFTER_GTIDS', 'SQRT', 'SRID', 'STARTPOINT', 'STRCMP', 'STR_TO_DATE', 'ST_AREA', 'ST_ASBINARY', 'ST_ASTEXT', 'ST_ASWKB', 'ST_ASWKT', 'ST_BUFFER', 'ST_CENTROID', 'ST_CONTAINS', 'ST_CROSSES', 'ST_DIFFERENCE', 'ST_DIMENSION', 'ST_DISJOINT', 'ST_DISTANCE', 'ST_ENDPOINT', 'ST_ENVELOPE', 'ST_EQUALS', 'ST_EXTERIORRING', 'ST_GEOMCOLLFROMTEXT', 'ST_GEOMCOLLFROMTXT', 'ST_GEOMCOLLFROMWKB', 'ST_GEOMETRYCOLLECTIONFROMTEXT', 'ST_GEOMETRYCOLLECTIONFROMWKB', 'ST_GEOMETRYFROMTEXT', 'ST_GEOMETRYFROMWKB', 'ST_GEOMETRYN', 'ST_GEOMETRYTYPE', 'ST_GEOMFROMTEXT', 'ST_GEOMFROMWKB', 'ST_INTERIORRINGN', 'ST_INTERSECTION', 'ST_INTERSECTS', 'ST_ISCLOSED', 'ST_ISEMPTY', 'ST_ISSIMPLE', 'ST_LINEFROMTEXT', 'ST_LINEFROMWKB', 'ST_LINESTRINGFROMTEXT', 'ST_LINESTRINGFROMWKB', 'ST_NUMGEOMETRIES', 'ST_NUMINTERIORRING', 'ST_NUMINTERIORRINGS', 'ST_NUMPOINTS', 'ST_OVERLAPS', 'ST_POINTFROMTEXT', 'ST_POINTFROMWKB', 'ST_POINTN', 'ST_POLYFROMTEXT', 'ST_POLYFROMWKB', 'ST_POLYGONFROMTEXT', 'ST_POLYGONFROMWKB', 'ST_SRID', 'ST_STARTPOINT', 'ST_SYMDIFFERENCE', 'ST_TOUCHES', 'ST_UNION', 'ST_WITHIN', 'ST_X', 'ST_Y', 'SUBDATE', 'SUBSTRING_INDEX', 'SUBTIME', 'SYSTEM_USER', 'TAN', 'TIMEDIFF', 'TIMESTAMPADD', 'TIMESTAMPDIFF', 'TIME_FORMAT', 'TIME_TO_SEC', 'TOUCHES', 'TO_BASE64', 'TO_DAYS', 'TO_SECONDS', 'UCASE', 'UNCOMPRESS', 'UNCOMPRESSED_LENGTH', 'UNHEX', 'UNIX_TIMESTAMP', 'UPDATEXML', 'UPPER', 'UUID', 'UUID_SHORT', 'VALIDATE_PASSWORD_STRENGTH', 'VERSION', 'WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS', 'WEEKDAY', 'WEEKOFYEAR', 'WEIGHT_STRING', 'WITHIN', 'YEARWEEK', 'Y', 'X', '+', '-', '!', '~', '(', '0', '1', '2', CHARSET_REVERSE_QOUTE_STRING, START_NATIONAL_STRING_LITERAL, STRING_LITERAL, DECIMAL_LITERAL, HEXADECIMAL_LITERAL, REAL_LITERAL, NULL_SPEC_LITERAL, BIT_STRING, STRING_CHARSET_NAME, ID, REVERSE_QUOTE_ID, LOCAL_ID, GLOBAL_ID}
[]

What would be the problem?

I use the SQLite.g4 in Python2, the generated python file has syntax error.

Yang Liu

unread,
Nov 28, 2017, 5:44:36 AM11/28/17
to antlr-discussion
It seems that there are some bugs in ANTLR python, because the generated file always has syntax error (other language code contained.)

在 2017年11月28日星期二 UTC+8下午4:53:24,Yang Liu写道:
Reply all
Reply to author
Forward
0 new messages