Extract all references tables?

685 views
Skip to first unread message

Nix

unread,
Oct 21, 2010, 1:32:15 PM10/21/10
to sqlparse
I want to be able to extract a list of all the referenced tables
regardless of statement complexity, i.e. subquery depth. Is that
currently possible?
e.g.

select K.a from (select H.b from (select G.c from (select F.d from
(select E.e from A, B, C, D, E), F), G), H), I, J, K;

I want to know the query references tables A, B, C, D, E, F, G, H, I,
J, and K

Andi Albrecht

unread,
Oct 22, 2010, 7:18:25 AM10/22/10
to sqlp...@googlegroups.com
Nix <robert...@gmail.com> writes:

Hi,

thanks for asking!

The bad news are that there is no API for query introspection. The
(almost) good news are that such questions help to build such an
API... :-)

Here's an example how to extract table names from a query. This example
may not work with all kinds of SQL statements, but at least it
illustrates how it *could* work (and maybe it solves you problem
anyway).

I've added this script to the "examples/" directory too.

The approach here is very trivial - and maybe too simplistic as a
general approach. The first function (extract_from_part) simply returns
a stream of all elements found after the first occurrence of "FROM". It
also flattens the nested statements. The second function
(extract_table_identifiers) extracts the table names from that stream.

Andi


# -- extract_table_names.py
sql = """


select K.a from (select H.b from (select G.c from (select F.d from
(select E.e from A, B, C, D, E), F), G), H), I, J, K;
"""

import sqlparse
from sqlparse.sql import IdentifierList, Identifier
from sqlparse.tokens import Keyword, DML

def is_subselect(parsed):
if not parsed.is_group():
return False
for item in parsed.tokens:
if item.ttype is DML and item.value.upper() == 'SELECT':
return True
return False

def extract_from_part(parsed):
from_seen = False
for item in parsed.tokens:
if from_seen:
if is_subselect(item):
for x in extract_from_part(item):
yield x
else:
yield item
elif item.ttype is Keyword and item.value.upper() == 'FROM':
from_seen = True

def extract_table_identifiers(token_stream):
for item in token_stream:
if isinstance(item, IdentifierList):
for identifier in item.get_identifiers():
yield identifier.get_name()
elif isinstance(item, Identifier):
yield item.get_name()
# It's a bug to check for Keyword here, but in the example
# above some tables names are identified as keywords...
elif item.ttype is Keyword:
yield item.value

def extract_tables():
stream = extract_from_part(sqlparse.parse(sql)[0])
return list(extract_table_identifiers(stream))

if __name__ == '__main__':
print 'Tables: %s' % ', '.join(extract_tables())

# -------------

Donald Lindsay

unread,
Nov 11, 2016, 3:22:35 AM11/11/16
to sqlparse, albrec...@googlemail.com
This doesn't seem to work with JOINs ?

I replaced the canned string (assigned to variable 'sql') with something I had lying around:

        SELECT c1  FROM t1 LEFT JOIN t2 ON t2.c2 = t1.c3;
       
The code only found table t1, and missed t2.  ??
Reply all
Reply to author
Forward
0 new messages