Quick follow-up: I can pass the data frame in instead of a csv file, but the data frame is missing the table/query name that would identify it. Here’s a full example using trino and Trino’s built-in test database. As you can see, the inclusions say “pandas data frame” instead of the csv file name, making the inclusions ambiguous when column names overlap. Is there a way to set the data frame name? Thanks again! -Ryan.
from sqlalchemy import create_engine
import desbordante
import pandas as pd
engine = create_engine('trino://ry...@nuc.local:8080/tpch/tiny')
df_table1 = pd.read_sql("lineitem", engine)
df_table2 = pd.read_sql("orders", engine)
TABLES = [df_table1, df_table2]
algo = desbordante.ind.algorithms.Default()
algo.load_data(tables=TABLES)
algo.execute()
inds = algo.get_inds()
for ind in inds: print(ind)
Prints:
(Pandas dataframe, [orderkey]) -> (Pandas dataframe, [orderkey])
(Pandas dataframe, [suppkey]) -> (Pandas dataframe, [partkey])
(Pandas dataframe, [linenumber]) -> (Pandas dataframe, [orderkey])
(Pandas dataframe, [linenumber]) -> (Pandas dataframe, [partkey])
(Pandas dataframe, [linenumber]) -> (Pandas dataframe, [suppkey])
(Pandas dataframe, [linenumber]) -> (Pandas dataframe, [orderkey])
(Pandas dataframe, [tax]) -> (Pandas dataframe, [discount])
(Pandas dataframe, [linestatus]) -> (Pandas dataframe, [orderstatus])
(Pandas dataframe, [commitdate]) -> (Pandas dataframe, [shipdate])
(Pandas dataframe, [commitdate]) -> (Pandas dataframe, [receiptdate])
(Pandas dataframe, [orderkey]) -> (Pandas dataframe, [orderkey])
(Pandas dataframe, [custkey]) -> (Pandas dataframe, [partkey])
But it’s hard to tell which data frame is which (the names line item and order have been lost) in the printout. Thanks again- Ryan