SQL PYTHON_SERVICES

390 views
Skip to first unread message

Ian Coetzer

unread,
Feb 6, 2021, 4:42:50 AM2/6/21
to modin-dev
Hi
I managed to install modin into the PYTHON_SERVICES\Scripts folder that comes installed with the SQL Server 'Machine Learning' option.

However, I cannot seem to use modin.pandas successfully within a SQL SCRIPT.
Am I doing something incorrectly?

(If I just pass the input data set out - or via normal pandas it works)
(But I cannot seem to consume the input data set into a modin.pandas data frame and then convert it back to normal pandas and send the output rows)?

CODE:

EXECUTE sp_execute_external_script @language = N'Python'
    , @script = N'

import modin.pandas as pd
import pandas as pdDefault

df1 = pd.DataFrame(InputDataSet)

df2 = pdDefault.DataFrame(df1.copy())
print(type(df1))
print(type(df2))
print(df2.count())

OutputDataSet = df2
',
@input_data_1 = N'SELECT TOP 5 BK_ItemId, 0 AS XXX FROM Sales.FactCreditNote'
WITH RESULT SETS ((BK_ItemId VARCHAR(250), XXX INT))


ERROR:


[11:40:57] Started executing query at Line 6
Msg 11537, Level 16, State 3, Line 1
EXECUTE statement failed because its WITH RESULT SETS clause specified 2 column(s) for result set number 1, but the statement sent 1 column(s) at run time. 
STDERR message(s) from external script:
D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\site-packages\revoscalepy\functions\RxSummary.py:4: FutureWarning: The Panel class is removed from pandas. Accessing it from the top-level namespace will also be removed in the next version
from pandas import DataFrame, Index, Panel
UserWarning: The Dask Engine for Modin is experimental.
YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
UserWarning: Distributing <class 'pandas.core.frame.DataFrame'> object. This may take some time.

SqlSatellite cannot write data buffer. Error code:0x80004004.
SqlSatelliteCall error: SqlSatellite cannot write data buffer. Error code:0x80004004.distributed.nanny - WARNING - Restarting worker
distributed.nanny - ERROR - Nanny failed to start process
Traceback (most recent call last): 
STDERR message(s) from external script:
File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\site-packages\distributed\nanny.py", line 575, in start
await self.process.start()
File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\site-packages\distributed\process.py", line 34, in _call_and_set_future
res = func(*args, **kwargs)
File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\site-packages\distributed\process.py", line 202, in _start
process.start()
File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj) 
STDERR message(s) from external script:
File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
reduction.dump(process_obj, to_child)
File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\multiprocessing\connection.py", line 948, in reduce_pipe_connection
dh = reduction.DupHandle(conn.fileno(), access)
File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\multiprocessing\connection.py", line 170, in fileno
self._check_closed() 
STDERR message(s) from external script:
File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\multiprocessing\connection.py", line 136, in _check_closed
raise OSError("handle is closed")
OSError: handle is closed
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input 
STDOUT message(s) from external script:
<class 'modin.pandas.dataframe.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
0 2
dtype: int64
SqlSatelliteCall function failed. Please see the console output for more information.
Traceback (most recent call last):
File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\site-packages\revoscalepy\computecontext\RxInSqlServer.py", line 605, in rx_sql_satellite_call
rx_native_call("SqlSatelliteCall", params)
File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\site-packages\revoscalepy\RxSerializable.py", line 375, in rx_native_call
ret = px_call(functionname, params)
RuntimeError: revoscalepy function failed. 
Total execution time: 00:00:06.975






Yaroslav Igoshev

unread,
Feb 8, 2021, 2:00:11 AM2/8/21
to modin-dev
Hi,

You can't pass Modin DataFrame to pandas DataFrame constructor because pandas doesn't know anything about it. You should use `df1._to_pandas()` to get entire pandas DataFrame instead.

Kind regards,
Yaroslav

суббота, 6 февраля 2021 г. в 12:42:50 UTC+3, ian.c...@gmail.com:

Ian Coetzer

unread,
Feb 8, 2021, 1:52:25 PM2/8/21
to modin-dev
Hi Yaroslav

Thank you! that makes 100% sense.. Pandas does not know about the Modin - it is the other way around ..
I have taken your advice and managed to take an InputDataSet that was sent to a Modin data frame and convert it successfully back to a Pandas Data Frame.

Do you perhaps know why I cannot apply a lamba function on my Modin frame - when I call this Python script via SQL?
- This works fine when running via a Jupyter notebook. (dfModin["XXX"] = dfModin["XXX"].apply(lambda x: myfunc(x)))




EXECUTE sp_execute_external_script @language = N'Python'
    , @script = N'

import modin.pandas as pd
import pandas as pdDefault

dfModin = pd.DataFrame(InputDataSet)
dfPandas = pdDefault.DataFrame(InputDataSet)

def myfunc(x):
    return x + 10

dfPandas["XXX"] = dfPandas["XXX"].apply(lambda x: myfunc(x))
dfModin["XXX"] = dfModin["XXX"].apply(lambda x: myfunc(x))

OutputDataSet = dfPandas
',
@input_data_1 = N'SELECT TOP 5 CAST(BK_ItemId AS VARCHAR(250)) AS BK_ItemId, CAST(0  AS INT) AS XXX FROM Sales.FactCreditNote'
WITH RESULT SETS ((BK_ItemId VARCHAR(250), XXX INT))


Error in execution.  Check the output for more information.
Traceback (most recent call last):
  File "<string>", line 5, in <module>
  File "D:\ProgramData\MSSQL02\Temp-PY\Appcontainer1\B239FEFC-A541-4C7D-B865-970CBFEFECD0\sqlindb_0.py", line 53, in transform
    dfModin["XXX"] = dfModin["XXX"].apply(lambda x: myfunc(x))

Msg 39019, Level 16, State 2, Line 5
An external script error occurred: 
  File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\site-packages\modin\pandas\base.py", line 2638, in __getitem__
    return self._getitem(key)
  File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\site-packages\modin\pandas\dataframe.py", line 2371, in _getitem
    return self._getitem_column(key)
  File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\site-packages\modin\pandas\dataframe.py", line 1945, in _getitem_column
    query_compiler=self._query_compiler.getitem_column_array([key])
  File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\site-packages\modin\backends\pandas\query_compiler.py", line 2157, in getitem_column_array
    new_modin_frame = self._modin_frame.mask(col_indices=key)
  File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\site-packages\modin\engines\base\frame\data.py", line 614, in mask

Msg 39019, Level 16, State 2, Line 5
An external script error occurred: 
    for row_idx, row_internal_indices in row_partitions_list.items()
  File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\site-packages\modin\engines\base\frame\data.py", line 616, in <listcomp>
    or len(row_internal_indices) > 0
  File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\site-packages\modin\engines\base\frame\data.py", line 612, in <listcomp>
    or len(col_internal_indices) > 0
  File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\site-packages\modin\engines\dask\pandas_on_dask\frame\partition.py", line 99, in mask
    lambda df: pandas.DataFrame(df.iloc[row_indices, col_indices])
  File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\site-packages\modin\engines\dask\pandas_on_dask\frame\partition.py", line 88, in add_to_apply_calls
    self.future, call_queue=self.call_queue + [[pkl.dumps(func), kwargs]]

Msg 39019, Level 16, State 2, Line 5
An external script error occurred: 
  File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\site-packages\cloudpickle\cloudpickle_fast.py", line 102, in dumps
    cp.dump(obj)
  File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\site-packages\cloudpickle\cloudpickle_fast.py", line 563, in dump
    return Pickler.dump(self, obj)
  File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\pickle.py", line 437, in dump
    self.save(obj)
  File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\site-packages\cloudpickle\cloudpickle_fast.py", line 745, in save_function
    *self._dynamic_function_reduce(obj), obj=obj

Msg 39019, Level 16, State 2, Line 5
An external script error occurred: 
  File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\site-packages\cloudpickle\cloudpickle_fast.py", line 682, in _save_reduce_pickle5
    dictitems=dictitems, obj=obj
  File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\pickle.py", line 638, in save_reduce
    save(args)
  File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\pickle.py", line 786, in save_tuple
    save(element)
  File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\pickle.py", line 771, in save_tuple
    save(element)

Msg 39019, Level 16, State 2, Line 5
An external script error occurred: 
  File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\site-packages\dill\_dill.py", line 1119, in save_cell
    f = obj.cell_contents
ValueError: Cell is empty

SqlSatelliteCall error: Error in execution.  Check the output for more information.

Thank you

Ian Coetzer

unread,
Feb 8, 2021, 2:12:36 PM2/8/21
to modin-dev
Hi,

A small edit to my question:

I also tried to create a manual modin data frame and it works in a Jupyter notebook - but not calling same code from with T-SQL?
Why would this be - thank you:

This works as expected from a Jupyer notebook where "pd is imports modin.pandas as pd"

dfModin = pd.DataFrame()
dfModin["XXX"] = [1,2,3]
dfModin["GROUP"] = ["A", "B", "C"]

def myfunc(x):
    return x + 10

dfModin["XXX"] = dfModin["XXX"].apply(lambda x: myfunc(x))
dfModin

the output in a Jupyer notebook is:

XXX    GROUP
11        A1
12        B2
13        C  



Ian Coetzer

unread,
Feb 8, 2021, 2:18:25 PM2/8/21
to modin-dev
Hi Everyone

To re-cap this works if I replace the import modin.pandas as pd with import pandas as pd.
I am trying to use modin since it can scale out to multiple CPUs and want to find out if Modin is useable from within a SQL Script?

When trying this with modin.pandas I hit this error:
Can't pickle local object 'PandasQueryCompiler.insert.<locals>.insert'


EXECUTE sp_execute_external_script @language = N'Python'
    , @script = N'

import modin.pandas as pd

df = pd.DataFrame()
df["XXX"] = [1,2,3]
df["GROUP"] = ["A", "B", "C"]

def myfunc(x):
    return x + 10

df["XXX"] = df["XXX"].apply(lambda x: myfunc(x))
df

OutputDataSet = df
'
WITH RESULT SETS ((XXX INT, [GROUP] VARCHAR(250)))

Devin Petersohn

unread,
Feb 8, 2021, 9:07:05 PM2/8/21
to Ian Coetzer, modin-dev
Hi Ian,

I am wondering if this is an issue with the cloudpickle installation. I am not familiar with SQL Server and how you install python dependencies, but can you try to upgrade your cloudpickle installation (`pip install -U cloudpickle`)? It also looks like dill is involved, so upgrading that as well might help.

Devin

--
You received this message because you are subscribed to the Google Groups "modin-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modin-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/modin-dev/57914f24-2cbb-43be-86e8-5a49735844c3n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages