SQL PYTHON_SERVICES

Ian Coetzer

unread,

Feb 6, 2021, 4:42:50 AM2/6/21

to modin-dev

Hi

I managed to install modin into the PYTHON_SERVICES\Scripts folder that comes installed with the SQL Server 'Machine Learning' option.

However, I cannot seem to use modin.pandas successfully within a SQL SCRIPT.

Am I doing something incorrectly?

(If I just pass the input data set out - or via normal pandas it works)

(But I cannot seem to consume the input data set into a modin.pandas data frame and then convert it back to normal pandas and send the output rows)?

CODE:

EXECUTE sp_execute_external_script @language = N'Python'

, @script = N'

import modin.pandas as pd

import pandas as pdDefault

df1 = pd.DataFrame(InputDataSet)

df2 = pdDefault.DataFrame(df1.copy())

print(type(df1))

print(type(df2))

print(df2.count())

OutputDataSet = df2

',

@input_data_1 = N'SELECT TOP 5 BK_ItemId, 0 AS XXX FROM Sales.FactCreditNote'

WITH RESULT SETS ((BK_ItemId VARCHAR(250), XXX INT))

ERROR:

[11:40:57] Started executing query at Line 6

Msg 11537, Level 16, State 3, Line 1

EXECUTE statement failed because its WITH RESULT SETS clause specified 2 column(s) for result set number 1, but the statement sent 1 column(s) at run time.

STDERR message(s) from external script:

D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\site-packages\revoscalepy\functions\RxSummary.py:4: FutureWarning: The Panel class is removed from pandas. Accessing it from the top-level namespace will also be removed in the next version

from pandas import DataFrame, Index, Panel

UserWarning: The Dask Engine for Modin is experimental.

YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.

UserWarning: Distributing <class 'pandas.core.frame.DataFrame'> object. This may take some time.

SqlSatellite cannot write data buffer. Error code:0x80004004.

SqlSatelliteCall error: SqlSatellite cannot write data buffer. Error code:0x80004004.distributed.nanny - WARNING - Restarting worker

distributed.nanny - ERROR - Nanny failed to start process

Traceback (most recent call last):

STDERR message(s) from external script:

File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\site-packages\distributed\nanny.py", line 575, in start

await self.process.start()

File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\site-packages\distributed\process.py", line 34, in _call_and_set_future

res = func(*args, **kwargs)

File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\site-packages\distributed\process.py", line 202, in _start

process.start()

File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\multiprocessing\process.py", line 112, in start

self._popen = self._Popen(self)

File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\multiprocessing\context.py", line 223, in _Popen

return _default_context.get_context().Process._Popen(process_obj)

STDERR message(s) from external script:

File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\multiprocessing\context.py", line 322, in _Popen

return Popen(process_obj)

File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__

reduction.dump(process_obj, to_child)

File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\multiprocessing\reduction.py", line 60, in dump

ForkingPickler(file, protocol).dump(obj)

File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\multiprocessing\connection.py", line 948, in reduce_pipe_connection

dh = reduction.DupHandle(conn.fileno(), access)

File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\multiprocessing\connection.py", line 170, in fileno

self._check_closed()

STDERR message(s) from external script:

File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\multiprocessing\connection.py", line 136, in _check_closed

raise OSError("handle is closed")

OSError: handle is closed

Traceback (most recent call last):

File "<string>", line 1, in <module>

File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\multiprocessing\spawn.py", line 105, in spawn_main

exitcode = _main(fd)

File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\multiprocessing\spawn.py", line 115, in _main

self = reduction.pickle.load(from_parent)

EOFError: Ran out of input

STDOUT message(s) from external script:

0 2

dtype: int64

SqlSatelliteCall function failed. Please see the console output for more information.

Traceback (most recent call last):

File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\site-packages\revoscalepy\computecontext\RxInSqlServer.py", line 605, in rx_sql_satellite_call

rx_native_call("SqlSatelliteCall", params)

File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\site-packages\revoscalepy\RxSerializable.py", line 375, in rx_native_call

ret = px_call(functionname, params)

RuntimeError: revoscalepy function failed.

Total execution time: 00:00:06.975

Yaroslav Igoshev

unread,

Feb 8, 2021, 2:00:11 AM2/8/21

to modin-dev

Hi,

You can't pass Modin DataFrame to pandas DataFrame constructor because pandas doesn't know anything about it. You should use `df1._to_pandas()` to get entire pandas DataFrame instead.

Kind regards,

Yaroslav

суббота, 6 февраля 2021 г. в 12:42:50 UTC+3, ian.c...@gmail.com:

Ian Coetzer

unread,

Feb 8, 2021, 1:52:25 PM2/8/21

to modin-dev

Hi Yaroslav

Thank you! that makes 100% sense.. Pandas does not know about the Modin - it is the other way around ..

I have taken your advice and managed to take an InputDataSet that was sent to a Modin data frame and convert it successfully back to a Pandas Data Frame.

Do you perhaps know why I cannot apply a lamba function on my Modin frame - when I call this Python script via SQL?

- This works fine when running via a Jupyter notebook. (dfModin["XXX"] = dfModin["XXX"].apply(lambda x: myfunc(x)))

EXECUTE sp_execute_external_script @language = N'Python'

, @script = N'

import modin.pandas as pd

import pandas as pdDefault

dfModin = pd.DataFrame(InputDataSet)

dfPandas = pdDefault.DataFrame(InputDataSet)

def myfunc(x):

return x + 10

dfPandas["XXX"] = dfPandas["XXX"].apply(lambda x: myfunc(x))

dfModin["XXX"] = dfModin["XXX"].apply(lambda x: myfunc(x))

OutputDataSet = dfPandas

',

@input_data_1 = N'SELECT TOP 5 CAST(BK_ItemId AS VARCHAR(250)) AS BK_ItemId, CAST(0 AS INT) AS XXX FROM Sales.FactCreditNote'

WITH RESULT SETS ((BK_ItemId VARCHAR(250), XXX INT))

Error in execution. Check the output for more information.

Traceback (most recent call last):

File "<string>", line 5, in <module>

File "D:\ProgramData\MSSQL02\Temp-PY\Appcontainer1\B239FEFC-A541-4C7D-B865-970CBFEFECD0\sqlindb_0.py", line 53, in transform

dfModin["XXX"] = dfModin["XXX"].apply(lambda x: myfunc(x))

Msg 39019, Level 16, State 2, Line 5

An external script error occurred:

File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\site-packages\modin\pandas\base.py", line 2638, in __getitem__

return self._getitem(key)

File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\site-packages\modin\pandas\dataframe.py", line 2371, in _getitem

return self._getitem_column(key)

File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\site-packages\modin\pandas\dataframe.py", line 1945, in _getitem_column

query_compiler=self._query_compiler.getitem_column_array([key])

File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\site-packages\modin\backends\pandas\query_compiler.py", line 2157, in getitem_column_array

new_modin_frame = self._modin_frame.mask(col_indices=key)

File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\site-packages\modin\engines\base\frame\data.py", line 614, in mask

Msg 39019, Level 16, State 2, Line 5

An external script error occurred:

for row_idx, row_internal_indices in row_partitions_list.items()

File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\site-packages\modin\engines\base\frame\data.py", line 616, in <listcomp>

or len(row_internal_indices) > 0

File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\site-packages\modin\engines\base\frame\data.py", line 612, in <listcomp>

or len(col_internal_indices) > 0

File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\site-packages\modin\engines\dask\pandas_on_dask\frame\partition.py", line 99, in mask

lambda df: pandas.DataFrame(df.iloc[row_indices, col_indices])

File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\site-packages\modin\engines\dask\pandas_on_dask\frame\partition.py", line 88, in add_to_apply_calls

self.future, call_queue=self.call_queue + [[pkl.dumps(func), kwargs]]

Msg 39019, Level 16, State 2, Line 5

An external script error occurred:

File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\site-packages\cloudpickle\cloudpickle_fast.py", line 102, in dumps

cp.dump(obj)

File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\site-packages\cloudpickle\cloudpickle_fast.py", line 563, in dump

return Pickler.dump(self, obj)

File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\pickle.py", line 437, in dump

self.save(obj)

File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\pickle.py", line 504, in save

f(self, obj) # Call unbound method with explicit self

File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\site-packages\cloudpickle\cloudpickle_fast.py", line 745, in save_function

*self._dynamic_function_reduce(obj), obj=obj

Msg 39019, Level 16, State 2, Line 5

An external script error occurred:

File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\site-packages\cloudpickle\cloudpickle_fast.py", line 682, in _save_reduce_pickle5

dictitems=dictitems, obj=obj

File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\pickle.py", line 638, in save_reduce

save(args)

File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\pickle.py", line 504, in save

f(self, obj) # Call unbound method with explicit self

File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\pickle.py", line 786, in save_tuple

save(element)

File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\pickle.py", line 504, in save

f(self, obj) # Call unbound method with explicit self

File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\pickle.py", line 771, in save_tuple

save(element)

Msg 39019, Level 16, State 2, Line 5

An external script error occurred:

File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\pickle.py", line 504, in save

f(self, obj) # Call unbound method with explicit self

File "D:\Software\Microsoft SQL Server\MSSQL15.MSSQL02\PYTHON_SERVICES\lib\site-packages\dill\_dill.py", line 1119, in save_cell

f = obj.cell_contents

ValueError: Cell is empty

SqlSatelliteCall error: Error in execution. Check the output for more information.

Thank you

Ian Coetzer

unread,

Feb 8, 2021, 2:12:36 PM2/8/21

to modin-dev

Hi,

A small edit to my question:

I also tried to create a manual modin data frame and it works in a Jupyter notebook - but not calling same code from with T-SQL?

Why would this be - thank you:

This works as expected from a Jupyer notebook where "pd is imports modin.pandas as pd"

dfModin = pd.DataFrame()

dfModin["XXX"] = [1,2,3]

dfModin["GROUP"] = ["A", "B", "C"]

def myfunc(x):

return x + 10

dfModin["XXX"] = dfModin["XXX"].apply(lambda x: myfunc(x))

dfModin

the output in a Jupyer notebook is:

XXX GROUP

11 A1
12 B2
13 C

Ian Coetzer

unread,

Feb 8, 2021, 2:18:25 PM2/8/21

to modin-dev

Hi Everyone

To re-cap this works if I replace the import modin.pandas as pd with import pandas as pd.

I am trying to use modin since it can scale out to multiple CPUs and want to find out if Modin is useable from within a SQL Script?

When trying this with modin.pandas I hit this error:

Can't pickle local object 'PandasQueryCompiler.insert.<locals>.insert'

EXECUTE sp_execute_external_script @language = N'Python'

, @script = N'

import modin.pandas as pd

df = pd.DataFrame()

df["XXX"] = [1,2,3]

df["GROUP"] = ["A", "B", "C"]

def myfunc(x):

return x + 10

df["XXX"] = df["XXX"].apply(lambda x: myfunc(x))

df

OutputDataSet = df

'

WITH RESULT SETS ((XXX INT, [GROUP] VARCHAR(250)))

Devin Petersohn

unread,

Feb 8, 2021, 9:07:05 PM2/8/21

to Ian Coetzer, modin-dev

Hi Ian,

I am wondering if this is an issue with the cloudpickle installation. I am not familiar with SQL Server and how you install python dependencies, but can you try to upgrade your cloudpickle installation (`pip install -U cloudpickle`)? It also looks like dill is involved, so upgrading that as well might help.

Devin

--
You received this message because you are subscribed to the Google Groups "modin-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modin-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/modin-dev/57914f24-2cbb-43be-86e8-5a49735844c3n%40googlegroups.com.

Reply all

Reply to author

Forward