BigDL Orca training with dataset is text

50 views
Skip to first unread message

AN-TRUONG Tran Phan

unread,
Nov 11, 2023, 8:53:37 PM11/11/23
to User Group for BigDL
Dear all,

I have a project to detect whether email is spam or ham. I finished training on TensorFlow 2 and Keras in a single computer environment. I changed the environment to using BigDL Orca on Spark, and my code has an error like the below. I need help with tf dataset-type text.

Thank you so much

  File "/home/ubuntu/anaconda3/envs/py37/lib/python3.7/site-packages/ray/serialization.py", line 361, in serialize
    return self._serialize_to_msgpack(value)
  File "/home/ubuntu/anaconda3/envs/py37/lib/python3.7/site-packages/ray/serialization.py", line 341, in _serialize_to_msgpack
    self._serialize_to_pickle5(metadata, python_objects)
  File "/home/ubuntu/anaconda3/envs/py37/lib/python3.7/site-packages/ray/serialization.py", line 301, in _serialize_to_pickle5
    raise e
  File "/home/ubuntu/anaconda3/envs/py37/lib/python3.7/site-packages/ray/serialization.py", line 298, in _serialize_to_pickle5
    value, protocol=5, buffer_callback=writer.buffer_callback)
  File "/home/ubuntu/anaconda3/envs/py37/lib/python3.7/site-packages/ray/cloudpickle/cloudpickle_fast.py", line 73, in dumps
    cp.dump(obj)
  File "/home/ubuntu/anaconda3/envs/py37/lib/python3.7/site-packages/ray/cloudpickle/cloudpickle_fast.py", line 620, in dump
    return Pickler.dump(self, obj)
  File "/home/ubuntu/anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 1074, in __reduce__
    return convert_to_tensor, (self._numpy(),)
  File "/home/ubuntu/anaconda3/envs/py37/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 1117, in _numpy
    raise core._status_to_exception(e) from None  # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot convert a Tensor of dtype variant to a NumPy array.
Stopping orca context
log
mail_data.csv
code.py

huangka...@gmail.com

unread,
Nov 12, 2023, 9:16:17 PM11/12/23
to User Group for BigDL
Hi,

I think you need to put the datasets in a creator function as well, e.g. https://github.com/intel-analytics/BigDL/blob/main/python/orca/tutorial/tf/transfer_learning.py#L65
Can you try it to see if this error can be resolved?

Thanks,
Kai

AN-TRUONG Tran Phan

unread,
Nov 17, 2023, 4:31:27 AM11/17/23
to huangka...@gmail.com, User Group for BigDL
Dear Kai and all,

My data is "text" from a file CSV, and it is not the correct format with BigDL Orca on Spark.

I am trying to put data from the CSV file into BigDL dataset format but without success. Does anyone have any other ways similar to mine?

Best regards,
Truong


--
You received this message because you are subscribed to the Google Groups "User Group for BigDL" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bigdl-user-gro...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bigdl-user-group/fa70c4e6-5801-4dad-ba6d-cfe8c08ca650n%40googlegroups.com.


--
Trân Trọng,

An Trường.

huangka...@gmail.com

unread,
Nov 21, 2023, 9:30:42 PM11/21/23
to User Group for BigDL
Hi Truong,

Sorry for the late reply.

Actually the step here:  https://bigdl.readthedocs.io/en/latest/doc/Orca/Howto/tf2keras-quickstart.html#step-3-define-the-dataset is not BigDL specific, it is just a function that creates a dataset acceptable by TensorFlow. Basically you can utilize how you use TensorFlow to handle the CSV file here as well.

Alternatively, you can use SparkXShards API in BigDL to read csv files? Example here: https://github.com/intel-analytics/BigDL/blob/main/python/orca/tutorial/NCF/process_xshards.py#L74

Thanks,
Kai

Reply all
Reply to author
Forward
0 new messages