Re: pandas dataframe to tensorflow dataset with textual data

Message has been deleted

Paige Bailey

unread,

Apr 16, 2020, 3:49:41 PM4/16/20

to Jeff Verdegan, Discuss

Hi, Jeff -

Thanks for the question! Does this tutorial help?

https://www.tensorflow.org/tutorials/load_data/pandas_dataframe

On Thu, Apr 16, 2020 at 12:45 PM Jeff Verdegan <jver...@youmail.com> wrote:

Hi there! I'm just getting started with TF and pandas, so if this has an obvious answer that I'm just not finding, please point me toward the appropriate docs.

I have a 2-column CSV file: match (Y/N) and text (a few sentences, some of which match my criteria and some of which don't).

I'm following the example at https://www.tensorflow.org/tutorials/structured_data/feature_columns, and it works with their sample data which is all numerical (except for the popped label column).

However, when I try to use my data, I get
ValueError: Can't convert Python sequence with mixed types to Tensor.

As I mentioned, my input data is all text, but when I head() the dataframe, it seems pandas has injected a row number column.
TypeError: Could not build a TypeSpec for 10972 blah blah this is my text

I even removed all the digits from my sample data, so as far as I can tell, it's the pandas injected column number that from_tensor_slices doesn't like.

Is there something I should use instead of from_tensor_slices , or ahead of it to prepare the data?
Or an easy way to tell it to treat the injected column numbers as text?
Or tell it to ignore it?

This is the example code I copied from the above link.
Thanks for any guidance you can give!
# A utility method to create a tf.data dataset from a Pandas Dataframe
def df_to_dataset(dataframe, shuffle=True, batch_size=32):
  dataframe = dataframe.copy()
  labels = dataframe.pop('target')
  ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), labels)) # FAILS HERE
  if shuffle:
    ds = ds.shuffle(buffer_size=len(dataframe))
  ds = ds.batch(batch_size)
  return ds
--
You received this message because you are subscribed to the Google Groups "Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/49b8e490-1c73-4090-9c5b-08b9da718b1e%40tensorflow.org.

--

•

Paige Bailey

Product Manager (TensorFlow)

@DynamicWebPaige

webp...@google.com

Jeff Verdegan

unread,

Apr 16, 2020, 3:54:06 PM4/16/20

to Discuss

Hey, Paige,

Thanks for the quick response!

Unfortunately, that doesn't help. That's the tutorial I'm starting from. It works using their sample data, which is all numbers (except the "target" column, which gets stripped off for the classification labels).

My problem seems to be that my textual data + the column number that pandas seems to be injecting is making it look like it's mixed types. So ultimately I think I need a way to tell it to ignore that injected column when turning the pandas dataframe into a TF Dataset. (Although, as a rank newbie, I could be just missing something obvious.)

To unsubscribe from this group and stop receiving emails from it, send an email to dis...@tensorflow.org.

To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/49b8e490-1c73-4090-9c5b-08b9da718b1e%40tensorflow.org.

Jeff Verdegan

unread,

Apr 16, 2020, 4:35:38 PM4/16/20

to Discuss

Okay, so I'm totally wrong about the root problem.

I should've done this sooner, but I stripped my data down to just a few rows, and it worked fine.

So apparently there's something in the data itself, but only in certain rows that's making it look like "mixed types."

So any help in tracking down that would be much appreciated.

Thanks!

Jeff Verdegan

unread,

Apr 16, 2020, 5:30:35 PM4/16/20

to Discuss

It turned out to be the dumbest thing ever, and not related to TF or pandas at all.

At least one of my data rows had a newline embedded in the sample text, so it create an extra line with only one column.

Pardon the intrusion, you may continue with your social distancing exercises. :-)

Reply all

Reply to author

Forward