Error message when importing local csv Python

111 views
Skip to first unread message

Ning

unread,
Apr 20, 2016, 1:09:38 PM4/20/16
to H2O Open Source Scalable Machine Learning - h2ostream
Hi there,

Very basic question, but I seem to be stuck. I have tried the following variations trying to import a csv file into the H2O dataframe:

new = h2o.import_file(os.path.realpath("/Users/myname/Downloads/test.csv"))
new = h2o.import_file("/Users/myname/Downloads/test.csv")
new = h2o.import_file(path = "/Users/myname/Downloads/test.csv")

new = h2o.upload_file(os.path.realpath("/Users/myname/Downloads/test.csv"))
new = h2o.upload_file("/Users/myname/Downloads/test.csv")
new = h2o.upload_file(path = "/Users/myname/Downloads/test.csv")

All of the above give me the same error message:

"EnvironmentError: h2o-py got an unexpected HTTP status code:
 412 Precondition Failed (method = POST; url = http://localhost:54321/3/Parse). 
detailed error messages: 

ERROR MESSAGE:

Illegal argument for field: na_strings of schema: ParseV3: string and key arrays' values must be double quoted, but the client sent: None"

Any suggestions on how I can successfully import a csv file locally as easily as I can pull from an s3 url?

Thank you!

Ning

Lauren DiPerna

unread,
Apr 21, 2016, 6:14:05 PM4/21/16
to Ning, H2O Open Source Scalable Machine Learning - h2ostream
Hi Ning,

What does your test.csv file look like? Can you send your log file?

Does

new = h2o.import_file('air_test.csv')   

work if you try to import the attached file (Note: add your path as you did before "/Users/[your_name]/Downloads/air_test.csv")

--
You received this message because you are subscribed to the Google Groups "H2O Open Source Scalable Machine Learning - h2ostream" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2ostream+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

air_test.csv

Ning

unread,
Apr 26, 2016, 3:33:32 PM4/26/16
to H2O Open Source Scalable Machine Learning - h2ostream, ning...@gmail.com
Hi there,

The air_test.csv works indeed. And I figured out the issue - it looks like the csv file needs to be double quoted. The csv file I have locally was not double quoted. When I tried to load the local file to pandas, this was automatically handled, but the H2O dataframe requires double quotes encoding. 

Mystery solved - thanks a lot for your help!

Cheers,
Ning

Lauren DiPerna

unread,
Apr 27, 2016, 2:55:27 PM4/27/16
to Ning, H2O Open Source Scalable Machine Learning - h2ostream
Hi Ning,

what version of h2o are you using? in the latest version of h2o-3, h2o.import_file() should be able to take both single and double quotes. Thanks for bring this up!

Ning

unread,
Apr 27, 2016, 3:25:56 PM4/27/16
to H2O Open Source Scalable Machine Learning - h2ostream, ning...@gmail.com
Hi Lauren, 

When I initiate H2O in my Jupyter notebook, it says the cluster version is 3.8.2.2. Here's the full printout:
H2O cluster uptime:18 seconds 317 milliseconds
H2O cluster version:3.8.2.2
H2O cluster name:H2O_started_from_python_myname_mbd114
H2O cluster total nodes:1
H2O cluster total free memory:1.76 GB
H2O cluster total cores:4
H2O cluster allowed cores:4
H2O cluster healthy:True
H2O Connection ip:127.0.0.1
H2O Connection port:54321
H2O Connection proxy:None
Python Version:2.7.11


However, I do also get a bunch of deprecation warnings that so far haven't affected performance. Here's the printout:

/Users/myname/anaconda/lib/python2.7/site-packages/IPython/core/formatters.py:92: DeprecationWarning: DisplayFormatter._ipython_display_formatter_default is deprecated: use @default decorator instead.
  def _ipython_display_formatter_default(self):
/Users/myname/anaconda/lib/python2.7/site-packages/IPython/core/formatters.py:98: DeprecationWarning: DisplayFormatter._formatters_default is deprecated: use @default decorator instead.
  def _formatters_default(self):
/Users/myname/anaconda/lib/python2.7/site-packages/IPython/core/formatters.py:677: DeprecationWarning: PlainTextFormatter._deferred_printers_default is deprecated: use @default decorator instead.
  def _deferred_printers_default(self):
/Users/myname/anaconda/lib/python2.7/site-packages/IPython/core/formatters.py:669: DeprecationWarning: PlainTextFormatter._singleton_printers_default is deprecated: use @default decorator instead.
  def _singleton_printers_default(self):
/Users/myname/anaconda/lib/python2.7/site-packages/IPython/core/formatters.py:672: DeprecationWarning: PlainTextFormatter._type_printers_default is deprecated: use @default decorator instead.
  def _type_printers_default(self):
/Users/myname/anaconda/lib/python2.7/site-packages/IPython/core/formatters.py:669: DeprecationWarning: PlainTextFormatter._singleton_printers_default is deprecated: use @default decorator instead.
  def _singleton_printers_default(self):
/Users/myname/anaconda/lib/python2.7/site-packages/IPython/core/formatters.py:672: DeprecationWarning: PlainTextFormatter._type_printers_default is deprecated: use @default decorator instead.
  def _type_printers_default(self):
/Users/myname/anaconda/lib/python2.7/site-packages/IPython/core/formatters.py:677: DeprecationWarning: PlainTextFormatter._deferred_printers_default is deprecated: use @default decorator instead.
  def _deferred_printers_default(self):

Is this what's causing the load to fail for non double-quoted csv files? Or should I upgrade my H2O to version 3.8.2.3?

Cheers,
Ning

jij...@gmail.com

unread,
May 1, 2016, 8:55:33 PM5/1/16
to H2O Open Source Scalable Machine Learning - h2ostream
Same error message like Ning, however my CSV file has no quotes, just comma delimited header row and values... Any suggestions 3.6 version here

Lauren DiPerna

unread,
May 2, 2016, 1:01:49 PM5/2/16
to jij...@gmail.com, H2O Open Source Scalable Machine Learning - h2ostream
Can you provide your log files? Is this the error you are getting :
Illegal argument for field: na_strings of schema: ParseV3: string and key arrays' values must be double quoted, but the client sent: None


I think Ning's issue was that import_file() only worked when she used double quotes around the file name she was importing, rather than if she used single quotes around her file name (i.e: import_file('my_file.csv') didn't work while import_file("my_file.csv") did work.



Reply all
Reply to author
Forward
0 new messages