"The file name cannot be determined" error with dataset add file api

50 views
Skip to first unread message

aussda....@gmail.com

unread,
Mar 12, 2018, 10:13:41 AM3/12/18
to Dataverse Users Community
Hi Everybody,

We're in the final stage of our Nesstar data migration to dataverse project and have hit a wall with file uploads (the metadata import works fine). We're using dataverse 4.8.4 build 128-ef7dce7, and we've coded the migration project in php 7.2. We work in php so as to create a friendly interface to broaden the migration workload for non-technical staff.

The issue we're having is that not matter how we pass the file path parameter in the curl command we get a "The file name cannot be determined" error. When we curl directly on our linux host the error only happens with the path or file name are incorrect. So for example this

curl -H "X-Dataverse-key:XXXXXXXXXX" -X POST -F 'file=@pdf/data.pdf' http://dev.vdc.ac/api/datasets/10810/add

works as a linux command, but the php curl equivalent invariably fails. In linux, the error only occurs if the the path is incorrect. We've tried relative and absolute paths but nothing seems to work. 

So we're wondering what exactly triggers the "The file name cannot be determined". Is it always a bad filename or path? Could something else be wrong? Is the path relative to the dataverse installation location, or possibly glassfish? 

Please forgive me if these questions sound silly - we're still getting our feet wet with our installation. Hoping there is someone out there will similar experience, even if not working in php.

Thanks very much for the help. Best, Frank
Message has been deleted

Philip Durbin

unread,
Mar 13, 2018, 7:32:59 AM3/13/18
to dataverse...@googlegroups.com
"The file name cannot be determined" is the value stored for the key "file.addreplace.error.filename_undetermined" in Bundle.properties and is only used here: https://github.com/IQSS/dataverse/blob/v4.8.5/src/main/java/edu/harvard/iq/dataverse/datasetutility/AddReplaceFileHelper.java#L921

The code looks like this:

if (fileName == null){
    this.addErrorSevere(getBundleErr("filename_undetermined"));
    return false;

So that error only gets thrown when "fileName" is null.

To dig in a little more, in the "datasets" API (which calls the AddReplaceFileHelper class), "fileName" comes from an object called "FormDataContentDisposition" at https://github.com/IQSS/dataverse/blob/v4.8.5/src/main/java/edu/harvard/iq/dataverse/api/Datasets.java#L793

Here's the code:

String newFilename = contentDispositionHeader.getFileName();

It sounds super frustrating that it's working from curl but not PHP. It's been a while since I've written any PHP but over at http://php.net/manual/en/features.file-upload.post-method.php I'm seeing "$_FILES['userfile']['name']" defined as "The original name of the file on the client machine" and I would guess maybe it's something like that. You might want to try asking on Stack Overflow.

Is this a PHP script that you can run from the command line? Do you plan to open source it? It sounds useful!

Phil


--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse-community@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/f68ef768-44f0-4614-928c-1d3d8c614fa1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

aussda....@gmail.com

unread,
Mar 13, 2018, 11:23:58 AM3/13/18
to Dataverse Users Community
Hi Phil,
Thanks as always for your wonderfully detailed replay. From what I understand of the java code it seems the file name is extracted from file input stream submitted with the form data, indicating that the stream itself is not present? If so this would suggest that the problem is definitely in my php script: somehow the curl command is not finding and attaching the file data properly. I'll keep banging my head on it, until either the problem or my head cracks.

Regarding open sourcing everything, if there is demand then definitely. There are actually a couple of outputs from this project that might be useful to others. The most important for us was the revised ddi-dataverse crosswalk we used, which is basically an adaptation of what ADA assembled and generously shared in this forum last year. We corrected a few bugs, added the social sciences fields, plus some conditional language to dynamically interpret and assign ddi attributes values to dataverse textContent nodes. The php code just takes that extraction and reconstructs a reliable dataverse dataset api json string out of it according to the four basic metadata object types (simple value (text/vocab), simple value array, complex object, complex object array). Between the two stages lies a kind of basecamp layover (basically a big table in a mysql database), where the extracted data can be cleaned and normalized if necessary. In our case this was very necessary, especially for the required fields in dataverse. All of our Nesstar metadata records now live a second life in our dataverse development installation - but without the associated data and supplementary files! In any case, the entire process (not including the cleaning, of course) takes about 4.5 seconds per record.

Thanks again for the help. Best, Frank



To post to this group, send email to dataverse...@googlegroups.com.

Pete Meyer

unread,
Mar 13, 2018, 11:39:13 AM3/13/18
to Dataverse Users Community
Hi Phil,

Sorry if this is a duplicate reply; but is it possible that the PHP code is running as a web server user that has different filesystem permissions than when running curl from the command line?

Best,
Pete

Philip Durbin

unread,
Mar 13, 2018, 10:43:10 PM3/13/18
to dataverse...@googlegroups.com
Hi Frank,

Well, I don't know if the InputStream is null or not. That addFileToDataset method I mentioned take (among other things), a InputStream and a FormDataContentDisposition like this...

@FormDataParam("file") InputStream fileInputStream,
@FormDataParam("file") FormDataContentDisposition contentDispositionHeader

... and the fileName is coming from that contentDispositionHeader object. If you are comfortable bulding a war file, you could try adding some println statements to the Dataverse code to give you more information about what's null where.

As for open sourcing any code, there's no pressure. Mostly I was just thinking I could create an empty dataverse-client-php repo under the IQSS organization and give you push access to it if you have any interest in working on a PHP library that could be mentioned next to the libraries for R, Python, and Java mentioned at http://guides.dataverse.org/en/4.8.5/api/client-libraries.html

Good luck. Please keep asking questions if these back-and-forths are helpful. Also note that Pete was wondering below about filesystem permissions. I'm still not sure what "php curl equivalent" means. From http://php.net/manual/en/curl.installation.php I assume that libcurl is called by the PHP code.

Phil






Pete
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages