Garbage temp files dropped by TupleSolrOutputFormat

19 views
Skip to first unread message

Eric Palacios

unread,
Sep 3, 2013, 10:30:23 AM9/3/13
to pangoo...@googlegroups.com
I've been noticing in unit testing that TupleSolrOutputFormat doesn't clean some temp files. 

As an example i got this in the last execution : 

/tmp/solr2311067538494006727zip
/tmp/d75e66bf-2c88-40d4-8ec3-c06b0f94ab46.solr.zip

Inspecting the code i found that in TupleSolrOutputFormat (line 149) the first file '/tmp/solr2311067538494006727zip'  is created by : 

File tmpZip = File.createTempFile("solr", "zip");

This local file is inmediately copied to an hdfs file, so this could be safely removed afterwards ? 

In the other hand, the file  '/tmp/d75e66bf-2c88-40d4-8ec3-c06b0f94ab46.solr.zip' is treated by SolrRecordWriter accessing it via DC (it doesn't receive the full path, only : d75e66bf-2c88-40d4-8ec3-c06b0f94ab46.solr.zip). 

Should be SolrRecordWriter responsible of cleaning this in SolrRecordWriter.close() ?

If so, i think that it would be safe to reconstruct the full path inside SolrRecordWriter prepending /tmp to it.

What do you think guys? I have a patch for this.

Regards





Iván de Prado

unread,
Sep 3, 2013, 12:31:47 PM9/3/13
to pangoo...@googlegroups.com
Mi opinion is that the temporary file tmpZip can be safely removed after it was uploaded to the HDFS, although is not an important issue, as the generated files are small and the tmp folder is cleaned at each reboot of the machine. 

But I think that the SolrRecordWriter.close() SHOULD NOT remove the file that was uploaded to the HDFS. The reason is that it is unsafe, as one RecordWrite could have finished even before than other RecordWriter asociated to another task has initiated. In this case, the uninitiated RecordWriter would fail as the file couldn't be copied to the local DC. 

Iván



2013/9/3 Eric Palacios <epa...@gmail.com>

--
Has recibido este mensaje porque estás suscrito al grupo "pangool-user" de Grupos de Google.
Para anular la suscripción a este grupo y dejar de recibir sus correos electrónicos, envía un correo electrónico a pangool-user...@googlegroups.com.
Para obtener más opciones, visita https://groups.google.com/groups/opt_out.



--
Iván de Prado
CEO & Co-founder
Reply all
Reply to author
Forward
0 new messages