Concatenating multiple flat files without repeating header

842 views
Skip to first unread message

Owen S.

unread,
Mar 16, 2017, 12:45:42 AM3/16/17
to Nextflow
Say I have several files with headers, like CSV or Tab-sep TXT files, produced by a process.

I want to combine them into one file.  Like just "cat" them all together, but with the header line appearing just once, at first line of the file.

This is a simple thing to do, and I can accomplish okay using sed or awk, but I was thinking there might be a more elegant solution in pure Nextflow.  (Especially because it seems like it would be a common task.)

Thanks
Owen

Paolo Di Tommaso

unread,
Mar 16, 2017, 11:41:11 AM3/16/17
to nextflow
To combine multiple files you can use the collectFile operator though, it would be required some extra coding to remove the header line. 

I agree that this could be a common use case. I've opened a feature request on GH to add this feature. 



Cheers,
Paolo


--
You received this message because you are subscribed to the Google Groups "Nextflow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/nextflow.
For more options, visit https://groups.google.com/d/optout.

Owen S.

unread,
Mar 16, 2017, 12:35:25 PM3/16/17
to Nextflow
Great, thanks.

In the meantime, I am getting the desired behavior with awk, like:

awk 'FNR==1 && NR!=1{next;}{print}' ${fileList} > combined_output.txt

But would be NiceToHave if nf could do this without the callout to awk.

Thanks again Paolo
Owen

On Thursday, March 16, 2017 at 8:41:11 AM UTC-7, Paolo Di Tommaso wrote:
To combine multiple files you can use the collectFile operator though, it would be required some extra coding to remove the header line. 

I agree that this could be a common use case. I've opened a feature request on GH to add this feature. 



Cheers,
Paolo

On Thu, Mar 16, 2017 at 5:45 AM, Owen S. <owen.s...@gmail.com> wrote:
Say I have several files with headers, like CSV or Tab-sep TXT files, produced by a process.

I want to combine them into one file.  Like just "cat" them all together, but with the header line appearing just once, at first line of the file.

This is a simple thing to do, and I can accomplish okay using sed or awk, but I was thinking there might be a more elegant solution in pure Nextflow.  (Especially because it seems like it would be a common task.)

Thanks
Owen

--
You received this message because you are subscribed to the Google Groups "Nextflow" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+u...@googlegroups.com.

Paolo Di Tommaso

unread,
Mar 16, 2017, 12:47:23 PM3/16/17
to nextflow
I've just uploaded a new snapshot including the `skip` option on the collectFile operator. 

If you are willing to give a try to it execute this command: 

NXF_VER-0.24.0-SNAPSHOT nextflow info 

Then run NF by using this command: 

NXF_VER-0.24.0-SNAPSHOT nextflow run .. etc. 


Cheers,
Paolo


To unsubscribe from this group and stop receiving emails from it, send an email to nextflow+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages