What to do when the arguments exceed the Linux command line length?

16 views
Skip to first unread message

John Kern

unread,
Apr 6, 2022, 12:10:55 PM4/6/22
to Nextflow

A nextflow script exists. The command in the script section accepts an argument which uses a wildcard to accept an arbitrary number of files. In a genomic world, this will eventual exceed Linux's command line length which is the bug I would like to fix.

Since nextflow is written in java, it has no such limit. What if I write the files, one per line, into a file which I'll call toc_file for table of contents. Somthing like...

   def tsv = ["foo.tsv", "bar.tsv", 'qaz.tsv']
   def File toc = new File("toc-files-to-merge.txt")
   toc.withWriter("UTF-8") { out ->
            tsv.each{out.writeLine it}
   }

Consider this nextflow script

    process passFileNames {
        def toc_file = "toc-files-to-merge.txt"
        def tsv = ["foo.tsv", "bar.tsv", 'qaz.tsv']
        def File toc = new File(toc_file)
        toc.withWriter("UTF-8") { out ->
                tsv.each{out.writeLine it}
        }
        """
        python /Users/jkern/wip/nxf-wildcards/acceptor.py ${toc_file} > file
        """
    }

Then the script will read it the filenames from it. Here is acceptor.py

    import argparse
    import os

    def read_toc(toc_file):
        print(f"file: {toc_file}")
        if os.path.exists(toc_file):
            with open(toc_file) as fd:
                for row in fd:
                    print(row)
        else:
            print("file not found")

    if __name__ == "__main__":
        parser = argparse.ArgumentParser()
        parser.add_argument("toc_file")
        args = parser.parse_args()
        read_toc(args.toc_file)

the toc_file doesn't exist in the work directory as I had expected. Why? I do not see any error message. What is happening here? 

-jk

John Kern

unread,
Apr 6, 2022, 1:16:52 PM4/6/22
to Nextflow
I found the missing file. Turns out it is not written to nextflow's work directory at all as I reported previously. Instead it is written to the linux current working directory from which nextflow was started. That seems like an odd choice. Is that the design decision nextflow intended? 

John Kern

unread,
Apr 6, 2022, 1:58:40 PM4/6/22
to Nextflow

Nextflow defines " a process is the basic processing primitive to execute a user script". Since this can be distributed, I thought the script section would be run exclusively on the computer where the bash script will be executed. In fact, this isn't entirely correct. Consider this process

    process passFileNames {
        def toc_file = System.getProperty("user.dir") + "/" + "toc-files-to-merge66.txt"

        def tsv = ["foo.tsv", "bar.tsv", 'qaz.tsv']
        def File toc = new File(toc_file)
        toc.withWriter("UTF-8") { out ->
                tsv.each{out.writeLine it}
        }
        """
        python /Users/jkern/wip/nxf-wildcards/acceptor.py ${toc_file} > file
        """
    }

and run in a distributed way.

    $  nextflow -C nextflow.config run testcase.nf -w s3://ag2-ephemeral/jkern/nxf-`date +%j`  

The file written by the groovy is stored on the current working directory of the client. While the bash script is run on the remote computer.

-jk
Reply all
Reply to author
Forward
0 new messages