PiGX ChIP-seq question

Pia.Raut...@mdc-berlin.de

unread,

Aug 6, 2020, 8:59:34 AM8/6/20

to pi...@googlegroups.com

Dear maintainers of the PiGX ChIP-seq pipeline,

I am a Ph.D. student from AG Ohler at BIMSB and currently trying to run the PiGX ChIP-seq pipeline. I get the following error (one out of many, all with the same problem):

```

/gnu/store/n8awazyldv9hbzb7pjcw76hiifmvrpvd-coreutils-8.32/bin/mv: cannot stat '/fast/AG_Ohler/prauten/data/Schuelke_Dummy_ChiP-seq/pigx/mm9/out/Trimmed/Trim_Galore/MyoD_control/MyoD_control_trimmed.fq.gz': No such file or directory

[Thu Aug 6 13:42:35 2020]

Error in rule trim_galore_se:

jobid: 96

output: /fast/AG_Ohler/prauten/data/Schuelke_Dummy_ChiP-seq/pigx/mm9/out/Trimmed/Trim_Galore/MyoD_control/MyoD_control_R.fastq.gz

log: /fast/AG_Ohler/prauten/data/Schuelke_Dummy_ChiP-seq/pigx/mm9/out/Log/MyoD_control/trim_galore_MyoD_control.log (check log file(s) for error message)

shell:

/gnu/store/fd4pwx2rh9389amqmb5jva3xin9g40hi-trim-galore-0.6.1/bin/trim_galore -o /fast/AG_Ohler/prauten/data/Schuelke_Dummy_ChiP-seq/pigx/mm9/out/Trimmed/Trim_Galore/MyoD_control /fast/AG_Ohler/prauten/data/Schuelke_Dummy_ChiP-seq/pigx/mm9/in/MyoD_control.fastq >> /fast/AG_Ohler/prauten/data/Schuelke_Dummy_ChiP-seq/pigx/mm9/out/Log/MyoD_control/trim_galore_MyoD_control.log 2>&1 ; /gnu/store/n8awazyldv9hbzb7pjcw76hiifmvrpvd-coreutils-8.32/bin/mv /fast/AG_Ohler/prauten/data/Schuelke_Dummy_ChiP-seq/pigx/mm9/out/Trimmed/Trim_Galore/MyoD_control/MyoD_control_trimmed.fq.gz /fast/AG_Ohler/prauten/data/Schuelke_Dummy_ChiP-seq/pigx/mm9/out/Trimmed/Trim_Galore/MyoD_control/MyoD_control_R.fastq.gz

(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

```

I checked whether this (these) files indeed don't exist, and it seems that the output of the trimming that the pipeline generates is not gzipped.

I see files

```.../out/Trimmed/Trim_Galore/sampleX/sampleXl_R.fastq```

instead of the required

```.../out/Trimmed/Trim_Galore/sampleX/sampleXl_R.fastq.gz```

I downloaded pigx-chipseq via guix.

Please let me know if you need more information.

Thanks for your help,

Pia

Alexand...@mdc-berlin.de

unread,

Aug 7, 2020, 8:47:47 AM8/7/20

to Pia.Raut...@mdc-berlin.de, Alexand...@mdc-berlin.de, pi...@googlegroups.com

Hi Pia,

This error is actually quite cryptic, but thank you for investigating so much of it.

I peaked into your genome folder and saw that the input fasta file is actually a link to another file,

So I thought you could maybe try to change the path to the real file,

as right now the filetype detected with magic is 'inode/symlink’,

which is why you get the 'Unknow input genome file format’ error.

I’ll try to figure out in future wether following the symlink is possible somehow, or at least throw an early error.

I hope this helped, but let me know if there are more problems.

Best,

Alex

Alexander Blume (Gosdschan)

PhD Student

Berlin Institute for

Medical Systems Biology (BIMSB)

at Max Delbrueck Center (MDC)

Hannoversche Str. 28

10115 Berlin, Germany

Phone: +49 30 9406 1422

Email: alexand...@mdc-berlin.de

Web: https://www.mdc-berlin.de/bioinformatics

On 7. Aug 2020, at 13:06, Rautenstrauch, Pia <Pia.Raut...@mdc-berlin.de> wrote:

Hi Alex,

Thanks a lot for the fast reply! Manually gzipping the files and rerunning the pipeline works - the pipeline continues.

Unfortunately, I ran into a further weird bug in the "linking the genome fasta" step.

```

[Fri Aug 7 12:00:09 2020]

Job 99:

Linking genome fasta:

input : /fast/AG_Ohler/prauten/data/Schuelke_Dummy_ChiP-seq/pigx/mm9/genome/mm9_primary_assembly.fa

output: /fast/AG_Ohler/prauten/data/Schuelke_Dummy_ChiP-seq/pigx/mm9/out/Bowtie2_Index/Main/mm9_primary_assembly.fa

[33mJob counts:

count jobs

1 link_genome

1[0m

[32m[Fri Aug 7 12:00:12 2020][0m

[31mError in rule link_genome:[0m

[31m jobid: 0[0m

[31m output: /fast/AG_Ohler/prauten/data/Schuelke_Dummy_ChiP-seq/pigx/mm9/out/Bowtie2_Index/Main/mm9_primary_assembly.fa[0m

[31m log: /fast/AG_Ohler/prauten/data/Schuelke_Dummy_ChiP-seq/pigx/mm9/out/Log/link_Main_genome_mm9_primary_assembly.log (check log file(s) for error message)[0m

[31m[0m

[31mSystemExit in line 45 of /gnu/store/nxk52pcmvizq6vlb4xmg2x0c3s74xprb-pigx-chipseq-0.0.42/libexec/pigx_chipseq/Rules/Mapping.py:

Unknow input genome file format

File "/gnu/store/09a5iq080g9b641jyl363dr5jkkvnhcn-python-3.8.2/lib/python3.8/concurrent/futures/thread.py", line 57, in run

File "/gnu/store/nxk52pcmvizq6vlb4xmg2x0c3s74xprb-pigx-chipseq-0.0.42/libexec/pigx_chipseq/Rules/Mapping.py", line 67, in __rule_link_genome

File "/gnu/store/nxk52pcmvizq6vlb4xmg2x0c3s74xprb-pigx-chipseq-0.0.42/libexec/pigx_chipseq/Rules/Mapping.py", line 45, in link_genome[0m

[31mExiting because a job execution failed. Look above for error message[0m

Shutting down, this might take some time.

Exiting because a job execution failed. Look above for error message

Complete log: /fast/AG_Ohler/prauten/data/Schuelke_Dummy_ChiP-seq/pigx/mm9/out/.snakemake/log/2020-08-07T120005.091283.snakemake.log

```

I think I pinpointed the error after checking the code of pigx_chipseq/Rules/Mapping.py.

The problem is, that the script checks the filetype with magic, and my fasta files are identified as 'ASCII text’, however, only 'text/plain’ is accepted.

Before I looked into the source code, I thought a problem might be, that I initially tried with a FASTA file, where all genome information per header was just in a single line, I subsequently split this. For the very long FASTA lines magic identifies the file type 'ASCII text, with very long lines’. However, both does not work ;).

I checked also the original genome file that I downloaded from UCSC (mouse mm9) and magic also says it is 'ASCII text’, it is a quite old assembly, so I don’t know whether more recent ones are plain text instead of ASCII.

I tried to circumvent this problem by just gzipping my input genome, however, then, the function check_fasta_header (File "/gnu/store/nxk52pcmvizq6vlb4xmg2x0c3s74xprb-pigx-chipseq-0.0.42/libexec/pigx_chipseq/scripts/Check_Config.py", line 219, in check_fasta_header) says I would have whitespaces in the fasta headers, which I checked and there are none in the uncompressed fasta file.

```

SystemExit in line 26 of /gnu/store/nxk52pcmvizq6vlb4xmg2x0c3s74xprb-pigx-chipseq-0.0.42/libexec/pigx_chipseq/scripts/Check_Config.py:

ERROR: Settings file is not properly formated:

Genome fasta headers contain whitespaces.

Please reformat the headers

File "/gnu/store/nxk52pcmvizq6vlb4xmg2x0c3s74xprb-pigx-chipseq-0.0.42/libexec/pigx_chipseq/Snake_ChIPseq.py", line 44, in <module>

File "/gnu/store/nxk52pcmvizq6vlb4xmg2x0c3s74xprb-pigx-chipseq-0.0.42/libexec/pigx_chipseq/scripts/Check_Config.py", line 26, in validate_config

```

I copied just this function and checked both my gzipped genome and my ungzipped genome fasta file.

For the ungzipped I get no error message, however, for the gzipped one I do. Hence, the pipeline also terminates.

I hope this information is sufficient to understand my problem. I am happy to provide more information.

For a quick fix, I am also happy to change my input file format/transform etc.

Thanks a lot for your help,

Pia

On 6. Aug 2020, at 17:21, Blume, Alexander <Alexand...@mdc-berlin.de> wrote:

Hi Pia,

I am one of the developers responsible for the pigx-chipseq pipeline.

Sorry that you are having problems with the pipeline, but thank you for sharing with us.

I will have a closer look at your problem the next days, but it seems that the gzip argument is not forwarded to the trimming tool.

A fix will hopefully be available by next week, until then I suggest you try to trick the pipeline by gzipping the trimmed fastq files yourself and testing wether it continuous.

If you find any other bugs or weird things please let us know, we are very happy to receive feedback.

Best,

Alex

--
You received this message because you are subscribed to the Google Groups "pigx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pigx+uns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pigx/EAE5B5D5-D292-4465-A013-C08DBEF1483E%40mdc-berlin.de.

Alexand...@mdc-berlin.de

unread,

Aug 7, 2020, 10:12:47 AM8/7/20

to Pia.Raut...@mdc-berlin.de, Alexand...@mdc-berlin.de, pi...@googlegroups.com

Hi Pia,

The perl warnings you can safely ignore, but the error points towards some formatting issues in the settings that you defined for the mapping step.

Could you please send me your settings file for the pipeline?

Best,

Alex

On 7. Aug 2020, at 15:59, Rautenstrauch, Pia <Pia.Raut...@mdc-berlin.de> wrote:

Hi Alex,

Thanks a lot for the fast and helpful reply! Changing the path to the actual file path instead of using a soft link solved the error.

I am sorry to bother you again, I hope there are not a lot more problems coming up.

In the "Mapping with bowtie2” step I get the following error (for all parallel jobs that got started):

```

[31m output: /fast/AG_Ohler/prauten/data/Schuelke_Dummy_ChiP-seq/pigx/mm9/out/Mapped/Bowtie/Main/Myf5_rep1/Myf5_rep1.bam[0m

[31m log: /fast/AG_Ohler/prauten/data/Schuelke_Dummy_ChiP-seq/pigx/mm9/out/Log/Myf5_rep1/Myf5_rep2.bowtie2_Main.log (check log file(s) for error message)[0m

[31mRuleException:

TypeError in line 174 of /gnu/store/nxk52pcmvizq6vlb4xmg2x0c3s74xprb-pigx-chipseq-0.0.42/libexec/pigx_chipseq/scripts/SnakeFunctions.py:

'NoneType' object is not iterable

File "/gnu/store/nxk52pcmvizq6vlb4xmg2x0c3s74xprb-pigx-chipseq-0.0.42/libexec/pigx_chipseq/Rules/Mapping.py", line 220, in __rule_bowtie2

File "/gnu/store/nxk52pcmvizq6vlb4xmg2x0c3s74xprb-pigx-chipseq-0.0.42/libexec/pigx_chipseq/scripts/SnakeFunctions.py", line 174, in join_params

File "/gnu/store/09a5iq080g9b641jyl363dr5jkkvnhcn-python-3.8.2/lib/python3.8/concurrent/futures/thread.py", line 57, in run[0m

```

And some perl warnings:

```

perl: warning: Setting locale failed.

perl: warning: Please check that your locale settings:

LANGUAGE = (unset),

LC_ALL = (unset),

LANG = "en_US.UTF-8"

are supported and installed on your system.

perl: warning: Falling back to the standard locale ("C”).

```

Thanks again and kind regards,

Pia

Alexand...@mdc-berlin.de

unread,

Aug 7, 2020, 11:58:27 AM8/7/20

to Pia.Raut...@mdc-berlin.de, pi...@googlegroups.com

Hi Pia,

Thanks for sharing the settings file and your concerns.

So, the error at hand is occurring because in the settings file you commented out all settings nested in the bowtie2 paragraph,

however unfortunately we are still trying to access its contents as long as the paragraph itself ( ‘bowtie2:’) is not commented out and this leads to the ‘NoneType’ TypeError.

The easiest fix would be to leave the 'N: 0’ uncommented as this is the default anyway and should not change the results at all.

Concerning the unclear settings:

- ‘bam_filter’ applies filters to the aligned reads, where ‘mapq' filters reads based on their respective mapping quality (see here for details http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#mapping-quality-higher-more-unique) and deduplicate removes alignments that map to identical positions which should be done for ATAC seq analysis

- ‘extract_signal’ adjusts how the figures in the chipqc report are calculated, so ‘expand_peak' says to resize the peaks from whatever size to what is set there and ‘number_of_bins’ sets how bins are used to aggregate the coverage signal over the gene features in the report

Especially the ‘extract_signal’ settings are quite complicated and the defaults are choosen based on what works fast and looks good enough, but the ‘bam_filter’ settings you should adjust based on your analysis and required stringency.

Puh, this was a lot :)

Hope this helps you and explains unclear stuff.

Best,

Alex

On 7. Aug 2020, at 16:23, Rautenstrauch, Pia <Pia.Raut...@mdc-berlin.de> wrote:

Hi Alex,

Thanks for taking the time! Please find attached the settings file. Now that I am writing to you anyhow - for me it became not clear from the tutorial and manual which programs the bam_filter and extract_signal belong to (and hence which manual to check for the respective parameters/flags). I did not care too much for now, as I just wanted to check the pipeline in general..

Kind regards,

Pia

<settings.yaml>

Ricardo Wurmus

unread,

Aug 7, 2020, 12:44:22 PM8/7/20

to Alexand...@mdc-berlin.de, Pia.Raut...@mdc-berlin.de, pi...@googlegroups.com

Alexand...@mdc-berlin.de <Alexand...@mdc-berlin.de> writes:

> And some perl warnings:
> ```
> perl: warning: Setting locale failed.
> perl: warning: Please check that your locale settings:
> LANGUAGE = (unset),
> LC_ALL = (unset),
> LANG = "en_US.UTF-8"
> are supported and installed on your system.
> perl: warning: Falling back to the standard locale ("C”).
> ```

These indicate a problem with locales. Make sure that GUIX_LOCPATH is
set to a valid location and that glibc-locales is installed. On the
MDC Max cluster this *should* be the case by default, but if you’re
running it elsewhere you may need to check this by yourself.

--
Ricardo Wurmus

System administrator
BIMSB - Scientific Bioinformatics Platform
Max Delbrueck Center for Molecular Medicine

email: ricardo...@mdc-berlin.de
tel: +49 30 9406 1796

Alexand...@mdc-berlin.de

unread,

Aug 11, 2020, 8:11:01 AM8/11/20

to Pia.Raut...@mdc-berlin.de, Alexand...@mdc-berlin.de, pi...@googlegroups.com

Hi Pia,

I just checked your fast file on the cluster and it seems to be a zip archive. This compression is usually not compatible with bioinformatics software, so you will need to archive the fastq files with the ‘gzip’ command.

You may run these two commands after each other the to achieve this:

‘''

unzip /fast/AG_Ohler/prauten/data/Schuelke_Dummy_ChiP-seq/pigx/mm9/out/Trimmed/Trim_Galore/*/*fastq.gz

gzip /fast/AG_Ohler/prauten/data/Schuelke_Dummy_ChiP-seq/pigx/mm9/out/Trimmed/Trim_Galore/*/*fastq

‘''

Hope this solves the issue.

Best,

Alex

On 11. Aug 2020, at 13:48, Rautenstrauch, Pia <Pia.Raut...@mdc-berlin.de> wrote:

Hi Alex,

Thank you very much for the fast and helpful replies and the explanation of the settings section!

I was on Holiday yesterday and only continued with the PiGX pipeline today. I left the N:0 option of bowtie2 uncommented which fixed the previously reported problem. However, I stumbled into another error:

```

[31mError in rule bowtie2:[0m

[31m jobid: 0[0m

[31m output: /fast/AG_Ohler/prauten/data/Schuelke_Dummy_ChiP-seq/pigx/mm9/out/Mapped/Bowtie/Main/Myf5_rep2/Myf5_rep2.bam[0m

[31m log: /fast/AG_Ohler/prauten/data/Schuelke_Dummy_ChiP-seq/pigx/mm9/out/Log/Myf5_rep2/Myf5_rep2.bowtie2_Main.log (check log file(s) for error message)[0m

[31m[0m

[31mRuleException:

CalledProcessError in line 224 of /gnu/store/nxk52pcmvizq6vlb4xmg2x0c3s74xprb-pigx-chipseq-0.0.42/libexec/pigx_chipseq/Rules/Mapping.py:

Command 'set -euo pipefail; /gnu/store/7sicb2zgpqnav53gaiizmarjdn03ydp2-bowtie-2.3.4.3/bin/bowtie2 -p 2 -x /fast/AG_Ohler/prauten/data/Schuelke_Dummy_ChiP-seq/pigx/mm9/out/Bowtie2_Index/Main/mm9 -U /fast/AG_Ohler/prauten/data/Schuelke_Dummy_ChiP-seq/pigx/mm9/out/Trimmed/Trim_Galore/Myf5_rep2/Myf5_rep2_R.fastq.gz -N 0 2> /fast/AG_Ohler/prauten/data/Schuelke_Dummy_ChiP-seq/pigx/mm9/out/Log/Myf5_rep2/Myf5_rep2.bowtie2_Main.log | /gnu/store/jx7i0x6jzfy9vlwsv6hycdyvih2q16lm-samtools-1.9/bin/samtools view -bhS > /fast/AG_Ohler/prauten/data/Schuelke_Dummy_ChiP-seq/pigx/mm9/out/Mapped/Bowtie/Main/Myf5_rep2/Myf5_rep2.bam' returned non-zero exit status 1.

File "/gnu/store/nxk52pcmvizq6vlb4xmg2x0c3s74xprb-pigx-chipseq-0.0.42/libexec/pigx_chipseq/Rules/Mapping.py", line 224, in __rule_bowtie2

File "/gnu/store/09a5iq080g9b641jyl363dr5jkkvnhcn-python-3.8.2/lib/python3.8/concurrent/futures/thread.py", line 57, in run[0m

[31mExiting because a job execution failed. Look above for error message[0m

```

The log file contains the following information:

```

perl: warning: Setting locale failed.

perl: warning: Please check that your locale settings:

LANGUAGE = (unset),

LC_ALL = (unset),

LANG = "en_GB.UTF-8"

are supported and installed on your system.

perl: warning: Falling back to the standard locale ("C").

Error: reads file does not look like a FASTQ file

terminate called after throwing an instance of 'int'

(ERR): bowtie2-align died with signal 6 (ABRT) (core dumped)

```

I checked the input FASTQ file (/fast/AG_Ohler/prauten/data/Schuelke_Dummy_ChiP-seq/pigx/mm9/out/Trimmed/Trim_Galore/Myf5_rep2/Myf5_rep2_R.fastq.gz ). To me, it looks fine.

I think the following thread could relate to the problem: https://sourceforge.net/p/bowtie-bio/bugs/163/. However, according to the website, the problem should be resolved.

I would appreciate your help and am sorry for running into so many problems.

Pia.Raut...@mdc-berlin.de

unread,

Sep 16, 2021, 9:32:50 AM9/16/21

to pi...@googlegroups.com

Dear maintainers of the PiGx pipelines,

I am planning to use the PiGx ChiP-seq pipeline and have a question regarding the installation with guix. I don't quite understand what ```guix environment -l guix.scm``` does and the explanation: "Assuming you have Guix installed, the following command spawns a sub-shell in which all dependencies are available" means in practice. Is this a "one time" shell that is opened? Can I only use this in interactive mode? In case I can use it also from within submitted job scripts, how can I connect to this shell? Is it possible to also do this for a named guix environment? And how would the command look like?

Ideally I would create a named guix environment with only the PiGx ChiP-seq pipeline and all its dependencies installed.

Thanks a lot and kind regards,
Pia

Ricardo Wurmus

unread,

Sep 16, 2021, 9:58:16 AM9/16/21

to Pia.Raut...@mdc-berlin.de, pi...@googlegroups.com

Hi Pia,

> I am planning to use the PiGx ChiP-seq pipeline and have a
> question regarding the installation with guix. I don't quite
> understand what ```guix environment -l guix.scm``` does and the
> explanation: "Assuming you have Guix installed, the following
> command spawns a sub-shell in which all dependencies are
> available" means in practice.

This is only useful for development of the pipeline. This
sub-shell disappears when you exit it, and yes it is only used
interactively.

If you actually want to use the pipeline you can install it with
Guix:

guix install pigx-chipseq

This will install the package into your default profile at
~/.guix-profile and you will end up with an executable
~/.guix-profile/bin/pigx-chipseq.

> Ideally I would create a named guix environment with only the
> PiGx ChiP-seq pipeline and all its dependencies installed.

You can also install the pigx-chipseq package into a separate Guix
profile, e.g. with this command:

guix package --profile=/somewhere/else/.guix-profile
--install=pigx-chipseq

You needn’t install any dependencies anywhere. PiGx keeps exact
references to all dependencies under /gnu/store, so they don’t
need to be explicitly installed into the profile.

--
Ricardo

Pia.Raut...@mdc-berlin.de

unread,

Sep 16, 2021, 10:01:10 AM9/16/21

to Ricardo...@mdc-berlin.de, pi...@googlegroups.com

Hi Ricardo,

Thank you very much for the instructive explanation!

Reading the installation instructions, I got the impression I have to "manually" install the dependencies, hence the confusion.

Thanks again for the clarification!

Best,
Pia

Ricardo Wurmus

unread,

Sep 16, 2021, 10:27:38 AM9/16/21

to Rautenstrauch, Pia, pi...@googlegroups.com

Hi Pia,

> Reading the installation instructions, I got the impression I
> have to "manually" install the dependencies, hence the
> confusion.

Yeah, you are right. This is a bit confusing. The section starts
with

You can install this pipeline with all its dependencies using
GNU Guix:

guix install pigx-chipseq

but it then goes on to talk about *manual* compilation in the same
section. It may be a good idea to move all that talk about manual
compilation to a separate section.

Thanks for your feedback!

--
Ricardo

Pia.Raut...@mdc-berlin.de

unread,

Sep 19, 2021, 8:35:53 AM9/19/21

to pi...@googlegroups.com

Hi there,

I am trying to run the ChIP-seq PiGx pipeline and am having trouble with the reference genome and annotations.

I downloaded the most recent genome (fasta files) and annotation from NCBI (https://www.ncbi.nlm.nih.gov/genome/51?genome_assembly_id=582967).

Files look like this:

```

wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.39_GRCh38.p13/GCF_000001405.39_GRCh38.p13_genomic.fna.gz

zcat GCF_000001405.39_GRCh38.p13_genomic.fna.gz | head

>NC_000001.11 Homo sapiens chromosome 1, GRCh38.p13 Primary Assembly

NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.39_GRCh38.p13/GCF_000001405.39_GRCh38.p13_genomic.gff.gz

zcat GCF_000001405.39_GRCh38.p13_genomic.gff.gz | head

##gff-version 3

#!gff-spec-version 1.21

#!processor NCBI annotwriter

#!genome-build GRCh38.p13

#!genome-build-accession NCBI_Assembly:GCF_000001405.39

#!annotation-date 05/14/2021

#!annotation-source NCBI Homo sapiens Updated Annotation Release 109.20210514

##sequence-region NC_000001.11 1 248956422

##species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9606

NC_000001.11 RefSeq region 1 248956422 . + . ID=NC_000001.11:1..248956422;Dbxref=taxon:9606;Name=1;chromosome=1;gbkey=Src;genome=chromosome;mol_type=genomic DNA

```

The pipeline terminates with:

```

Commencing snakemake run submission locally

SystemExit in line 40 of /gnu/store/kzxackdy32cjghdy8d1508ggp2bzn4x3-pigx-chipseq-0.0.52/libexec/pigx_chipseq/scripts/Check_Config.py:

ERROR: Settings file is not properly formated:

Genome fasta headers contain whitespaces.

Please reformat the headers

File "/gnu/store/kzxackdy32cjghdy8d1508ggp2bzn4x3-pigx-chipseq-0.0.52/libexec/pigx_chipseq/Snake_ChIPseq.py", line 47, in <module>

File "/gnu/store/kzxackdy32cjghdy8d1508ggp2bzn4x3-pigx-chipseq-0.0.52/libexec/pigx_chipseq/scripts/Check_Config.py", line 40, in validate_config

```

Unfortunately just removing whitespaces from the fasta headers doesn’t work, as then the match of names between the fasta files and the off file breaks.

I was wondering whether there is a quick fix for this or whether you experienced similar problems before? I suprised to run into trouble with the most recent human genome version.

Help highly appreciated!

Thanks and best wishes,

Pia

P.s.: In case you want to look at my project, it lives here: ```/fast/AG_Ohler/prauten/projects/BeyondTheExome/202109_ChIP-seq_test/```

Alexander Blume

unread,

Sep 19, 2021, 8:40:21 AM9/19/21

to Pia.Raut...@mdc-berlin.de, pi...@googlegroups.com

Hi Pia,

I'll refer to the answer that we have in the Troubleshooting FAQ (https://bioinformatics.mdc-berlin.de/pigx_docs/pigx-chip-seq.html#troubleshooting):

Our pipeline does not like if in the FASTA headers of the reference genome there is more information than the chromosome name, so please use this oneliner to format your genome-file:

cat refGenome.fa | awk '{ print $1}’ > refGenome_noWhiteSpace.fa

Best,

Alex

--
You received this message because you are subscribed to the Google Groups "pigx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pigx+uns...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/pigx/950F7D3C-4B9B-47AF-A5DE-BB67D93A3C3F%40mdc-berlin.de.

Pia.Raut...@mdc-berlin.de

unread,

Sep 21, 2021, 2:58:06 PM9/21/21

to Alexander Blume, pi...@googlegroups.com

Hi Alex,

Thanks a lot for the quick response and help! I am sorry for not having read the FAQ.. I don’t know whether this is of interest to you, but I also had to remove the header from the corresponding GTF annotation file for the pipeline to run (might be worth adding to the FAQ answer)
```sed '/^#/d' annotation_file.gtf > annotation_file_no_header.gtf```

I successfully ran the PiGx ChIP-seq pipeline on my samples, but unfortunately several of the figures in the report are empty.

Namely:
- 4 Reads Mapping to Mitochondrial Chromosome
- 6.1 GC Content Distribution Per Sample
- 8.1 Frequency of reads in peaks

Do you have an idea what might have gone wrong or how to fix this?

Thanks and best,
Pia

Alexander Blume

unread,

Sep 21, 2021, 4:50:49 PM9/21/21

to Pia.Raut...@mdc-berlin.de, pi...@googlegroups.com

Hi Pia,

Great you figured out the gtf issue!

Concerning the problem with the report, I have one good, but two bad answers.

First the bad ones: Right now, I do not have an idea why the 'GC Content Distribution Per Sample’ or the 'Frequency of reads in peaks’ did not work. You may want to inspect the chipqc log files for the samples to check if there are some obvious errors.

But for the plot showing the reads mapping to mitochondrial chromosome I might have a solution: In the settings file we have a setting called ‘chrM_givenName’ in the ‘chipqc’ part of the ‘general’ params section. You may want to put 'NC_012920.1’ in there, since the genome assembly that you are working with does not use the standard chromosome names in the fasta and gtf file, like chrM or MtDNA, but has these special refsec ids. However to let the report pick this change up, you may need to remove or rename the 'Analysis/RDS/ChIPQC’ folder and the ‘Analysis/Summarized_Data_For_Report.RDS’ file and rerun the pipeline to trigger re-execution of the chipqc rule.

I hope this helps you for now, but we can also try to figure out what is wrong with the other plots.

Best,
Alex

> On 21. Sep 2021, at 20:48, Pia.Raut...@mdc-berlin.de wrote:
>
> Hi Alex,
>
> Thanks a lot for the quick response and help! I am sorry for not having read the FAQ.. I don’t know whether this is of interest to you, but I also had to remove the header from the corresponding GTF annotation file for the pipeline to run (might be worth adding to the FAQ answer)
> ```sed '/^#/d' annotation_file.gtf > annotation_file_no_header.gtf```
>
> I successfully ran the PiGx ChIP-seq pipeline on my samples, but unfortunately several of the figures in the report are empty.
>
> Namely:
> - 4 Reads Mapping to Mitochondrial Chromosome
> - 6.1 GC Content Distribution Per Sample
> - 8.1 Frequency of reads in peaks
>
>

> I attached the html report.

>
> Do you have an idea what might have gone wrong or how to fix this?
>
> Thanks and best,
> Pia
>
>
>
> > On 19. Sep 2021, at 14:40, Alexander Blume <alex....@gmail.com> wrote:
> >

> <20210920_PiGx_ChIP_Seq_Report.html>

Pia.Raut...@mdc-berlin.de

unread,

Sep 22, 2021, 3:26:48 AM9/22/21

to Alexander Blume, pi...@googlegroups.com

Hi Alex,

Thanks again for the very quick and helpful response!

Regarding your tip for the mitochondrial genome - I had a similar thought, but since I wrote to you to ask about the chromosome naming issue, I have switched to a genome file from “GRCh38_major_release_seqs_for_alignment_pipelines”, where mitochondrial genome is called “chrM”. So the parameter used in the settings file for my analysis is correct and the plot still empty.

Maybe of interest to you: Last year (on a different sample I ran the PiGx ChIP-seq pipeline version 0.0.42, and in this reports all figures are displayed), now I am using version 0.0.52,

I found one thing that might point towards why "6.1 GC Content Distribution Per Sample” does not work:

From knit_report.log

```

There were 50 or more warnings (use warnings() to see the first 50)

Warning messages:

1: replacing previous import 'Biostrings::pattern' by 'grid::pattern' when loading 'genomation'

2: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> = "none")` instead.

3: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> = "none")` instead.

4: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> = "none")` instead.

5: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> = "none")` instead.

6: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> = "none")` instead.

7: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> = "none")` instead.

8: The `x` argument of `as_tibble.matrix()` must have unique column names if `.name_repair` is omitted as of tibble 2.0.0.

Using compatibility `.name_repair`.

This warning is displayed once every 8 hours.

Call `lifecycle::last_warnings()` to see where this warning was generated.

9: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> = "none")` instead.

10: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> = "none")` instead.

11: Transformation introduced infinite values in continuous y-axis

12: Removed 17148 rows containing non-finite values (stat_binhex).

13: Computation failed in `stat_binhex()`: