Beta plus IOError: [Errno 2] bed.5c.bed

74 views
Skip to first unread message

Anthony Castanza

unread,
Mar 22, 2017, 8:26:06 PM3/22/17
to Cistrome
Hi,
So I'm trying to run BETA Plus on a dataset I generated. This dataset is using ensembl ID's mapped to Gene Symbols so I generated a reference file as described in this post: https://groups.google.com/d/msg/cistromebeta/DiIeeEeaINM/zVT9apk7EQAJ

I'm using the command:

BETA plus -p [peaks].bed -e [diffexpression].tabular -k CUF --info 1,3,6 -r [ensembl biomart generated reference].txt --df 0.05 -n HC --gs [ensembl ref genome].fa --gname2

IOError: [Errno 2] No such file or directory: '[peaks].bed.5c.bed'

Any ideas? 

Anthony Castanza

unread,
Mar 22, 2017, 8:27:12 PM3/22/17
to Cistrome
Before the error:

[17:23:32] Differential Expression file format successful passed
[17:23:32] We don not provide a CTCF boundary file for None, the peak will be filtered only by the distance
Traceback (most recent call last):
  File "/usr/local/bin/BETA", line 4, in <module>
    __import__('pkg_resources').run_script('BETA-Package==1.0.7', 'BETA')
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 719, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 1511, in run_script
    exec(script_code, namespace, namespace)
  File "/usr/local/lib/python2.7/dist-packages/BETA_Package-1.0.7-py2.7.egg/EGG-INFO/scripts/BETA", line 193, in <module>

  File "/usr/local/lib/python2.7/dist-packages/BETA_Package-1.0.7-py2.7.egg/EGG-INFO/scripts/BETA", line 186, in main

  File "build/bdist.linux-x86_64/egg/BETA/runbeta.py", line 100, in plusrun
  File "build/bdist.linux-x86_64/egg/BETA/PScore.py", line 134, in readfile

Anthony Castanza

unread,
Mar 23, 2017, 12:24:08 AM3/23/17
to Cistrome
I should mention that my BED file is 5 columns, and I've also tried with a tabular version. I called peaks with MACS2 then stripped the file to 5 columns using cat|awk from the terminal


On Wednesday, March 22, 2017 at 5:26:06 PM UTC-7, Anthony Castanza wrote:

Anthony Castanza

unread,
Mar 23, 2017, 2:24:32 AM3/23/17
to Cistrome
I was able to get around this error by creating a copy of the peaks.bed file and naming that copy peaks.bed.5c.bed.
After which the error appears to be 

  File "build/bdist.linux-x86_64/egg/BETA/runbeta.py", line 107, in plusrun
  File "build/bdist.linux-x86_64/egg/BETA/expr_combine.py", line 203, in ChGS
  File "build/bdist.linux-x86_64/egg/BETA/Up_Down_score.py", line 296, in scorerun
  File "build/bdist.linux-x86_64/egg/BETA/Up_Down_score.py", line 97, in read_score_file
IndexError: list index out of range

But I think the error is earlier: 

[23:14:23] Differential Expression file format successful passed
prot/Auts2GaoEnsTrimvide a CTCF boundary file for None, the peak will be filtered only by the distance[23:14:23] We don not vide a CTCF boundary file for None, the peak will be filtered only by the distance
[23:14:23] Read file <peaks.bed.5c.bed> OK! All <6136> peaks.
[23:14:23] Process <0> genes
[23:14:23] Finished! Preliminary results saved into temporary file: <HC.txt>
[11783, 12252]
[23:14:23] Genes were seprated to two parts: up regulated and down regulated.
cut: write error: Broken pipe
cut: write error: Broken pipe
[23:14:24] Prepare file for the Up/Down Test
Traceback (most recent call last):

It seems to have a problem somewhere with parsing the gene expression file, which is odd because it's standard tabular output that I've used before, the only difference is that we're working with ensembl IDs. But, I provided an Ensembl ID mapping file per the instructions. This: https://groups.google.com/d/msg/cistrome/6pF5wz3dSDQ/X0oP3S6lzY4J seems to think it's a problem with non tab separators in the Expression file, but I do not believe this is the case here as the files generated the same way have been accepted by beta before.

Jean-Baptiste Alberge

unread,
Feb 10, 2018, 11:04:15 AM2/10/18
to Cistrome
Hi Anthony,
I got the same error :

IOError: [Errno 2] No such file or directory: 'bed_summits.bed.5c.bed' 

It seems that macs2 used to prefix chromosome names with "chr". BETA's test data looks like:

chr1 1000131 1000212 MACS_peak_1 9.07
chr1 1009123 1009363 MACS_peak_2 16.73
chr1 1014839 1015028 MACS_peak_3 7.21
chr1 1015629 1015900 MACS_peak_4 16.89
chr1 1172443 1172641 MACS_peak_5 4.96
chr1 1266873 1267046 MACS_peak_6 27.77
chr1 1315941 1316152 MACS_peak_7 14.50

while actual macs data looks like

1 778695 778696 HSF1_peak_1 7.27256
1 788980 788981 HSF1_peak_2 2.74776
1 791305 791306 HSF1_peak_3 12.17758
1 850031 850032 HSF1_peak_4 4.62308
1 851521 851522 HSF1_peak_5 41.74846
1 852109 852110 HSF1_peak_6 7.30818

I solved the problem by changing chromosomes names in my bed file(here in R):

df <- read.table("../macs2_summits_chr.bed")
head(df)
df$V1 <- paste0("chr",df$V1)
tail(df)
write.table(df, file = "../macs2_summits_chr.bed", quote=F, sep="\t", row.names = F, col.names = F)

Meeta Mistry

unread,
Mar 21, 2018, 1:36:21 PM3/21/18
to Cistrome
Hi Anthony,

Is there any chance you figured this out? I am having a problem that sounds identical to what you posted.

Meeta
Reply all
Reply to author
Forward
0 new messages