BETA problem with format of diffrential gene expression data

Skip to first unread message

sajjad khani

Aug 20, 2015, 3:50:59 AM8/20/15
to Cistrome
I created my differential gene expression data with cuffdiff and when I am using it with BETA, I  get an error massage "BETA cannot recognize the refseq gene ID, status value(logFC) or FDR. Please give the exact column numbers of the refseq, logFC, and fdr like: 1,2,6 for LIMMA; 2,10,13 for Cufdiff; and 1,2,3 for BETA specific format." I think all the argument I am using is correct and even the cuffdiff file header is exactly based on the requirement of BETA paper.

Then I tried to test BETA with the test sample from the original paper, there also it works for the first example:
BETA basic–p 3656_peaks.bed –e AR_diff_expr.xls –k LIM –g hg19 --da 500 -o basic
However it does not work for 2 other example.
I would be really appropriated if you can help me.

Jian Ma

Aug 20, 2015, 8:45:23 AM8/20/15
Hi Sajjad,

If you are using the command line tool of BETA, could you try to download the 1.0.7 version again ( and see if it works for the example? The software fix a bug in Aug 11 updates while the version number not changed.

You received this message because you are subscribed to the Google Groups "Cistrome" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
For more options, visit


Jian Ma,

sajjad khani

Aug 20, 2015, 9:03:17 AM8/20/15

Thanks I will do that and let you know.

sajjad khani

Aug 25, 2015, 11:35:34 AM8/25/15
Hi Jian,
I installed the BETA and download a new test data, but seems that still it is not working. Actually it is not important to work with the test data but the problem is that it is not working with my own data! I created my chipseq and RNA seq file exactly according to your recommendation but I really have problem.
again attaching the error massage
Screenshot from 2015-08-25 17:31:34.png

Jian Ma

Aug 26, 2015, 10:35:39 PM8/26/15
to, 王苏
Hi Sajjad,

The test file format is not correct, you need to add a "#" mark, or just remove the first column definition line. for example,

#ID     logFC   AveExpr t       P.Value adj.P.Val       B
NM_007725_at    1.952294076     9.5329625       26.21866663     2.08E-09        2.67E-05        11.53359293
NM_027384_at    2.058199487     11.1945861      24.27362632     3.97E-09        2.67E-05        11.08268077
XR_105574_at    1.711659679     11.60240637     23.27766182     5.63E-09        2.67E-05        10.82741307

Hi Su, could you take a look at it and do a update?

Roberto Ferrari

Apr 8, 2020, 12:10:19 PM4/8/20
to Cistrome
same problem here with BETA 1.07


BETA basic -p high_accessibility.bed -e Beta_gene_expression.txt -k BSF -g hg38 --df 0.05 --gname2  
[17:52:54] BETA will use 'NA' as all the ouput files prefix name
mkdir: /Users/RobertoFerrari/Desktop/Progetti/HAPRBs/BETA_OUTPUT: File exists
[17:52:54] Argument List:
[17:52:54] Name = NA
[17:52:54] Peak File = high_accessibility.bed
[17:52:54] Top Peaks Number = 10000
[17:52:54] Distance = 100000 bp
[17:52:54] Genome = hg38
[17:52:54] Expression File = Beta_gene_expression.txt
[17:52:54] BETA specific Expression Type
[17:52:54] Number of differential expressed genes = 0.5
[17:52:54] Differential expressed gene FDR Threshold = 0.05
[17:52:54] Up/Down Prediction Cutoff = 0.001000
[17:52:54] Function prediction based on regulatory potential
CRITICAL:root:The input bed file high_accessibility.bed has a wrong format!(3 column checking active)
[17:52:54] Wrong Format:            chrX    139065101    139065200

[17:52:54] Right Format should look like:    chr1    567577    567578    MACS_peak_1    119.00
[17:52:54] Or the depreciate 3-column format like this:    chr1    567577    567578
[17:52:54] genesymbol    logFC    FDR
 is not the header of the expression file
[17:52:54] Checking the differential expression infomation...
[17:52:54] Take the first line with Differential Information as an example: genesymbol    logFC    FDR

[17:52:54] BETA cannot recognize the official gene symbol, status value(logFC) or FDR. Please give the exact column numbers of the genesymbol, logFC, and fdr like: 1,2,6 for LIMMA; 2,10,13 for Cufdiff; and 1,2,3 for BETA specific format.

bed file:

chr1    633851    634100
chr1    6158001    6158100
chr1    8070951    8071200
chr1    8120701    8120800
chr1    8169801    8170000
chr1    8182551    8182650
chr1    8190001    8190200
chr1    8259001    8259550
chr1    8412551    8412850
chr1    8947401    8947500

gene expression file:

genesymbol    logFC    FDR
MARC1    -0.413980636    1.15E-06
MARCH1    -0.497991318    0.000460727
MARC2    -0.018785242    0.752000859
MARCH2    -0.149651168    0.174413916
MARCH3    -0.328417022    0.004716698
MARCH4    -0.079694889    0.383097078
MARCH5    -0.040061544    0.378623947
MARCH6    0.192787579    1.62E-06
MARCH7    0.038250526    0.424522174

BETA 1.07 last one
Reply all
Reply to author
0 new messages