Hello ,
I have a data in format :
| CHR |
TSS-25bp |
TSS+25bp |
count |
tss |
Ensemble transcript |
refgene |
strand |
| chr8 |
68141773 |
68141823 |
1 |
68141798 |
ENSMUST00000152320 |
1-Mar |
+ |
| chr8 |
68141882 |
68141932 |
3 |
68141907 |
ENSMUST00000110258 |
1-Mar |
+ |
| chr8 |
68141898 |
68141948 |
3 |
68141923 |
ENSMUST00000110256 |
1-Mar |
+ |
| chr8 |
68141910 |
68141960 |
3 |
68141935 |
ENSMUST00000155804 |
1-Mar |
+ |
| chr8 |
68141959 |
68142009 |
2 |
68141984 |
ENSMUST00000110255 |
1-Mar |
+ |
| chr8 |
68910167 |
68910217 |
2 |
68910192 |
ENSMUST00000039540 |
1-Mar |
+ |
| chr8 |
68910174 |
68910224 |
2 |
68910199 |
ENSMUST00000110253 |
1-Mar |
+ |
| chr17 |
33822631 |
33822681 |
2 |
33822656 |
ENSMUST00000066121 |
2-Mar |
- |
| chr17 |
33828434 |
33828484 |
2 |
33828459 |
ENSMUST00000172767 |
2-Mar |
- |
| chr17 |
33828758 |
33828808 |
1 |
33828783 |
ENSMUST00000173454 |
2-Mar |
- |
| chr17 |
33840058 |
33840108 |
1 |
33840083 |
ENSMUST00000173392 |
2-Mar |
- |
| chr18 |
56963297 |
56963347 |
1 |
56963322 |
ENSMUST00000153044 |
3-Mar |
- |
| chr19 |
37282007 |
37282057 |
4 |
37282032 |
ENSMUST00000024078 |
5-Mar |
+ |
| chr19 |
37282032 |
37282082 |
5 |
37282057 |
ENSMUST00000112391 |
5-Mar |
+ |
| chr19 |
37282040 |
37282090 |
4 |
37282065 |
ENSMUST00000148105 |
5-Mar |
+ |
| chr15 |
31385628 |
31385678 |
2 |
31385653 |
ENSMUST00000090227 |
6-Mar |
- |
| chr15 |
31387011 |
31387061 |
1 |
31387036 |
ENSMUST00000043826 |
6-Mar |
- |
For every ref genes ,I would like to get a unique count (column 7) on the basis of strand(column8) and tss( column5) information. For example,
There are 7 rows for 1-March with count 1,2,3 .If a gene has same number of count, I would like to retrieve that gene with its highest number for tss, likewise if its on negative strand ,keep the one with lowest tss number (starting from 3'). For gene 1- March and 2-March, I would expect;
| chr8 | 68141773 | 68141823 | 1 | 68141798 | ENSMUST00000152320 | 1-Mar | + |
| chr8 | 68141910 | 68141960 | 3 | 68141935 | ENSMUST00000155804 | 1-Mar | + |
| chr8 | 68910174 | 68910224 | 2 | 68910199 | ENSMUST00000110253 | 1-Mar | + |
| chr17 | 33822631 | 33822681 | 2 | 33822656 | ENSMUST00000066121 | 2-Mar | -
|
| chr17 | 33828758 | 33828808 | 1 | 33828783 | ENSMUST00000173454 | 2-Mar | - |
Sorry for the typo,it should be gene March1 and March2 .
CAN YOU PLEASE HELP ME GIVE IDEAS OF HOW DO I START WRITING A SCRIPT IN UNIX . ?
Thanks for your time.