cat my_clusters.txt | sed 's/^/^>/g' | awk '{print $0"--"}' > my_clusters2.txt
Let us know how you go.
Cheers,
Nadia.
./fetchClusterSeqs.py -c testData/clusters.txt -t testData/significantClusters.txt -i testData/transcripts.fa -o foundTranscripts.fa
Thank you very much for your help.
Have any of you written a script for selecting the longest contig within a cluster? As I mentioned before, most of my clusters comprise multiple contigs and I would like to select the cluster with the longest contig for annotation purposes. I've tried writing one in R, but it's taking ages to run and I'm getting some warning messages too. Do you have something in python or shell?
Regards,
Regina
I'm trying to run the fetchClusterSeqs.py script, but I get the following:
"Traceback (most recent call last):
File "./fetchClusterSeqs.py", line 127, in <module>
main(inFasta, targetClust, outFasta, clustMap)
File "./fetchClusterSeqs.py", line 26, in main
clustMem = clustDict(clustMap)
File "./fetchClusterSeqs.py", line 86, in clustDict
clustID=row[1]
IndexError: list index out of range"
My command line was: ./fetchClusterSeqs.py -c ./clusters.txt -t ./select_clusters.csv -i ~/proton/trinity_kmer3/Trinity.fasta -o ./Trinity_kmer3_fetchcorset.fa
Any orientation will be very appreciated!!
./fetchClusterSeqs.py --inFasta transcripts.fa --targetClust significantClusters.txt --outFasta longestTranscriptsByCluster.fa --clustMap clusters.txt --longest