Hi Julien,
okay, the reason I'm into the source code of bio++ is because I'm trying to add a way to allow probabilities in the input as per the thread named "bppsuite : a question about the input format" ... I believe you addressed this post as well. Hence, I take it you know the source quite well, and so I'd like to get your opinion on the following, which is sort of related to the above design issue, i.e., what sparked this current thread
What I've done to allow probabilities in the input is to define a new type of sequence called ProbabilisticSeqeunce, which inherits from Sequence. The idea of the ProbabilisticSequence, is that instead of a sequence of characters ACG ... each position is a (discrete) PDF on the alphabet, i.e., the BasicSequence "ACG" would be the ProbabilisticSeqeunce "<1,0,0,0> <0,1,0,0> <0,0,1,0>" on the alphabet {A,C,G,T}, and so the sequence "content" of the ProbabilisticSequence is not even a vector<int> or even a vector<string> at all, but rather a matrix, in fact a bpp::DataTable object
And, so in addition to suppressing the existence of virtual const std::vector<int> & getContent() const = 0 , I wonder if one should make it even more generalized, i.e., to allow the content to be any type? ... perhaps to create a sequence "Content" class that inherits from Cloneable? This would quite a major overhaul of bio++, since at the base, all sequences contain some sort of content that can be represented as a vector<int>, and so I would not vote for this, based on my lack of knowledge on how widely that ProbabilisticSequence will be used by bio++ users ... at the moment it's being added only for the specific use of our project
another way I thought of just now to deal with ProbabilisticSequence objects, is to implement a canonical encoding of a DataTable into a vector<int> ... indeed one exists, we can just pick one and implement it ... then the ProbabilisticSequence fits into this framework of sequence content being vector<int> objects. Of course, the individual elements of this vector<int> would be gobbledegook because its some complex encoding that can only be translated to and from a DataTable to make any sense of it
what do you think? Just trying to think of best practices right now. In any case, I've pulled the git repo of the development version of bio++ and I've added ProbabilisticSequence and a few interfaces for it, and it's fitting in fairly fine so far, compiling and running. Eventually, when it's stable I'd like to push it to the remote repo for everyone to use
thanks and Cheers,
Murray