Dear Nachiket-
Sorry for the slow reply. I think this strange behavior results from the fact that your data violate the assumption of the Brownian motion model. Basically, you are modeling a discrete character with a model for continuous data. Because you have huge areas of your tree with the same character state (1), the maximum likelihood estimate of all the Brownian motion parameters basically goes to 0, and likelihood across the tree continues to increase. It's like this: imagine you have a single branch on your tree, where you have state 1 at the ancestral node, and state 1 at a descendant node. This is a change in phenotype of 0, and you could make your likelihood (in principle) arbitrarily large by causing the Brownian rate parameter to go to zero on that branch.
Normally, this doesn't happen, because there is enough variation in phenotypes across the tree with most continuous data that the rate parameter is forced to be large enough to accommodate phenotypic change.
In your case, you have maybe 1000 or more branches where there is probably a change in 0 of phenotype. So BAMM tries to put a rate of 0 on those branches, and the closer that rate gets to (and the smaller the phenotypic change), the higher the likelihood.
My impression is that BAMM is actually doing pretty well - it's finding the best model for your data, but your data are just pretty strange.
If I'm right, you should be able to improve performance by adding some random noise to each data point. Your data are integers. So, I suggest making them "quasi-continuous".
Take your data vector, and to each element, add a small random number, from a uniform distribution. if xx is a vector of your integer-valued data, do:
xx <- runif(length(xx), min = -0.25, max = 0.25)
and you'll fuzz your data a bit. I expect BAMM will do much better after this.
If your data are unordered (e.g., 1 is no more similar to 2 than it is to 5), then everything I wrote above does not apply, and you can't use BAMM on these data (yet).
Not everyone is likely to agree with this approximation, though I think it can be a reasonable solution to an otherwise difficult problem, as long as you are explicit about what you've done. You can try different amounts of noise and see if it matters, but I expect that your qualitative results will not change too much.
~Dan