So far I have noticed the following trend: many books titled Bioinformatics with Perl/Python/Java/R etc end up being introductions into the programming language in question, often only minor code examples are related to bioinformatics.
I think many bioinformaticians of a "certain age" learned in this way: they are often former bench biologists who gave up lab work and taught themselves programming. These days there are undergraduate courses (!), so I imagine more people use textbooks. It's just that I don't know of any, nor have I ever needed to use one.
Most of my sparse experience with bioinformatics came with the necessity to extract some statistics from sequence data. So, most books I can recommend deal with statistical and algorithmic approaches to biological data.
Jones and Pavel are accomplished mathematicians and bioinformaticians. Their work with repeats is a must have reference. Ewens's book will become a classic. He is already a foremost figure in population genetics, both in theory and experiment. Sankoff's book still is the most important reference in sequence aligment. Unfortunatelly, these books are somewhat mind bending. They rely heavily on mathematical concepts. But, as far as I know, bioinformatics theory is indeed mathematically and algorithmically challenging.
I really like Biological Sequence Analysis, Durbin et al. and, although not really bioinformatics-specific, I found Perl Medic, Peter J. Scott made a big difference to my newbie Perl code. For biology text books, I mainly relied on Lewin and Alberts for background during my undergrad.
A few have mentioned this book, but I would still like to emphasize it more in a separate answer. This book covers a lot of topics and on each topic it gives very comprehensive and in-depth review. After 10 years, I still benefit from this book, finding meticulous but invaluable details I have overlooked. This is exceptional among general textbooks on bioinformatics. Some may argue the book is too old, but interestingly, when you read the book, you will find that there are not so many breakthroughs in Bioinformatics in the past 12 years -- many old techniques are still useful till now.
For phylogenetics, I like Felsenstein's 'Inferring phylogenies' much more than the Nei and Kumar book. It's more comprehensive and covered much more and deeper on almost all aspects of phylogenetics. And Li's 'Molecular Evolution' is a better book on this topic than Nei and Kumar.
Bioinformatics and Computational Biology Solutions Using R and Bioconductor ( ) is a good text to get to grips with common data processing tasks for microarray and proteomics analysis which covers QC, normalisation, one and two colour array data, and downstream data analysis. It needs an update, some of the example code does not work with more modern BioConductor releases but it is still a useful resource.
Bioconductor Case Studies ( ) focuses less on the specifics of the packages and more on the workflows of common bioinformatics analyses, including GSEA, machine learning, pulling data from remote resources, statistical modelling and visualisation. It also benefits from being a more recent release than it's counterpart above.
R Programming for Bioinformatics ( ) which tells you more about R than you probably ever want to (or care) to know. Whilst it is aimed at a bioinformatics audience it does not skip it's role as a text primarily to teach you how to program in R.
If youre looking for a tome that brings your statistics up to speed instead within the R framework then I have long had a copy of Introductory Statistics With R ( =sollc-gb-20) it's not a long book by any means but will get you used to handling data and applying statistical tests in R.
k-li - unfortunately the Bioinformatics Knowledgeblog site was hacked. The Knowledgeblog team are working to bring it back right now. I notice that you also registered on my blog, I am very sorry but I assumed your sign up was bogus as I had a spate of sign ups today, and your account was deleted.
I a gree a book about technical aspects of bioinformatics, should exist, maybe even in two flavors, "applied data management" and "getting at the bioinformatics data you want", but would you call a statistician someone who happens to know how to write input files to libSVM without knowing what is going on? I think there is a difference between bioinformatics (the science) and informatics applied to biological data (the engineering problem). Just like the difference between computer science and software engineering. So maybe we disagree just on the definition.
I cannot disagree more, bioinformatics needs books with theory and maths because it derives most of its algorithms from probability theory / statistics / random processes / machine learning, information theory, graph theory, formal language theory not to speak of all those description logics and ontologies. No blog post will do that (no single book too).
Understanding how a program works helps us to choose appropriate tools and to avoid pitfalls. By "tools" here, I mean bioinformatics programs such as mappers, multialigners, SNP callers, tree builders and so on.
Marcin, I do understand your point of view. But on my side, I'm mostly interested in the technical aspect of a problem not about a deep knowledge of an algorithm. For example, I don't really know what is the algorithm used by Lucene but I know it's a good tool for indexing a document & I found the best doc for Lucene on the web.
I think you are spot on with your observation. For some reason most of the recent bioinformatics books, particularly the expensive hardcover ones from CRC and Springer, are written by non-practitioners. By non-practitioners I mean professors who teach statistics, biological science or computer science, as opposed to software developers working in the field of bioinformatics. The result has read like a cross-section of stodgy textbooks and research articles, with little in the way of practical code or analysis strategy. Others, as you mention, are "mildly bio-flavored" introductions to a programming language. I love technical books but with a couple exceptions (Beginning Perl for Bioinformatics) I have never felt bioinformatics books were worth the money.
Let me preface that I have three big interests in my life: biology, computer science and sailing. The year was around 2000, and I had found the book The New New Thing : A Silicon Valley Story by Michael M. Lewis. It was about two of my interests: computer science and sailing.
It is the biography of Jim Clark, a technology entrepreneur who is about to create his third, separate, billion-dollar company: first Silicon Graphics, then Netscape--and now Healtheon, a startup which he hopes will turn the $1 trillion healthcare industry on its head. But after coming up with the basic idea for Healtheon, securing the initial seed money, and hiring the people to make it happen, Clark concentrated on the building of Hyperion, a sailboat with a 197-foot mast (at the time of her launch, she was the largest sloop ever build and the tallest mast ever built), whose functions are controlled by 25 SGI workstations. As the title implies, Jim Clark is a restless man who was always looking for the new new thing, the next big breaktrough. Near the end of the book Michael Lewis tells about one of the new things of Jim Clarks radar, a new emerging field called bioinformatics.
(The book with the ultimate triumvirate, where the three of my interest -biology, computer science and sailing- were combined, came later with the autobiography of Craig Venter, A life decoded, where he writes about the Global Ocean Sampling Expedition he undertook with his personal 95-foot sailboat named the Sorcerer II. The expedition sampled water from Halifax, Nova Scotia to the Eastern Tropical Pacific while undertaking a two year circumnavigation. The micro-organisms in the water were sequenced and the results were published, more then doubling the amount of genetic sequences available up to that point.)
I've learnt pretty much everything from doing, i.e. programming, and rely heavily on online resources. There have been occasional programming books that I've used to bootstrap learning about a language (especially if it was a major leap, say from procedural to object-oriented languages, or from standalone application programming to web scripting). Of the bioinformatics books mentioned so far, Durbin et al., Biological Sequence Analysis was the book I got the most out of, especially the section on RNA secondary structure, which I was obsessed with for a time. Good description of the problem, algorithms clearly explained, and pseudocode. Great stuff.
Here's a different take to this question. My favorite book is the one that I could write - or the one that Ewan Birney or Lincoln Stein could write (not that I am in their company). In all seriousness, what I am getting at is a kind of interview that is not a digest of a career path but more like this is what I have used and developed in the field of Bioinformatics in response to these challenges (with details) and here is where I required assistance from colleagues who were expert in X or Y. Such a book would certainly sit well on my shelf next to the molecular biology, biochemistry and Perl/R/Java coding books.
My favourite bioinformatics book is a biology book Lewin's Genes X. Of course it's not a bioinformatics book, but is very good for getting a good understanding of the biology. Bio-informatics is an interdisciplinary field and for me, it is the fascination of the related genetics that motivates me to analyse it. I see computer science as a means to better understand genetics. This book can provide the necessary insight into genetics required for good bioinformatics. I cannot read this from cover to cover, it's just too much information, but it provides different levels of detail. Even when reading only the headlines, one could learn something new.
Maybe not so well suited for absolute beginners in genetics, and some biologists say it is superficial sometimes. Might be, but that I cannot judge, I just found the parts I read well understandable. There are of course lots of references (rather many to "Cell").
c01484d022