Bioinformatics Books

0 views

Skip to first unread message

Crispina Blomker

unread,

Aug 5, 2024, 6:34:24 AM8/5/24

to dissembcimu

Ihave been asked to recommend introductory books and resources to R and Bioconductor. My problem is just, I never read a book to learn R or Bioconductor, so I have no experience with this and cannot recommend one. I am interested in mainly introductory books, possibly targeting various groups of readers (computer scientists, molecular biologists, (bio-)statisticians), any recommendation appreciated.

I co-wrote an O'Reilly Short Cut, "Data Mashups in R", that is designed to be a little more fun than some of the scientific stuff out there while still exploring data manipulation in R, using packages, XML, web services, rudimentary plotting, and even some statistics. It costs $5 if you are not a Safari subscriber.

I have R in a Nutshell on my desk and I use it at least weekly if not daily. Its well indexed with a lot of great examples and helps me save time by not searching online. The BioC chapter is decent, but quite short. I also bought Data mashups in R which I found to be a great purchase and a fun way to learn about GIS capabilities and the Yahoo API in addition to R.

I have to give a shout out for the "R Graph Gallery". When learning to make graphs/visualizations, this resource is indispensable. Sometimes when I'm trying to think of a way to visualize some data, I just browse through the gallery for inspiration. Then, if you find something you like, all the code to get you started is right there.

I just got a copy of Draghici's book and I found it to be FANTASTIC.I will possibly buy a copy soon and look for his upcoming title "An Introduction to Statistics and Data Analysis for Bioinformatics using R" to be released in Sep 15, 2012

A comment, not answering your question but which may help a lot people using R:If you are very annoyed that it is impossible to make a google search for R, use rseek.org. It is basicly R-specific google.

This is really cool and helpful in any way. Yes, I was annoyed searching for one character languages "C, R". Had the authors of the languages anticipated google, they might have used something like R++ ;)

Introductory Statistics with R by Peter Dalgaard is a wonderful, brief introduction to basic statistical practice using R. If you want to know how to perform survival analysis, specify linear models, build plots, etc. it is a very clear guide. It is not bioinformatics-specific, and does not mention Bioconductor.

In general I've found the quality of the Springer books to be very high. If you're affiliated with a large university (e.g. UCSF), you may be able to browse their texts for free from your desk through SpringerLink.

I upvoted David Quigley's recommendation of the Dalgaard book. I'd also recommend Modern Applied Statistics with S by Venables & Ripley. Despite the title, it's totally relevant to R. From the first page of the Introduction, "An Open Source system called R has emerged that provides an independent implentation of the S language. It is similar enough that almost all the examples in this book can be run under R." This book is pretty much the standard reference for R in book form.

So far I have noticed the following trend: many books titled Bioinformatics with Perl/Python/Java/R etc end up being introductions into the programming language in question, often only minor code examples are related to bioinformatics.

I think many bioinformaticians of a "certain age" learned in this way: they are often former bench biologists who gave up lab work and taught themselves programming. These days there are undergraduate courses (!), so I imagine more people use textbooks. It's just that I don't know of any, nor have I ever needed to use one.

Most of my sparse experience with bioinformatics came with the necessity to extract some statistics from sequence data. So, most books I can recommend deal with statistical and algorithmic approaches to biological data.

Jones and Pavel are accomplished mathematicians and bioinformaticians. Their work with repeats is a must have reference. Ewens's book will become a classic. He is already a foremost figure in population genetics, both in theory and experiment. Sankoff's book still is the most important reference in sequence aligment. Unfortunatelly, these books are somewhat mind bending. They rely heavily on mathematical concepts. But, as far as I know, bioinformatics theory is indeed mathematically and algorithmically challenging.

I really like Biological Sequence Analysis, Durbin et al. and, although not really bioinformatics-specific, I found Perl Medic, Peter J. Scott made a big difference to my newbie Perl code. For biology text books, I mainly relied on Lewin and Alberts for background during my undergrad.

A few have mentioned this book, but I would still like to emphasize it more in a separate answer. This book covers a lot of topics and on each topic it gives very comprehensive and in-depth review. After 10 years, I still benefit from this book, finding meticulous but invaluable details I have overlooked. This is exceptional among general textbooks on bioinformatics. Some may argue the book is too old, but interestingly, when you read the book, you will find that there are not so many breakthroughs in Bioinformatics in the past 12 years -- many old techniques are still useful till now.

For phylogenetics, I like Felsenstein's 'Inferring phylogenies' much more than the Nei and Kumar book. It's more comprehensive and covered much more and deeper on almost all aspects of phylogenetics. And Li's 'Molecular Evolution' is a better book on this topic than Nei and Kumar.

Bioinformatics and Computational Biology Solutions Using R and Bioconductor ( ) is a good text to get to grips with common data processing tasks for microarray and proteomics analysis which covers QC, normalisation, one and two colour array data, and downstream data analysis. It needs an update, some of the example code does not work with more modern BioConductor releases but it is still a useful resource.

Bioconductor Case Studies ( ) focuses less on the specifics of the packages and more on the workflows of common bioinformatics analyses, including GSEA, machine learning, pulling data from remote resources, statistical modelling and visualisation. It also benefits from being a more recent release than it's counterpart above.

R Programming for Bioinformatics ( ) which tells you more about R than you probably ever want to (or care) to know. Whilst it is aimed at a bioinformatics audience it does not skip it's role as a text primarily to teach you how to program in R.

If youre looking for a tome that brings your statistics up to speed instead within the R framework then I have long had a copy of Introductory Statistics With R ( =sollc-gb-20) it's not a long book by any means but will get you used to handling data and applying statistical tests in R.

k-li - unfortunately the Bioinformatics Knowledgeblog site was hacked. The Knowledgeblog team are working to bring it back right now. I notice that you also registered on my blog, I am very sorry but I assumed your sign up was bogus as I had a spate of sign ups today, and your account was deleted.

I a gree a book about technical aspects of bioinformatics, should exist, maybe even in two flavors, "applied data management" and "getting at the bioinformatics data you want", but would you call a statistician someone who happens to know how to write input files to libSVM without knowing what is going on? I think there is a difference between bioinformatics (the science) and informatics applied to biological data (the engineering problem). Just like the difference between computer science and software engineering. So maybe we disagree just on the definition.

I cannot disagree more, bioinformatics needs books with theory and maths because it derives most of its algorithms from probability theory / statistics / random processes / machine learning, information theory, graph theory, formal language theory not to speak of all those description logics and ontologies. No blog post will do that (no single book too).

Understanding how a program works helps us to choose appropriate tools and to avoid pitfalls. By "tools" here, I mean bioinformatics programs such as mappers, multialigners, SNP callers, tree builders and so on.

Marcin, I do understand your point of view. But on my side, I'm mostly interested in the technical aspect of a problem not about a deep knowledge of an algorithm. For example, I don't really know what is the algorithm used by Lucene but I know it's a good tool for indexing a document & I found the best doc for Lucene on the web.

I think you are spot on with your observation. For some reason most of the recent bioinformatics books, particularly the expensive hardcover ones from CRC and Springer, are written by non-practitioners. By non-practitioners I mean professors who teach statistics, biological science or computer science, as opposed to software developers working in the field of bioinformatics. The result has read like a cross-section of stodgy textbooks and research articles, with little in the way of practical code or analysis strategy. Others, as you mention, are "mildly bio-flavored" introductions to a programming language. I love technical books but with a couple exceptions (Beginning Perl for Bioinformatics) I have never felt bioinformatics books were worth the money.

Let me preface that I have three big interests in my life: biology, computer science and sailing. The year was around 2000, and I had found the book The New New Thing : A Silicon Valley Story by Michael M. Lewis. It was about two of my interests: computer science and sailing.

It is the biography of Jim Clark, a technology entrepreneur who is about to create his third, separate, billion-dollar company: first Silicon Graphics, then Netscape--and now Healtheon, a startup which he hopes will turn the $1 trillion healthcare industry on its head. But after coming up with the basic idea for Healtheon, securing the initial seed money, and hiring the people to make it happen, Clark concentrated on the building of Hyperion, a sailboat with a 197-foot mast (at the time of her launch, she was the largest sloop ever build and the tallest mast ever built), whose functions are controlled by 25 SGI workstations. As the title implies, Jim Clark is a restless man who was always looking for the new new thing, the next big breaktrough. Near the end of the book Michael Lewis tells about one of the new things of Jim Clarks radar, a new emerging field called bioinformatics.