Category Archives: Uncategorized

The Future of Evo-Devo 2012

The University of Oregon’s NSF IGERT program in evolution, development, and genomics invites you to attend The Future of Evo-Devo 2012, a symposium about genomes in context of systems, populations, and the environment.

The symposium is in Portland, OR. February 10-12, 2012.  Limited scholarships are available for students!  Speakers, details, and more information is here:

Oregon State University Genome Research and Biocomputing 2010 Conference Recap

This will be a quick post of some impressions from yesterday and today’s Center for Genome Research and Biocomputing (CGRB) Fall Conference at Oregon State University.  The speakers were uniformly high quality, with Peter and Rosemary Grant’s talk the consensus of highlight of the program.

Jay Dunlap, “Genetic and Molecular Dissection of a Simple Circadian System”

Dunlap works on the circadian clock in Neurospora, a filamentous fungi. Like the circadian clock in mammals and insects, the basis of the Neurospora clock is a heterodimer that autoregulates the transcription of its own genes.  This autoregulation gives rise to the daily rhythmic expression characteristic of a circadian clock.

Dunlap’s lab has done some hard-core molecular work to dissect the basis of this autoregulation.  This includes chromatin immunoprecipitation assays to demonstrate that methylation at the locus of a key circadian clock regulator is necessary to its proper function, a mutant screen that eventually demonstrated phosphorylation of a protein heterodimer was necessary for proper autoregulation and finally expression microarrays to look for peripheral elements of the circadian clock.  This is one of the better understood molecular genetic pathways, but Dunlap reminded us how much work remains.

My only criticism, which is really more of a difference in philosophies, is Dunlap’s reliance on mutagenesis.  There is natural variation in at least one Neurospora circadian phenotype; such genetic variation has an advantage over lab-induced mutations in that it is maintained by natural selection, whereas mutagenesis studies mostly produce phenotypes of large effect that would never survive in nature.  Natural variation, in a genetically tractable context, can thus be used to understand the molecular genetic basis of a phenotype and the environmental context which maintains it.

Richard Spinrad, “OSU Research Now, Next, and After Next.”

Spinrad discussed the present and future of academic research, with a focus on the need to better communicate science to the general public and to secure future sources of funding.  Most funding (about two thirds of OSU’s research budget by the look of his pie chart) comes from the federal government, while industry and non-profits made up another two percent each.  He discussed the need to make up for the expected federal research budget reductions.  One opportunity Spinrad mentioned is better partnerships with industry.  As corporations reduce in-house R & D, they may outsource it to universities.  There are significant issues with academic-corporate relationships that I won’t get into, but he’s basically right; basic research will need to find other sources of support and corporations will be one of those sources.

My take on his talk:  We’re all aware of the need to communicate our work more broadly and the likelihood of shrinking federal research budgets, but Spinrad didn’t have suggestions for how to address these problems besides attempting to give talks geared to a general audience when we’re traveling and finding ways to increase funding from industry and non-profit sources.  These ideas are both obvious enough to be useless without more detail.  I’m sure he’s been working on such ideas, I would like to have seen his talk include some.

Daniel Schafer, “Some Lessons from the Biometry of Evolution and the Evolution of Biometry.”

This was a, dare I say, entertaining talk about the history of statistics within evolutionary biology.  I only mention it for two points.  The first is that the speaker briefly mentioned the negative binomial-P distribution as a method for RNA-seq data analysis.  Who wants to host him for a potentially dry and very valuable seminar, such as this one?  Secondly, he and the moderator kept on referencing this video.

Peter and Rosemary Grant, “Evolution of Darwin’s Finches” The Roles of Genetics, Ecology and Behavior.”
The Grant’s long term (and ongoing!) data set of darwin finch evolution on Daphne Major is one of the most valuable scientific studies ever conducted.  They have documented evolution of beak size in response to environmental changes for several decades now (more detail available via Google, but a good source is here) and have recently described some of the genes responsible for this variation.  Data sets linking genetic changes to specific ecological changes are nearly impossible to produce.  This is one of the best.

The second part of the talk described incipient speciation driven by a stochastic event (the original PNAS paper is here).  The descendants of a hybrid male immigrant and a hybrid female bred exclusively with each other, producing a lineage with unique beak and body morphology.  The mechanism of this reproductive isolation was a new song, which appears to be the result of imperfect copying of the local species song by the initial hybrid male.  The reproductive isolation is maintained by his descendant’s learning this unique song from their father.  This accident of imperfect imitation leading to reproductive isolation illustrates how important such stochastic events can be to speciation.

As with the last time I saw them, the Grants ended with their thesis:  “To understand the diversity of species we see around us we need to understand the dynamic interactions between genetics, ecology and behavior.”  Their life’s work is the best evidence for this.

Embroidered data: needle & thread not required

We’re all familiar with examples of research misconduct (Marc Hauser being a prominent recent example), but there are plenty of other less deliberate and more insidious ways science can lie to itself.  These include publication bias, choosing a method of statistical analysis that gives the desired answer, etc.  Those are worth discussing, but this post will focus on an informative and (to me anyways) humorous example of embroidered data, which is when a series of misrepresentations of a data set build upon themselves, with the end result and the inferences drawn from it having little connection to reality.  I feel such embroidery happens easily as we filter the literature through our biases and limited abilities of retention.  Most examples may not be as egregious as the following, but they are still failures of science to regulate itself.

THE KAIBAB DEER (this figure is taken from Colvinaux’s 1973 textbook, “Introduction to Ecology.”)
The Kaibab plateau is an area bordering the Grand Canyon that had undergone a series of disturbances from fires, sheep and cattle grazing and finally, predator removal (which occurred after it was designated a park by Teddy Roosevelt).

A subsequent increase in the deer population (Figure 1a) was documented by Rasmussen’s 1941 monograph, “Biotic Communities of the Kaibab Plateau, Arizona.”  The apparent increase was attributed to the removal of top predators.  The solid circles represent the park supervisors’ estimates, the open circles represent those of visitors to the park.  A contemporary wildlife biologist would probably roll his or her eyes at either method, but common sense would suggest that supervisors, who spend far more time in the park, would give more accurate estimates.  At the least, both estimates are represented and their sources noted in Rasmussen’s original paper.

"The Kaibab deer herd fiction; a history of embroidered data. (A) Population estimate of the Kaibab deer herd, copied from Rasmussen (1941). Linked solid circles are the forest supervisor's estimates' circles give estimates of other persons, and the dashed line is Rasmussen's own estimate of the trend. (B) A copy of Leopold's (1943) interpretation of the trend. © A copy of trend given by David Davis and Golley (1963), after Allee et al. (1949), after Leopold (1943) from Rasmussen (1941). (After Caughley, 1970.)

Aldo Leopold (yes, that Aldo Leopold) started the real trouble by basing a publication figure on the curve drawn to fit the visitors’ estimates (Figure 1b).  Two problems should be apparent, 1) the second, and presumably more accurate estimate is ignored and 2) he only reproduced the fitted curve drawn by Rasmussen, which is obviously not an actual best-fit curve as it is drawn to intersect the maximum.   Furthermore, the shape of the curve is altered:  the left-hand side of Leopold’s curve has a sigmoid shape suggestive of a population undergoing logistic growth.  This is what we would expect of a population released from a key restraint.  The right hand side shows a sharp decrease, characteristic of a population that has exceeded its environment’s carrying capacity.  These alterations suggest that the ecological ideas Leopold wished to illustrate biased his interpretation and reproduction of the data.

Finally, Leopold’s modifications were codified in Allee’s 1949 ecology textbook, “Principles of Animal Ecology (click here for the original figure) (Figure 1c).  His comments on the figure thus became accepted fact, while the data they were originally based on is completely obscured.

This data set is still considered a classic example of the control exerted by predators on prey abundance, as Wikipedia demonstrates.

So what can we take from this example of embroidery?  In a narrow sense, predators do not control prey abundance as closely as is commonly thought, as habitat recovery and mitigation of other anthropogenic disturbances probably had a larger effect in the case of the Kaibab deer. (here’s a badly scanned pdf of the chapter I took the figure from if you want more information).

The larger point is obvious: “Look at the data,” to quote my adviser who first showed me this figure.  Science works best when methodology is transparent and a cautious, sound interpretation of the data is suggested.  It also means that we must read the original papers that are the basis of the theory or phenomenon that we’re investigating.  Just about every new grad student has had that point made to them, followed by enough demands to make such historical literature (as arcane and opaque as they often are) the first thing triaged, , but it is necessary if science is to successfully regulate itself.

Sequence space and the ongoing expansion of the protein universe

Posted by Victor Hanson-Smith

Check-out this paper by Inna S. Povolotskaya and Fyodor A. Kondrashov(It’s a closed-access Nature article; I’m sorry if you do not have a subscription!)

The premise of this paper begins with two claims.  First, protein-sequence space is finite.  Second, proteins have been evolving away from one other (“expanding in sequence space”) over the last 3.5 billion years.  Given these claims, the authors ask: is it possible that structurally and functionally conserved orthologous proteins from the last universal common ancestor (LUCA) have evolved over a long enough time period such that they reached the limit of their possible sequence divergence?  The authors say apparently not.  For details on how they reach this conclusion, read the paper.

Their result is interesting because it sheds light on the relationship between protein sequence conservation and protein function conservation.  This paper suggests that given enough time two orthologous proteins can evolve apart such that their sequences will contain almost no signal of shared ancestory, but their function will be essentially conserved.  However, this theoretical upper-bound on sequence divergence has not (yet) been reached because proteins evolve slowly across the fitness landscape.

The authors capture this idea in one very compelling paragraph:

The following picture of the protein sequence space emerges from our analysis. Ridges of high fitness corresponding to specific ancient proteins occupy a tiny fraction of the entire volume of the sequence space. However, these ridges are long and thin and can be more accurately visualized as a wide-mesh net spanning a large part of sequence space, rather than as a small volume within the space. Such fitness ridges imply that [epistasis] and compensatory evolution in ancient proteins must be common. Our data show that >90% of the sites in any protein can eventually accept a substitution given the right combination of amino acids at other sites, although it is not clear whether such substitutions are predominantly neutral or beneficial. Regardless of the importance of positive selection in protein divergence, it seems that many sites are conserved because there has not been enough time to create the right combination of amino acids at other sites to allow them to evolve, which may take billions of years.

On a final note, I am not 100% comfortable with the idea that sequence space is finite.  If we momentarily assume that sequence length is finite, then—yes—I agree that sequence space must also be finite.  However, is there an upper-bound on sequence length?  Comments and discussion are welcome.

Povolotskaya, I., & Kondrashov, F. (2010). Sequence space and the ongoing expansion of the protein universe Nature, 465 (7300), 922-926 DOI: 10.1038/nature09105

Hey Pharyngula visitors!

We’ve noticed an astounding increase in pageviews the past few days, all thanks to PZ’s blog entry. Please let us know what you think about the blog’s organization and content. This is a work in progress but our goal is to have a good set of interpretable paper summaries on a pertinent topic each quarter.

The wonders of statistics in gene expression experiments

Post by Bryn Gaertner


Binding site turnover produces pervasive quantitative changes in transcription factor binding between closely related Drosophila species

RK Bradley, XY Li, C Trapnell, S Davidson, L Pachter, HC Chu, LA Tonkin, MD Biggin, MB Eisen

PLoS Biology 8(3) 2010

Flies in the Drosophila genus all look about the same, and the early-development transcription factors that we all know and love (Hunchback, kruppel, bicoid, giant, knirps, etc.) are expressed in roughly the same patterns. However, there is about one SNP per 10 bp between these species, which strongly suggests that the TF binding targets are no longer conserved. How do the TFs still know where to go?

Continue reading

EVO-WIBO 2010 highlights

Posted by Victor Hanson-Smith.

Several authors on this blog (including myself) just returned from Evo-Wibo 2010, a gathering of evolutionary biologists from the pacific northwest.  The talks were high-quality and covered a broad range of topics, from the macro (population and ecology interactions) to the micro (protein evolution).  I won’t summarize all twenty-seven talks, but allow me to highlight a few favorites:

Michael Doebli gave a talk titled “Complexity and Diversity,” which basically summarized his recent Science paper.  Michael’s main point is:

. . . if the ecological properties of an organism are determined by multiple traits with complex interactions, the conditions needed for frequency-dependent selection to generate diversity are relaxed to the point where they are easily satisfied in high-dimensional phenotype spaces.

Michael’s result is exciting because it sheds light on the origin of diversity.  Furthermore, the result seems obvious and leads me to wonder “why didn’t I think of that?”

Members of Bill Cresko’s lab (including Julian Catchen, Paul Hohenlohe, and Susan Bassham) gave a series of talks showcasing RAD tag sequencing [See here and here].  As a phylogeneticist, I am particularly interested in the potential to use RAD tags to identify sites that polymorphic within a population; these sites can be culled from phylogenetic analysis, thus removing a significant amount of “noise” when inferring inter-species phylogenies.

My final highlight is David Pollock‘s talk titled “Adaptation, Convergence, and Context-Dependent Evolution.”  David investigated why a very long phylogenetic branch leads to the snake clade.  One explanation is found in the large number of mitochondrial mutations allowing snakes to rapidly alter their metabolism in order to digest large meals.  I think David’s talk was interesting because it was the first (and only?) at this meeting to connect specific protein-level mutations to organism-level phenotypic changes.

Did you attend EVO-WIBO?  If so, I encourage your comments down below.  What presentations did you think were noteworthy?

Transcriptional Rewiring in Yeast

Posted by Victor Hanson-Smith.

Consider this 2006 Nature paper from Alexander Johnson’s lab. The story here is that transcriptional regulation of S. cerevisiae (i.e. yeast) mating genes has been handed-off from activation by the MATa gene to repression by the MAT-alpha gene.  This is interesting because despite significant transcriptional rewiring, the logical output (the expression of mating genes) remained the same.

First, some background on yeast. . .

Yeast are either diploid or haploid.  Both haploid and diploid cells can reproduce by mitosis, but haploid cells can sexually reproduce.  Haploid yeast are either type “a” or type “alpha.”  Type-a haploid cells can mate with type-alpha cells, and vice versa.  Haploid mating produces diploid children, which cannot themselves mate.  However, diploid children can induce meiosis (typically in response to nutritional stress) to form four haploid spores: two type-a spores and two type-alpha spores.

Type-a and type-alpha yeast cells differ in their mating pheromones.  Type-a cells produce a-factor pheromone and respond to alpha-factor; Type-alpha cells produce alpha-factor and respond to a-factor. In response to pheromone (of the opposite type) haploid yeast grow a projection called a “shmoo” towards the source of the opposite factor.

An illustration of yeast mating

Type-a cells respond to alpha-factor by using the cell surface receptor Ste2; type-alpha cells respond to a-factor pheromones using the cell surface receptor Ste3.  The interesting difference — and the focus of Tsong et al.’s paper — is that S. cerevisiae type-a mating genes are promoted by Mcm1 transcription factor, whereas C. albicans type-a mating genes are promoted by cofactors Mcm1 and MAT-a2.  Given that S. cerevisiae and C. albicans are related species, this transcriptional difference belies a rewiring event in their shared evolutionary history.

The authors identify seven type-a specific mating genes and their corresponding regulatory sequences.  Using position-specific scoring matrices and homology modeling, the authors inferred the evolutionary events that led to the hand-off between transcriptional activation and repression.  For more details, read the publication.

This paper raises several questions:

1. Did the hand-off from activation to repression incur a fitness cost?  The authors imply a binary fitness landscape: either a yeast expresses the correct mating genes or it doesn’t.  However, it seems like a more accurate fitness story would consider the energetic cost differences between the transcriptional systems used by S. cerevisiae and C. albicans.

2. The authors use C. albicans’ transcriptional phenotype as a proxy for the ancestral state.  Is this accurate?  (The answer is yes).  The alternative hypothesis, in which S. cerevisiae is the ancestral state, requires an outrageous number of gene gains and losses with respect to MAT-a2.

3. How often do these transcriptional rewiring events occur?  This question is somewhat rhetorical, because we don’t have enough information to answer it.  A naive interpretation of this paper is that the yeast MAT-a2 story is especially novel.  As we learn more about the entire transcriptional network of organisms, however, we might learn that these architectural rearrangements occur frequently.

Tsong, A., Tuch, B., Li, H., & Johnson, A. (2006). Evolution of alternative transcriptional circuits with identical logic Nature, 443 (7110), 415-420 DOI: 10.1038/nature05099

The Evolution of Transcription Factors and DNA Binding Sites

Posted by Victor Hanson-Smith.

For the next ten weeks, our conversation will focus on the evolution of transcription factors and their corresponding DNA binding sites.  A growing body of EvoDevo research shows that cis-regulatory mutations play a significant role in the evolution of morphological, physiological and behavioral phenotypes.  In a 2007 Nature Reviews Genetics article, Greg Wray highlighted twenty such cis-regulatory mutations with interesting phenotypic consequences.  A lot of new research has been published since then, and we want to know: what is the current knowledge on the evolution of transcriptional regulation?

Below is a reading list of (a few) relevant publications.  Over the next ten weeks, we’ll read and discuss the articles on this list.  Do you think we overlooked any papers?  Feel free to post your comments down below.

Elucidation of phenotypic adaptations: Molecular analyses of dim-light vision proteins in vertebrates

posted by Victor Hanson-Smith

In 2008, Shozu Yokoyama et al. published a compelling paper in which they reconstructed ancestral rhodopsin proteins in order to infer specific amino acid changes that explain phenotypic differences in vertebrate dim-light vision. In doing so, the authors shed light (pun intended) on the aquatic habitat of vertebrate ancestors. Sean Carroll commented (in his book “Making of the Fittest”) that Yokoyama’s work is “the deepest body of knowledge [to date] linking differences in specific genes to differences in ecology and to the evolution of species.” Indeed, this paper is remarkable because the overall story unites evidence from disparate scales of macro- and micro-analysis: the authors integrated molecular biology with paleontology and ecology. This paper also offers insight into limitations of dN/dS tests for positive Darwinian selection.

Continue reading