Sequence space and the ongoing expansion of the protein universe

Posted by Victor Hanson-Smith

Check-out this paper by Inna S. Povolotskaya and Fyodor A. Kondrashov(It’s a closed-access Nature article; I’m sorry if you do not have a subscription!)

The premise of this paper begins with two claims.  First, protein-sequence space is finite.  Second, proteins have been evolving away from one other (“expanding in sequence space”) over the last 3.5 billion years.  Given these claims, the authors ask: is it possible that structurally and functionally conserved orthologous proteins from the last universal common ancestor (LUCA) have evolved over a long enough time period such that they reached the limit of their possible sequence divergence?  The authors say apparently not.  For details on how they reach this conclusion, read the paper.

Their result is interesting because it sheds light on the relationship between protein sequence conservation and protein function conservation.  This paper suggests that given enough time two orthologous proteins can evolve apart such that their sequences will contain almost no signal of shared ancestory, but their function will be essentially conserved.  However, this theoretical upper-bound on sequence divergence has not (yet) been reached because proteins evolve slowly across the fitness landscape.

The authors capture this idea in one very compelling paragraph:

The following picture of the protein sequence space emerges from our analysis. Ridges of high fitness corresponding to specific ancient proteins occupy a tiny fraction of the entire volume of the sequence space. However, these ridges are long and thin and can be more accurately visualized as a wide-mesh net spanning a large part of sequence space, rather than as a small volume within the space. Such fitness ridges imply that [epistasis] and compensatory evolution in ancient proteins must be common. Our data show that >90% of the sites in any protein can eventually accept a substitution given the right combination of amino acids at other sites, although it is not clear whether such substitutions are predominantly neutral or beneficial. Regardless of the importance of positive selection in protein divergence, it seems that many sites are conserved because there has not been enough time to create the right combination of amino acids at other sites to allow them to evolve, which may take billions of years.

On a final note, I am not 100% comfortable with the idea that sequence space is finite.  If we momentarily assume that sequence length is finite, then—yes—I agree that sequence space must also be finite.  However, is there an upper-bound on sequence length?  Comments and discussion are welcome.

Povolotskaya, I., & Kondrashov, F. (2010). Sequence space and the ongoing expansion of the protein universe Nature, 465 (7300), 922-926 DOI: 10.1038/nature09105

Mark Pagel at University of Oregon HBES conference

Posted by Victor Hanson-Smith

Mark Pagel (MP) delivered a keynote lecture at the 22nd annual Human Behavior and Evolution Society conference, titled “The Rise of the Speaking Machine: Explorations in Language Evolution.”

MP has published several well-known papers on phylogenetic methods, speciation, and protein-protein networks, but his recent work investigates phylogeographic patterns of language expression [Pagel et al. 2007, Pagel 2008, Pagel et al. 2009].  This topic might seem eccentric for an EvoDevo blog, but I think the topic of language evolution is relevant to our interests for two reasons.  First, it reminds us that phylogenetic methods are useful for studying more than sequence data; rather, a phylogeny is useful for studying the evolution of any phenotype, including language.  Second, MP’s results strongly suggest that genetic evolution and linguistic evolution are governed by the same underlying patterns and processes; indeed, human language is simply a highly abstract phenotype.

MP’s hypothesis is that “language provides a digital regulatory mechanism for the newly emerged complex social phenotype of culuture.”  In other words, human language arose to regulate and vary our individual expression of the social phenotype, in a very similar way that the gene expression regulates the phenotype of cells.  If you missed MP’s lecture, you can absorb most of the content by reading the 2007 paper, 2008 paper, and 2009 paper.

Comments are welcome.

Pagel, M. (2009). Human language as a culturally transmitted replicator Nature Reviews Genetics DOI: 10.1038/nrg2560

Hey Pharyngula visitors!

We’ve noticed an astounding increase in pageviews the past few days, all thanks to PZ’s blog entry. Please let us know what you think about the blog’s organization and content. This is a work in progress but our goal is to have a good set of interpretable paper summaries on a pertinent topic each quarter.

Regulatory divergence modifies limb length between mammals

Check out: This paper by Cretekos et al.

The diversity of vertebrate limb morphology epitomizes our notion of natural selection and evolution by successive slight modifications of a conserved fundamental pattern. Differences in limb morphology between whale flippers, bat and bird wings, and humans certainly show how variation on a theme allows a species to exploit a niche. Take mice and bat forelimbs, for example: the total length of the limb is drastically different between these groups of animals. Certainly, if we could find the evolutionary mechanism for limb morphology divergence between animals, we’d have serious insight into the mechanism of vertebrate evolution in general. Following the observation that bat forelimbs are relatively much longer in late-stage fetuses (and ultimately, in adults) than in mice, these researchers set out to find a reason for this difference.

Continue reading

Gene Duplication and Escape from Adaptive Conflict

Gene duplication and the adaptive evolution of a classic genetic switch

Hittinger and Carrol, Nature (449), 2007

This paper provides two particularly striking examples of evolutionary issues. To set up the first, the authors provide the wonderful reminder that the process of duplication, degeneration and complementation (DDC) itself is not necessarily adaptive. This seems obvious, but while there are examples of DDC the adaptive advantage is not always clear. One key part of Hittinger and Carroll’s method that I found simple and quite impressive was the wonderfully sensitive assay for competition between 2 strains of yeast, allowing them to compete any two genetically manipulated strains.

Continue reading

The wonders of statistics in gene expression experiments

Post by Bryn Gaertner


Binding site turnover produces pervasive quantitative changes in transcription factor binding between closely related Drosophila species

RK Bradley, XY Li, C Trapnell, S Davidson, L Pachter, HC Chu, LA Tonkin, MD Biggin, MB Eisen

PLoS Biology 8(3) 2010

Flies in the Drosophila genus all look about the same, and the early-development transcription factors that we all know and love (Hunchback, kruppel, bicoid, giant, knirps, etc.) are expressed in roughly the same patterns. However, there is about one SNP per 10 bp between these species, which strongly suggests that the TF binding targets are no longer conserved. How do the TFs still know where to go?

Continue reading

EVO-WIBO 2010 highlights

Posted by Victor Hanson-Smith.

Several authors on this blog (including myself) just returned from Evo-Wibo 2010, a gathering of evolutionary biologists from the pacific northwest.  The talks were high-quality and covered a broad range of topics, from the macro (population and ecology interactions) to the micro (protein evolution).  I won’t summarize all twenty-seven talks, but allow me to highlight a few favorites:

Michael Doebli gave a talk titled “Complexity and Diversity,” which basically summarized his recent Science paper.  Michael’s main point is:

. . . if the ecological properties of an organism are determined by multiple traits with complex interactions, the conditions needed for frequency-dependent selection to generate diversity are relaxed to the point where they are easily satisfied in high-dimensional phenotype spaces.

Michael’s result is exciting because it sheds light on the origin of diversity.  Furthermore, the result seems obvious and leads me to wonder “why didn’t I think of that?”

Members of Bill Cresko’s lab (including Julian Catchen, Paul Hohenlohe, and Susan Bassham) gave a series of talks showcasing RAD tag sequencing [See here and here].  As a phylogeneticist, I am particularly interested in the potential to use RAD tags to identify sites that polymorphic within a population; these sites can be culled from phylogenetic analysis, thus removing a significant amount of “noise” when inferring inter-species phylogenies.

My final highlight is David Pollock‘s talk titled “Adaptation, Convergence, and Context-Dependent Evolution.”  David investigated why a very long phylogenetic branch leads to the snake clade.  One explanation is found in the large number of mitochondrial mutations allowing snakes to rapidly alter their metabolism in order to digest large meals.  I think David’s talk was interesting because it was the first (and only?) at this meeting to connect specific protein-level mutations to organism-level phenotypic changes.

Did you attend EVO-WIBO?  If so, I encourage your comments down below.  What presentations did you think were noteworthy?

Transcriptional Rewiring in Yeast

Posted by Victor Hanson-Smith.

Consider this 2006 Nature paper from Alexander Johnson’s lab. The story here is that transcriptional regulation of S. cerevisiae (i.e. yeast) mating genes has been handed-off from activation by the MATa gene to repression by the MAT-alpha gene.  This is interesting because despite significant transcriptional rewiring, the logical output (the expression of mating genes) remained the same.

First, some background on yeast. . .

Yeast are either diploid or haploid.  Both haploid and diploid cells can reproduce by mitosis, but haploid cells can sexually reproduce.  Haploid yeast are either type “a” or type “alpha.”  Type-a haploid cells can mate with type-alpha cells, and vice versa.  Haploid mating produces diploid children, which cannot themselves mate.  However, diploid children can induce meiosis (typically in response to nutritional stress) to form four haploid spores: two type-a spores and two type-alpha spores.

Type-a and type-alpha yeast cells differ in their mating pheromones.  Type-a cells produce a-factor pheromone and respond to alpha-factor; Type-alpha cells produce alpha-factor and respond to a-factor. In response to pheromone (of the opposite type) haploid yeast grow a projection called a “shmoo” towards the source of the opposite factor.

An illustration of yeast mating

Type-a cells respond to alpha-factor by using the cell surface receptor Ste2; type-alpha cells respond to a-factor pheromones using the cell surface receptor Ste3.  The interesting difference — and the focus of Tsong et al.’s paper — is that S. cerevisiae type-a mating genes are promoted by Mcm1 transcription factor, whereas C. albicans type-a mating genes are promoted by cofactors Mcm1 and MAT-a2.  Given that S. cerevisiae and C. albicans are related species, this transcriptional difference belies a rewiring event in their shared evolutionary history.

The authors identify seven type-a specific mating genes and their corresponding regulatory sequences.  Using position-specific scoring matrices and homology modeling, the authors inferred the evolutionary events that led to the hand-off between transcriptional activation and repression.  For more details, read the publication.

This paper raises several questions:

1. Did the hand-off from activation to repression incur a fitness cost?  The authors imply a binary fitness landscape: either a yeast expresses the correct mating genes or it doesn’t.  However, it seems like a more accurate fitness story would consider the energetic cost differences between the transcriptional systems used by S. cerevisiae and C. albicans.

2. The authors use C. albicans’ transcriptional phenotype as a proxy for the ancestral state.  Is this accurate?  (The answer is yes).  The alternative hypothesis, in which S. cerevisiae is the ancestral state, requires an outrageous number of gene gains and losses with respect to MAT-a2.

3. How often do these transcriptional rewiring events occur?  This question is somewhat rhetorical, because we don’t have enough information to answer it.  A naive interpretation of this paper is that the yeast MAT-a2 story is especially novel.  As we learn more about the entire transcriptional network of organisms, however, we might learn that these architectural rearrangements occur frequently.

Tsong, A., Tuch, B., Li, H., & Johnson, A. (2006). Evolution of alternative transcriptional circuits with identical logic Nature, 443 (7110), 415-420 DOI: 10.1038/nature05099

The Evolution of Transcription Factors and DNA Binding Sites

Posted by Victor Hanson-Smith.

For the next ten weeks, our conversation will focus on the evolution of transcription factors and their corresponding DNA binding sites.  A growing body of EvoDevo research shows that cis-regulatory mutations play a significant role in the evolution of morphological, physiological and behavioral phenotypes.  In a 2007 Nature Reviews Genetics article, Greg Wray highlighted twenty such cis-regulatory mutations with interesting phenotypic consequences.  A lot of new research has been published since then, and we want to know: what is the current knowledge on the evolution of transcriptional regulation?

Below is a reading list of (a few) relevant publications.  Over the next ten weeks, we’ll read and discuss the articles on this list.  Do you think we overlooked any papers?  Feel free to post your comments down below.

The genetic basis of adaptive evolution

Adaptive evolution of pelvic reduction in sticklebacks by recurrent deletion of a Pitx1 enhancer.

Chan YF, Marks ME, Jones FC, Villarreal G Jr, Shapiro MD, Brady SD, Southwick AM, Absher DM, Grimwood J, Schmutz J, Myers RM, Petrov D, Jónsson B, Schluter D, Bell MA, Kingsley DM.

Science. 2010 Jan 15;327(5963):302-5.

This paper is most recent in a series of papers over the past decade in which the Kingsley Lab has used stickleback fish as a model to investigated the genetic bases of adaptive natural variation. Marine populations of stickleback have a pelvic apparatus that consists of articulating spines along the fishes lateral sides. Interestingly, several independently derived freshwater populations have lost this structure. Previous work had determined that a chromosome region containing the Pitx1 gene was responsible for pelvic structure loss in multiple populations, and that Pitx1 expression is lost in pelvic reduced stickleback. These and other data suggested that cis-regulatory mutations at the Pitx1 locus were responsible for pelvic reduction. However, regulatory mutations are difficult to identify and the exact sequence changes controlling pelvic reduction had not been identified. In this paper, the authors identify the exact genetic changes responsible for this loss in multiple populations.

Continue reading