• Home

Ecology homework help

2

Biodiversity

Student’s Name

Affiliate Institution

Course

Instructor

Date

Understanding our environment and all living organism is very important. It helps in knowing the importance of every living organism in it. In an ecology, both plants and animals are important in many ways. It is well known that animals (human beings) use oxygen generated from plants to survive. However, it has been observed that some living organisms are becoming extinct in our environment. Biodiversity is an ecological term that describes a huge variety of plants and animals. It starts from the species level of a particular living organism up to the entire ecosystem (Haahtela, 2019). In this paper, I will be discussing how biodiversity has been destroyed. Human beings have destroyed the biodiversity that has threatened the existence of some species.

In this research, I will be discussing how wild animals have been threatened and their population reduced. Some wild animals are facing extinction due to the inconducive environment and challenges caused by human beings. I will research how wild animals such as elephants, rhinos, zebras, among others are being affected by human beings. The research is expected to be done for three weeks. It will involve visiting national parks and seeing how wild animals live and assessing their environment and some of the threatening factors around.

The research will also involve interviews from the relevant authorities to give data on animal variation in the park. Comparison data from some parks and what might have changed the ecosystem will also be discussed. At the end of the result, I will give qualitative and quantitative data on how the population of wild animals has been changing. The factors behind these threats will be discussed and give away what should be done. The information will be important to future ecologists and know how the ecology can be preserved.

Reference

Haahtela, T. (2019). A biodiversity hypothesis. Allergy74(8), 1445-1456.

https://onlinelibrary.wiley.com/doi/abs/10.1111/all.13763

Ecology homework help

“Predation.” Predation – an Overview | ScienceDirect Topics,
https://www.sciencedirect.com/topics/agricultural-and-biological-sciences/predation#:~:text=Predation%20is%20the%20ecological%20process,upper%20levels%20of%20food%20chains

Do Clay Eggs Attract Predators to Artificial Nests? | Journal of Field Ornithology


https://sora.unm.edu/sites/default/files/journals/jfo/v070n01/p0001-p0007.pdf

PDF) Using Artificial Nests to Study Nest Predation in Birds.
https://www.researchgate.net/publication/232674538_Using_Artificial_Nests_to_Study_Nest_Predation_in_Birds

Ecology homework help

4

MOLECULAR EVOLUTION

By (name)

Affiliated Institution

Professor (name)

Course

Date

MOLECULAR EVOLUTION

Lynch, M., Koskella, B., & Schaack, S. (2006). Mutation pressure and the evolution of organelle genomic architecture. Science311(5768), 1727-1730.

The multicellular plants and animal’s nuclear genomes include a substantial amount noncoding DNA, drawbacks that may be very little to be chosen effectively offset through the lineages with smaller populations with effective numbers. By contrast, despite equal effective population numbers, the organelle genomes of such two lineages developed to opposing extremities of the genomic complexity range. This trend, as well as other confusing characteristics of evolution of organelle, seems to be the result of disparities in the rates of mutation of organelles. These findings back up the concept that the basic characteristics of evolution of the genome are mostly determined through relative strength of two nonadoptive pressures:  mutation pressure and random genetic drift.

Peterson, K. J., Lyons, J. B., Nowak, K. S., Takacs, C. M., Wargo, M. J., & McPeek, M. A. (2004). Estimating metazoan divergence times with a molecular clock. Proceedings of the National Academy of Sciences101(17), 6536-6541.

Understanding early animal evolution requires dating the earliest bilaterally symmetrical species. However, based on the fossil record, vertebrates diverged from dipterans (Drosophila) about 900 million years ago (Ma). Although vertebrates and dipterans diverged at almost the same pace, comparative genomics indicates that there was a considerable rate differential between the two groups. However, unlike other invertebrate taxa, vertebrates’ molecular evolution pace is slower than different invertebrate taxa.

Charlesworth, B. (1994). The effect of background selection against deleterious mutations on weakly selected, linked variants. Genetics Research63(3), 213-227.

This study examines the impact of background selection against harmful alleles on evolutionary rates and genetic variation at weakly selected, totally connected loci. Expected rates of gene replacement and genetic diversity are computed as functions of selecting and dominant in the loci coefficients in issue, and the rate of gametes with no harmful mutations at the background selection loci. Most impacts of background choosing may be anticipated through multiplying the population size through the rate of mutation-free gametes. Background selection may dramatically diminish genetic diversity, such that values for selected sites are similar to values for neutral variations under the same background selection regime. Background selection increases the fixation of harmful mutations while decreasing the fixation of beneficial variants. Autosomal asexual and s ex-linked populations are examined. The ramifications of these findings for molecular variations and evolution investigations are highlighted.

Voight, B. F., Kudaravalli, S., Wen, X., & Pritchard, J. K. (2006). A map of recent positive selection in the human genome. PLoS biology, 4(3), e72.

Finding recent positive selection signals provide information about contemporary people’ adaption to local environments. This report is on a genome-wide search for contemporary positive selection favoring non-fixed mutations. It proposes a novel strategy for screening SNP data for recent selection signals and apply it to data from the “International HapMap Project.”  In all 3 continental groupings, contemporary positive selection is pervasive. Most indicators are region-specific, although many are group-specific. Instead of a lack of recent selection among Sub-Saharan Africans, we discover that our greatest selection signals come from the Yoruba group. These signals must also suggest locations that generate considerable phenotypic variation since they show the presence of genetic variations with distinct fitnesses. Although this phenotypes are unknown, such loci should be of relevance in complicated trait mapping investigations. We created a list of SNPs that may be used to designate the strongest 250 recent selection signals in each group.

Ecology homework help

Facebook Marketplace is an online selling platform that allows the users to sell different products. It takes advantage of billions of Facebook users to build a marketplace with effective sales and ready potential customers. However, the site faces the problem of lack of trust from the customers due to sellers selling products which are not similar with what they indicated. There is also the problem of private reviews where not all users are able to view the reviews the seller and the products. Thus, it is possible to improve the performance of Facebook Marketplace by using AI to reduce the cognitive load and frustration of the sellers. It helps to determine the prices using the demand and supply factors. Facebook Marketplace can also be enhanced by introducing a measure of its performance. The sellers should be able to understand the total inquiries, those that resulted to successful purchase, and those that involved potential customers in need of lower prices. This way, the sellers will enhance their services and determine the right approach of dealing with issues arising.

Ecology homework help

Elucidation of phenotypic adaptations: Molecular
analyses of dim-light vision proteins in vertebrates
Shozo Yokoyama*†, Takashi Tada*, Huan Zhang‡, and Lyle Britt§

*Department of Biology, Emory University, Atlanta, GA 30322; ‡Department of Marine Sciences, University of Connecticut, Groton, CT 06340;
and §Alaska Fisheries Science Center, National Marine Fisheries Service, National Oceanic and Atmospheric Administration, Seattle, WA 98195

Edited by Masatoshi Nei, Pennsylvania State University, University Park, PA, and approved July 14, 2008 (received for review March 12, 2008)

Vertebrate ancestors appeared in a uniform, shallow water envi-
ronment, but modern species fourish in highly variable niches. A
striking array of phenotypes exhibited by contemporary animals is
assumed to have evolved by accumulating a series of selectively
advantageous mutations. However, the experimental test of such
adaptive events at the molecular level is remarkably diffcult. One
testable phenotype, dim-light vision, is mediated by rhodopsins.
Here, we engineered 11 ancestral rhodopsins and show that those
in early ancestors absorbed light maximally (�max) at 500 nm, from
which contemporary rhodopsins with variable �maxs of 480 –525
nm evolved on at least 18 separate occasions. These highly envi-
ronment-specifc adaptations seem to have occurred largely by
amino acid replacements at 12 sites, and most of those at the
remaining 191 (�94%) sites have undergone neutral evolution.
The comparison between these results and those inferred by
commonly-used parsimony and Bayesian methods demonstrates
that statistical tests of positive selection can be misleading without
experimental support and that the molecular basis of spectral
tuning in rhodopsins should be elucidated by mutagenesis analy-
ses using ancestral pigments.

molecular adaptation � rhodopsin

The morphologies and lifestyles of animals in a wide range of environmental conditions have evolved to generate a striking
array of forms and patterns. It is generally assumed that these
variations have been driven by mutations, followed by positive
Darwinian selection. However, it has been remarkably difficult
not only to detect minute selective advantages caused by mo-
lecular changes (1), but also to find genetic systems in which
evolutionary hypotheses can be tested experimentally (2). In the
absence of proper experimental systems, molecular adaptation
in higher eukaryotes has been inferred mostly by using statistical
methods (for examples, see refs. 3–5). For several cases, how-
ever, ancestral molecules have been engineered, allowing studies
of functional changes in the past (6). These analyses demonstrate
that functional changes actually occurred, but they do not
necessarily mean that the new characters were adaptive (7). To
complicate the matter further, evolutionary changes are not
always unidirectional and ancestral phenotypes may reappear
during evolution (8, 9). One effective way of exploring the
mechanisms of molecular adaptation is to engineer ancestral
molecules at various stages of evolution and to recapitulate the
changes in their phenotypes through time. To date, the molec-
ular analyses of the origin and evolution of color vision produced
arguably ‘‘the deepest body of knowledge linking differences in
specific genes to differences in ecology and to the evolution of
species’’ (10). The study of dim-light vision provides another
opportunity to explore the adaptation of vertebrates to different
environments.

Results
Rhodopsins. Dim-light vision in vertebrates is mediated by rho-
dopsins, which consist of a transmembrane protein, opsin, and a
chromophore, 11-cis-retinal (11). By interacting with different
opsins, the identical chromophores in different rhodopsins de-

tect various wavelengths of light (reviewed in ref. 12). To explore
the molecular basis of the spectral tuning in rhodopsins, in vitro
assay-based mutagenesis experiments are necessary, in which the
wavelengths of maximal absorption (�maxs) can be measured in
the dark (dark spectra) and/or by subtracting a spectrum mea-
sured after photobleaching from a spectrum evaluated before
light exposure (difference spectra) (for example, see ref. 13). So
far, the �maxs of contemporary rhodopsins measured by using the
in vitro assay vary between 482 and 505 nm (refs. 12 and 14 and
references therein). By using another method, microspectropho-
tometry (MSP), the rhodopsin of a deep-sea fish, shining loose-
jaw (Aristostomias scintillans), has also been reported to have a
�max of 526 nm (15).

To examine whether these �maxs represent the actual variation
of the �maxs of rhodopsins in nature, we isolated the rhodopsins
of migratory fish [Japanese eel (Anguilla japonica) and its close
relative Japanese conger (Conger myriaster)], deep-sea fish [Pa-
cific blackdragon (Idiacanthus antrostomus), Northern lampfish
(Stenobrachius leucopsarus), shining loosejaw (Aristostomias
scintillans), scabbardfish (Lepidopus fitchi), and Pacific viperfish
(Chauliodus macouni)], and freshwater bluefin killifish (Lucania
goodei), which live in diverse light environments (www.fishbase.
org) [see supporting information (SI) Methods and Fig. S1]. The
eel has two paralogous rhodopsins (EEL-A and -B), as do conger
(CONGER-A and -B) and scabbardfish (SCABBARD-A and
-B), whereas the others use one type of rhodopsins (BLACK-
DRAGON, LAMPFISH, LOOSEJAW, VIPERFISH and BFN
KILLIFISH) (16) (see also SI Result 1).

In the in vitro assay, the �maxs of the dark spectra are more
reliable than those of difference spectra (SI Result 2) and,
therefore, unless otherwise specified, the �maxs refer to the
former values throughout the paper. The �maxs were determined
for EEL-A (500 nm), EEL-B (479 nm), CONGER-A (486 nm),
CONGER-B (485 nm), SCABBARD-A (507 nm), SCAB-
BARD-B (481 nm), and BFN KILLIFISH (504 nm) (SI Result
2). The �maxs for LAMPFISH and VIPERFISH could not be
evaluated, but those of their difference spectra were 492 and 489
nm, respectively. Neither dark spectra nor difference spectra
were obtained for BLACKDRAGON and LOOSEJAW. How-
ever, a mutant pigment, which is modeled after LOOSEJAW,
has a difference spectrum �max of 526 nm (see Molecular Basis
of Spectral Tuning). Hence, the range of �480 –525 nm seems to
represent the �maxs of rhodopsins in vertebrates reasonably well.

Author contributions: S.Y. designed research; S.Y., T.T., H.Z., and L.B. performed research;
S.Y. analyzed data; and S.Y. wrote the paper.

The authors declare no confict of interest.

This article is a PNAS Direct Submission.

Data deposition: The sequences reported in this paper have been deposited in the GenBank
database (accession nos. EU407248 –EU407253).

†To whom correspondence should be addressed at: Department of Biology, Rollins Re-
search Center, Emory University, 1510 Clifton Road, Atlanta, GA 30322. E-mail:
syokoya@emory.edu.

This article contains supporting information online at www.pnas.org/cgi/content/full/
0802426105/DCSupplemental.

© 2008 by The National Academy of Sciences of the USA

13480 –13485 � PNAS � Septmber 9, 2008 � vol. 105 � no. 36 www.pnas.org�cgi�doi�10.1073�pnas.0802426105

D
o
w

n
lo

a
d
e
d
b

y
g
u
e
st

o
n
D

e
ce

m
b
e
r

2
9
,
2
0
2
1

The Ecology of Dim-Light and Deep-Sea Vision. One of the critical
times for the survival of animals in shallow water and on land is
at twilight when the most abundant light falls between 400 and
500 nm (17). Many fish, amphibians, birds, and mammals that
live in these environments use rhodopsins with �maxs of �500 nm
(12). In contrast, in deep water, the distribution of light is much
narrower at �480 nm (18). Mature conger, mature eel, thorny-
head, and coelacanth all live at the depths of 200 –1,800 m
(www.fishbase.org). Our data and that of others (19, 20) show
that these fishes achieve their dim-light vision by using rho-
dopsins with �maxs of �480 nm. Because of their �maxs and
specific light environments, the two groups of rhodopsins may be
classified simply as ‘‘surface’’ and ‘‘deep-sea,’’ respectively.
Despite being active at much deeper depths of 3,000 – 4,000 m
(www.fishbase.org), however, Northern lampfish and Pacific
viperfish use rhodopsins with �maxs of �490 nm. The higher
�maxs can be explained by their upward migration at night, and
LAMPFISH and VIPERFISH can be regarded as ‘‘intermedi-
ate’’ rhodopsins. Then, through the use of far-red (�700 nm)
bioluminescence to create an artificial light environment, shining
loosejaw achieves dim-light vision with the ‘‘red-shifted’’ rho-
dopsins (15).

Japanese eel spawns in the deep sea, the young adults migrate
into freshwater, and the mature fish return to the deep sea for
reproduction. Similarly, Japanese conger spawns in the deep sea,
their larvae hatch near the coast, but they live only in the sea
(www.fishbase.org). For their dim-light vision, young and adult
eels use EEL-A and EEL-B, respectively, whereas congers use
only CONGER-B; CONGER-A is expressed in the pineal
complex (16). The �max of EEL-A (500 nm) ref lects the shallow
freshwater environment, whereas those of EEL-B and CON-
GER-B (480 – 485 nm) match with their light environments in the
deep-sea. The �max of CONGER-A (486 nm) is similar to those
of 470 – 482 nm in the pineal gland-specific pigments of American
chameleon, pigeon, and chicken (reviewed in ref. 12). Hence,
EEL-A is a surface rhodopsin and EEL-B and CONGER-B are
deep-sea rhodopsins. CONGER-A does not ref lect the deep-sea
environment directly; however, because of its �max, CONGER-A
may also be classified as a deep-sea rhodopsin.

Based on considerations of ecology, life history, and �maxs of
rhodopsins, dim-light vision can be classified into biologically
meaningful deep-sea, intermediate, surface, and red-shifted
vision (see SI Result 3 for detailed discussion of the classifications
of other rhodopsins). The corresponding rhodopsins have �maxs
of 479 – 486, 491– 496, 500 –507, and 526 nm, respectively, estab-
lishing the units of possible selection. Consequently, it is possible
that selective force may be able to differentiate even 4 –5 nm of
�max differences of rhodopsins.

Ancestral Rhodopsins. Based on the composite phylogenetic tree
(Fig. 1) (see also SI Result 4 and Fig. S2) of the 11 newly
characterized rhodopsins and 27 others from a wide range of
vertebrate species, we inferred the amino acid sequences of
ancestral rhodopsins. Because most of their rhodopsin genes
have been sequenced partially (Fig. S3), squirrelfish (21) and
cichlid (14) rhodopsins were first excluded from this inference.
The amino acids inferred by using the Jones, Taylor, and
Thornton and Dayhoff models of amino acid replacements of the
PAML program (3) are highly reliable (SI Results 5 and 6, and
Tables S1 and S2).

By introducing a total of 137 amino acid changes into various
rhodopsins (Table S3), we then engineered pigments at nodes
a– k (pigments a– k). The in vitro assays show that pigments a– d
and f– h have �maxs of 501–502 nm, whereas others have �maxs of
496 nm (pigment i) and 482– 486 nm (pigments e, j, and k), which
are also highly reliable (SI Results 5 and 6).

Fig. 1. A composite tree topology of 38 representative rhodopsins in ver-
tebrates. Numbers in ovals are �maxs evaluated from MSP (*), dark spectra, and
difference spectra (†). The numbers in white, blue, black, and red ovals
indicate surface, intermediate, deep-sea, and red-shifted rhodopsins, respec-
tively, whereas those in rectangles indicate the expected values based on the
mutagenesis results. Because of their incomplete data, the amino acid se-
quences of pigments a– k have been inferred by excluding the squirrelfsh,
bluefn killifsh, and cichlid rhodopsins. S-PUN and S-XAN are classifed as
intermediate rhodopsins because of their expected �maxs, but currently avail-
able data are ambiguous. Red- and blue-colored amino acid replacements
indicate the color of the shifts in the �max. The �max of avian ancestral pigment
shows that of the ancestral Archosaur rhodopsin (39). ND, the �max could not
be determined.

Molecular Basis of Spectral Tuning. Because of the interactions
between the 11-cis-retinal and various amino acids, the �max
shifts caused by mutations in the opposite directions are often
nonsymmetrical (22–24). Hence, to understand the evolutionary
mechanism that has generated the various �maxs of rhodopsins in
nature, we must analyze ‘‘forward’’ amino acid replacements that
actually took place in specific lineages. The reconstruction of
multiple ancestral pigments opens an unprecedented opportu-
nity to study the effects of such forward amino acid replacements
on the �max shift in different lineages.

At present, certain amino acid changes at a total of 26 sites are
known to have generated various �maxs of rhodopsins and other
paralogous visual pigments in vertebrates (25). The amino acid
sequences of the 38 representative rhodopsins differ at 11 of the
26 sites (Fig. 2, second column). Among these, the specific amino
acid replacements at 46, 49, 52, 93, 97, 116, and 164 are unlikely
to have been involved in the spectral tuning (SI Result 7). These
and other amino acid site numbers in this paper are standardized

EV
O

LU
TI

O
N

D
o
w

n
lo

a
d
e
d
b

y
g
u
e
st

o
n
D

e
ce

m
b
e
r

2
9
,
2
0
2
1

Yokoyama et al. PNAS � Septmber 9, 2008 � vol. 105 � no. 36 � 13481

D
o
w

n
lo

a
d
e
d
b

y
g
u
e
st

o
n
D

e
ce

m
b
e
r

2
9
,
2
0
2
1

11122 1111223
44589912669 90899581
69233762412 62345397

pigment a FLFDTTFEAFA YYMLKMTM
pigment b ……….. ……..
pigment c .I……… ……..
pigment d .I……… …PN…
pigment e .I……..S ……..
pigment f ……….. …RA…
pigment g ……….. …RA…
pigment h ……….. …RA…
pigment i ……….. VFLRA…
pigment j …N……S …RA…
pigment k …N.S….S …RA…
pigment m ……….. ……..
CONGER-A .I……..S ..LRA…
EEL-A .I……… …PN…
CONGER-B .I……..S ..L…..
EEL-B T..N……S ……..
CAVEFISH …….I.Y. …RA…
GOLDFISH .I……… …RP…
ZEBRAFISH .I……… …RT…
N-SAM .I…SSM.Y. …RA…
N-ARG .I…SSM.Y. …RA…
S-PUN .I…SSM… …RA…
S-MIC .I…..M… …RA…
S-DIA .I…..M… …RA…
S-XAN .I…..M… …RA…
N-AUR .I…S.M..S …RA.A.
S-SPI .I…S.M.YS …RA.A.
S-TIE .I…S.M.YS …RA.A.
M-VIO .I….SMG.. …RA…
M-BER .I….SMGG. …RA…
BFN KILLIFISH .I……… …RA…
O-NIL ……….. …RA…
X-CAU ……….S ..LRA…
THORNYHEAD L..N……S …RA…
LAMPFISH ……MQ… …RA…
SCABBARD-A .I…….Y. VFLRA…
SCABBARD-B …N……S VFLRA…
VIPERFISH …N.S….S …RA.A.
BLACKDRAGON …N.S….S …RA.A.
LOOSEJAW …N…..YI ..FRAL.I
COELACANTH ….VS.Q..S ……G.
CLAWED FROG L.LNV…… …….L
SALAMANDER L..NVS….. ……..
CHAMELEON L..N……. …PT…
PIGEON M………. ……..
CHICKEN M………. ……..
ZEBRA FINCH M………. ……..
BOVINE LM……… …PH…
DOLPHIN LV.N……S …SR…
ELEPHANT LV.N……. ……..

Fig. 2. Amino acids at the 11 previously known (25) and newly found critical
residues of rhodopsins. The numerical column headings specify the amino acid
positions, and the third column describes the newly discovered critical resi-
dues. Shades indicate amino acid replacements that are unlikely to cause any
�max shifts (SI Result 7). Dots indicate the identity of the amino acids with those
of pigment a. The ancestral amino acids that have a posterior probability of
95% or less are underlined.

by those of the bovine rhodopsin (BOVINE). From the mu-
tagenesis results (SI Result 8 and Table S4), four key observations
of evolutionary significance emerge.

First, the �maxs of most contemporary rhodopsins can be
explained largely by a total of 15 amino acid replacements at 12
sites. Namely, significant �max shifts have been caused by 4 of the
11 currently known sites (D83N, E122M, E122Q, F261Y, A292S,
and S292I) as well as newly discovered sites Y96V, Y102F,
E122I, M183F, P194R, N195A, M253L, T289G, and M317I (Fig.
2, third column). Therefore, the functional differentiation of
vertebrate rhodopsin was caused mostly by only �3% of 354
amino acid sites.

Second, 4 of the 15 critical amino acid replacements occurred
multiple times during rhodopsin evolution: D83N (seven times),
A292S (nine times), F261Y (five times), E122Q (two times), and
D83N/A292S (five times) (Fig. 1). Such extensive parallel
changes strongly implicate the importance of these and other
amino acid replacements at the 12 sites in the functional
adaptation of vertebrate dim-light vision.

Third, we uncovered new types of amino acid interactions.
A292S usually decreases the �max of rhodopsin by �10 nm (12)
(see also pigments c and g in Table S4). Much to our surprise,
when A292S was introduced into pigment d (Fig. 1), it did not
decrease the �max at all. However, when the reverse mutation,
S292A, was introduced into CONGER-A, a descendant of
pigment d, the mutant rhodopsin increased the �max by 12 nm,
explaining the �max of pigment d reasonably well. From the latter
analysis alone, we may erroneously conclude that the �max of
CONGER-A was achieved by A292S; instead, it was achieved
purely by the interaction of three amino acid replacements
(P194R, N195A, and A292S) (SI Result 8). Moreover, F261Y in
pigment b increases the �max by 10 nm, and CAVEFISH, a
descendant of pigment b, should have a �max of �510 nm, but the
observed value is 504 nm (12), where the effect of F261Y was
countered by E122I (for more details, see SI Result 8).

To study the molecular basis of spectral tuning, quantum
chemists analyze the interactions between the 11-cis-retinal and
amino acids that are located in the retinal binding pocket, within
4.5 Å of the 11-cis-retinal (26, 27). The residue 292 is �4.5 Å
away from the 11-cis-retinal and is very close to the functionally
critical hydrogen bonded network (28). However, the tertiary
structure of the bovine rhodopsin (28) shows that the residues
194 and 195 are �20 Å away from residue 292 and are not even
in the transmembrane segments. This magnitude of �max shift
and the distance of interacting amino acids are totally unex-
pected.

The fourth significant observation is that the �max shifts of
rhodopsins can be cyclic during vertebrate evolution. In partic-
ular, F261Y reversed the direction of �max shift four times (Fig.
1). If F261Y preceded E122I in the CAVEFISH lineage, then the
functional reversions occurred five times.

Paleontology, Ecology, and Habitats. The ancestors of bony fish most
likely used rhodopsins with �maxs of �500 nm (Fig. 1). What types
of light environment did these ancestors have? The origin of many
early vertebrate ancestors is controversial, but that of bony fish
ancestors is clear (29). The fossil records from late Cambrian and
early Ordovician, �500 Mya, show that the ancestors of bony fish
lived in shallow, near-shore marine environments (30 –32). There-
fore, pigment a must have functioned as a surface rhodopsin and its
�max would be consistent with that role. Interpolating from the
ancestral and contemporary rhodopsins, it is most likely that
pigments b– d and f– h (�max � 501–502 nm) were also surface
rhodopsins, pigment i (496 nm) was an intermediate rhodopsin, and
pigments e, j, and k (480 – 485 nm) were deep-sea rhodopsins (Fig.
1). From their predicted �maxs, it is also likely that pigments q, r, s,
and v were intermediate rhodospins and pigment u was a deep-sea
rhodopsin (Fig. 1).

Based on the four types of dim-light vision, vertebrates show
six different evolutionary paths (Fig. 1). First, surface vision has
been maintained in a wide range of species, from eels to
mammals. Second, the transition of surface 3 intermediate
vision also occurred in a wide range of species. Third, many
deep-sea fish have achieved the directed transitions of surface 3
intermediate 3 deep-sea vision. The three additional changes
are surface 3 intermediate 3 surface vision (some squirrelfish),
surface 3 intermediate 3 deep-sea 3 intermediate vision
(some squirrelfish and Pacific viperfish), and surface 3 inter-
mediate 3 deep-sea 3 red-shifted vision (shining loosejaw), all
showing that the evolution of dim-light vision is reversible.

Molecular Evolution. In vertebrate rhodopsins, several amino acid
replacements occurred multiple times and, furthermore, the
biologically significant �max shifts occurred on at least 18 sepa-
rate occasions (Fig. 1). These observations strongly suggest that
the 15 amino acid changes have undergone positive selection. To
search for positively selected amino acid sites, we applied the

13482 � www.pnas.org�cgi�doi�10.1073�pnas.0802426105 Yokoyama et al.

Table 1. Results from the NEB and BEB analyses

Rhodopsins Model* NEB BEB

Squirrelfsh M2a 50 162 213 214 50 162 213 214
M8 37 50 162 213 214 37 50 112 162

213 214 217
Squirrelfsh and other fsh M2a 162 212 162, 212

M8 162, 212 162, 212
Coelacanth and tetrapods M2a None None

M8 None None
All M2a None None

M8 None None

Sites with P � 0.01 levels are in bold.
*The null models and other parameters are given in Table S5.

naive-empirical-Bayes (NEB) and Bayes-empirical-Bayes (BEB)
approaches of maximum-likelihood-based Bayesian method (3,
33) and parsimony method (4) to four sets of rhodopsin genes:
(i) 11 squirrelfish genes; (ii) the squirrelfish rhodopsins and the
bluefin killifish, cichlid, and deep-sea fish genes, excluding the
coelacanth gene; (iii) the coelacanth and 9 tetrapod genes; and
(iv) all 38 genes (Fig. 1).

Using the parsimony method, we could not find any positively
selected amino acid sites. Using the Bayesian methods, however,
a total of eight positively selected sites (positions 37, 50, 112, 162,
212, 213, 214, and 217) are predicted (Table 1). The Bayesian
results reveal two characteristics. First, the positively selected
amino acid sites are predicted only for relatively closely related
genes, involving squirrelfish genes, but they disappear as more
distantly related genes are considered together. It is also sur-
prising that none of these predicted sites coincide with those
detected by mutagenesis experiments. Second, different amino
acids at these predicted sites do not seem to cause any �max-shifts
(SI Result 9, Tables S5–S8, and Fig. S4).

Considering 17 closely related cichlid rhodopsin genes, 26
positively selected sites have also been predicted (34). Again,
none of the different amino acids at these sites seem to cause any
�max shifts and, furthermore, when we add the other 23 genes
(the eel, conger, cavefish, goldfish, zebrafsh, deep-sea fish, and
tetrapod rhodopsin genes in Fig. 1) in the Bayesian analyses, all
positively selected sites disappear (SI Result 9)! Why can posi-

rhodopsins. According to their �maxs and light environments,
rhodopsins are classified into four groups: deep-sea (�480 – 485
nm), intermediate (�490 – 495 nm), surface (�500 –507 nm), and
red-shifted (�525 nm) rhodopsins. Our mutagenesis results
establish five fundamental features of molecular evolution that
cannot be learned from the standard statistical analyses of
protein sequence data.

First, mutagenesis experiments can offer critical and decisive
tests of whether or not candidate amino acid changes actually
cause any functional changes. Second, the same amino acid
replacements do not always produce the same functional change
but can be affected by the amino acid composition of the
molecule. Therefore, the likelihood of parallel amino acid
replacements, which may or may not result in any functional
change, can overestimate the actual probability of functional
adaptations (SI Result 9).

Third, similar functional changes can be achieved by differ-
ent amino acid replacements; for example, D83N/A 292S,
P194R/N195A/A 292S, and E122Q decrease the �max by 14 –20
nm (Table S4). Thus, by simply looking for parallel replace-
ments of specific amino acids, one can fail to discover other
amino acid replacements that generate the same functional
change, thereby underestimating the chance of finding func-
tional adaptations.

EV
O

LU
TI

O
N

tively selected sites be inferred more often in closely related
genes than in distantly related genes? When nucleotide changes
occur at random, the proportions of nonsynonymous and syn-
onymous mutations are roughly 70% and 30%, respectively.
Hence, under neutral evolution, or even under some purifying
selection, closely related molecules can initially accumulate
more nonsynonymous changes than synonymous changes. How-
ever, as the evolutionary time increases, synonymous mutations
will accumulate more often than nonsynonymous mutations
(35). The differential rates of synonymous and nonsynonymous
nucleotide substitutions during evolution may explain the pre-
diction of false-positives among the relatively closely related
rhodopsins. Or, such inferences may also be affected by the
statistical procedures (36).

As we saw earlier, D83N, Y96V, Y102F, E122I, E122M,
E122Q, P194R, N195A, and A292S decreased the �max, whereas
M183F, M253L, F261Y, T289G, S292I, and M317I increased it.
These changes are located within or near the transmembrane
segments, but most of the amino acid replacements at the

COOH

NH2

D
o
w

n
lo

a
d
e
d
b

y
g
u
e
st

o
n
D

e
ce

m
b
e
r

2
9
,
2
0
2
1

remaining 191 neutral sites are scattered all over the rhodopsin
Fig. 3. Secondary structure of BOVINE (26) with a total of 203 naturallymolecule (Fig. 3).
occurring amino acid replacements in the 38 vertebrate rhodopsins, where
seven transmembrane helices are indicat

Ecology homework help

Levels of naturally occurring DNA
polymorphism correlate with
recombination rates in
D. melanogaster
David J. Begun & Charles F. Aquadro

Secti on of Genetics and Development, Biotechnol ogy Buildi ng,
Cornell University, Ithaca, New York 14853 – 270 3 . USA

Two genomic regions with unusally low recombination rates in
Drosophila melanogaster have normal levels of divergence but
greatly reduced nucleotide diversity’· 2, apparently resulting from
the fixation of advantageous mutations and the associated hitch­
hiking effect 3.4. Here we show that for 20 gene regions from across
the genome, the amount of nucleotide diversity in natural popula­
tions of D . melanogaster is positively correlated with the regional
rate of recombination. This cannot be explained by va riation in
mutation rates and/or functional constraint, because we observe
no correlation between recombination rates and DNA sequence
divergence between D. melanogast er and its sibling species, D.
simulans. We suggest that the correlation may result from genetic
hitch-hiking associated with the fixation of advantageous mutants .
Hitch-hiking thus seems to occur over a large fraction of the
Drosophila genome and may constitute a major constraint on levels
of genetic variation in nature.

T a ble I summ a riz es le v el s o f DNA v a ri ati o n a nd inte rs pe cifi c
di ve rg enc e ( b et wee n D. m ela nogaste r a nd D. simulan s) wh ere
av ail a bl e. T he se estimate s of D NA pol y morphi sm are d eri ve d
fr o m re stri ction site survey s w ith o ne e x cepti o n ( cu bitu s int errup­
rus) a nd th erefore a r e estimate s o f average l eve ls of va ri ati o n
ov er 13 to 65 kil o ba ses ( kb ) from eac h ge n e re g ion. T o explor e
the rel ati o n ship b et we en levels o f D N A sequen ce v a ri a tion a nd
r eco mbin ati o n ra tes we co mp are d estim at es o f nu cle otid e di ve r ­

5 sit y ( w ) and th e coe ffi c i e nt of ex ch an ge\ a mea sur e of r ec ombi ­
n ation ra t e p er p h ys i ca l d ista n ce .

E
0.014

.?:­
·;;;
ID 0 .010
>
i5
(I)
1::> 0.006
-~
(I)

u
::,
z 0.002

O l!H~~~~~~~~-~~~-~~~~~~~~–

0 0.02 o.o• 0.06 o.oa 0 .1
Coefficient of exchange

FIG. 1 Scatte rplot of nucleoti de divers ity ( 7T) versus coef ficient of exchange
in D. melanogaster. Autoso mal and X-linked genes are represented by
hatched and closed circles. respect ively. To make autosoma l and X-linked
genes direc t ly compa rable we made the simplifying assu mpti on of equal
numbers of males and fema les. Then, under neutrality , nucleot ide heterozy­
gos it y is an estimate of 3N µ. fo r X-linked genes and 4Nµ. fo r autosomal
genes (N is the e ff ective populat ion size and µ. is the neut ral mut ation
rate) . Therefo re, befo re doing regress ion or co rrelat ion analyses. we mult i­
plied est imates of 7T from X-linked gene regions by fou r-th irds . The recombi­
nat ion rates est imated by the coefficie nt of exchange are for fema les. An
X-linked gene reg ion spends two -th irds of t he t ime in fema les (where it can
recombine) and only one-thi rd of the time in males (where it canno t recom­
bine ). wherea s an autosome spends half it s t ime in fema les and half in
males. Therefore , we multi plied the coeff icient of exchange fo r autos omal
genes and X-linked regio ns by one-half and two- t hirds. respective ly.
Regressio n line is indicated by a solid line.

NATURE · VOL 356 · 9 APRIL 1992

LETTERS TO NATURE

TABLE 1 Coefficient s of exchange and nucleotide heterozygosities in D. me/anogaster
and divergence with D. simulans

Coefficient
Gene region of exchange ” Divergence Reference

Chromosome I (X)
yello w.achaete (y, ac) 0.0045 0.001 0.054 1
phosphogluconate

dehydrogenase gene
(Pgd) 0.0154 0.003 0.029 1

zest e- tko (z, tko) 0.0222 0.004 12
period (per) 0.0520 0.001 0.050 1
white (w) 0.1400 0.009 13
Notch (N) 0.1212 0.005 14
vermilion (v) 0.0590 0.001 0.047 (D.J.B. and C.F.A.,

unpublished
results)

forked ( f) 0.0455 0.002 15
glucose-6.phosphate

dehydrogenase gene
(Z w) 0.0485 0.001 16

suppre ssor of fork ed
(sul f)) 000 50 0.000 15

Chromosome II
sn-glycerol 3-phosphate

dehydogenase gene
(Gpdh) 0.0800 0.008 17

alcohol dehydrogenase
gene (Adh) 0.0647 0.006 0.045 18,19

DOPA decarboxylase IC.FA et al ..
gene (Ode) 0.0184 0.005 unpublished

results)
amylase gene (Amy) 0.0435 0.008 20
Punch (Pu) 0.0718 0.004 17

Chromosome Ill
esterase-6 gene (Est-6) 0.0604 0.005 21
metallothionein-A gene

(MCnA) 0.0083 0.001 0.072 22
heat.shock protein. 70A

gene (Hsp70A) 0.0069 0.002 0.023 23, 24
rosy (ry) 0.0471 0.003 0.050 25

Chromosome IV
cubitus in terruptus

Dominant (ci°) o• 0.000 0.050 2

Nucleotide diversity (,r) is the average pairwise difference for all pairs of sequences
drawn at random from a population, and can be thought of as heterozygosity per
nucleotide5 . The coefficient of exchange for a gene region was calculated by selecting
two genetically defined loci26 that f lank the region of interest and dividing the distance
in map units between the flanking loci by the number of polytene bands between the
loci6 . The number of polytene bands between loci was determined from Bridge’s maps27_
An important assumpt ion underlying the use of thi s me t ric as an index of recombination
rate is that over large stretches of the genome (for example. 20 to 40 polytene bands),
the average amount of DNA per polytene band is roughly similar between regions.
Available data suggest that this is a reasonable assumption for at least much of the
Drosophila genome 6 28 2 ” .

• The recombination rate on the fourth chromosome is effectively zero30

F i gu re I is a sca tt erpl o t o f w ve r su s th e coeffici ent of e xc h a n ge
fo r 20 ge ne s in D. m ela nogaste r. It i s a p pare n t t h at l e v els o f
nucl eo tid e d ive r si ty i n crease as r ates of r eco mbin a t io n in c r ease.
Varia ti o n in reco mb ina ti on r at es ex p la i ns a l arge fr act ion o f th e
va ri at i on i n nucl eo tid e di v er sity a n d t he null h y p o th es i s th at
th e sl ope i s ze ro i s rej ec t ed with hi g h p ro b a b i l i t y (F , = 16.8,
P = 0.0007). Th e n o n -pa r a m etri c Spea rm a n a nd K e nd all
r egress i o n t es t s are a l so si g nifi ca ntl y diff ere nt from ze ro ( Spea r­
m a n ‘s D = 544, P < 0.0 1; K end a ll ‘s r = 0.437, P < 0 .0 1) . T h e
sa m e co n cl u si o n i s reac h ed w h e n d ata from th e white reg i o n
( w hi ch h as a pa rt i cul a rl y hi g h l eve l o f va ri a t io n in D.
me lanogas ter) is ex clu d ed fr o m the an a l ys i s ( F, = 5.8, P = 0 .03;
Spe ar ma n ‘s D = 544, 0.02 < P < 0.05; K end a ll ‘s r = 0.374, 0.02 <
P < 0.05).

O n e h y p o th es i s to expl a in thi s tr end is tha t gene reg i o n s in
areas of reduced reco mbi na t io n h ave l owe r n eu tra l m ut a tion
rates. Per h aps recom bin a tio n itse l f i s mut age n ic. If th is w er e
t ru e, th en und e r a n e u tra l m o d el th ese gene reg i o n s sho uld a l so
be l ess di verged be t wee n spec i es th an gene reg i ons i n areas of

519
© 1992 Nature Publishing Group

LETTERS TO NATURE

0.08

0.06
QJ
0
C

~ 0.04
Q)
>
o 0.02

e

• •

0 .1-,-~~~-.–~~ ……. ~—,.–~,-~ ……. ~-.–~–.-

0 0 .010 0.020 0 .030 0.040

Coefficient of exchange

greater recombination rates 7. In Fig. 2 we show a plot of diver­
gence between D. melanogaster and D. simulans versus the
coefficient of exchange, including a ll gene region s for which we
have estimate s of divergence. Clearly, estimate s of divergence
from a larger number of gene regions are desirable. Nevertheless,
the lack of a significant positive regression coefficient (F , =
0.001, P == 0.983) with the available data argues against the
hypothesis that gene regions in areas of low recombination rates
have, on average, lower substitution rates.

Theoretical results show that at the time offixation of a neutral
variant , the amount of linked neutral variation is reduced, and
that the magnitude of the reduction depends on th e recombina­
tion rate 8 • But at a random time (which is an y time a genomic
region in a population is sampled), the average amount of
neutral nucl eotide polymorphism is unaffected by the recombi­
nation rate 8 9• . Therefore, we are unable to arrive at a sa tisfactory
neutral explanation for the patterns seen in Figs 1 and 2.

We propose that the positive correlation between DNA vari­
ation and recombination rate results from th e selective fixation
of advantageous mutants over a significant portion of the
genome. This correlation suggests that levels of neutral va riation
in man y of the gene regions for which variation has been
measured have been reduced by one or more hitch -hiking events.
Provided that a new selectively favoured mutation goes to
fixation before another advantageous mutation arises close to
it, each fixation will be surrounded by a ‘ window ‘ of redu ced
polymorphism, the relative size of which is proportional to the
rate of recombination for that region of the genome 4 • Thus,
where recombin ation rates are very low, ea ch fixation will cause
a wide window of reduced polymorphism, wherea s in reg ion s
of higher recombination, the window will be proportionatel y
smaller . Mo ving along a chromosome tow ards regions of pro ­
gressively lower recombin ation , the windows becom e clos er ,
and ma y begin to overl ap sub stanti ally. Thu s, re gions of low

Received 19 December 1991; accepted 7 Febru.Jry 1992 . 18 . Langley, C. H., Montgomery. E_ & Quat tlebaum, W. F. Proc. natn. Acad. Sci. US.A. 79 , 5631-5635
(1982) .

19 . Aquadro, C. F., Deese, S. F .. Bland, M. M., Langley, C. H. l aurie~Ahlberg, C. C. Genetics U4 , 1 . & Begun, D J. & Aquadr o . C. F. Genetics 129 , 1147 – 1158 (1991 ).
1165-1190 (1986) 2. Berry, A . J .. Ajioka. J. w. & Kr ei tman . M. Genetics 129 . 1111 -11 17 (199 1).

20. Langley. C. H. et al. Genetics 119 , 619 – 629 (1988 ). 3 . Maynar d Smit h. J. & Ha igh, J. Genet . Res . 23, 23 – 35 (197 4).
21. Game. A . Y. & Oakes ho lt. J. G. Genetics 126 , 1021-1031 (1990) . 4 . Kaplan . N. L.. Hudson , R. R. & Langley. C. H. Genetics 123. 887 – 899 (1 989)
22. Lange, B. W .. Langley . C.H . & Step han . W . Genelics 126, 92 1-93 2 (1990) 5. Nei, M . Molecular £vo futionary Genetics (Columbia Univ. Press . 1987)
23 . J. 80. 5350-5354 (1983). 6 . Linds le y , D. L. & Sand ler , L. Phil. Trans. R. Soc B. 277, 295 – 312 (19 77). Leigh Brown. A. Proc. natn. Acad . Sc( US.A.
24 . J. D. 290 , 677 – 682 (198 1). 7. Kimura, M. The Neutral Theory of Molecular Evolution & (Cambridge Univ. Press. 1983 ). Leigh Brown, A. !sh -Horow itz . Natu re
25 . Aq uadro . C. F .. 125, Lado . K. M. & Noon . W A. Genetics 119, 87 5 -888 (19881. 8 . Taji ma, F. Geneti cs 447 -4 54 (1990) .

R. 26 . Ashbumer, M. OrosophHa Genetic Maps (Drosophila Information Service 69. 1991) . 9 . Hudson . R. Theor. Popular . Biol . 23. 183 – 20 1 (1983) .
27. Lindsley. D. L. & Grell, E. H. Gene tic Variations of Drosophila melanogaster (Carnegie Inst itute, 10 Birky. C. W. Jr & Wa lsh . J. B. Proc. natn A cad Sci . US .A 85 , 6414-64 18 (1988)

Washington DC. 196 7) 11. McDonald, J, H. & Kre itman, M. Na ture 351, 65 2- 65 4 (1991) .
28. Sousa. V. Chromosome Maps of Drosophila (CRC, Boca Raton. Florida. 1988) . 12 . Aguade. M ., Miyashita , N. & Langley. C.H. Molec. Biol. E:vol, 6, 123 – 130 (19881 .
29 . J .. & 254 . 221 – 225 (1991) . 13 . Miya sllita . N. & L angl ey. C. H. Genetics 120. Merriam. Ashburner . M.. Hartl , D . L. Ka fatos. F. C. Science 199 – 212 (19881.

w., c. 5. 30 . Hochman. B. in Genetics and Biology of Drosophila Vol. 1b (eds Ashburner, M. & Novitski, E.) 14 . Schaeffer. S. A qua dro , C. F. & Langley, H. Mol ec. Biol. Evol. 30 – 40 (1988) .
903 – 928 (Academ ic. New York, 1976) . 1 5 . Langley, C. H. in Population Biology of Genes and Molecule s (eds Takahata, N. & Crow. J F.)

75 – 91 (Baifuk an. Japan).
16 . Eanes . W. F., Aj ioka, J. W .. Hey, J. & Wes ley, C. Molec. Bio/. £vol. 6, 38 4 – 397 (1989). ACKNOWLEDGEMENTS. We thank M. Nachman f or comments and all members of our laboratory for
17 . Takano . T. S .. Ku sakabe . 5 . & Mukai. T. Genet ics 129, 753 – 761 (1991) . discussion. This work was supported by the NIH and NSF

520

FIG. 2. Scatterp lot of sequence divergence between D. me/anogaster and
D. simulans versus coefficient of exchange in 0. melanogaster. Autosomal
and X-linked genes are represented by hatched and closed circles. respec­
tively . Coefficients of exchange for X-linked and aut osomal regions are
modified as described in Fig. 1 legend. Regression line is indicated by a
solid line.

recombination are ‘hit’ by selective sweep s more often, keeping
polymorphism at a lower average level. Mutations driven to
fixation by meiot ic drive or biased gene conversion would ha ve
similar evo lutionary consequences. Hitch-hiking does not affect
interspecific divergence’° , con sistent with the observed lack of
a corre lation between DNA sequence divergence and recombi­
nation rate .

McDonald and Kreitm a n 11 propo sed that patterns of synony ­
mous and non-synonymous va riation at the Adh locu s in and
between three Drosophila species are incompatible with
neutralit y. They sugg ested that selective fixation of amino -acid
polymorphisms at this locus is the best expl a nation for their
data. Furthermore, they speculate th at selective fixations
occur at a large number of loci. Our results are consistent
with thi s view . However, the number of sele ctively favoured
nucleotides relative to the size of the genome could still be
quite small 4 •

C learly, hitch-hiking and reco mbination rate s do not explain
all of the heterogeneity in levels of var iation across the D.
melanogaster genome. Variation in mutation rate and functional
constraint , as well as several different forms of selection, have
roles in shaping local leve ls of DNA sequence variation . But
the analy sis pre sented here pre se nts the first eviden ce th at hitch ­
hiking driv en b y selective fixation of new mutations may con­
stra in le vels of nucleotide pol ymorphi sm over large portions of
the D. melanoga ster genome. Inferenc e of effective population
size from levels of DNA variation may be co mpromis ed by this
phenomenon.

Much effort is being expended to assemble physical and
genetic maps in sever al species. An unexpected benefit of these
genome mapping projec ts is that it will be possible to examine
wheth e r correla tion s between recombination rates and levels of
DNA variation a re a general phenomenon in natural populations
o f other taxa , including human s. D

NATURE · VOL 356 · 9 APRIL 1992

© 1992 Nature Publishing Group

  • Levels of naturally occurring DNA polymorphism correlate with recombination rates in D. melanogaster

Ecology homework help

Research Gate

Proj ec t

Project

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/12069307

Positive and Negative Selection in the DAZ Gene Family

Article in Molecular Biology and Evolution · May 2001

DOI: 10.1093/oxfordjournals.molbev.a003831 · Source: PubMed

CITATIONS READS

74 41

2 authors:

Joseph P Bielawski Ziheng Yang

Dalhousie University University College London

179 PUBLICATIONS 7,211 CITATIONS 310 PUBLICATIONS 70,548 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

An Integrated Microbiome and Genetic Analysis of Pediatric Crohn’s Disease View project

The Origin of Plants: Genomes, Rocks, and Biogeochemical Cycles View project

All content following this page was uploaded by Ziheng Yang on 19 January 2016.

The user has requested enhancement of the downloaded file.

Positive and Negative Selection in the DAZ Gene Family

Joseph P. Bielawski and Ziheng Yang
Department of Biology, Galton Laboratory, University College London, London, England

Because a microdeletion containing the DAZ gene is the most frequently observed deletion in infertile men, the
DAZ gene was considered a strong candidate for the azoospermia factor. A recent evolutionary analysis, however,
suggested that DAZ was free from functional constraints and consequently played little or no role in human sper-
matogenesis. The major evidence for this surprising conclusion is that the nonsynonymous substitution rate is similar
to the synonymous rate and to the rate in introns. In this study, we reexamined the evolution of the DAZ gene
family by using maximum-likelihood methods, which accommodate variable selective pressures among sites or
among branches. The results suggest that DAZ is not free from functional constraints. Most amino acids in DAZ
are under strong selective constraint, while a few sites are under diversifying selection with nonsynonymous/
synonymous rate ratios (dN/dS) well above 1. As a result, the average dN/dS ratio over sites is not a sensible measure
of selective pressure on the protein. Lineage-specifc analysis indicated that human members of this gene family
were evolving by positive Darwinian selection, although the evidence was not strong.

Introduction

Azoospermia is the most common form of infertil-
ity in human males (Shinka and Nakahori 1996). A lo-
cus on the human Y chromosome, the azoospermic fac-
tor (AZF), is believed to contain a gene, or genes, cru-
cial to proper differentiation of male germ cells. The
observation that microdeletions at three different loci of
AZF occur in 5%–15% of infertile men supports this
hypothesis (Ferlin et al. 1999). One of these loci (AZFc)
encodes the Deleted in AZoospermia (DAZ) gene. Be-
cause AZFc is the most frequently observed deletion in
infertile men, it was considered a strong candidate for
the azoospermia factor (Ferlin et al. 1999). Genes from
a number of different pathways, however, are required
for normal spermatogenesis (Elliott and Cooke 1997).

DAZ is located on the Y chromosome, but it is
closely related to the autosomal gene DAZL1. While
DAZL1 is present in all vertebrates, DAZ is found only
in Old World Monkeys. Thus, DAZ [Yq11.23 ] is be-
lieved to have evolved via translocation of DAZL1
[3p24] to the Y chromosome (Saxena et al. 1996; Grom-
oll et al. 1999) some time after the divergence of Old
and New World monkeys; Kumar and Hedges (1998)
dated that divergence to about 40 MYA. After the trans-
location event, DAZ underwent a series of rearrange-
ments and a modifed copy was amplifed, yielding a Y
gene cluster.

DAZ and DAZL1 have a functional role in fertility.
Both DAZ and DAZL1 are expressed exclusively in germ
cells (Cooke et al. 1996; Ruggiu et al. 1997; Gromoll
et al. 1999), and in humans DAZ expression is highest
in spermatogonia (Menke, Mutter, and Page 1997). Ex-
perimental elimination of DAZL1 in mice results in ter-
mination of germ cell development beyond the sper-
matoginial stage (Ruggiu et al. 1997). Moreover, Y-en-
coded human DAZ can compliment the sterile phenotype
of DAZL1 null mice, yielding a partial recovery of sper-

Key words: DAZ, DAZL1, gene family, maximum likelihood,
codon model, positive selection.

Address for correspondence and reprints: J. P. Bielawski, Depart-
ment of Biology, University College London, 4 Stephenson Way, Lon-
don NW1 2HE, United Kingdom. E-mail: j.bielawski@ucl.ac.uk.

Mol. Biol. Evol. 18(4):523–529. 2001
q 2001 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038

matogenesis, which suggests the same or similar target
mRNA for DAZ and DAZL1 during spermatogenesis
(Slee et al. 1999). Although the specifc functions of
DAZ and DAZL1 are unknown, the presence of RNA
recognition motifs suggests that these genes could be
involved in controlling the cell cycle switch from mi-
totic to meiotic cell division (Gromoll et al. 1999); this
cell cycle switch is controlled by RNA-binding proteins
in yeast (Watanabe et al. 1997).

Surprisingly, a recent evolutionary analysis of the
DAZ family (DAZ and DAZL1 genes) indicated a lack
of functional constraints on DAZ. Agulnik et al. (1998)
found a high rate of nonsynonymous substitution, sim-
ilar rates between exons and introns, and similar rates
among the three codon positions. They concluded that
there were no functional constraints on evolution of
DAZ and that patterns of sequence divergence were due
to neutral drift. They hypothesized that Y-linked DAZ
played little role in human spermatogenesis.

The nonsynonymous-to-synonymous rate ratio (dN/
dS) provides a sensitive measure of selective pressure on
the protein. However, when selection pressure varies
among amino acid sites, the average dN/dS ratio might
not be very informative about the evolutionary process-
es affecting the gene. The objective of this study was to
investigate the role of both purifying and positive selec-
tion on the DAZ gene family by using maximum-like-
lihood methods that accommodate differences in selec-
tive pressures among sites (Nielsen and Yang 1998;
Yang et al. 2000). Our fndings indicated that DAZ was
not free of functional constraints and that other expla-
nations for its rapid rate of nonsynonymous evolution
must be considered. There has been considerable debate
as to whether rapid evolution in gene families is caused
by positive Darwinian selection after gene duplication
(Ohta 1993) or by relaxation, but not complete loss, of
functional constraints in redundant genes (Kimura 1983;
Li 1985). In the latter case, a new function might evolve
when formerly neutral substitutions convey a selective
advantage in a novel environment or genetic background
(Dykhuizen and Hartl 1980). We also examined variable
selective pressures among lineages (Yang 1998; Yang
and Nielsen 1998), and our fndings suggest that both

523

D
ow

nloaded from
http://m

be.oxfordjournals.org/ by guest on O
ctober 18, 2015

C (0.001)
(3.47) b

d (1.44)

DAZL1: Mus

a (0.10)

DAZL 1: Macacca

DAZL 1: Human

e (0.35)
r—‘-(o_.3_5_) __ DAZ: Macacca

0.1 L.. __ 9_(_1._1_4) __ DAZ:Human

524 Bielawski and Yang

FIG. 1.—Phylogeny for the DAZ gene family. This topology was
recovered from a maximum-likelihood analysis of nucleotide sequenc-
es and also from least-squares analysis of synonymous divergence.
Branch lengths are proportional to the mean number of nucleotide sub-
stitutions per codon as inferred under model D: free ratios (Yang 1998).
All analyses were conducted using unrooted topologies; this topology
is rooted for convenience. Numbers in parentheses are branch-specifc
v ratios estimated under model D.

models could have played a role in the evolution of the
DAZ gene family.

Materials and Methods
Sequence Data

Two data sets were compiled, refecting a trade-off
between more characters versus more taxa. Data set 1
was composed of 618 bp of DNA sequence (after ex-
clusion of gaps) from fve representatives of the DAZ
gene family (fg. 1). Sequences of DAZL1 were from
Homo sapiens (GenBank accession number U066078),
Macaca mulatta (AF053608), and Mus musculus
(U046694), and sequences of DAZ were from H. sapiens
(NM004081) and Macaca fascicularis (AJ012216). Se-
quences included exons 1–6, A7, C8, and 10. Macaca
fascicularis DAZ (AJ012216) contains an intragenic du-
plication of exons 2–6 and multiple copies of exons 7
and 8. We sampled the 39 copy of exons 2–6, which is
99% similar to the 59 copy. The ‘‘A’’ copy of exon 7
and the ‘‘C’’ copy of exon 8 were sampled because each
predates divergence of the human and Macaca lineages
(Gromoll et al. 1999). Data set 1 was used to investigate
variation in selective pressure among lineages. To study
variation in selective pressure among sites, however,
more sequences were needed. Hence, a second data set
was compiled consisting of 11 members of the DAZ
gene family (fg. 2), but only 291 bp of DNA sequence.
Included in data set 2 were exons 3–5 and portions of
exons 2 and 6. This data set was composed almost en-
tirely of the RNA recognition domain, which spans ex-
ons 2–5. Data set 2 included DAZL1 from Cebua apella
(AF053608), H. sapiens (U066078), M. mulatta
(AF053608), and Papio hamadryas (AF053607); a sin-
gle copy of DAZ from H. sapiens (NM004081), Pan
troglodytes (AF072324), and M. fascicularis
(AJ012216); and two DAZ clones (C1 and C2) from P.
hamadryas (C1: AF07230; C2: AF07321) and M. mu-
latta (C1: AF072322; C2: AF072323). Clones from P.
hamadryas and M. mulatta were divergent copies from
a multicopy DAZ array on the Y chromosome (Agulnick
et al. 1998). Saxena et al. (2000) recently reported that
human DAZ genes occur as a four-copy array in the
AZFc region of the Y chromosome. However, they

found that the four copies differed by only a single,
silent, transition in exon 7A, so only one copy was in-
cluded in our analysis.

Data Analysis

Tree topologies were estimated using maximum
likelihood (ML) under the general time-reversible
(GTR) model with a discrete gamma model (dG) of rate
variation among sites (Yang 1994a, 1994b). Trees also
were estimated by least-squares from synonymous di-
vergences estimated by ML under a codon model of
evolution (Goldman and Yang 1994). The PAUP* com-
puter program (Swofford 2000) was used for conducting
tree searches.

We implemented four nested models of variable se-
lective pressures among branches (Yang 1998; Yang and
Nielsen 1998). Model A was the simplest and assumed
the same v ratio for all branches. Models B and C were
based on the prediction that a gene family evolves under
different selective pressures following gene duplication.
Model B assumed two v ratios: one for the branch pre-
dating the translocation to the Y chromosome (fg. 1;
branch a), and a second for branches postdating the
translocation (branches b–g). Model C assumed three v
ratios: one for branch a, one for DAZL1 branches post-
dating the translocation (branches b–d), and one for all
DAZ branches (branches e–g). Model D (free ratios) as-
sumed an independent v ratio for each branch of a to-
pology and was employed to evaluate the potential for
positive selection in any one branch of the tree.

ML models (Yang and Nielsen 2000; Yang et al.
2000) also permit testing and identifcation of selective
pressures at individual codon sites. We implemented
three such models: M3 (discrete), M7 (beta), and M8
(beta&v). M3 assumed two site classes with the pro-
portions f0 and f1 and ratios v0 and v1 estimated from
the data. M7 assumed that v ratios were distributed
among sites according to a beta distribution. Depending
on parameters p and q, the beta distribution can take a
variety of shapes within the interval (0, 1). M8, an ex-
tension of M7, added an extra class of sites having an
v parameter freely estimated from the data. Positive se-
lection was indicated when an v parameter of M3 or
M8 was .1. The likelihood ratio test was used to com-
pare a one-ratio model (M0) with M3 and to compare
M7 with M8. If there were sites with v . 1, Bayesian
methods were used to calculate the posterior probability
that a site fell into each site class; sites with high prob-
abilities for v . 1 were likely to be under positive Dar-
winian selection (Yang et al. 2000).

All ML analyses of codon models were performed
using the codeml program of the PAML package (Yang
1999). The models employed correction for transition/
transversion rate bias and codon usage bias, features of
DNA sequence evolution that have a signifcant effect
on the estimation of substitution rates (Yang and Nielsen
1998, 2000).

D
ow

nloaded from
http://m

be.oxfordjournals.org/ by guest on O
ctober 18, 2015

A. Maximum likelihood tree from GTR+dr

t

DAZL 1: Cebus ape/la
DAZL 1: Papio hamadryas
DAZL 1: Homo sapiens
DAZL 1: Macacca mulatta

Translocation

0.1

DAZ: Macacca fascicularis
DAZ: Papio hamadryas C2
DAZ: Papio hamadryas C1
DAZ: Macacca mulatta C2
DAZ: Macacca mulatta C1
DAZ: Pan troglodytes

—- DAZ: Homo sapiens

B. Least squares tree from synonymous distances

t

DAZL1: Cebus

DAZL 1: Papio hamadryas
DAZL 1: Homo sapiens
DAZL 1: Macacca mulatta

Translocation

DAZ: Pan troglodytes
DAZ: Homo sapiens
DAZ: Papio hamadryas C1
DAZ: Macacca mulatta C2
DAZ: Macacca mulatta C1

— DAZ: Macacca fascicularis ———– DA Z: Papio hamadryas C2
0.1

Selection in DAZ Gene Family 525

FIG. 2.—Candidate topologies for the DAZ gene family in primates. A, Tree topology recovered from a maximum-likelihood analysis under
the GTR substitution matrix combined with a gamma correction for among-sites rate variation. B, Tree topology recovered from least-squares
analysis of synonymous divergence. Branch lengths are proportional to the mean number of nucleotide substitutions per codon as inferred under
model M8: beta&v (Yang et al. 2000). All analyses were conducted using unrooted topologies; these topologies are rooted for convenience.

D
ow

nloaded from
http://m

be.oxfordjournals.org/ by guest on O
ctober 18, 2015

Results
Variable Selection Pressure Among Lineages

Phylogenetic analyses of data set 1 by different tree
reconstruction methods yielded the same tree topology
(fg. 1), and this topology was used to analyze variable
selective pressures among lineages. Four models (A–D)
were ftted by ML to data set 1 (table 1). The estimate
of v for model A (v 5 0.295) represented an average
over all codon sites and branches. Model A was then
compared with model B, which assumed that selective
constraints changed after translocation of DAZL1 to the
Y chromosome. Twice the difference in their likelihood
scores (2d) was compared with a x2 distribution with
degrees of freedom equal to the difference between
models in number of parameters. This likelihood ratio
test indicated that model B provided a signifcantly bet-
ter ft to these data (2d 5 17.8, df 5 1, P 5 0.000025).

The dN/dS ratio after the translocation, v1 5 0.51, is
signifcantly higher than that prior to the translocation,
v0 5 0.10.

Model B assumed that both autosomal DAZL1 and
the copy that was translocated to the Y chromosome
(DAZ) experienced the same change in selective con-
straints after the translocation event. This simple model
was compared with a more complex model (model C)
in which changes in selective constraints after the trans-
location were allowed to differ between DAZL1 and
DAZ (table 1). Estimates under model C indicate very
different v values for DAZ and DAZL1 after duplication
(table 1). However, the likelihood of model C was not
signifcantly better than that of model B (2d 5 1.12, df
5 1, P 5 0.29).

Because positive selection at any one point in the
phylogeny could have affected our results, we applied

526 Bielawski and Yang

Table 1
Log Likelihood Scores and Parameter Estimates Under Models of Variable Selection Pressures Among Lineages

Model p Parameters for Branches ,

A: One ratio . . . . . . . . . . . . . 1 v0 5 0.295 for all branches 21,442.44
B: Two ratios . . . . . . . . . . . . 2 v0 5 0.102 for branch a 21,433.52

v1 5 0.513 for branches b, c, d, e, f, and g
C: Three ratios . . . . . . . . . . . 3 v0 5 0.103 for branch a 21,432.96

v1 5 0.290 for branches b, c, and d
v2 5 0.574 for branches e, f and g

D: Free ratios . . . . . . . . . . . . 7 v0 5 0.100 for branch a 21,426.40
v1 5 3.474 for branch b
v2 5 0.001 for branch c
v3 5 1.444 for branch d
v4 5 0.350 for branch e
v5 5 0.355 for branch f
v6 5 1.144 for branch g

NOTE.—Analyses were conducted using k as a free parameter and the F61 model of equilibrium codon frequencies. p is the number of branch-specifc v
parameters. v ratios greater than 1 are in bold. Branches are defned in fgure 1.

the free-ratios model (model D) to the same data. The
likelihood score under model D was signifcantly better
than that obtained for model B (2d 5 14.2, df 5 5, P
5 0.014). Branches b, d, and g exhibited v values .1
(table 1). Use of the simpler but less realistic F334
model, which calculates codon frequencies by using
base composition at the three codon positions, produced
similar results. Note that v for branch g was slightly
less than 1 under the F334 model (v6 5 0.99), whereas
it was greater than 1 under the F61 model (v6 5 1.144).

Variable Selection Pressure Among Sites

Phylogenetic analysis of data set 2 under the nu-
cleotide model GTR1dG recovered a topology in which
divergent copies of DAZ from the same species were
not monophyletic, indicating that divergent copies of
DAZ originated in an early amplifcation event and per-
sisted in multiple lineages (fg. 2A). This result is similar
to that obtained in a previous analysis of the DAZ gene
family (Agulnik et al. 1998). We also inferred a tree
topology from synonymous divergences (fg. 2B). This
tree, although different from the tree obtained from the

Table 2

nucleotide analysis, also indicated that some copies of
DAZ originated from an early amplifcation event and
persisted to the present day. Both trees also indicate a
clear bifurcation between all DAZL1 and DAZ sequenc-
es, supporting the hypothesis that a single translocation
event gave rise to the Y-encoded DAZ. To investigate
the impact of tree topology, models of variable v values
among sites were analyzed using both topologies in fg-
ure 2 (table 2). The small size of data set 2 (291 bp; 97
codons) prevented use of the parameter-rich model of
empirical codon frequencies (the F61 model), and the
F334 model was used instead.

The discrete model (M3), which allowed two site
classes with independent v ratios, provided a signifcant
improvement over the one-ratio model (M0) regardless
of the tree topology assumed (table 3). The selective
pressure is not uniform among amino acid sites. Esti-
mates of parameters under M3 suggest that most sites
(95%–97%) are under selective constraint, with v0 5
0.35–0.37, while a few sites (3%–5%) are evolving by
positive selection, with v1 close to 6. Both models, M3
and M8, which allowed for the presence of positively

D
ow

nloaded from
http://m

be.oxfordjournals.org/ by guest on O
ctober 18, 2015

Log Likelihood Scores and Parameter Estimates for Four Models of Variable v’s Among Sites and Two Tree
Topologies

Model Parameter Estimates Positively Selected Sites ,

Tree 1 (fg. 2A)
M0: One ratio . . . . . . . . . . . v 5 0.47 None 2747.19
M3: Discrete . . . . . . . . . . . .

M7: Beta . . . . . . . . . . . . . . .
M8: Beta&v . . . . . . . . . . . .

Tree 2 (fg. 2B)
M0: One ratio . . . . . . . . . . .

v0 5 0.37, f0 5 0.97
v1 5 5.66 (f1 5 0.03)
p 5 0.63, q 5 0.78
p 5 2.2, q 5 3.1, f0 5 0.98
v1 5 12.47 (f1 5 0.02)

v 5 0.47

26, 28, 42

Not allowed
28

None

2742.72

2745.84
2742.70

2758.47
M3: Discrete . . . . . . . . . . . .

M7: Beta . . . . . . . . . . . . . . .
M8: Beta&v . . . . . . . . . . . .

v0 5 0.35, f0 5 0.95
v1 5 5.96 (f1 5 0.05)
p 5 0.30, q 5 0.37
p 5 107, q 5 197, f0 5 0.95
v1 5 5.70 (f1 5 0.05)

26, 28, 42, 91

Not allowed
26, 28, 42, 91

2748.90

2754.99
2748.91

NOTE.— p and q are parameters of the beta distribution. f is the proportion of sites assigned to an individual v category or to a beta distribution with shape
parameters p and q. The proportion f1 (in parentheses) is not a free parameter. Positively selected sites are those with posterior probabilities (P) . 0.50, and those
with P . 0.95 are in bold.

Selection in DAZ Gene Family 527

Table 3
Likelihood Ratio Statistic (2d) for Comparing Models of
Variable v’s Among Sites

M3 vs. M0 M8 vs. M7

Tree 1 (fg. 2A) . . . . . . . . 8.94* 6.82*
Tree 2 (fg 2B) . . . . . . . . 19.14* 12.16*

NOTE.—See table 2 for model parameters.
* Signifcant at the 5% level (x2 5 5.99, df 5 2).5%

selected sites indicated that some variation in selective
pressure was due to positive selection (table 2). Likeli-
hood ratio tests indicated that these models ft the data
better than models in which positively selected sites
were not allowed (table 3). It is also noteworthy that
regardless of model or topology, v values for sites not
subject to positive Darwinian selection were well below
1 (table 2), indicating evolution by purifying selection.

Agulnick et al. (1998) hypothesized that there were
no functional constraints on DAZ sequences. To test this
hypothesis specifcally, we reanalyzed only the DAZ se-
quences of data set 2. The results were consistent with
the previous analysis of data set 2; v values for those
sites not subject to positive Darwinian selection were
well below 1 (e.g., tree 1—discrete model: v0 5 0.43,
f0 5 0.93, v1 5 3.4, f1 5 0.07; beta&v model— p 5
98, q 5 122, f0 5 0.94, v1 5 4.1, f1 5 0.06).

Discussion

Maximum-likelihood analysis of the DAZ gene
family revealed signifcant variation in selective pres-
sures among lineages and among sites. The majority of
sites are clearly subject to purifying selection, with the
nonsynonymous rate being well below the synonymous
rate. A small fraction of sites exhibit nonsynonymous
rates almost six times the synonymous rate, indicating
the action of positive Darwinian selection. Lineage-spe-
cifc analyses indicated that following the translocation
of an autosomal copy of DAZL1 to the Y chromosome,
both loci experienced increased rates of nonsynonymous
substitution. In DAZL1 this was due, at least in part, to
early evolution by positive Darwinian selection. Later,
DAZL1 of M. fascicularis returned to evolution by pu-
rifying selection, whereas DAZL1 of humans continued
to evolve by positive Darwinian selection. Although
there was also an increa

Ecology homework help

A Single Determinant Dominates the Rate of Yeast Protein Evolution

D. Allan Drummond,* Alpan Raval,�� and Claus O. Wilke§
*Program in Computation and Neural Systems, California Institute of Technology, Pasadena; �Keck Graduate Institute,
Claremont; �School of Mathematical Sciences, Claremont Graduate University; and §Section of Integrative Biology and
Center for Computational Biology and Bioinformatics, University of Texas at Austin

A gene’s rate of sequence evolution is among the most fundamental evolutionary quantities in common use, but what
determines evolutionary rates has remained unclear. Here, we carry out the frst combined analysis of seven predictors
(gene expression level, dispensability, protein abundance, codon adaptation index, gene length, number of protein-protein
interactions, and the gene’s centrality in the interaction network) previously reported to have independent infuences on
protein evolutionary rates. Strikingly, our analysis reveals a single dominant variable linked to the number of translation
events which explains 40-fold more variation in evolutionary rate than any other, suggesting that protein evolutionary rate
has a single major determinant among the seven predictors. The dominant variable explains nearly half the variation in
the rate of synonymous and protein evolution. We show that the two most commonly used methods to disentangle the
determinants of evolutionary rate, partial correlation analysis and ordinary multivariate regression, produce misleading or
spurious results when applied to noisy biological data. We overcome these diffculties by employing principal component
regression, a multivariate regression of evolutionary rate against the principal components of the predictor variables.
Our results support the hypothesis that translational selection governs the rate of synonymous and protein sequence evol-
ution in yeast.

Introduction

A protein’s evolutionary rate, commonly measured by
the number of nonsynonymous substitutions per site in its
encoding gene, is routinely used to characterize functional
importance, detect selection (Nei and Kumar 2000), create
phylogenetic trees (Kurtzman and Robnett 2003), identify
orthologous genes (Wall, Fraser, and Hirsh 2003), and infer
the time of major evolutionary events. However, what
determines a protein’s evolutionary rate has remained the
subject of active speculation and ongoing research (Pál,
Papp, and Hurst 2001; Akashi 2003; Rocha and Danchin
2004).

Recently, studies have found signifcant infuences on
evolutionary rate from many disparate variables: proteins
have been reported to evolve slower if their encoding genes
have a higher expression level in mRNA molecules per cell
(Pál, Papp, and Hurst 2001), if they have a higher codon
adaptation index (CAI) (Rocha and Danchin 2004; Wall
et al. 2005), more protein-protein interactions (higher ‘‘de-
gree’’) (Fraser et al. 2002), shorter length (Marais and Duret
2001), a smaller ftness effect upon gene knockout (higher
‘‘dispensability’’) (Hirsh and Fraser 2001; Yang, Gu, and Li
2003; Zhang and He 2005), or a more central role in inter-
action networks (‘‘betweenness centrality,’’ or simply ‘‘cen-
trality’’) (Hahn and Kern 2005).

Here, we frst demonstrate that the analytical techni-
ques widely used to establish independent roles for many
effects, partial correlation and multiple regression, generate
highly signifcant but entirely spurious effects given noisy
data such as those available for evolutionary analyses.
Then, using a technique which does not suffer from these
problems, we carry out a comprehensive analysis designed
to uncover the major independent correlates of evolutionary
rate in the model eukaryote Saccharomyces cerevisiae. We

Key words: Saccharomyces cerevisiae, evolutionary rate, gene
expression, protein-protein interactions, dispensability, translational
selection.

E-mail: cwilke@mail.utexas.edu.

Mol. Biol. Evol. 23(2):327–337. 2006
doi:10.1093/molbev/msj038
Advance Access publication October 19, 2005

� The Author 2005. Published by Oxford University Press on behalf of
the Society for Molecular Biology and Evolution. All rights reserved.
For permissions, please e-mail: journals.permissions@oxfordjournals.org

determine the number of such correlates, their strength,
and their relationship to the biological variables used in
previous studies. Finally, we ask what these correlates re-
veal about the biological constraints on protein sequence
evolution.

Materials and Methods
Genomic Data

We obtained CAIs and evolutionary rates (nonsynon-
ymous substitutions per site dN, synonymous substitutions
per site dS, adjusted synonymous substitutions dS# [Hirsh,
Fraser, and Wall 2005], and ratios dN/dS and dN/dS#) from
four-way yeast species alignments for 3,036 S. cerevisiae
genes (Wall et al. 2005, supporting information, Table
4). Deletion-strain growth rate data were downloaded from
http://chemogenomics.stanford.edu/supplements/01yfh/fles/
orfgenedata.txt; the average growth rates of the homozy-
gous deletion strains were used as dispensability measure-
ments in our analysis. The fltered yeast interactome data set
(Han et al. 2004) provided interaction network hub types for
199 genes and the number of interactions for 1,379 yeast
genes. The latter data set was used to compute betweenness-
centrality values, which quantify the frequency with which
a network node lies on the shortest path between other
nodes, as described by Hahn and Kern (2005). Genomic
data for Saccharomyces paradoxus and Kluyveromyces
waltii were obtained exactly as described by Drummond
et al. (2005). Genome sequences for Escherichia coli
K12 and Salmonella typhimurium LT2 were obtained from
the Institute for Genomic Research (Peterson et al. 2001),
with orthologs identifed and evolutionary rates computed
exactly as described (Drummond et al. 2005). Gene ex-
pression levels for E. coli measured in mRNAs per cell
in Luria-Bertani (LB) and M9 media were obtained from
Bernstein et al. (2002).

Statistical Analysis

We used R (Ihaka and Gentleman 1996) for statistical
analyses and plotting. The package �pls� was used to perform

D
ow

nloaded from
https://academ

ic.oup.com
/m

be/article/23/2/327/1118974 by guest on 16 D
ecem

ber 2021

328 Drummond et al.

principal component regression. We log transformed all
variables except dispensability. We decided whether or
not to log transform a variable based on whether log trans-
formation led to a higher R2. For those variables that
contained zeros, we added a small constant before the
log transformation, as previously suggested (Wall et al.
2005). This constant was 0.001 for dN, dS#, and dN/dS#
and 10

�7
for betweenness centrality. We scaled the predic-

tor variables to zero mean and unit variance before carrying
out the principal component analysis. In all regression anal-
yses (both against the original predictors and against the
principal components), we determined the statistical signif-
icance levels by starting with the full model and succes-
sively dropping the least signifcant predictor until only
signifcant predictors (P , 0.01) remained.

Results
Correlation and Partial Correlation Analysis

We used the yeast S. cerevisiae to examine the deter-
minants of evolutionary rate because it has been the subject
of many previous analyses (e.g., Pál, Papp, and Hurst 2001;
Fraser 2005) and has an enormous amount of available ge-
nomic, proteomic, and functional data. We frst examined
the raw correlation of six previously assessed biological
variables (expression, CAI, length, dispensability, degree,
and centrality) with protein evolutionary rate, as measured
by the number of nonsynonymous substitutions per site in
the underlying gene. A seventh variable, the number of pro-
tein molecules per cell (‘‘abundance’’), was also considered.
Table 1 shows that all variables except centrality correlated
signifcantly with evolutionary rate, as previously reported.

Expression level strongly correlates with evolutionary
rate, and higher expressed genes have higher CAIs (Akashi
2001), are less dispensable (Gu et al. 2003), more abundant
(Ghaemmaghami et al. 2003), and more likely to be found
in protein-protein interaction experiments (Bloom and
Adami 2003) than lower expressed genes. No inverse rela-
tionships have been posited by which these variables alter
the expression level. Thus, it is imperative to establish
whether these variables play a role independent of expres-
sion level. Following previous analyses (Pál, Papp, and
Hurst 2003; Lemos et al. 2005; Wall et al. 2005), we com-
puted the partial correlation of our seven variables with

Table 1
Partial Correlation Analysis of Seven Putative
Determinants of Evolutionary Rate

Correlation Partial Correlation
Variable X rX,dN rX,dNjgene expression VIF

Gene expression �0.537*** 0 2.72
CAI �0.565*** �0.338*** 2.46
Protein abundance �0.478*** �0.232*** 2.05
Gene length 0.136*** 0.010 1.25
Gene dispensability 0.265*** 0.183*** 1.08
Degree (number of �0.246*** �0.127* 1.70

protein-protein interactions)
Protein centrality �0.098# �0.082 1.64

(frequency on node-node
shortest paths)

#
NOTE.— P , 0.01; * P , 10�3; *** P , 10�9 .

evolutionary rate, controlling for expression level. Table 1
shows that CAI, dispensability, and degree all showed re-
duced but highly signifcant partial correlations, consistent
with previous studies (Hirsh and Fraser 2003; Wall et al.
2005), as did abundance.

Partial Correlations and Noisy Data

What can we conclude from highly signifcant partial
correlations? Yeast expression-level measurements from
multiple groups, even two using the same commercial ol-
igonucleotide array, correlated with coeffcients of only
0.39–0.68 (Coghlan and Wolfe 2000), demonstrating that
expression-level measurements are inaccurate and/or
simply refect the variability of gene expression across
growth conditions and strains. We refer to all such variabil-
ity as noise, regardless of its source. Noisy data are the rule
in genome-wide molecular studies, leading us to explore
what effect noise has on partial correlation analyses. As
a concrete example, CAI is so tightly bound to expression
level that a recent analysis used CAI as its preferred expres-
sion-level measurement (Wall et al. 2005). Might CAI’s
signifcant partial correlation only refect our inability to
control for the true (i.e., evolutionarily relevant) underlying
expression level? More generally, we can ask: what is the
expected partial correlation of two variables, controlling for
a third, when (1) the two variables relate only through de-
pendence on the third ‘‘master’’ variable and (2) all meas-
urements contain noise?

Given these conditions, we derive explicit formulas
for the expected partial correlation, its statistical signif-
cance, and its behavior under various limiting cases in
the Appendix. The expected partial correlation is, in gen-
eral, larger than zero because the full correlation refects
the true underlying master variable’s infuence, while par-
tial correlations can only remove the portion of this infu-
ence that is visible through a noisy measurement (box 1).
We show that, surprisingly, if measurements of an underly-
ing causal variable (e.g., expression level) are noisy, highly
signifcant partial correlations of virtually any strength be-
tween the dependent predictors can be obtained.

As a case in point, dispensability’s role has been vig-
orously debated (Hirsh and Fraser 2003; Pál, Papp, and
Hurst 2003; Wall et al. 2005) with correlation and partial
correlations acting as key analytical tools. Given a model
in which expression level X and noise completely determine
dispensability D and evolutionary rate K (see box 1), what
is the observed partial correlation rDKjX# if we ft variables to
approximately match the observed correlations between X#,
D, and K? As a concrete example, previous reports show
that, using parametric Pearson’s correlations, rX#K ’
�0.6 (Pál, Papp, and Hurst 2001; Wall et al. 2005),
rDK ’ 0.25 (Wall et al. 2005), rDX# ’ 0.2 (Pál, Papp, and
Hurst 2003), and rDKjX# ’ 0.24 (Wall et al. 2005). We can
obtain roughly the reported full correlations and rDKjX# ’

10
�9

0.23 6 0.02, P with 3,000 observations if the true
expression level X is normally distributed with mean 0.5
and standard deviation (SD) 0.25, and the observable
predictors X#, D, and K are equal to X plus zero mean nor-
mally distributed noise with SDs of 0.3, 0.7 and 0.1, respec-
tively. This highly signifcant partial correlation is entirely

D
ow

nloaded from
https://academ

ic.oup.com
/m

be/article/23/2/327/1118974 by guest on 16 D
ecem

ber 2021

A Single Determinant Dominates the Rate of Yeast Protein Evolution 329

Table 2
Results of Principal Component Regression Analysis on Seven Predictors and Five Measures
of Evolutionary Rate for 568 Saccharomyces cerevisiae genes

Principal Components

1 2 3 4 5 6 7 All

Percent variance explained in
dN 42.76*** 0.05 0.50 0.19 0.14 0.47 0.48 44.60***
dS 50.77*** 2.13** 0.88* 0.08 6.55*** 0.37 1.14* 61.92***
dN/dS 24.82*** 0.05 0.67 0.82 0.00 0.00 0.05 26.42***
dS# 6.70*** 0.19 7.31*** 0.26 0.06 0.14 1.25# 15.92***
dN/dS# 42.34*** 0.09 0.13 0.28 0.13 0.40 0.70# 44.07***

Percent contributions
Expression 32.8 1.2 1.5 0.1 1.2 11.2 52.1
CAI 28.3 3.1 8.4 0.9 2.7 17.6 39.0
Abundance 29.2 2.0 1.6 0.3 15.4 51.4 0.1
Length 2.0 1.1 86.4 0.0 2.1 0.3 8.2
Dispensability 1.8 13.0 0.0 84.0 0.0 0.9 0.3
Degree 5.0 36.7 1.9 6.2 38.9 10.9 0.4
Centrality 0.9 42.9 0.3 8.5 39.6 7.7 0.0

# ** ***
NOTE.— P , 0.01; * P , 10�3; P , 10�6; P , 10�9. Bold indicates that the indicated predictor contributes at least 20% to

the indicated component.

spurious: in this model, expression level and random noise
completely determine dispensability. Thus, the observed
statistical relationship between dispensability and evolu-
tionary rate, established by correlation and partial correla-
tion, would arise even if no actual relationship existed
except mutual dependence on noisily measured expres-
sion level.

Multivariate Regression Analysis

Because partial correlation analysis is not applicable to
the problem at hand, what other methods can we use to de-
termine the relative infuence of different predictors on the
rate of evolution? One obvious choice is multivariate re-
gression analysis, a method with the added beneft that
we can look at the infuence of all potential predictor var-
iables at the same time and can eliminate step by step those
predictors that contribute the least to the regression model.
Indeed, several authors have followed this route (Rocha and
Danchin 2004; Agrafoti et al. 2005). Regressing dN simul-
taneously against the seven predictors we consider here, we
fnd that all but centrality make a signifcant contribution to
the regression and that the overall R2 5 0.45.

Unfortunately, ordinary multivariate regression is not
appropriate to analyze the infuence of the various predic-
tors on the evolutionary rate either (box 1). The problem is
that the predictors intercorrelate, while multivariate regres-
sion implicitly assumes that the predictors are statistically
independent. This problem is widely discussed in the sta-
tistical literature, mostly in the context of ‘‘collinear’’ or
‘‘nearly collinear’’ predictors (Gunst and Mason 1977a,
1977b; Mandel 1982; Næs and Martens 1988). The vari-
ance infation factor (VIF) may be used to quantify the de-
gree of predictor collinearity, and table 1 reports VIFs for
our data. These VIFs indicate some collinearity but are not
high enough to raise signifcant concerns. However, for our
toy model (box 1) in which the two predictors refect the
same underlying variable plus noise, the VIFs are only
1.21 in both cases, yet the analysis demonstrates that mul-
tivariate regression and partial correlation break down any-

way. Collinearity and noise work together to undermine
these techniques.

Principal Component Regression Analysis

An alternative approach is to frst identify the indepen-
dent sources of variation in the data, and then determine the
contribution of each biological predictor to each source.
The technique of principal component regression offers
a standard way to carry out such an analysis.

In principal component regression (Mandel 1982),
multiple linear predictors (e.g., expression level and dis-
pensability) are scaled to zero mean and unit variance,
inserted in a matrix, and rotated such that the new coordi-
nate axes point in the directions of greatest predictor var-
iation. The new axes defne variables, called principal
components, which are linear combinations of the original
predictors. Subsequent linear regression of the response
(e.g., dN) on the rotated predictor data yields several pieces
of information per principal component: the proportion of
the response’s variance, R2, explained by the component,
the signifcance of this R2, and the fractional contribution
of each original predictor to the component. Because all
principal components are orthogonal and independent,
the total proportion of response variance explained by
the data is the sum of the component R2 values. Principal
component regression thus circumvents the debilitating
problems of partial correlation and multivariate regression
analyses (box 1) while yielding results which are, in some
ways, easier to interpret.

We carried out principal component regression on the
seven predictors analyzed above. Because the determina-
tion of principal components involves only the predictors
and not the response (i.e., dN or dS), there is only one
set of components and contributions from biological predic-
tors. The regression analysis generates response-specifc
results, in particular, the proportions of variance in dN,
dS, and so on, which each component explains. Table 2
shows numerical data from the analysis of dN and dS using
the seven predictors of expression, CAI, abundance, length,

D
ow

nloaded from
https://academ

ic.oup.com
/m

be/article/23/2/327/1118974 by guest on 16 D
ecem

ber 2021

a * b a * b so * 40 * 40
Expr ssion

30

§ ____
Expr ssion 30 N’ 40 e; N’

-u 30 20
e; 20

(I) -u
C: (I)

·;;; C: 30 10
C. CAI

10 ·;;; CAI § * * * * X 20 C. 0 —-== (I) X
(I) 0 (I) 20 2 3 4 5 u 2 3 4 5 (I)
C: u
ro C:

·;:: ro
ro 10 Abu dance

·;::
> ro 10 >
cf!. cf!. * * * * Ed * 0 — = — — — – 0 ~ — = = E==3

2 3 4 5 6 7 2 3 4 5 6 7

Principal component Principal component

330 Drummond et al.

FIG. 1.—Principal components regression on the rate of protein evo-
lution (dN) in 568 yeast genes reveals a single dominant underlying com-
ponent. (a) Of the seven principal components only one (starred) explained
a statistically signifcant proportion of the variation in dN. This component
explained 43% of the variance, while no other component explained more
than 1%. Expression level, CAI, and protein abundance determined most
of this dominant component (labeled), while the remaining predictors (in
order from top to bottom: length, dispensability, degree, and centrality)
determined ,10% of the component’s R2. See table 2 for numerical data.
(b) A larger data set (1,939 genes) excluding protein-protein interaction
predictors showed the same patterns as in (a).

dispensability, degree, and centrality; fgures 1a and 2a
show these data graphically.

Strikingly, for the rate of protein evolution, dN, one
principal component explained 43% of the variance with
high signifcance, while all other components explained
less than 1% (fg. 1). The single dominant component
was almost entirely (.90%) determined by roughly equal
contributions from three predictors: expression level, abun-
dance, and CAI.

While the causes of dNs variation have remained un-
clear, dS is constrained by translational selection: selection
for preferred codons, which correspond to abundant tRNAs
and are translated faster and more accurately (Akashi 1994,
2001), makes many synonymous changes unfavorable and
thus reduces dS (Hirsh, Fraser, and Wall 2005). Figure 2
shows that the dS results mirror those using dN: the frst
component, which is determined almost entirely by expres-
sion, abundance and CAI, is overwhelmingly dominant
(50.8% of dS variation). A second highly signifcant com-
ponent of modest size (6.6% of dS variation) appears but is
88.4% determined by abundance and CAI. Astonishingly,
the seven biological predictors explain a cumulative 61.9%
of the total variance in dS, with three predictors (expression,
abundance, and CAI) contributing roughly equal amounts
and accounting for 87% of the total variance explained. Be-
cause synonymous sites are thought to be under relatively
weak selection, we would expect random fuctuations
(noise) to contribute a large proportion of variation in
dS, yet our analysis suggests that selective pressures, even
those revealed using noisy data, account for almost two-
thirds of the dS variation among these genes.

The size of the seven-component data set (568 genes)
was severely limited by the requirement for genes having
measures for all seven predictors. In particular, we used
high-quality interactions measurements (Han et al. 2004)
for degree and betweenness centrality; eliminating these

FIG. 2.—Principal components regression on the rate of synonymous
site evolution (dS) in 568 yeast genes reveals a single dominant underlying
component. (a) Seven-predictor variables (see text) yielded seven principal
components, of which six (starred) explained a statistically signifcant pro-
portion of the variation in dS. The dominant component explained 51%
of the variance, while no other component explained more than 7%. See
fgure 1 caption for the breakdown of predictor contributions. (b) A larger
data set (1,939 genes) excluding protein-protein interaction predictors
showed the same patterns as in (a).

measurements, which apparently contribute negligible
amounts to evolutionary rate, more than triples the data
set size to 1,939 genes. We performed the same analysis
on this expanded set and obtained similar results (table 3,
and fgs. 1b and 2b).

To examine the possible effects of assuming a linear
model in the regression, we repeated our analyses using
only data ranks for the predictors and each response.
The results of this nonparametric analysis were virtually un-
changed from the parametric case (data not shown), indi-
cating that little information is contained in the relative
magnitudes of the variables.

It is common practice to interpret dS as the rate of se-
lectively neutral divergence and the ratio dN/dS as the de-
viation of protein evolutionary rate from neutral, putatively
allowing detection of purifying selection or adaptive evo-
lution. We analyzed dN/dS and found trends that were sim-
ilar to those observed in dN and dS alone (tables 2 and 3).
The dominant principal component explained only half the
variation in dN/dS compared to dN or dS, but the reason
seems obvious in light of our results: dN and dS appear
to refect the same underlying selective force, so dividing
one by the other removes much of the shared infuence.
In yeast, as in many other organisms, dS does not refect
neutral divergence but rather divergence constrained by
translational selection for preferred codons, as previous
authors have noted (Hirsh, Fraser, and Wall 2005). These
authors proposed an adjusted measure of dS, denoted dS#,
from which the infuence of codon preference has been
extracted (Hirsh, Fraser, and Wall 2005). We thus analyzed
dS# and dN/dS# (tables 2 and 3), and found that for dS# the
dominance of the frst principal component was obliterated.
While two components (component 1, mostly CAI, expres-
sion and abundance; and component 2, mostly dispensabil-
ity) appeared to make small but possibly meaningful
contributions (R2 . 6%) in the smaller seven-predictor data
set, these contributions were effectively eliminated in the

D
ow

nloaded from
https://academ

ic.oup.com
/m

be/article/23/2/327/1118974 by guest on 16 D
ecem

ber 2021

A Single Determinant Dominates the Rate of Yeast Protein Evolution 331

Table 3
Results of Principal Component Regression Analysis on Five Predictors and
Five Measures of Evolutionary Rate for 1,939 Saccharomyces cerevisiae genes

Principal Components

1 2 3 4 5 All

Percent variance explained in
dN 36.94*** 0.05 0.03 0.22 0.60*** 37.85***
dS 39.33*** 0.73** 0.09 1.93*** 1.92*** 44.01***
dN/dS 22.39*** 0.28 0.21 0.00 0.21 23.10***
dS# 1.26** 2.52*** 2.58*** 0.00 1.54** 7.91***
dN/dS# 37.61*** 0.28 0.00 0.14 1.16** 39.20***

Percent contributions
Expression 33.2 1.7 0.1 24.2 40.8
CAI 31.4 1.0 9.4 9.0 49.2
Abundance 31.3 0.6 0.4 65.8 1.9
Length 2.0 61.0 29.6 0.4 7.0
Dispensability 2.1 35.7 60.5 0.6 1.1

# ** ***
NOTE.— P , 0.01; P , 10�6; P , 10�9. Bold indicates that the indicated predictor contributes at least 20% to the

indicated component.

larger fve-predictor data set (R2 , 3%), even though the
major contributing predictors were still present. This sam-
ple size dependence suggests that the contributions of com-
ponents 1 and 2 are artifacts. Overall, our results are
consistent with the previous claim that dS# has been purged
of the infuence of selection on synonymous sites (Hirsh,
Fraser, and Wall 2005). As additional support, the dN/
dS# regression was nearly instinguishable from that of
dN (tables 2 and 3).

To assess the importance of phylogenetic distance on
our results, we carried out principal component regression
on dN and dS values calculated using two relatives of
S. cerevisiae, S. paradoxus and K. waltii, which diverged
roughly 5 and 100 MYA, respectively (Drummond et al.
2005).

For S. paradoxus, we obtained almost identical results
for dN as for the data of Hirsh, Fraser, and Wall (2005).
However, dS showed a much weaker, though still dominant,
frst component that explained 15% of the dS variance in-
cluding interaction data and 6% without these data, fvefold
more than any other variable. We traced the weaker dS signal
to differences in gene fltering (the smaller data set of Hirsh,
Fraser, and Wall (2005) omits sequences whose gene-level
phylogeny did not match the species-level pattern and se-
quences containing introns and potential frameshifts) and
in codon frequency estimates. Controlling for gene fltering,
the nine-free-parameter codon frequency model used by
Hirsh, Fraser, and Wall (2005) produced a larger signal than
the sixty-free-parameter model used by Drummond et al.
(2005), indicating that analyses of dS may be sensitive to
estimation methodologies (data not shown).

For the distant relative K. waltii, we again obtained
nearly identical results for dN. For the 2,412 genes without
(and 752 genes with) interaction data, one principal com-
ponent determined by CAI, abundance, and expression ex-
plained 41% of the variance in dN, while all other
components explained ,2%. For dS, no dominant compo-
nent emerged, and the best component (mostly expression
and CAI) explained 1.7% of the variance. The lack of any
predictive signal for dS is not surprising because the dS val-
ues relative to K. waltii average more than 14 substitutions

per synonymous site, far beyond the range of reliable esti-
mation. These high dS values may result from a combina-
tion of the large amount of time separating the species,
changes in synonymous pressures, and diffculties in ortho-
log identifcation and alignment. The robust dN results lend
weight to the frst two explanations. We expect that as even
more distant relatives are analyzed, the dN results will be
attenuated by noise, alignment degradation, and phenotypic
changes that must, in some cases, be linked to changes in
relative gene expression levels.

To assess whether the trends we identifed for yeast
extend to other species, we examined evolutionary rates
in 2,605 E. coli genes relative to S. typhimurium. Lacking
global protein abundance, interaction, and dispensability
data for E. coli, we used length, two measures of expression
level, refecting growth in minimal M9 and rich LB media,
and two measures of codon optimization, CAI and the fre-
quency of optimal codons Fop (Ikemura 1985), as predic-
tors. Again, a dominant component emerged which
explained 36% of the dN variance (16-fold more than
any other) and 25% of the dS variance (38-fold more than
any other). Because most of the included predictors are
translation oriented in some way, our results offer no con-
clusion as to the possible infuence of other predictors in
E. coli. However, the remarkable similarity to the yeast re-
sults, including the large portion of variance explained, sug-
gests that similar selective forces have shaped evolutionary
rates in this prokaryotic organism.

Analysis of Binary Variables Using Analysis of
Covariance

In all the above analyses, we found that protein-protein
interactions and gene dispensability showed little or no ap-
parent infuence on the rate of protein evolution (dN) and
synonymous site evolution (dS), contrary to previous
reports (Hirsh and Fraser 2001; Fraser et al. 2002; Fraser
and Hirsh 2004; Wall et al. 2005). Perhaps these measures,
as continuous predictors de

Ecology homework help

Afternoon Simulation

Vincent Brody- COPD with Spontaneous Pneumothorax

The below activities are required to be completed before you arrive to simulation. Completing the below criteria is your “ticket to enter” the simulation. Please have all prework completed by Monday at 2359. Anyone that does not submit the clinical prep work will receive a failure for the simulation experience. Submitting this completed clinical document prior to simulation is important in order to be prepared for the clinical day. If this prep work is not completed, you will not be allowed to participate in the simulation (please be advised that simulations are limited, so make-up is not an option. If the simulation is not completed for this course you will fail to meet the objectives and not pass this course-both lecture and clinical).

Describe the Pathophysiology of a Pneumothorax? What are causes of a pneumothorax? What are the different types of pneumothorax? (Include at least 5 sentences along with in-text citations.) (30 minutes)

1. Complete the Pathophysiology diagram below by using the ATI Med/Surg ebook or your Ignatavicius Med-Surg Text located in your lecture course shell regarding COPD. (See Chapters 28 & 30, pg. 539, pp. 637-638 in Ignatavicius Med/Surg Book for more information.) (1hr)


References:

COPD

(define)

Health Promotion and Disease Prevention:

[Text]

Risk Factors

Document two Nursing Diagnosis and two Goals for your client:

[Text]

[Text]

Lab Tests/ Diagnostics

[Text]

Nursing Interventions

[Text]

Client Education

Medications (list only)

[Text]

Multidisciplinary Care

Possible Complications

[Text]

[Text]

[Text]

Ecology homework help

Science
~MAS

D
JSTOR

Four Evolutionary Strata on the Human X Chromosome

Author(s): Bruce T. Lahn and David C. Page

Source: Science , Oct. 29, 1999, New Series, Vol. 286, No. 5441 (Oct. 29, 1999), pp. 964-967

Published by: American Association for the Advancement of Science

Stable URL: https://www.jstor.org/stable/2899501

REFERENCES
Linked references are available on JSTOR for this article:
https://www.jstor.org/stable/2899501?seq=1&cid=pdf-
reference#references_tab_contents
You may need to log in to JSTOR to access the linked references.

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Terms and Conditions of Use

American Association for the Advancement of Science is collaborating with JSTOR to digitize,
preserve and extend access to Science

This content downloaded from
�������������71.57.134.161 on Wed, 29 Dec 2021 11:10:40 UTC�������������

All use subject to https://about.jstor.org/terms

964

mann, J. R. Ecker, Ce// 72, 427 (1993)] (23). The
largest of 16 independent NPH3 cDNAs was se­
quenced (24) completely (GenBank accession num­
ber AF180390).

11. GenBank searches were accomplished with the
gapped BLAST program [S. F. Altschul et al., Nucleic
Acid Res. 25, 3389 (1997)].

12. The data are available at www.sciencemag.org/
feature/data/ 10423 58.shl

13. Single-letter abbreviations for the amino acid resi­
dues are as follows: A, Ala; C, Cys; D, Asp; E, Glu; F,
Phe; G, Gly; H, His; I, Ile; K, Lys; L, Leu; M, Met; N, Asn;
P, Pro; Q, Gln; R, Arg; S, Ser; T, Thr; V, Val; W, Trp; and
Y, Tyr.

14. T. Patschinsky, T. Hunter, F. S. Esch, J. A. Cooper, B. M.
Sefton, Proc. Natl. Acad. Sci. U.S.A. 79, 973 (1982).

15. The BTB/POZ domain was identified with SMART [J.
Schultz, F. Milpetz, P. Bork, C. P. Ponting, Proc. Natl.
Acad. Sci. U.S.A. 95, 5857 (1998)]. The coiled-coil
structure was identified with COILS [A. Lupas, M. Van
Dyke, J. Stock, Science 252, 1162 (1991)].

16. 0. Albagli, P. Dhordain, C. DeWeindt, G. LeCocq, D.
LePince, Cell Growth Differ. 6, 1193 (1995); L. Ara­
vind and E. V. Koonin,J. Mo/. Biol. 285, 1353 (1999).

17. C. Cohen and D. A. D. Parry, Proteins 7, 1 (1990); A.
Lupas, Trends Biochem. Sci. 21, 375 (1996).

18. Structural analyses were performed with the Protean
program (DNASTAR, Madison, WI).

19. S. Fields, Methods 5, 116 (1993); S. Fields and R.
Sternglanz, Trends Genet. 10, 286 (1994).

REPORTS

20. M. Nagao and K. Tanaka, J. Biol. Chem. 267, 17925
(1992).

21. M. C. Faux and J. D. Scott, Cell 85, 9 (1996); T.
Pawson and J. D. Scott, Science 278, 2075 (1997);
E. A. Elion, Science 281, 1625 (1998).

22. S. D. Choi, R. Creelman, J. Mullet, R. A. Wing, Weeds
World 2, 17 (1995), http://genome-www.stanford.
edu/ Arabidopsis/ww/home.html.

23. J. Sambrook, E. F. Fritsch, T. Maniatis, Molecular Clon­
ing: A laboratory Manual (Cold Spring Harbor Labo­
ratory Press, Plainview, NY, 1989).

24. Sequencing templates were prepared by polymerase
chain reaction and sequenced with an ABl377 auto­
mated sequencer (Perkin-Elmer, Norwalk, CT).

25. Phototropism and hypocotyl growth was assayed as
described previously [E. L. Stowe-Evans, R. M. Harper,
A. V. Motchoulski, E. Liscum, Plant Physio/. 118, 1265
(1998)].

26. E. Liscum and R. P. Hangarter, Plant Cell 3, 685
(1991).

27. J. W. Reed, P. Nagpal, D. S. Poole, M. Furuya, J. Chory,
Plant Cell 5, 147 (1993).

28. C. Bell and J. R. Ecker, Genomics 19, 137 (1994).
29. H.-G. Nam et at., Plant Cell 1, 699 (1989).
30. Information about markers AM40 and AM80 is available

at http:/ /www.biosci.missouri.edu/liscum/newmarkers.
html

31. Y. Nakamura et al., DNA Res. 4, 401 (1997); http://
www.kazusa.or.jp/arabi/chr5/map/24-26Mb.html.

32. Soluble and total microsomal membrane fractions

Four Evolutionary Strata on the
Human X Chromosome

Bruce T. Lahn* and David C. Paget

Human sex chromosomes evolved from autosomes. Nineteen ancestral auto­
somal genes persist as differentiated homologs on the X and Y chromosomes.
The ages of individual X-Y gene pairs (measured by nucleotide divergence) and
the locations of their X members on the X chromosome were found to be highly
correlated. Age decreased in stepwise fashion from the distal long arm to the
distal short arm in at least four “evolutionary strata.” Human sex chromosome
evolution was probably punctuated by at least four events, each suppressing
X-Y recqmbination in one stratum, without disturbing gene order on the X
chromosome. The first event, which marked the beginnings of X-Y differenti­
ation, occurred about 240 to 320 million years ago, shortly after divergence of
the mammalian and avian lineages.

The human X and Y chromosomes, like those
of other animals, are thought to have evolved
from an ordinary pair of autosomes (J). The
pseudoautosomal regions at the termini of the X
and Y chromosomes still recombine during
male meiosis, ensuring X-Y nucleotide se­
quence identity there. Elsewhere on the X and
Y chromosomes, however, X-Y recombination
has been suppressed. These nonrecombining
regions of the X and Y chromosomes have
become highly differentiated during evolution,
and only a few X-Y sequence similarities per-

Howard Hughes Medical Institute, Whitehead Insti­
tute, and Department of Biology, Massachusetts In­
stitute of Technology, 9 Cambridge Center, Cam­
bridge, MA 02142, USA.

*Present address: Department of Human Genetics,
University of Chicago, 924 East 57th Street, Chicago,
IL 60637, USA.
tTo whom correspondence should be addressed. E­
mail: dcpage@wi.mit.edu

sist within them. These modem X-Y gene pairs
are the remaining “fossils” where extensive se­
quence identity between ancestral X and Y
chromosomes once existed. The recent discov­
ery of many X-Y genes has made it possible to
examine the entire group to search for patterns
of human sex chromosome evolution. Thus far,
the human sex chromosomes-the best charac­
terized mammalian sex chromosomes-have
been found to contain 19 X-Y gene pairs (2).

We first compared the locations of all 19
pairs of genes on the human X and Y chromo­
somes (Fig. I). We determined the relative
positions of the X-linked genes through radia­
tion hybrid analysis, in many cases confirming
previously published localizations (3). Map po­
sitions of the Y-linked homologs were obtained
principally from the literature (4-6). On the X
chromosome, most of the X-Y genes map to the
short arm, where they are concentrated toward
the distal end. By contrast, the X-Y genes are

were separated by ultracentrifugation, followed by
two-phase partitioning to enrich for plasma mem­
branes, as described previously [T. W. Short, P. Rey­
mond, W. R. Briggs, Plant Physio/. 101, 647 (1993)].

33. Antibodies against NPH1 were previously described
(7). Rabbit polyclonal antisera were raised (22)
against a COOH-terminal NPH3 fusion protein [CBD­
NPH3C2 (see Fig. 3A)]. CBD-NPH3 protein was ex­
pressed from pET34-Ek/LIC in Escherichia coli and
purified according to manufacturer’s instructions
(Novagen, Madison, WI).

34. NPH1-NPH3 interaction was examined in yeast with
the Matchmaker Gal4 II System (Clontech, Palo Alto,
CA). Expression of fusion peptides was verified by
immunoblot analysis (9, 22) with monoclonal anti­
bodies raised against the Gal4 DNA binding domain
(GBD) and Gal4 activation domain (GAD) (Clontech).

35. J. H. Miller, Experiments in Molecular Genetics (Cold
Spring Harbor Laboratory, Plainview, NY, 1972).

36. We thank R. Harper for data in Fig. 1; J. M. Christie
and W.R. Briggs for GBD-NPH1 constructs and NPH1
antisera; D. Randall for production of NPH3 antisera;
the Arabidopsis Biological Resource Center in Colum­
bus, Ohio, for BAC clones and cDNA libraries; and
members of our laboratory for helpful comments on
the manuscript. This work was funded by USDA
National Research Initiative grant 96-35304-3709,
NSF grant MCB-9723124, and University of Missouri
Research Board grant RB96-055.

3 June 1999; accepted 17 September 1999

found as singletons or small clusters throughout
the euchromatic portion of the Y chromosome.
In general, the map order of the X-linked genes
corresponds poorly to that of the Y-linked ho­
mologs. Local exceptions to this rule are pro­
vided by three small gene clusters that are
present on both X and Y chromosomes (Fig. I).

We next measured, for each of the 19 X-Y
gene pairs, synonymous nucleotide divergence
between the X-linked and Y-linked coding re­
gions (7). Because synonymous substitutions
do not alter the encoded protein, they are gen­
erally assumed to be nearly neutral with respect
to selection. The statistic Ks (the estimated
mean number of synonymous substitutions per
synonymous site) is often used to gauge evolu­
tionary time ( 8). In the present context, Ks
values provide a measure of the evolutionary
time that has elapsed since the gene pairs start­
ed differentiating into distinct X and Y forms.
The calculated Ks values are given in Table I,
where gene pairs are listed according to map
order on the X chromosome.

We noted that the 19 Ks values appeared to
cluster into approximately four groups (Fig. 2):
0.94 to 1.25 (group I), 0.52 to 0.58 (group 2),
0.23 to 0.36 (group 3), and 0.05 to 0.12 (group
4). Each X-Y gene pair’s Ks value differed
significantly from those of all gene pairs in
other groups (P :5 0.02). The most striking
observation was that, on the X chromosome,
the four Ks-defined groups of genes are ar­
ranged in an orderly sequence (Fig. 2). X-Y
genes are stratified by age along the length of
the X chromosome. By contrast, on the Y chro­
mosome, the Ks-defined groups appear to be
scrambled (compare Table I and Fig. I).

What might account for the orderly stratifi­
cation of X-Y genes by age on the human X
chromosome? We hypothesize that, during evo-

29 OCTOBER 1999 VOL 286 SCIENCE www.sciencemag.org

This content downloaded from
�������������71.57.134.161 on Wed, 29 Dec 2021 11:10:40 UTC�������������

All use subject to https://about.jstor.org/terms

lution, differentiation of the X from the Y chro­
mosome was initiated one region, or stratum, at
a time. Regions were recruited in the order of
their physical position, with stratum I ( contain­
ing the genes of group I) having been the first
to embark on X-Y differentiation, and stratum 4
having been the most recent. Genes in the same
stratum began differentiating into X and Y ho­
mologs at about the same time, accounting for
their similar Ks values.

X-Y differentiation would have occurred
only after X-Y recombination ceased (9). Our
findings suggest that during evolution, X-Y re­
combination was suppressed regionally, begin­
ning with stratum I and subsequently expanding
in discrete steps to include strata 2, 3, and 4.
Chromosomal inversions, which are known to be
capable of suppressing recombination across
broad regions in mammals ( J 0), would appear to
be the most likely mechanism. These inversions
must have occurred on the evolving Y chromo­
some, where the strata have been scrambled, but
not on the X chromosome, where the order of
strata apparently has been preserved (Figs. I and
2). [Had the strata on the human X chromosome
been extensively shuffled during evolution-as
may have occurred on the mouse X chromosome
after divergence of the human and murine lin­
eages (J J)-we would have observed no corre­
lation between the age ofX-Y gene pairs and the
map positions of their X-chromosomal mem­
bers.] In the modern human sex chromosomes,
the proximal boundary of the pseudoautosomal
region is spanned by a gene that is intact on the
X chromosome, but grossly interrupted on the Y
chromosome (12), consistent with disruption of

REPORTS

an ancient pseudoautosomal region by a Y-chro­
mosomal inversion. We speculate that this par­
ticular event was the most recent in a series of
inversions, each of which enabled X-Y differen­
tiation to begin in one stratum.

This model of staged, region-by-region ini­
tiation of X-Y differentiation also accounts for
two global features of the X chromosome’s
gene content: (i) the concentration in strata 3
and 4 of genes with detectable Y homologs
(Fig. I) and (ii) the concentration on the short
arm (strata 2, 3, and 4) of genes that escape X
inactivation, some with and some without Y
homologs (13). Evolutionary theory predicts
that once X-Y recombination ceased within a
stratum, the genes on the affected portion of the
Y chromosome began to decay, with most of
the Y-linked genes ultimately being obliterated
(J). As an adaptive response, homologous
genes on the X chromosome were up-regulated,
and subsequently became subject to X inactiva­
tion, processes thought to have spread during
evolution on a gene-by-gene or cluster-by-clus­
ter basis (14). If decay of Y-linked genes and
adaptation of X-linked homologs were gradual
evolutionary processes, then one would expect
the youngest X strata to exhibit the highest
densities of (i) genes with detectable Y ho­
mologs and (ii) genes that escape inactivation.
Both predictions are met (Fig. I) (13).

A comparison of the youngest (group 4)
gene pairs with the older (groups I through 3)
gene pairs illustrates certain temporal features of
X-Y differentiation. We measured both synon­
ymous and nonsynonymous substitutions for
each gene pair (Table I). Nonsynonymous sub-

Table 1. Sequence divergence between homologous X- and Y-linked genes.

DNA Protein Sequence
Gene pair Ks KA Ks/KA divergence divergence compared

(%) (%) (nucleotides)

Group 4
GYG2/GYG2P* 0.11 0.06 1.8 7 12 525
ARSDIARSDP* 0.09 0.07 1.3 7 13 846
ARSEI ARSEP* 0.05 0.04 1.2 4 9 615
PRKXIY 0.07 0.03 2.3 5 8 1020
STSISTSP* 0.12 0.10 1.2 11 18 852
KAL1/KALP* 0.07 0.06 1.2 6 12 1302
AMELXIY 0.07 0.07 1.0 7 12 576

Group 3
TB4XIY 0.29 0.04 7.3 7 7 135
EIF7AX/Y 0.32 0.01 32 9 2 432
ZFXIY 0.23 0.04 5.8 7 7 2394
DFFRXIY 0.33 0.05 6.6 11 9 7671
DBXIY 0.36 0.04 9.0 12 9 1932
CASK/CASKP* 0.24 0.22 1.1 15 32 156
UTXIY 0.26 0.08 3.3 12 15 4068

Group 2
UBE7XIY 0.58 0.07 8.3 16 13 693
SMCXIY 0.52 0.08 6.5 17 15 4623

Group 7
RPS4XIY 0.97 0.05 19 18 18 792
RBMXIY 0.94 0.25 3.8 29 38 1188
SOX3/SRY 1.25 0.19 6.6 28 29 264

*Y copy is pseudogene. DNA ‘and protein divergence refer to uncorrected nucleotide (coding region) and amino acid
divergence (nonidentity).

stitutions alter the encoded protein and are con­
strained by selection. Thus, their frequency (KA,
the estimated mean number of nonsynonymous
substitutions per nonsynonymous site) is a func­
tion of both evolutionary time and selective
constraints on the encoded proteins. The degree

X

2

1

GYG2]
ARSD a
ARSE
PRKX
STS ]b
KAL1
AMELX
TB4X
E/F1AX
ZFX

DFFRX] DBX
CASK c
UTX
UBE1X

SMCX

RPS4X

SRY
RPS4Y

ZFY

y

PRKY
AMELY

[ARSEP
a ARSDP

GYG2P

[DFFRY
OBY

c CASKP
UTY

TB4Y

b[ KALP STSP·

SMCY

EIF1AY

RBMY

Fig. 1. Map of homologous
genes in nonrecombining re­
gions of human X and Y chro-

RBMX mosomes. Pseudoautosomal re-
SOX3 gions of X and Y are black; het­

erochromatic region of Y is gray.
Radiation hybrid analysis (3) was
used to map genes on the X
chromosome, which is drawn on
a centiRay scale. Ks-defined stra-

ta on the X chromosome are indicated. The
boundary between strata 2 and 1 is somewhere
between SMCX and RPS4X; here, it is arbitrarily
shown at the centromere (white oval). Genes and
pseudogenes on the Y chromosome were ordered
previously by analysis of naturally occurring de­
letions (4, 5). UBE1 X has a homolog on the squir­
rel monkey Y chromosome but not on the human
Y chromosome (29). Brackets denote three small
gene clusters {labeled a, b, c) that are present on
both X and Y chromosomes.

www.sciencemag.org SCIENCE VOL 286 29 OCTOBER 1999 965

This content downloaded from
�������������71.57.134.161 on Wed, 29 Dec 2021 11:10:40 UTC�������������

All use subject to https://about.jstor.org/terms

966

1.25

Group 1

1.00

0 .75

Ks
Group2

0.50

0.25
Group3

p cen q
X ch romosome map position

Fig. 2. Plot of Ks (Table 1) versus X-chromosome
map position (Fig. 1) for 19 X-Y gene pairs.

of constraint can be reflected in the ratio Ksf KA;
values greater than one indicate the presence of
constraints on both homologs, and values in the
vicinity of one are consistent with lack of con­
straint on at least one homolog (8, 15). In groups
1 through 3, 10 of 11 gene pairs exhibit KsfKA
ratios of 3 or higher (Table 1), suggesting that
natural selection has preserved the Y copies of
these genes . Without such selection, these X-Y
homologies (especially those in groups 1 and 2)
would no longer be visible. By contrast, the
seven gene pairs in group 4 show KsfKA ratios
of 1 to 2, and in five of these pairs, the Y copy
is known to be a pseudogene . Among the group
4 pairs, X-Y homology is readily apparent even
in the absence of selective constraint, because
there has been little time for erosion of sequence
similarity. Thus, the Y -chromosomal genes of
the older groups, and especially those of groups

300

~ .. .,
250 >-

15
“‘ C:

200
~
g
.,

150
E . ., .,
g 100 .,
e .,
.2: so ‘O
:>;-
X

0 .08 0 .99 Ks

/
Fig. 3. Plot of X-Y divergence time (age) versus
average Ks value for X-Y gene pairs (weight­
averaged) in each stratum. The X chromosome
schematic is adapted from Fig. 1. Maximum and
minimum age estimates for strata 2, 3, and 4 are
bracketed; these are not statistical confidence
intervals. Theory predicts an approximately linear
relationship between age and Ks value (8); the
shaded area is calibrated with respect to stratum
2, whose age is 130 to 170 million years (21) and
whose average Ks value is 0.53. By extrapolation,
the age of stratum 1 is estimated between 240
and 320 million years.

REPORTS

1 and 2, are survivors of an early winnowing
process that is still ongoing in group 4.

To determine the age of the Ks-defined stra­
ta, we used two methods . First, we considered
published information on homologs of represen­
tative genes in diverse mammals. The maximum
age of stratum 4, for example, was suggested by
the prior observation that homologs of STS and
KALI are pseudoautosomal or autosomal in pro­
simians (16-18). Assuming that suppression of
X-Y recombination is an irreversible evolution­
ary step (14) , this implies that X-Y differentia­
tion in stratum 4 began less than 50 million
years ago (Ma), when the simian and prosimian
lineages diverged (J 9). Minimum ages of the
strata could also be inferred. For example, STS
and KALI have been shown to have X- and
Y-specific homologs in both New and Old
World monkeys (16, 17), suggesting that X-Y
differentiation in stratum 4 began at least 30 Ma,
when the New and Old World monkey lineages
diverged (19, 20). Using similar logic, we in­
ferred the ages of stratum 3 (80 to 130 million
years), stratum 2 (130 to 170 million years), and
stratum 1 (130 to 350 million years) from prior
data on gene homologs in more-distantly related
species, including nonprimate mammals, mar­
supials, monotremes , and birds (21) .

These cross-species comparisons yielded
reasonably precise estimates of age for strata 2,
3, and 4-the younger strata-but only crude
estimates of age for stratum 1. Because this
oldest stratum might contain information about
the origins of mammalian sex chromosomes, its
age is of great interest. Here, we used a second

i,,’
!-;;;-
Ii~

f&
eJ

ti~ c:-
~”” -~ l§ 1:!”ii,-

~~

dating method, based on Ks values for X-Y gene
pairs. Theory predicts that among human X-Y
gene pairs, Ks values should be roughly propor­
tional to age ( 8). This expectation is met by the
X-Y gene pairs of strata 2, 3, and 4 (Fig . 3). By
extrapolation, we estimated that X-Y differenti­
ation began 240 to 320 Ma in stratum 1 (Fig . 3).
These findings suggest that X-Y divergence be­
gan shortly after the mammalian lineage arose,
having diverged from the lineage of birds (with
Z-W sex chromosomes) between 300 and 350
Ma (19). [Because the sex chromosomes of
birds appear to be completely unrelated to the
mammalian sex chromosomes, it is thought that
they arose independently, from a different auto­
somal pair (22).] Interestingly, our Ks findings
indicate that SOX3 and SRY (the primary sex­
determining gene) are among the oldest known
X-Y gene pairs in humans (Table 1). This find­
ing strengthens an hypothesis, by Foster and
Graves , which states that an ordinary autosomal
pair became sex chromosomes when mutations
fashioned one allele of SOX3, originally an au­
tosomal gene, into the male-determining factor
SRY (23). Indeed, formal cluster analysis of the
Ks values we report suggests that the X-Y genes
of group 1 might actually comprise two distinct
strata, with SRY!SOX3 perhaps being older than
the two other X-Y gene pairs of group 1
(RPS4XIY and RBMXIY) (24). Although the dif­
ference in Ks values between SRY!SOX3 and the
two other X-Y gene pairs is not statistically
significant, the evidence is suggestive .

If future studies establish that the group 1
genes are divisible into two strata, these results

~
[
~ .s ;–

l/ 4
~ …. c:- .§ tf , .J2 3 3 q~ 1:!;;;- G-=—?i°#

fR ?i = ~~ ::..$ 0~ 2 g .f 2 ·S’O 2 ·S 2 ~c:- bi ii’ .::… ::? .::… fil ~ § ~~ ,!::’ ~-;;; (J …. ~ §~
iff~ 0 ‘- i1f ~= if Clj

Autosome X y X y X y X y XY

! ! ! !
humans

Autosome XYin XYin XY in non-simian
in birds monotremes marsupials placental mammals

Fig. 4. A proposed sequence of evolutionary events that generated four strata on the human X
chromosome. Four inversions on the Y chromosome are postulated . Each inversion reduced the size
of the pseudoautosomal ( X-Y recombining) region (black; for simplicity, only one pseudoautosomal
region is shown for each chromosome) and enlarged the portions of the X (yellow) and Y (blue)
chromosomes that did not recombine during male meiosis. Ongoing decay and loss of Y genes
offset these periodic expansions of the nonrecombining region of the Y chromosome. Points of
divergence from the sex chromosomes of other mammals are indicated. This model does not
preclude the occurrence of (i) additional inversions or other rearrangements within the nonrecom ­
bining portion of the evolving Y chromosome or (ii) similar rearrangements on the evolving X
chromosome, so long as they do not disturb the fundamental order among the four strata.

29 OCTOBER 1999 VOL 286 SCIENCE www .sciencemag.org

This content downloaded from
�������������71.57.134.161 on Wed, 29 Dec 2021 11:10:40 UTC�������������

All use subject to https://about.jstor.org/terms

would also help date the emergence of X inac­
tivation during mammalian sex chromosome
evolution. XIST, an X-specific gene which plays
a pivotal role in X inactivation (25), is located
near RPS4X and therefore would be in the
younger of the two strata~not in the stratum
where the nascent X and Y chromosomes first
differentiated. This would controvert the hypoth­
esis of Chandra, who speculated that X inactiva­
tion emerged contemporaneously with the chro­
mosomal sex-determining mechanism (26).

Consistent with our evolutionary map,
Graves and colleagues have postulated that the
long arm and proximal short arm of the human
X chromosome are at least 170 million years
old (27, 28). They have referred to this portion
of the X as the “XCR” (X conserved region).
Graves’s XCR corresponds approximately to
our strata 1 and 2. They have also postulated
that the distal short arm of the human X chro­
mosome is younger. This “XAR” (X added
region) was attributed to translocation of an
autosome to the pseudoautosomal region of
both X and Y after divergence of placental
mammals from marsupials (27, 28). Our strata
3 and 4 are found within Graves’s XAR.

In conclusion, we postulate that the evolution
of human sex chromosomes was punctuated by
at least four events, plausibly a series of inver­
sions on the Y chromosome (Fig. 4). Each event
suppressed X-Y recombination in one stratum
and enabled X-Y differentiation to proceed
there. The first of these events, which created
stratum 1, was roughly contemporaneous with
the birth of the mammalian sex chromosomes
and the emergence of SRY as the primary sex
determinant. This occurred about 240 to 320 Ma,
shortly after the mammalian and avian lineages
diverged. The pseudoautosomal region was ex­
panded by translocation of autosomal material
between the second and third events (which cre­
ated strata 2 and 3, respectively). The fourth
event occurred relatively recently, during pri­
mate evolution, creating stratum 4, where X-Y
differentiation is still in its earliest stages.

References and Notes
1. J. J. Bull, Evolution of Sex Determining Mechanisms (Ben­

jamin Cummings, Menlo Park, CA, 1983); J. A. Graves,
Annu. Rev. Genet. 30, 233 (1996); B. Charlesworth,
Curr. Biol. 6, 149 (1996); W.R. Rice, Bioscience 46, 331
(1996).

2. The 19 X-Y gene pairs studied include the following:
GYG2/GYG2P [J. Mu, A. V. Skurat, P. j. Roach,}. Biol.
Chem. 272, 27589 (1997); (6)], ARSDIARSDP, ARSE/
ARSEP [G. Meroni et al., Hum. Mo/. Genet. 5, 423
(1996)], PRKX!Y[A. Klink et al., Hum. Mal. Genet. 4,869
(1995); K. Schiebel et al., Hum. Mal. Genet. 6, 1985
(1997)], STSISTSP (16), KAL1/KALP [B. Franco et al.,
Nature 353, 529 (1991); R. Legouis et al., Cell 67,423
(1991); (77)], AMELXIY [Y. Nakahori, 0. Takenaka, Y.
Nakagome, Genomics 9,264 (1991)], TB4XIY [H. Gonda
et al.,}. /mmunol. 139, 3840 (1987); (5)], ZFXIY [D. C.
Page et al., Cell 51, 1091 (1987); A. Schneider-Gadicke,
P. Beer-Romero, L G. Brown, R. Nussbaum, D. C. Page,
Cell 57, 1247 (1989)], EIF1AXIY [T. E. Dever et al.,}. Biol.
Chem. 269, 3212 (1994); (5)], DFFRXIY [M. H. Jones et
al., Hum. Mal. Genet. 5, 1695 (1996); (5)], DBXIY (5),
CASK/CASKP [A. R. Cohen et al.,}. Cell Biol. 142, 129
(1998); (6)], UTX/Y(5),SMCXIY[J. Wu eta/., Hum. Mal.

REPORTS

Genet. 3, 153 (1994); A. I. Agulnik et al., Hum. Mal.
Genet. 3, 879 (1994)]. RPS4XIY [E. M. Fisher et al., Cell
63, 1205 (1990)]. RBMXIY [M. Soulard et al., Nucleic
Acids Res. 21, 4210 (1993); K. Ma et al., Cell 75, 1287
(1993); M. L. Delbridge, P.A. Lingenfelter, C. M. Disteche,
J. A. Graves, Nature Genet. 22, 223 (1999); S. Mazeyrat,
N. Saut, M. G. Mattei, M. J. Mitchell, Nature Genet. 22,
224 (1999)], SOX3/SRY [M. Stevanovic, R. Lovell-Badge,
J. Collignon, P. N. Goodfellow, Hum. Mo/. Genet. 2,
2013 (1993); A. H. Sinclair et al., Nature 346, 240
(1990)]. One interspecies pair was also studied: human
UBE1 X [P. M. Handley, M. Mueckler, N. R. Siegel, A.
Ciechanover, A. L. Schwartz, Proc. Natl. Acad. Sci. U.S.A.
88, 258 (1991)] and squirrel monkey UBE1Y (29). In
humans, UBE1Y was deleted from the Y chromosome
(29). We used squirrel monkey UBE1 Y as a substitute.

3. Using polymerase chain reaction (PCR), we tested DNAs
from the 93 hybrid cell lines of the GeneBridge 4 panel
(Research Genetics) [G. Gyapay et al., Hum. Mo/. Genet.
5, 339 (1996)] for the presence of each of the X-linked
genes. PCR conditions and primer sequences have been
deposited at GenBank, where accession numbers are as
follows: GYG2, G49430; ARSD, G42687; ARSE, G42688;
PRKX, G42689; STS, G42690; KAL1, G42691; AMELX,
G42692; TB4X, G34979; E/F1AX, G34989; ZFX, G42693;
DFFRX, G34982; DBX, G34988; CASK, G49441; UTX,
G34976; UBE1X, G42694; SMCX, G42695; RPS4X,
AF041428; RBMX, G42696; and SOX3, G42697. Analy­
sis of the results positioned the genes with respect to
the radiation hybrid map of the X chromosome con­
structed at the Whitehead/MIT Center for Genome
Research [T. J. Hudson et al., Science 270, 1945 (1995);
www-genome.wi.mit.edu/cgi-bin/contig/phys_map].

4. D. Vollrath et al., Science 258, 52 (1992).
5. B. T. Lahn and D. C. Page, Science 278, 675 (1997).
6. C. Sun et al., Nature Genet., in press.
7. Homologous X and Y DNA sequences were aligned by

means of MegAlign software (DNASTAR, Madison, WI).
For each X-Y gene pair, estimates of the mean numbers
of synonymous substitutions per synonymous site (Ks),
and of nonsynonymous substitutions per nonsynony­
mous site (KA)-all corrected for multiple changes-were
calculated using published algorithms (8) as implemented
in GCG software (Genetics Computer Group, Madison,
WI). Insertions and deletions were ignored in these cal­
culations. In the case of SOX3 and SRY, sequence similar­
ity is limited to, and our analysis was restricted to, the
HMG box domain. Our analyses of other X-Y gene pairs
employed all available coding sequences. Only a partial
UBE1Y (squirrel monkey) coding sequence was available
for comparison with its human X homolog. Sequences for
all pseudogenes were extracted from genomic sequences:
GYG2P, ARSDP, and ARSEP from BAC (bacterial artificial
chromosome) clone 203M13 (GenBank AC002992); STSP
from BAC clone NH0494J04 (GenBank AC006382); KALP
from BAC clone NH0292P09 (GenBank AC006370);
CASKP from BAC clone 47511 (GenBank AC004474). Se­
quences for all other genes were obtained from published
cDNAs, whose GenBank accession numbers are as fol­
lows: GYG2, U94362; ARSD, X83572; ARSE, X83573;
PRKX, X85545; PRKY, Y15801; STS, M16505; KAL1,
M97252; AMELX, M86932; AMELY, M86933; TB4X,
Ml 7733; TB4Y, AF000989; ZFX, X59739; ZFY, M30607;
EIF1AX, L 18960; E/F1AY, AF000987; DFFRX, X98296;
DFFRY, AF000986; DBX, AF000982; DBY, AF000984;
CASK, AF032119; UTX, AF000992; UTY, AF000994;
UBE1X, M58028; UBE1Y, AJ003105; SMCX, L25270;
SMCY, U52191; RPS4X, M58458; RPS4Y, M58459; RBMX,
Z23064; RBMY, X76059; SOX

Ecology homework help

Science
~MAS

D
JSTOR

Natural Selection and the Origin of jingwei, a Chimeric Processed Functional Gene in
Drosophila

Author(s): Manyuan Long and Charles H. Langley

Source: Science , Apr. 2, 1993, New Series, Vol. 260, No. 5104 (Apr. 2, 1993), pp. 91-95

Published by: American Association for the Advancement of Science

Stable URL: https://www.jstor.org/stable/2881129

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Terms and Conditions of Use

American Association for the Advancement of Science is collaborating with JSTOR to digitize,
preserve and extend access to Science

This content downloaded from
�������������71.57.134.161 on Wed, 29 Dec 2021 11:13:06 UTC�������������

All use subject to https://about.jstor.org/terms

111:~~I HlllllllH’lllf-‘lllillllllllllll®I-Hll11!11&111elllll!llill11flll!llll!!IIH Jll■l■lallll REPORTS

Phosphoinositides could serve as specific
membrane targets that bind proteins re­
quired for the formation of transport vesi­
cles, such as the polypeptides of the adaptor
complex that link clathrin to the cytoplas­
mic tails of certain transmembrane receptor
proteins {for example, the mannose-6-phos­
phate receptor) (28). Vesicles from rat adi­
pocytes that contain the glucose transporter
also contain PI 4-kinase, which may regulate
the transport (fusion) of these vesicles with
the plasma membrane in response to insulin
(29). In addition to its potential role in
signaling cell proliferation, PI 3-kinase asso­
ciated with receptor protein tyrosine kinases
at the plasma membrane may also take part
in the endocytosis and down-regulation {ly­
sosomal degradation) of these receptors. In
this way, the duration and the magnitude of
the growth signal might be modulated. The
association of Vps34p with the membrane
appears to be mediated by the product of
another VPS gene, VPS15. The VPS15 gene
encodes a membrane-associated protein ki­
nase (Vpsl5) (30, 31) that can be chem­
ically cross-linked to Vps34p (21). This
raises the possibility that the Vpsl5 and
Vps34 proteins may function together as
components of a signal transduction com­
plex that regulates intracellular protein
sorting decisions.

REFERENCES AND NOTES

1. L. C. Cantley et al., Cell 64, 281 (1991).
2. J. M. Backer et al., EMBO J. 11, 3469 (1992).
3. S. Soltoff, S. Rabin, L. Cantley, D. Kaplan, J. Biol.

Chem. 267, 17472 (1992).
4. M. J. Berridge and R. F. Irvine, Nature 341,197

(1989).
5. M. Whitman et al., ibid. 332,644 (1988).
6. D. L. Lips et al., J. Biol. Chem. 264, 8759 (1989).
7. L.A. Serunian et al., ibid., p. 17809.
8. C. Carpenter et al., ibid. 265, 19704 (1990).
9. F. Shibasaki, Y. Homma, T. Takenawa, ibid. 266,

8108 (1991).
10. M. Otsu et al., Cell 65, 91 (1991 ).
11. I. D. Hiles et al., ibid. 70,419 (1992).
12. C. A. Koch, D. Anderson, M. F. Moran, C. Ellis, T.

Pawson, Science 252, 668 (1991).
13. P. Hu et al., Mo/. Cell. Biol. 12, 981 (1992).
14. C. J. McGlade et al., ibid., p. 991.
15. P. K. Herman and S. D. Emr, ibid. 10, 6742 (1990).
16. S. Kornfeld and I. Mellman, Annu. Rev. Cell Biol.

5, 483 (1989).
17. D. J. Klionsky, P. K. Herman, S. D. Emr, Microbial.

Rev. 54, 266 (1990).
18. J. E. Rothman and L. Orci, Nature 355, 409

(1992).
19. T. Stevens et al., Cell 30, 439 (1982).
20. T. R. Graham and S. D. Emr, J. Cell Biol. 114, 207

(1991).
21. J. H. Stack, P. K. Herman, P. V. Schu, S. D. Emr,

EMBO J., in press.
22. K. R. Auger, C. L. Carpenter, L. C. Cantley, L.

Varticovski, J. Biol. Chem. 264, 20181 (1989).
23. G. Endemann, S. N. Dunn, L. C. Cantley, Bio­

chemistry26, 6845 (1987).
24. Genetics Computer Group sequence analysis

package; University of Wisconsin.
25. S. K. Hanks eta/., Science 241, 42 (1988).
26. D. R. Knighton et al., ibid. 253, 407 (1991).
27. M. P. Sheetz and S. J. Singer, Proc. Natl. Acad.

Sci. U.S.A. 71, 4457 (1974).
28. B. M. Pearse and M. S. Robinson, Annu. Rev. Cell

Biol. 6, 151 (1990).

29. R. L. Del Vecchio and P. F. Pilch, J. Biol. Chem.
266, 13278 (1991).

30. P. K. Herman et al., Cell 64, 425 (1991).
31. P. K. Herman, J. H. Stack, S. D. Emr, EMBOJ.10,

4049 (1991).
32. T. Kunkel, Proc. Natl. Acad. Sci. U.S.A. 82, 5463

(1985).
33. F. Sherman, G. R. Fink, L. W. Lawrence, Methods in

Yeast Genetics: A Laboratory Manual (Cold Spring
Harbor Laboratory, Cold Spring Harbor, NY, 1979).

34. M. Whitman et al., Nature 315,239 (1985).
35. J. P. Walsh, K. K. Caldwell, P. W. Majerus, Proc.

Natl. Acad. Sci. U.S.A. 88, 9184 (1991).
36. L. J. Wickerham, J. Bacterial. 52, 293 (1946).
37. K. R. Auger et al., Cell 57, 167 (1989).
38. We thank members of the Emr lab for discussions

and G. Huyer” for his help in optimizing the Pl
3-kinase assay. Supported by the Howard Hughes
Medical Institute, a grant from the NSF (to S.D.E.),
and by a fellowship from the Deutsche Forschungs­
gemeinschaft (to P.V.S.). S.D.E. is an investigator of
the Howard Hughes Medical Institute.

8 October 1992; accepted 13 January 1993

Natural Selection and the Origin of jingwei, a
Chimeric Processed Functional Gene in Drosophila

Manyuan Long* and Charles H. Langley

The origin of new genes includes both the initial molecular events and subsequent pop­
ulation dynamics. A processed Drosophila alcohol dehydrogenase (Adh) gene, previously
thought to be a pseudogene, provided an opportunity to examine the two phases of the
origin of a new gene. The sequence of the processed Adh messenger RNA became part
of a new functional gene by capturing several upstream exons and introns of an unrelated
gene. This novel chimeric gene, jingwei, differs from its parent Adh gene in both its pattern
of expression and rate of molecular evolution. Natural selection participated in the origin
and subsequent evolution of this gene.

How genes with novel functions evolve
remains a fundamental and fascinating ques­
tion. Gene duplications (1), exon shuffling,
and processed genes (2) have been suggested
as important sources of novel genes, but
little is known about the evolutionary mech­
anisms or the participation of natural selec­
tion in their early history. Our analysis of
the structure, expression, and evolution of a
putative processed Adh pseudogene in Dro­
sophila provided an opportunity to examine
both the early molecular events and the
evolutionary processes that created it. The
jingwei (igw) gene, as we named it (3), is a
locus located on chromosome 3 in the Dro­
sophila sibling species D. teissieri and D.
yakuba. A part of jgw was initially observed
to hybridize to the Adh probe (4). Further
analysis suggested that in a single event, in
the ancestor of the two species the Adh
portion of this potential pseudogene was
retrotransposed from an mRNA of the Adh
locus on chromosome 2 (5). To understand
the molecular population genetics of this
potential pseudogene, we investigated its
within-species DNA sequence variation.
However, our results convince us thatjgw is
not a pseudogene and that the Adh-derived
sequence is a part of a novel gene.

The Adh-derived portion was sequenced
from ten jgw alleles of D. teissieri and of 20
jgw alleles of D. yakuba collected from
natural populations (6). Figure 1 shows the

Section of Genetics, Section of Evolution and Ecology,
and Center for Population Biology, University of Cali­
fornia at Davis, Davis, CA 95616.

•To whom correspondence should be addressed.

SCIENCE • VOL. 260 • 2 APRIL 1993

distribution of nucleotide polymorphisms
within species and the variation between
species in a 765-bp segment that corre­
sponds to the protein coding region of the
Adh gene. Only one possible ancestral poly­
morphism was apparent {site 782). A sum­
mary of the DNA sequence variation in this
region is presented {Table 1).

Unexpectedly, most polymorphisms were
silent {eight out of ten in D. teissieri and 19
out of 21 in D. yakuba). If jgw were a pseudo­
gene in which mutations had no phenotypic
effect, most changes would be at replacement
sites and there would be a reasonable frequen­
cy of stop codons (7). No new stop codons
were present. Another prediction of the pseu­
dogene hypothesis is that a pseudogene has a
higher overall level of nucleotide variation.
Estimates of DNA sequence polymorphism in
jgw were similar to that found in the Adh gene
{Table 1) and are typical of many functional
genes in Drosophila (=0.005) (8). A compar­
ison of between-species divergence also re­
vealed a significant bias toward silent versus
replacement substitutions, and the degree of
bias was smaller than for the within-species
comparison. The bias was noted in the earlier
study (5) but by itself was not large enough to
convince the authors that this was not a
pseudogene. In addition, insertions and dele­
tions are abundant in the evolution of mam­
malian pseudogenes (7). In jgw, no length
polymorphism or divergence was observed in
the coding region (9). These molecular pop­
ulation genetic observations imply that the
Adh-derived sequence is all or part of jgw, a
new functional gene in D. teissieri and D.
yakuba. This conclusion motivated our search

91

This content downloaded from
�������������71.57.134.161 on Wed, 29 Dec 2021 11:13:06 UTC�������������

All use subject to https://about.jstor.org/terms

D.teissieri D.yalcuba Fig. 1. Polymorphic
and divergent differ­
ences in the Adh-de­
rived portion of jgw.
The nucleotide posi­
tions are indicated in
the first column, num­
bered according to that
reported from D. yaku­
ba (5). The second and
thirteenth columns show
the consensus nucleo­
tides. The dots indicate
that the nucleotides are
identical to those of the
consensus. The first row
contains the numbers of
the sequenced alleles
(10 from D. teissieri and
20 from D. yakuba).

I 2 3 4 5 6 7 I 9 10 I 2 3 • 5 6 l I • 10 11 12 ll 14 IS 16 17 IM 1fl 20
219
225
231
232
2•s
2•6
254
297
323
335
365
369
370
372
3H
387
388
389
390 …
08
•so
•s1
09
•60
•61
•63 .,.
00 ••2
•97
sea
525
526
527
535
557
566
569
581
599
608
614
650
680
719
728
70 ,.,
776
779
782
785
112
863
178
884
887
893

••• 917
9U
951
960
961
974
977

G
A
T
C
G
C
G
C
G
C
G
G
C
C
C
T
G
C
Q
G
A
A
C
Q
Q
C
A
A
Q
C
C
T
C
G
C
C
Q
G
C
G
G
G
Q
C
C
G
C
G
T
C
C
A
T
T
Q
Q
C
Q
Q
C
A
C
A
T
Q
C
G

i: ;.

;.

o

o

;-

for an RNA from jgw and a more detailed
analysis of its structure.

Northern (RNA) analysis (10) of both
total and polyadenylate [poly(A) ]-selected
RNAs from adults probed with jgw yielded
only a single band that corresponded in size
to that derived from the Adh gene (11).
Because of the sequence similarity between
the two genes and the abundance of the Adh
mRNA, this result was not surprising. Two
interpretations are possible. (i) The jgw

A

;- ;- ;-
;.

A i:: A C

G
T
A
G
A
A
A
G
G
C
G
A
A
A
A
G
C
T
A
T
G
T
G
T
C
C
G
G
A
C
T
C
G
C
A
C
Q
Q
Q
A
G
C
T
C
C
A
C
G
T
C
C
A
G
T

o

;.

o
C i::

o

o o o

G G

Q T
Q A
T
Q
C
C
C
A
A
A
C
T
Q

f

;-

o

;-

;.

;-

mRNA may be detectably abundant but
obscured by the Adh signal if its size is similar
to that of Adh. (ii) The jgw mRNA may be
so rare (or nonexistent) that it was undetect­
able by the Northern technique.

We used the polymerase chain reaction
(PCR) in conjunction with reverse transcrip­
tion (RT-PCR) (12) to identify RNAs with
greater sensitivity and specificity than is pos­
sible with Northern analysis. Several regions
can be found in which Adh and the Adh-

Table 1. Summary of the variation within and between species. The numbers of segregating (or
polymorphic) sites are classified as replacement (R) and silent (S). Estimates of 8 and ‘II’ are
measures of within-population variation in terms of the parameter 4Nµ per nucleotide, assuming the
neutral theory of molecular evolution (20) where N is the effective population size and µ is the
mutation rate for selectively equivalent nucleotides. The values of divergence between species
(average divergence per nucleotide) are the results of averaging pairwise comparisons between
the alleles from the two species minus average within-species differences, subject to the multiple
substitution correction according to the Jukes and Cantor model (28). The standard deviations of
8 and divergence estimates were calculated as described (29). Adh polymorphism data (19 of all
the samples) of 0. yakuba and D. teissieri Adh sequences are from (5, 19).

Gene

D. teissieri jgw
D. yakuba jgw
D. yakuba Adh

jgw
Adh

92

Segregating sites (n)

R s
Polymorphism within species

2 8
2 19
0 18

Divergence between species
0.041 ± 0.009 0.127 ± 0.027
0.007 ± 0.004 0.059 ± 0.018

8

0.005 ± 0.003
0.008 ± 0.003
0.006 ± 0.003

‘II’

0.005
0.006
0.006

SCIENCE • VOL. 260 • 2 APRIL 1993

f

;.

;.

;.

C

;.

C

A A

A A A A

(i 0
Ci (i

i T

i:
! A 1 ! T

i. A A A A
T T T

f

;.

G 0

i ;.
;.

T

derived portion of jgw differ by two or three
contiguous substitutions. Under optimal PCR
conditions, primers that contain these substi­
tutions at their 3′ ends specifically amplified
jgw RNA. A jgw-specific amplification prod­
uct was obtained from total RNA extracted
from both species. Sequencing these products
confirmed the characteristic substitutions of
the jgw gene in the 5 71-bp amplified fragment
from D. teissieri and the 321-bp amplified
fragment from D. yakuba (13). These results
demonstrated the presence of jgw-derived
RNA in both species.

The single band in the 5′ rapid amplifica­
tion cDNA end (RACE) product and the
identity of the two sequenced clones from this
product indicated thatjgw RNA has a distinct
5′ end in D. teissieri (14). In both species, jgw
appears to have captured more than 180 hp of
additional exon (or exons) in the 5′ direction
(Fig. 2). The length of the 3′ end ofD. teissieri
jgw RNA is similar to that of Adh RNA.
Figure 2 also shows the sequence of these
products. By extending the Adh reading frame
in the 5′ direction, the D. teissieri jgw appears
to have acquired a new start codon and
potentially encodes a hybrid protein with an
additional 77 amino acids added to the Adh­
derived protein. A preliminary analysis of jgw
in D. yakuba indicated that it contains a
similar structure.

To determine the structure of the genomic
region (or regions) containing the exon (or

This content downloaded from
�������������71.57.134.161 on Wed, 29 Dec 2021 11:13:06 UTC�������������

All use subject to https://about.jstor.org/terms

Fig. 2. Structure of the
jgw transcription unit
(13). (A) The structure
of jgw deduced from
the comparison of the
sequenced 0. teissieri
RNA and genomic
DNA The four boxes
show the four exons in
jgw , and the solid and
hatched portions of the
boxes are the putative
protein coding regions,

A
+-Captured region-j

ATG

/lngwel

Adh

B D. telsslerl 5′ portion sequence of jgw
AGTATCAAAGT’M’CATTGTATTAGAACAAACATM”CAA’J’TTCGGC.MACATTACAAAAT.v.AAAAAMAATC.AAAAAATTfA

CTGTATTGTATTAACCAA’l”M’GA’ffATAUSiGCGCTTCGTCTTACAACCATTACTCTTCTAAAACGCACTCCTTTATTACTCGTGCCAAAGg
Exon 1

t9a9tctacaaaccctattag:atcctgttcatat99taattcattattaacaat9c9attc9cagTTAAAAGCAGCACCGACMCACGCACT
lntron 1 Exon 2

CTACTACTAAGAAgt.Aagtaactcgt.9ggeaaH9tccaa9tca9aatt9ctut9t9caaacecatGTCCTTTTCCAGGGAATCTGGTACC
lntn>n2

GTTTTCMa.TGATGA~TCATGCM:CATGGGGGGgtaagtactatatgatgagttgtctagaactcca
Exon 3

uactca.agacttaccacct.ttgtcataaacctataaaaggattataaaagatatatatatatatattatttttcagTGCGGTTGTCAGTTG -· ~~~ ……… .
Exon•~

Adh-derived and recruited, respectively. The open boxes are the putative
untranslated portions . The lines among boxes are the three intrans. The
exon structure of the original Adh gene is also given , based on (5) . The
gray portions in four Adh exons are fused into one exon in the jgwgene. (8)
Sequences of the 5′ regions of jgwlrom 0. teissieri and 0 . yakuba. Intrans
are shown in lowercase letters. The codon ATG underlined in the 0.
teissieri sequence is the hypothetical start codon. The 3′ region of 0.
teissieri jgw is the same as that of the published sequence (5). An
apparent polyadenylation signal, AATAAA, is at positions 1123 to 1128 of
0. teissieri.

D. yakube 5′ portion sequence of Jgw
1 …. .. 7’1’C1″‘J’TCGACCA~CTTTGTTACTCGTGCAAAAGgtgaggctaaaaatccctattagattctgttc

Exon 1 lntron 1

atat99taattcactattucaatqecattcgcagTTAAAAGCAGCACAGCCGCACTCTACTACTCTCTACTACMCAAgtaa9taa
Exon 2

ctc9t999caua9tccaa9tca99att,ctu.9t9caaacccatqtccttttacaqGGM.TCTGGT’J’CCGTTTAGGTCATGATTCTTGT
0-.2 Exon3

GTGTTCTCAAATMGGCCATCATCCCCCATGGGGGCg”ta9t-agtcta9uctccaaucttaccacattt9atataa9cgattcaaat9aga

caaatacataacutcca99a9atctgctee99gt.tctcttctcttcatatc9atcautuauttgt.ctca9t9ca9ttgtca9ttgcag
lnln>n3

ttca9C9agUattgcatctctttaaattUcttatttgatcaaatc9aca~ . … …. .
Exon•~

exons) from which this new mRNA is de­
rived, we carried out PCR-based genomic
sequencing. Three intrans (and three exons)
were found 5′ to the Adh-derived exon in
both species (Fig. 2). The size and positions of
the three intrans are similar in the two spe­
cies, and the standard intron-splicing signal
(GT-AG) was found in each case except for
the D. teissieri second intron (GT-AT).

Because the structure of jgw is similar in
both species, it seems likely that the insert­
ed exons of Adh recruited the observed
three 5′ exons (and introns) rather quickly.
This interpretation is supported by the ob­
servation that no silent substitution accu­
mulated in the Adh-derived portion of jgw
before speciation, although several replace­
ment substitutions were common to jgw in
both species. This suggests that there was
relatively little time between the insertion
event and the subsequent molecular evolu­
tionary events: the recruitment of the addi­
tional 5′ components into a functional
transcription unit, the substitution of sev­
eral amino acids, and the speciation.

In spite of similarities of structure and early
common ancestry, jgw appears to have
evolved distinct patterns of expression in the
two species. Figure 3 shows the pattern of jgw
expression at different development stages de­
termined by quantitative PCR technique
( I 5). Quantitative PCR provides at least a
relative comparison of abundance and con­
firmed that the amount of jgw RNA was
generally smaller than the amount of Adh. In
D. teissieri, jgw showed sex-specific expression.
The expected band was abundant in amplifi­
cations of RNA from adult males, whereas no
band was present in reactions from RNA of
other stages or adult females. However, in D.
yakuba reactions with RNA from all stages
showed less expression than was seen with D.
teissieri RNA. Larval stages L1 and L2 and

adult male RNA showed more expression
than L3 and adult female RNA. Just as the
protein sequences evolved rapidly, the se­
quences controlling the regulation of jgw have
also evolved rapidly. Further analysis will
determine if the species-specific patterns of
expression are the result of the divergence of
sequences in and around the jgw transcription
unit or whether these expression patterns
reflect divergence at more distant, trans-act­
ing regulatory loci.

Whereas a distinct mRNA was expressed
fromjgw in each species, the conclusion that
jgw is a functional gene also rests on the
evolutionary interpretation of the polymor­
phism observed within each species. The
extension of this evolutionary analysis in
order to incorporate between-species diver­
gence reveals that the early history of jgw
was dominated by positive natural selection
instead of the neutral mutation and genetic
drift that are expected to characterize the
dynamics of pseudogenes (16).

To explore the divergence of jgw in the
period immediately after the Adh retro-

Fig. 3. Developmental pattern of jgwexpression .
The same method as described ( 12) was used
to amplify jgw RNA from total RNA extracted
from the different development stages. The
cDNA product from 300 ng of RNA was added
as a template for amplification because for the
range from 100 ng to 1000 ng of total RNA, the
yield of jgw RNA PCR products was observed to
be linearly related to the template RNA concen­
tration ( 11). PCR cycles (30) were conducted at
94°C (1 min) for denaturation, 60°C (2 min) for
annealing, and 72°C (3 min) for polymerization.
The products of the PCR reactions were electro­
phoresed on a 1 % agarose gel and stained with
ethidium bromide . More Adh RNA was amplified
from the same cDNA preparations, which sug­

transposition but before the split of D.
teissieri and D. yakuba, we compared the
Adh-derived portions of all jgw alleles with
the available Adh alleles in D. teissieri and
D. yakuba and with Adh alleles from various
members of the melanogaster subgroup (Fig.
4). The fixed nucleotide substitutions that
distinguished the Adh-derived portion of all
jgw alleles from Adh in all alleles of D.
teissieri and D. yakuba were assumed to have
been fixed in the ancestral species after the
insertion. Eight such sites exist at positions
339, 576, 600, 791, 831, 861, 918, and
954. Surprisingly, all eight substitutions
were amino acid replacements; no silent
substitutions were found. By incorporating
the available Adh sequences of the other six
members of the melanogaster subgroup, we
found that all eight substitutions were
unique mutations that were fixed in the jgw
lineage. Assuming the replacement rate of
substitution of 0.5 x 10- 9 per nucleotide
per year [calculated from melanogaster sub­
group Adh data (I 7) I, these eight substitu­
tions (out of 572 possible replacement sites)

D. teissieri
Adult

L1 L2 L3 Female Male

571

u

321

gests that Adh is expressed in larger amounts than jgw ( 11). Molecular size markers are on the right
in base pairs .

SCIENCE • VOL. 260 • 2 APRIL 1993 93

This content downloaded from
�������������71.57.134.161 on Wed, 29 Dec 2021 11:13:06 UTC�������������

All use subject to https://about.jstor.org/terms

Fig. 4. Phylogenetic
analysis of the evolution
of the Adh-derived por­
tion of jingwei. The topol­
ogy reflects the as­
sumed evolutionary rela­
tion between jingwei and
Adh (5). The small (stip­
pled) trees at the tips in­
dicate that the detailed
phylogenies of the al­
leles within species are

outgroup
species

teissieri
jingwei
yakuba

unknown. Numbers presented as fractions are the numbers of (fixed or polymorphic) amino acid
replacement differences over the numbers of (fixed or polymorphic) silent differences. The Adh outgroup
includes the sequences of 0. melanogaster, 0. simulans, 0. sechellia, 0. mauritiana, 0. erecta, and 0.
arena. These outgroup sequences were used to confirm that all functional Adh genes have the same
DNA sequence for the eight codons that are distinct to jingwei (in both 0. teissieri and 0. yakuba) with
one exception. At the eighth codon (nucleotide position 954) of 0. arena Adh, there is a substitution that
conferred an amino acid replacement. The italic numbers are the average numbers of silent substitutions
and the numbers of silent polymorphisms under the HKA test, which also incorporates the standard
assumptions of the neutral theory.

yielded an estimate of about 30 million
years for the time of the last common
ancestor of jgw and Adh. This conflicts with
the age of the melanogaster subgroup, which
is estimated to be 17 to 20 million years
(18). The lack of silent substitutions im­
plies that the insertion of jgw occurred close
to the· time of the speciation of D. teissieri
and D. yakuba. The excess of replacement
substitutions in the jgw line of descent
between the time of its insertion and the
speciation of D. teissieri and D. yakuba is
consistent with the model thatjgw respond­
ed to positive natural selection and evolved
a new function.

This analysis of the early history of jgw
leads us to examine its dynamics after the
divergence of the two species. One method to
explore the role of natural selection at a single
locus is to compare the replacement and silent
substitutions found between species with the
replacement and silent polymorphism found
within species ( I 9). The variation in the
Adh-derived portion of jgw within and be­
tween D. teissieri and D. yakuba is summarized
(Fig. 4). A relative excess of (fixed) replace­
ment substitutions over (fixed) silent substitu­
tions between species (21: 16) is apparent
when compared to the proportion of replace­
ment polymorphisms over silent polymor­
phisms within a species (4:27). These results
are inconsistent with the null hypothesis in
the test proposed (19), which incorporates
the assumption of the selective neutrality (x2
= 13.6, P < 0.001). Therefore, adaptive
protein evolution remained an important
force in the evolutionary history of jgw after
the separation of the two species. Further
evidence supporting this view of the diver­
gence of jgw in D. teissieri and D. yakuba
comes from a comparison of the captured 5′
coding regions. Eleven out of the 15 differ­
ences in 153 alignable coding sites cause
amino acid replacements.

Neutral allele theory considers polymor-

94

phism to be a transient phase of molecular
evolution (20). Under the assumptions of
this theory, the level of the between-species
divergence at different loci is positively cor­
related with the level of within-species poly­
morphism. This idea is embodied in a sim­
ple, two-by-two statistical test (HKA test)
(2 I) that calculates the expected amounts of
polymorphism and divergence at two loci
and can indicate whether the observed
amounts are consistent with the neutral
theory. Because we determined that there is
an excess of replacement substitutions, we
directed our attention to the silent variation
in jgw and Adh. Figure 4 shows the observed
numbers of silent substitutions and polymor­
phisms in the Adh-derived portion of jgw.
Whereas their within-species polymorphism
is comparable, the between-species diver­
gence of jgw is twofold greater than that of
Adh (x2 = 7.12, P < 0.01).

The Adh gene in these two species has a
very biased codon usage with a 84 to 87%
G+C content at the third codon positions.
Furthermore, the codon preference as shown
by the within-species polymorphism data in
D. yakuba (19) also shows a very high G+C
content (85. 7%). Nevertheless, the codon
usage in the Adh-derived portion of jgw
seems to be evolving to a smaller amount of
G+C. The G+C contents at the third
codon position summed over all silent poly­
morphism sites in the two species are 70.0%
for D. teissieri and 62.4% for D. yakuba,
respectively. The G+C content at the third
codon position in the fixed replacement sites
between the two species is 60.0%. This
evolution of a moderate G+C content of
third sites is consistent with the higher rate
of silent divergence for jgw and the general
observation of a negative correlation be­
tween the codon bias (and G+C content)
and the rate of silent divergence (22).

The results presented reveal the molecu­
lar, genetic, and evolutionary mechanisms

SCIENCE • VOL 260 • 2 APRIL 1993

that participated in the origin of the chimer­
ic processed functional gene jgw. The chi­
meric nature of the expressed product of this
novel gene is unique among processed genes,
including those few that are functional, such
as the second gene for rat insulin, the human
POK gene, and other examples that have
been interpreted as outcomes of retroposi­
tions (2, 23). It has been proposed that
retrotransposition can be a major cause of
intron loss during evolution (24), as it was
with jgw. The evolution of jgw also demon­
strates that retrotransposition can be a
source of new, intron-containing genes in
eukaryotic evolution. It is also unique
among instances of exon-shuffiing (25) in
that it clearly involved the retrotransposi­
tion of an mRNA. Experiments of cross­
hybridization with a genomic Southern
(DNA) blot and Northern analysis with the
use of a probe generated from the captured
portion of jgw indicated that the source of
the captured portion of jgw was a duplication
of an unrelated gene (26). The analysis of
the early molecular evolution of jgw indicat­
ed that jgw was functional from the begin­
ning and experienced strong adaptive evolu­
tion for what must have been a novel func­
tion. Prevailing theories of the origin of new
genes assume initial relaxation of selection
(I). Our results provide a contrary interpre­
tation in which natural selection is present
throughout the origin of a new gene.

REFERENCES AND NOTES

1 . S. Ohno, Evolution by Gene Duplication (Spring­
er-Verlag, Berlin, 1970); T. Ohta, Genome 31, 304
(1989); J. B. S. Haldane, The Causes of Evolution
(Longmans, New York, 1932).

2. W. Gilbert, Nature 271, 501 (1978); G. F. Hollis, P.
A. Hieter, 0. W. McBride, D. Swan, P. Leder, ibid.
296, 321 (1982); J. R. McCarrey and K. Thomas,
ibid. 326, 501 (1987).

3. In an ancient legend from China (San Hai Jing),
Jingwei, a daughter of the Emperor Yande (first
Chinese emperor 3000 B.C.), tragically drowned
while swimming in the East China Sea. Jingwei
was then reincarnated as a beautiful bird that
drops stones and wood into the sea in an attempt
to fill it, thus preventing others from drowning. We
used the name “jingwei” because this gene
avoided the usual fate of processed gene (death)
and was “reincarnated” into a new structure with
novel function.

4. C. H. Langley et al., Proc. Natl. Acad. Sci. U.S.A.
79, 5631 (1982).

5. P. Jeffs and M. Ashburner, Proc. R. Soc. London,
Ser. B 244, 151 (1991).

6. The sequences were completely determined from
both strands by the dideoxyribonucleotide chain­
terminating method [F. Sanger et al., Proc. Natl.
Acad. Sci. U.S.A. 74, 5463 (1977)] on single-str

Ecology homework help

AP4: Identifying design choices in a Broadway production (2022)

While you view the two videos of the musical you chose, identify examples of each design element and give specific description, as well as how the design element adds to the audience’s understanding of the production. For example, if one actor is in a spotlight while the actors are in the dark, you would write that lighting in that scene was used to focus the audience rather than full lighting which would not be as effective.

Which production videos did you choose for this assignment?

Production Videos

The Lion King

War Horse

Wicked

PRODUCTION CHOSEN:

Design Elements


Describe specific examples used in the production and how those choices affect the audience’s connection to the scenes.



Scenic Design


Sound Design


Lighting Design


Prop Design

(Prop Choices)


Costume Design


Makeup & Hair Design


Project Score


Instructor’s Comments

Ecology homework help

SIM DAY 1- Morning Simulation

Jennifer Hoffman- Acute Severe Asthma

The below activities are required to be completed before you arrive to the simulation. Completing the below criteria is your “ticket to enter” the simulation. Please have all pre-work submitted by Monday at 2359 prior to the simulation day. If this is not completed, you will not be allowed to participate in the simulation (please be advised that simulations are limited, so make-up is not an option. If the simulation is not completed for this course you will fail to meet the objectives and not pass this course-both lecture and clinical).

Complete the Pathophysiology diagram below by using the ATI Med/Surg ebook or your Ignatavicius Med-Surg Text located in your lecture course shell regarding Status Asthmaticus. (chapter 30) (1hr)


1. Complete the table below (1hr)

Complete the table below on medications using a Drug eBook.

Graphical user interface, text, application  Description automatically generated

Drug Name

Indications

Pharmacokinetics

Contraindications/Precautions

Nursing Implications

Implementation

Patient/family teaching

Evaluation

Albuterol 5mg in 3 mL

Ipratropium Bromide 0.5mg in 2.5mL normal saline via nebulization

Methylpred-nisolone IV 125mg

2. Complete the table below (30 min.)

Complete the table below on lab values using the Lab and diagnostic ebook (example below).

Graphical user interface, text  Description automatically generated with medium confidence

Lab Name

Rationale

Normal Ranges

Indications

Nursing Implications

ABG Analysis (What are normal results?)

pH:

CO2:

HCO3:

What ABG findings would you expect with hyperventilation?

What ABG findings would you expect with hypoventilation?

References:

Status Asthmaticus

(define)

Health Promotion and Disease Prevention:

[Text]

Risk Factors

Document two Nursing Diagnosis and two Goals for your client:

[Text]

[Text]

Lab Tests/ Diagnostics

[Text]

Nursing Interventions

[Text]

Client Education

Medications (list only)

[Text]

Multidisciplinary Care

Possible Complications

[Text]

[Text]

[Text]

Ecology homework help

PLoS BIOLOGY

Population Genomics: Whole-Genome Analysis
of Polymorphism and Divergence
in Drosophila simulans

1,2* 1,2* 1,2 3 1,2,4,5
David J. Begun , Alisha K. Holloway , Kristian Stevens , LaDeana W. Hillier , Yu-Ping Poh ,

6,7 6 8,9 1,2,10 11 12,13
Matthew W. Hahn , Phillip M. Nista , Corbin D. Jones , Andrew D. Kern , Colin N. Dewey , Lior Pachter ,

13 1,2*
Eugene Myers , Charles H. Langley

1 Department of Evolution and Ecology, University of California Davis, Davis, California, United States of America, 2 Center for Population Biology, University of California

Davis, Davis, California, United States of America, 3 Genome Sequencing Center, Washington University School of Medicine, St. Louis, Missouri, United States of America,

4 Institute of Molecular and Cellular Biology, National Tsing Hua University, Hsinchu, Taiwan Authority, 5 Research Center for Biodiversity, Academica Sinica, Taipei, Taiwan

Authority, 6 Department of Biology, Indiana University, Bloomington, Indiana, United States of America, 7 School of Informatics, Indiana University, Bloomington, Indiana,

United States of America, 8 Department of Biology, University of North Carolina, Chapel Hill, North Carolina, United States of America, 9 Carolina Center for Genome

Sciences, University of North Carolina, Chapel Hill, North Carolina, United States of America, 10 Center for Biomolecular Science and Engineering, University of California

Santa Cruz, Santa Cruz, California, United States of America, 11 Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, Wisconsin, United

States of America, 12 Department of Mathematics, University of California, Berkeley, California, United States of America, 13 Department of Computer Science, University of

California, Berkeley, California, United States of America

The population genetic perspective is that the processes shaping genomic variation can be revealed only through
simultaneous investigation of sequence polymorphism and divergence within and between closely related species.
Here we present a population genetic analysis of Drosophila simulans based on whole-genome shotgun sequencing of
multiple inbred lines and comparison of the resulting data to genome assemblies of the closely related species, D.
melanogaster and D. yakuba. We discovered previously unknown, large-scale fluctuations of polymorphism and
divergence along chromosome arms, and significantly less polymorphism and faster divergence on the X chromosome.
We generated a comprehensive list of functional elements in the D. simulans genome influenced by adaptive evolution.
Finally, we characterized genomic patterns of base composition for coding and noncoding sequence. These results
suggest several new hypotheses regarding the genetic and biological mechanisms controlling polymorphism and
divergence across the Drosophila genome, and provide a rich resource for the investigation of adaptive evolution and
functional variation in D. simulans.

Citation: Begun DJ, Holloway AK, Stevens K, Hillier LW, Poh YP, et al. (2007) Population genomics: whole-genome analysis of polymorphism and divergence in Drosophila
simulans. PLoS Biol 5(11): e310. doi:10.1371/journal.pbio.0050310

Introduction

Given the long history of Drosophila as a central model
system in evolutionary genetics beginning with the origins of
empirical population genetics in the 1930s, it is unsurprising
that Drosophila data have inspired the development of
methods to test population genetic theories using DNA
variation within and between closely related species [1–4].
These methods rest on the supposition of the neutral theory
of molecular evolution that polymorphism and divergence
are manifestations of mutation and genetic drift of neutral
variants at different time scales [5]. Under neutrality, poly-
morphism is a ‘‘snapshot’’ of variation, some of which
ultimately contributes to species divergence as a result of
fixation by genetic drift. Natural selection, however, may
cause functionally important variants to rapidly increase or
decrease in frequency, resulting in patterns of polymorphism
and divergence that deviate from neutral expectations [1,2,6].
A powerful aspect of inferring evolutionary mechanism in
this population genetic context is that selection on sequence
variants with miniscule fitness effects, which would be
difficult or impossible to study in nature or in the laboratory
but are evolutionarily important, may cause detectable
deviations from neutral predictions. Another notable aspect
of these population genetic approaches is that they facilitate

inferences about recent selection—which may be manifest as
reduced polymorphism or elevated linkage disequilibrium—
or about selection that has occurred in the distant past—
which may be manifest as unexpectedly high levels of
divergence. The application of these conceptual advances to
the study of variation in closely related species has resulted in
several fundamental advances in our understanding of the
relative contributions of mutation, genetic drift, recombina-
tion, and natural selection to sequence variation. However, it
is also clear that our genomic understanding of population
genetics has been hobbled by fragmentary and nonrandom
population genetic sampling of genomes. Thus, the full value

Academic Editor: Mohamed A. F. Noor, Duke University, United States of America

Received March 19, 2007; Accepted September 26, 2007; Published November 6,
2007

Copyright: � 2007 Begun et al. This is an open-access article distributed under the
terms of the Creative Commons Attribution License, which permits unrestricted
use, distribution, and reproduction in any medium, provided the original author
and source are credited.

Abbreviations: CDS, coding sequence; GO, gene ontology; indel, insertion/
deletion; MK test, McDonald and Kreitman test; UTR, untranslated region

* To whom correspondence should be addressed. E-mail: djbegun@ucdavis.edu
(DJB); akholloway@ucdavis.edu (AKH); chlangley@ucdavis.edu (CHL)

PLoS Biology | www.plosbiology.org 2534 November 2007 | Volume 5 | Issue 11 | e310

Population Genomics of D. simulans

Author Summary

Population genomics, the study of genome-wide patterns of
sequence variation within and between closely related species,
can provide a comprehensive view of the relative importance of
mutation, recombination, natural selection, and genetic drift in
evolution. It can also provide fundamental insights into the
biological attributes of organisms that are specifically shaped by
adaptive evolution. One approach for generating population
genomic datasets is to align DNA sequences from whole-genome
shotgun projects to a standard reference sequence. We used this
approach to carry out whole-genome analysis of polymorphism and
divergence in Drosophila simulans, a close relative of the model
system, D. melanogaster. We find that polymorphism and diver-
gence fluctuate on a large scale across the genome and that these
fluctuations are probably explained by natural selection rather than
by variation in mutation rates. Our analysis suggests that adaptive
protein evolution is common and is often related to biological
processes that may be associated with gene expression, chromo-
some biology, and reproduction. The approaches presented here
will have broad applicability to future analysis of population
genomic variation in other systems, including humans.

of genome annotation has not yet been applied to the study
of population genetic mechanisms.

Combining whole-genome studies of genetic variation
within and between closely related species (i.e., population
genomics) with high-quality genome annotation offers several
major advantages. For example, we have known for more than
a decade that regions of the genome experiencing reduced
crossing over in Drosophila tend to show reduced levels of
polymorphism yet normal levels of divergence between
species [7–10]. This pattern can only result from natural
selection reducing levels of polymorphism at linked neutral
sites, because it violates the neutral theory prediction of a
strong positive correlation between polymorphism and
divergence [5]. However, we have no general genomic
description of the physical scale of variation in polymor-
phism and divergence in Drosophila and how such variation
might be related to variation in mutation rates, recombina-
tion rates, gene density, natural selection, or other factors.
Similarly, although several Drosophila genes have been targets
of molecular population genetic analysis, in many cases, these
genes were not randomly chosen but were targeted because of
their putative association with phenotypes thought to have a
history of adaptive evolution [11,12]. Such biased data make it
difficult to estimate the proportion of proteins diverging
under adaptive evolution. In a similar vein, the unique power
of molecular population genetic analysis, when used in
concert with genome annotation, could fundamentally alter
our notions about phenotypic divergence due to natural
selection. This is because our current understanding of
phenotypic divergence and its causes is based on a small
and necessarily highly biased description of phenotypic
variation. Alternatively, a comprehensive genomic investiga-
tion of adaptive divergence could use genome annotations to
reveal large numbers of new biological processes previously
unsuspected of having diverged under selection. Here we
present a population genomic analysis of D. simulans. D.
simulans and D. melanogaster are closely related and split from
the outgroup species, D. yakuba, several million years ago [13–
15]. The vast majority of D. simulans and D. yakuba euchro-

matic DNA is readily aligned to D. melanogaster, which permits
direct use of D. melanogaster annotation for investigation of
polymorphism and divergence and allows reliable inference
of D. simulans–D. melanogaster ancestral states over much of the
genome. Our analysis uses a draft version of a D. yakuba
genome assembly (aligned to the D. melanogaster reference
sequence) and a set of light-coverage, whole-genome shotgun
data from multiple inbred lines of D. simulans, which were
syntenically aligned to the D. melanogaster reference sequence.

Results/Discussion

Genomes and Assemblies
Seven lines of D. simulans and one line of D. yakuba were

sequenced at the Washington University Genome Sequencing
Center (the white paper can be found at http://www.genome.
gov/11008080). The D. simulans lines were selected to capture
variation in populations from putatively ancestral geographic
regions [16], recent cosmopolitan populations, and strains
encompassing the three highly diverged mitochondrial
haplotypes previously described for the species [17]. These
strains have been deposited at the Tucson Drosophila Stock
Center (http://stockcenter.arl.arizona.edu). A total of 2,424,141
D. simulans traces and 2,245,197 D. yakuba traces from this
project have been deposited in the National Center for
Biotechnology Information (NCBI) trace archive. D. simulans
syntenic assemblies were created by aligning trimmed,
uniquely mapped sequence traces from each D. simulans strain
to the euchromatic D. melanogaster reference sequence (v4).
Two strains from the same population, sim4 and sim6, were
unintentionally mixed prior to library construction; reads
from these strains were combined to generate a single, deeper,
syntenic assembly (see Materials and Methods), which is
referred to as SIM4/6. The other strains investigated are
referred to as C167.4, MD106TS, MD199S, NC48S, and w501 .
Thus, six (rather than seven) D. simulans syntenic assemblies
are the objects of analysis. Details on the fly strains and
procedures used to create these assemblies, including the use
of sequence quality scores, can be found in Materials and
Methods. The coverages (in Mbp) for C167.4, MD106TS,
MD199S, NC48S, SIM4/6, and w501 , are 56.9, 56.3, 63.4, 42.6,
89.8, and 84.8, respectively. A D. yakuba strain Tai18E2 whole-
genome shotgun assembly (v2.0; http://genome.wustl.edu/)
generated by the Parallel Contig Assembly Program (PCAP)
[18] was aligned to the D. melanogaster reference sequence
(Materials and Methods). The main use of the D. yakuba
assembly was to infer states of the D. simulans–D. melanogaster
ancestor. For many analyses, we used divergence estimates for
the D. simulans lineage or the D. melanogaster lineage (from the
inferred D. simulans–D. melanogaster ancestor) rather than the
pairwise (i.e., unpolarized) divergence between these species.
These lineage-specific estimates are often referred to as ‘‘D.
simulans divergence,’’ ‘‘D. melanogaster divergence,’’ or ‘‘polar-
ized divergence.’’
A total of 393,951,345 D. simulans base pairs and

102,574,197 D. yakuba base pairs were syntenically aligned to
the D. melanogaster reference sequence. Several tens of
kilobases of repeat-rich sequences near the telomeres and
centromeres of each chromosome arm were excluded from
our analyses (Materials and Methods). D. simulans genes were
conservatively filtered for analysis based on conserved
physical organization and reading frame with respect to the

PLoS Biology | www.plosbiology.org 2535 November 2007 | Volume 5 | Issue 11 | e310

Population Genomics of D. simulans

Table 1. Autosome and X Chromosome Weighted Averages of Nucleotide Heterozygosity (p) and Lineage Divergence

Sequence Type Sites Chromosome p Divmel Divsim Divyak

Euchromatic Nonsynonymous X 0.0018 0.0067 0.0070 0.0253
A 0.0026 0.0061 0.0057 0.0223

Synonymous X 0.0199 0.0767 0.0519 0.2314
A 0.0352 0.0695 0.0524 0.2187

Intron X 0.0166 0.0248 0.0330 0.1175
A 0.0212 0.0240 0.0281 0.1028

59 UTR X 0.0079 0.0233 0.0258 0.1018
A 0.0108 0.0216 0.0203 0.0842

39 UTR X 0.0088 0.0199 0.0261 0.0957
A 0.0113 0.0186 0.0192 0.0775

Intergenic X 0.0153 0.0231 0.0299 0.1102
A 0.0204 0.0225 0.0265 0.0957

Heterochromatic Nonsynonymous X 0.0014 0.0088 0.0089 0.0269
A 0.0017 0.0083 0.0075 0.0354

Synonymous X 0.0132 0.0664 0.0493 0.2385
A 0.0136 0.0589 0.0523 0.2338

Divmel, D. melanogaster lineage divergence; Divsim, D. simulans lineage divergence; Divyak,
D. simulans/D. melanogaster common ancestor), see Materials and Methods.
doi:10.1371/journal.pbio.0050310.t001

D. melanogaster reference sequence gene models (Materials and
Methods). We took this conservative approach so as to retain
only the highest quality D. simulans data for most inferences.
The number of D. simulans genes remaining after filtering was
11,466. Ninety-eight percent of coding sequence (CDS)
nucleotides from this gene set are covered by at least one D.
simulans allele. The average number of lines sequenced per
aligned D. simulans base was 3.90. For several analyses in which
heterozygosity and divergence per site were estimated, we
further filtered the data so as to retain only genes or
functional elements (e.g., untranslated regions [UTRs]) for
which the total number of bases sequenced across all lines
exceeded an arbitrary threshold (see Materials and Methods).
The numbers of genes for which we estimated coding region
expected heterozygosity, unpolarized divergence, and polar-
ized divergence were 11,403, 11,439, and 10,150, respectively.
Coverage on the X chromosome was slightly lower than
autosomal coverage, which is consistent with less X chromo-
some DNA than autosomal DNA in mixed-sex DNA preps.
Variable coverage required analysis of individual coverage
classes (n ¼ 1–6) for a given region or feature, followed by
estimation and inference weighted by coverage (Materials and
Methods). The D. simulans syntenic alignments are available at
http://www.dpgp.org/. An alternative D. simulans ‘‘mosaic’’
assembly, which is available at http://www.genome.wustl.edu/,
was created independently of the D. melanogaster reference
sequence.

General Patterns of Polymorphism and Divergence
Nucleotide variation. We observed 2,965,987 polymorphic

nucleotides, of which 43,878 altered the amino acid sequence;
77% of sampled D. simulans genes were segregating at least
one amino acid polymorphism. The average, expected
nucleotide heterozygosity (hereafter, ‘‘heterozygosity’’ or
‘‘pnt ’’) for the X chromosome and autosomes was 0.0135 and
0.0180, respectively. X chromosome pnt was not significantly
different from that of the autosomes (after multiplying X
chromosome pnt by 4/3, to correct for X/autosome effective
population size differences when there are equal numbers of

D. yakuba lineage divergence (corresponds to divergence between D. yakuba and the

males and females; see [19]). However, X chromosome
divergence was greater than autosomal divergence in all
three lineages (50-kb windows; Table 1, Table S1, Figure 1,
Dataset S8). We will discuss this pattern in greater detail
below.
Not surprisingly, many patterns of molecular evolution

identified from previously published datasets were confirmed
in this genomic analysis. For example, synonymous sites and
nonsynonymous sites were the fastest and slowest evolving
sites types, respectively [20–24]. Nonsynonymous divergence
(dN) and synonymous divergence (dS) were positively, though
weakly, correlated (r2 ¼0.052, p , 0.0001) [25–27], and dN was
weakly, negatively correlated with CDS length (Spearman’s q
¼� 0.03, p ¼0.0005) [28,29]. More generally, longer functional
elements showed smaller D. simulans divergence than did
shorter elements (intron Spearman’s q ¼� 0.33; intergenic
Spearman’s q ¼� 0.39; 39 UTRs Spearman’s q ¼� 0.11: all show
p , 0.0001) [21,30].
Insertion/deletion (indel) variation. We investigated only

small indels (�10 bp), because they were inferred with high
confidence (Materials and Methods). Variants were classified
with respect to the D. melanogaster reference sequence;
divergence estimates were unpolarized. An analysis of trans-
posable element variation can be found in Text S1. Estimates
of small-indel heterozygosity for the X chromosome and
autosomes (Table S1) were lower than estimates of nucleotide
heterozygosity [31]. Interestingly, variation in nucleotide and
indel heterozygosity across chromosome arms was highly
correlated ([32], Figures 1 and 2; Spearman’s q between 0.45
and 0.69, p , 10 4 for each arm). Deletion heterozygosity and
divergence were consistently greater than insertion hetero-
zygosity and divergence (Figures S1 and S2, Datasets S11–S15)
for both the X chromosome and the autosomes, which
supports and extends previous claims, based on analysis of
repetitive sequences [33], of a general mutational bias for
deletions in Drosophila.
D. simulans autosomal pnt and divergence are of similar

magnitude. Mean polarized autosomal divergence (50-kb

PLoS Biology | www.plosbiology.org 2536 November 2007 | Volume 5 | Issue 11 | e310

Population Genomics of D. simulans

Figure 1. Patterns of Polymorphism and Divergence of Nucleotides along Chromosome Arms
Nucleotide p (blue) and div on the D. simulans lineage (red) in 150-kbp windows are plotted every 10 kbp. v[–log(p)] (olive) as a measure of deviation (þ
or –) in the proportion of polymorphic sites in 30-kbp windows is plotted every 10 kbp (see Materials and Methods). C and T correspond to locations of
centromeres and telomeres, respectively. Chromosome arm 3R coordinates correspond to D. simulans locations after accounting for fixed inversion on
the D. melanogaster lineage.
doi:10.1371/journal.pbio.0050310.g001

windows; 0.024) was only slightly greater than mean autoso-
mal pnt (0.018), even with regions of severely reduced pnt near
telomeres and centromeres included. Indeed, estimates of pnt
for several genomic regions are roughly equal to the genomic
average polarized divergence (Figure 1), suggesting the
existence of large numbers of shared polymorphisms in D.
simulans and D. melanogaster; such variants should be over-
represented in regions of higher nucleotide heterozygosity in
D. simulans. These patterns suggest that the average time to
the most recent common ancestor of D. simulans alleles is
nearly as great as the average time of the most recent
common ancestor of D. simulans and D. melanogaster. The
similarity in scale of polymorphism and divergence in D.
simulans also suggests that many of the neutral mutations that
have fixed in D. simulans were polymorphic in the common
ancestor of the two species. As we discuss below, this has
implications for interpreting chromosomal patterns of poly-
morphism and divergence in this species.

As expected under the neutral model, and given the
observation that much of the D. simulans lineage divergence
is attributable to polymorphism, D. simulans pnt and diver-
gence (50-kb windows) were highly, significantly correlated
(autosome Spearman’s q ¼ 0.56, p , 0.0001: X chromosome
Spearman’s q ¼ 0.48, p , 0.0001) [5]. Moreover, the genetic
and population genetic processes shaping patterns of
divergence along chromosome arms appear to operate in a
similar manner in D. simulans and D. melanogaster, as polarized
divergence (50-kb windows) for the two lineages was highly
correlated (Spearman’s q ¼ 0.74; p , 0.0001). Nevertheless,

some regions of the genome showed highly significant
increases in divergence in either the D. simulans or the D.
melanogaster lineage (see below).
Variation near centromeres and telomeres. Figure 1 and

Figure S1 support previous reports documenting severely
reduced levels of polymorphism in the most proximal and
distal euchromatic regions of Drosophila chromosome arms
[7,10,34–36]. The fact that divergence in such regions
(Materials and Methods) is only slightly lower (50-kb median
¼0.0238) than that of the rest of the euchromatic genome (50-
kb median ¼0.0248) (Mann-Whitney U, p , 0.0001), supports
the hypothesis that reduced pnt in these regions is due to
selection at linked sites rather than reduced neutral mutation
rates [1,3,6]. Genes that are located in repetitive regions of
chromosomes near telomeres and centromeres (Materials and
Methods), which we refer to as ‘‘heterochromatic,’’ showed
moderately reduced nonsynonymous and synonymous heter-
ozygosity compared with other genes (Table 1, Dataset S6)
[37] and showed a substantially higher ratio of nonsynon-
ymous-to-synonymous polymorphism and divergence relative
to other genes (Table S2) [38].
Interestingly, the magnitude and physical extent of reduced

pnt near telomeres and centromeres appears to vary among
arms. Moreover, the physical scale over which divergence
varied along the basal region of 3R appears to be much
smaller than the scale for other arms, which is seen in Figure
1 as a more compressed, thick red line representing
divergence. These heterogeneous patterns of sequence
variation near centromeres and telomeres across chromo-

PLoS Biology | www.plosbiology.org 2537 November 2007 | Volume 5 | Issue 11 | e310

Population Genomics of D. simulans

Figure 2. Patterns of Polymorphism for Nucleotides, Small Insertions, and Small Deletions along Chromosome Arms
p for nucleotides (blue), p for small (� 10 bp) insertions (orange), and p for small (� 10 bp) deletions (orchid) among the D. simulans lines in 150-kbp
windows are plotted every 10 kbp (see Materials and Methods). C and T correspond to locations of centromeres and telomeres, respectively.
Chromosome arm 3R coordinates correspond to D. simulans locations after accounting for fixed inversion on the D. melanogaster lineage.
doi:10.1371/journal.pbio.0050310.g002

some arms may reflect real differences. For example, genetic
data from D. melanogaster suggest that the centromere-
associated effects of reduced crossing-over are greater for
the autosomes than for the X chromosome and also suggest
that the X chromosome telomere is associated with a stronger
reduction in crossing-over compared with the autosomal
telomeres [39]. Alternatively, some of the heterogeneity
between chromosome arms in the centromere proximal
regions may reflect variation in the amount of repeat-rich
sequence excluded from the analysis (Materials and Methods).

X versus Autosome Divergence
Faster-X divergence. The X chromosome differs from the

autosomes in its genetics as well as in its population genetics
[40,41]. These differences have motivated several attempts to
compare patterns of polymorphism and divergence on these
two classes of chromosomes and to use such comparisons to
test theoretical population genetic models [19,41]. For
example, several population genetic models (e.g., recessivity
of beneficial mutations) predict faster evolution of X-linked
versus autosomal genes [42]. Nevertheless, there is currently
no statistical support for greater divergence of X-linked
versus autosomal genes in Drosophila [19,43,44].

The genomic data presented here clearly show that the X is
evolving faster than the autosomes. For example, median
(standard error [SE]) X versus autosome divergence for 50-kb
windows was 0.0274 (0.0003) versus 0.0242 (0.0001) for D.
simulans, 0.0233 (0.0002) versus 0.0223 (0.0007) for D. mela-
nogaster, and 0.1012 (0.0007) versus 0.0883 (0.0003) for D.
yakuba. The X evolves significantly faster than the autosomes in
D. simulans, D. melanogaster, and D. yakuba (Tables 1 and S1; 50-

kb windows, Mann-Whitney U; z ¼4.99, 12.92, and 14.68 for D.
melanogaster, D. simulans, and D. yakuba respectively, all p ,
0.0001), although the faster-X effect appeared to be consid-
erably smaller in D. melanogaster than in D. simulans or D. yakuba.
Moreover, of the 18 lineage divergence estimates (six site types
and three lineages), only one, D. simulans synonymous sites,
failed to show faster-X evolution (Table 1). However, not all
classes of site/lineages showed statistically significant faster-X
evolution (Table S3). Thus, the faster-X effect is likely to be
general for Drosophila but vary in magnitude across lineages
and site types. Mean X chromosome divergence in previous
analyses of smaller datasets [19,43,44] was higher (though not
significantly so) than autosome divergence, in agreement with
these genomic results. Finally, indel divergence also showed a
faster-X effect (Mann-Whitney U, p , 0.0001 for both
insertions and deletions).
Interestingly, the lengths of coding regions, introns, inter-

genic regions, and 59 and 39 UTRs were significantly longer
(Mann-Whitney U, all five have p , 0.0001) for the X
chromosome than for the autosomes in D. melanogaster [45].
Longer introns, intergenic sequences, and genes tend to
evolve more slowly than shorter functional elements (above
and [45]), suggesting that the faster-X inference is conserva-
tive. Perhaps the X chromosome requires additional sequen-
ces for proper regulation through dosage compensation (e.g.,
[46–48]) or proper large-scale organization in the nucleus
[49]. Alternatively, if directional selection were more com-
mon on the X chromosome, then Hill-Robertson effects [50]
could favor insertions, because selection is expected to be
more effective when there is more recombination between
selected sites. However, the fact that X-linked deletion

PLoS Biology | www.plosbiology.org 2538 November 2007 | Volume 5 | Issue 11 | e310

� �

� �

Population Genomics of D. simulans

divergence is much greater than insertion divergence, at least
for small indels (see below), does not support this idea.
Further analysis of larger indels could clarify this matter.
Finally, under the premise that ancestral polymorphism
makes a considerable contribution to D. simulans divergence,
lower X chromosome polymorphism (relative to ancestral
autosome polymorphism) would also make the faster-X
inference conservative.

As noted above, faster-X evolution has several possible
explanations, including recessivity of beneficial mutations,
underdominance, more frequent directional selection on
males than on females, higher mutation rates in females than
in males, or higher mutation rates on the X chromosome
versus the autosomes [19,40–42]. The fact that faster-X
evolution is observed across most site types is consistent with
the hypothesis that X chromosome mutation rates are greater
than autosomal mutation rates. The X chromosome is distinct
from the autosomes in that it is dosage compensated in males
through hypertranscription of X-linked genes [51–53]. Dosage
compensation of the Drosophila male germline [52] could
result in higher X-linked mutation rates if chromatin
conformation associated with hypertranscription increases
mutation rates. Indeed, cytological and biochemical studies of
the male Drosophila polytene chromosomes suggest that the X
has a fundamentally different chromatin organization than
the autosomes [54]. Alternatively, DNA repair in the hetero-
gametic male could have different properties than repair in
females. In addition to the possible contribution of elevated
X-linked mutation rates to faster-X evolution, some aspects of
the data support a role for selection in elevating X
chromosome substitution rates. For example, the three site
classes that showed the greatest X/autosome divergence ratio
in D. simulans (nonsynonymous, 59 UTR and 39 UTR) also
showed the strongest evidence for adaptive divergence in
contrasts of polymorphic and fixed variants in D. simulans (see
below). Furthermore, the observation of a significantly higher
frequency of derived polymorphic variants on the X relative
to the autosomes [55] (Table S4) is consistent with more
adaptive evolution on the X chromosome [56,57]. However,
there is no obvious enrichment of genes showing a history of
recurrent adaptive protein evolution on the X chromosome
(see below).

In addition to the overall faster rate of X chromosome
evolution, relative rate tests (Materials and Methods) revealed
that the deviations of observed numbers of substitutions from
neutral expectations are significantly greater for the X
chromosome than for autosomes in both D. simulans and D.
melanogaster (Mann-Whitney U, p ¼1.3 3 10 13 and 1.4 3 10 4
for D. simulans and D. melanogaster, respectively). The
magnitude of the deviations of D. simulans substitutions from
expected numbers (Materials and Methods) varied along
chromosome arms (Table S5 and Figure S3), with the X
chromosome showing a particularly strong physical clustering
of unusual regions. Though these patterns could be

Ecology homework help

Crystal Structure of an Ancient Protein: Evolution
by Conformational Epistasis
Eric A. Ortlund, et al.
Science 317, 1544 (2007);
DOI: 10.1126/science.1142819

The following resources related to this article are available online at
www.sciencemag.org (this information is current as of December 11, 2007 ):

Updated information and services, including high-resolution figures, can be found in the online
version of this article at:
http://www.sciencemag.org/cgi/content/full/317/5844/1544

Supporting Online Material can be found at:
http://www.sciencemag.org/cgi/content/full/1142819/DC1

A list of selected additional articles on the Science Web sites related to this article can be
found at:
http://www.sciencemag.org/cgi/content/full/317/5844/1544#related-content

This article cites 26 articles, 9 of which can be accessed for free:
http://www.sciencemag.org/cgi/content/full/317/5844/1544#otherarticles

This article has been cited by 1 article(s) on the ISI Web of Science.

This article has been cited by 1 articles hosted by HighWire Press; see:
http://www.sciencemag.org/cgi/content/full/317/5844/1544#otherarticles

This article appears in the following subject collections:
Evolution
http://www.sciencemag.org/cgi/collection/evolution

Information about obtaining reprints of this article or about obtaining permission to reproduce
this article in whole or in part can be found at:
http://www.sciencemag.org/about/permissions.dtl

D
o
w

n
lo

a
d
e
d
f
ro

m
w

w
w

.s
ci

e
n
ce

m
a
g
.o

rg
o

n
D

e
ce

m
b
e
r

1
1
,

2
0
0
7

Science (print ISSN 0036-8075; online ISSN 1095-9203) is published weekly, except the last week in December, by the
American Association for the Advancement of Science, 1200 New York Avenue NW, Washington, DC 20005. Copyright
2007 by the American Association for the Advancement of Science; all rights reserved. The title Science is a
registered trademark of AAAS.

REPORTS

proaches have been and are being considered. For
example, in Singapore, where 84% of the popu-
lation lives in public housing (35), regulations that
explicitly recognize the role of spatial segregation in
sectarianism specify the percentage of ethnic groups
to occupy housing blocks (36). This legally
compels ethnic mixing at a scale finer than that
which our study finds likely to lead to violence.
Given the natural tendency toward social separa-
tion, maintaining such mixing requires a level of
authoritarianism that might not be entertained in
other locations. Still, despite social tensions (37),
the current absence of violence provides some
support to our analysis. The alternative approach—
aiding in the separation process by establishing
clear boundaries between cultural groups to
prevent violence—has also gained recent atten-
tion (38, 39). Although further studies are
needed, there exist assessments (39) of the impact
of historical partitions in Ireland, Cyprus, the
Indian subcontinent, and the Middle East that
may be consistent with the understanding of type
separation and a critical scale of mixing or
separation presented here.

The insight provided by this study may help
inform policy debates by guiding our understanding
of the consequences of policy alternatives. The
purpose of this paper does not include promoting
specific policy options. Although our work re-
inforces suggestions to consider separation, we are
not diminishing the relevance of concerns about the
desirability of separation or its process. Even where
separation may be indicated as a way of preventing
violence, caution is warranted to ensure that the
goal of preventing violence does not become a
justification for violence. Moreover, even a peaceful
process of separation is likely to be objectionable.
There may be ways to positively motivate
separation using incentives, as well as to mitigate
negative aspects of separation that often include
displacement of populations and mobility barriers.

Our results for the range of filter diameters that
provide good statistical agreement between
reported and predicted violence in the former
Yugoslavia and India suggest that regions of width
less than 10 km or greater than 100 km may
provide sufficient mixing or isolation to reduce the
chance of violence. These bounds may be affected
by a variety of secondary factors including social
and economic conditions; the simulation resolu-
tion may limit the accuracy of the lower limit; and
boundaries such as rivers, other physical barriers,
or political divisions will surely play a role. Still,
this may provide initial guidance for strategic
planning. Identifying the nature of boundaries to
be established and the means for ensuring their
stability, however, must reflect local issues.

Our approach does not consider the relative
merits of cultures, individual acts, or immediate
causes of violence, but rather the conditions that may
promote violence. It is worth considering whether, in
places where cultural differentiation is taking place,
conflict might be prevented or minimized by political
acts that create appropriate boundaries suited to the
current geocultural regions rather than the existing

historically based state boundaries. Such bounda-
ries need not inhibit trade and commerce and need
not mark the boundaries of states, but should allow
each cultural group to adopt independent behav-
iors in separate domains. Peaceful coexistence
need not require complete integration.

References and Notes
1. M. White, Deaths by Mass Unpleasantness: Estimated

Total for the Entire 20th Century, http://users.erols.com/
mwhite28/warstat8.htm (September 2005).

2. D. L. Horowitz, Ethnic Groups in Conflict (Univ. of
California Press, Berkeley and Los Angeles, ed. 2, 2000).

3. B. Harff, T. R. Gurr, Ethnic Conflict in World Politics
(Westview, Boulder, ed. 2, 2004).

4. S. Huntington, The Clash of Civilizations and the Remaking of
World Order (Simon & Schuster, New York, 1996).

5. D. Chirot, M. E. P. Seligman, Eds., Ethnopolitical Warfare:
Causes, Consequences, and Possible Solutions (American
Psychological Association, Washington, DC, 2001).

6. M. Reynal-Querol, J. Conflict Resolut. 46, 29 (2002).
7. T. R. Gulden, Politics Life Sciences 21, 26 (2002).
8. H. Buhaug, S. Gates, J. Peace Res. 39, 417 (2002).
9. A. Varshney, Ethnic Conflict and Civic Life: Hindus and

Muslims in India (Yale Univ. Press, New Haven, CT, 2003).
10. M. D. Toft, The Geography of Ethnic Violence: Identity,

Interests, and the Indivisibility of Territory (Princeton
Univ. Press, Princeton, NJ, 2003).

11. J. Fox, Religion, Civilization, and Civil War: 1945 through the
New Millennium (Lexington Books, Lanham, MD, 2004).

12. M. Mann, The Dark Side of Democracy: Explaining Ethnic
Cleansing (Cambridge Univ. Press, New York, 2004).

13. I. Lustick, Am. Polit. Sci. Rev. 98, 209 (2004).
14. Materials and methods are available as supporting

material on Science Online.
15. T. C. Schelling, J. Math. Sociol. 1, 143 (1971).
16. J. Mimkes, J. Therm. Anal. 43, 521 (1995).
17. H. P. Young, Individual Strategy and Social Structure

(Princeton Univ. Press, Princeton, NJ, 1998).
18. R. Van Kempen, A. S. Ozuekren, Urban Stud. 35, 1631 (1998).
19. Y. Bar-Yam, in Dynamics of Complex Systems (Perseus

Press, Cambridge, MA, 1997), chap. 7.
20. A. J. Bray, Adv. Phys. 43, 357 (1994).
21. I. M. Lifshitz, V. V. Slyozov, J. Phys. Chem. Solids 19, 35

(1961).
22. D. A. Huse, Phys. Rev. B 34, 7845 (1986).

23. W. Easterly, R. Levine, Q. J. Econ. 112, 1203 (1997).
24. P. Collier, A. Hoeffler, Oxf. Econ. Pap. 50, 563 (1998).
25. R. H. Bates, Am. Econ. Rev. 90, 131 (2000).
26. J. D. Fearon, D. D. Laitin, Am. Pol. Sci. Rev. 97, 75 (2003).
27. D. N. Posner, Am. J. Pol. Sci. 48, 849 (2004).
28. I. Daubechies, Ten Lectures on Wavelets, (SIAM,

Philadelphia, 1992).
29. A. Arneodo, E. Bacry, P. V. Graves, J. F. Muzy, Phys. Rev.

Lett. 74, 3293 (1995).
30. P. Ch. Ivanov et al., Nature 383, 323 (1996).
31. Map of Yugoslavia, Courtesy of the University of Texas

Libraries., www.lib.utexas.edu/maps/europe/yugoslav.jpg.
32. R. Petrovic, Yugosl. Surv. 33, 3 (1992).
33. K. Chaudhuri, Frontline 18 (no. 2), www.hinduonnet.com/

fline/fl1802/18020330.htm.
34. Final Report, Carnegie Commission on Preventing Deadly

Conflict, www.wilsoncenter.org/subsites/ccpdc/pubs/
rept97/finfr.htm.

35. A. Brief Background, Housing and Development Board,
Singapore Government, www.hdb.gov.sg/fi10/fi10296p.nsf/
WPDis/About%20UsA%20Brief%20Background%20-%
20HDB’s%20Beginnings.

36. Ethnic Integration Policy, Housing and Development
Board, Singapore Government, www.hdb.gov.sg/fi10/
fi10201p.nsf/WPDis/Buying%20A%20Resale%
20FlatEthnic%20Group%20Eligibility.

37. D. Murphy, Christian Science Monitor, 5 February 2002,
www.csmonitor.com/2002/0205/p07s01-woap.html.

38. J. Tullberg, B. S. Tullberg, Politics Life Sciences 16, 237 (1997).
39. C. Kaufmann, Int. Secur. 23, 120 (1998).
40. We thank G. Wolfe, M. Woolsey, and L. Burlingame for

editing the manuscript; B. Wang for assistance with figures;
M. Nguyen and Z. Bar-Yam for assistance with identifying
data; and I. Epstein, S. Pimm, F. Schwartz, E. Downs, and
S. Frey for helpful comments. We acknowledge internal
support by the New England Complex Systems Institute and
the U.S. government for support of preliminary results.

Supporting Online Material
www.sciencemag.org/cgi/content/full/317/5844/1540/DC1
Methods
Figs. S1.1 to S4.3
SOM Text
Table S1
References
Bibliography

30 November 2006; accepted 13 August 2007
10.1126/science.1142734

Crystal Structure of an Ancient
Protein: Evolution by
Conformational Epistasis
Eric A. Ortlund,1* Jamie T. Bridgham,2* Matthew R. Redinbo,1 Joseph W. Thornton2†

The structural mechanisms by which proteins have evolved new functions are known only indirectly.
We report x-ray crystal structures of a resurrected ancestral protein—the ~450 million-year-old
precursor of vertebrate glucocorticoid (GR) and mineralocorticoid (MR) receptors. Using structural,
phylogenetic, and functional analysis, we identify the specific set of historical mutations that
recapitulate the evolution of GR’s hormone specificity from an MR-like ancestor. These
substitutions repositioned crucial residues to create new receptor-ligand and intraprotein contacts.
Strong epistatic interactions occur because one substitution changes the conformational position
of another site. “Permissive” mutations—substitutions of no immediate consequence, which
stabilize specific elements of the protein and allow it to tolerate subsequent function-switching
changes—played a major role in determining GR’s evolutionary trajectory.

D
o
w

n
lo

a
d
e
d
f
ro

m
w

w
w

.s
ci

e
n
ce

m
a
g
.o

rg
o

n
D

e
ce

m
b
e
r

1
1
,

2
0
0
7

A
central goal in molecular evolution is to
understand the mechanisms and dynam-
ics by which changes in gene sequence

generate shifts in function and therefore pheno-
type (1, 2). A complete understanding of this

process requires analysis of how changes in protein
structure mediate the effects of mutations on
function. Comparative analyses of extant proteins
have provided indirect insights into the diversifi-
cation of protein structure (3–6), and protein

1544 14 SEPTEMBER 2007 VOL 317 SCIENCE www.sciencemag.org

A B

F
o
ld

a
ct

iv
a
tio

n

30 4010HomoGR RajaGR HomoMR
8 3020
6

20
410

102
0 0
-10 -9 -8 -7 -6 -5 -11 -10 -9 -8 -7 -6 -5 -11 -10 -9 -8 -7 -6

0

Hormone (log M)

TetrapodGR TeleostGR ElasmobranchGR MRs(8)
(4) (6) (1) 20

AncGR2

~420 Ma

15

10

5

0
-11 -10 -9 -8 -7 -6

36aa
+1∆

20

15
AncGR1

10

25aa

~440 Ma
5

0
-11 -10 -9 -8

30
AncCR

20

-7 -6 C

C18

Aldosterone
Cortisol
DOC

10 C17
~470 Ma

0
-11 -10 -9 -8 -7 -6

C11

ormones.

REPORTS

engineering studies have elucidated structure-
function relations that shape the evolutionary
process (7–11). To directly identify the mecha-
nisms by which historical mutations generated
new functions, however, it is necessary to
compare proteins through evolutionary time.

Here we report the empirical structures of an
ancient protein, which we “resurrected” (12) by
phylogenetically determining its maximum likeli-
hood sequence from a large database of extant se-
quences, biochemically synthesizing a gene coding
for the inferred ancestral protein, expressing it in
cultured cells, and determining the protein’s
structure by x-ray crystallography. Specifically, we
investigated the mechanistic basis for the functional
evolution of the glucocorticoid receptor (GR), a
hormone-regulated transcription factor present in all
jawed vertebrates (13). GR and its sister gene, the
mineralocorticoid receptor (MR), descend from the
duplication of a single ancient gene, the ancestral
corticoid receptor (AncCR), deep in the vertebrate
lineage ~450 million years ago (Ma) (Fig. 1A) (13).
GR is activated by the adrenal steroid cortisol and
regulates stress response, glucose homeostasis, and
other functions (14). MR is activated by aldosterone
in tetrapods and by deoxycorticosterone (DOC) in
teleosts to control electrolyte homeostasis, kidney

1Department of Chemistry, University of North Carolina,
Chapel Hill, NC 27599, USA. 2Center for Ecology and
Evolutionary Biology, University of Oregon, Eugene, OR
97403, USA.

*These authors contributed equally to this work.
†To whom correspondence should be addressed. E-mail:
joet@uoregon.edu

Fig. 1. (A) Functional evolution

and colon function, and other processes (14). MR is
also sensitive to cortisol, though considerably less
so than to aldosterone and DOC (13, 15).
Previously, AncCR was resurrected and found to
have MR-like sensitivity to aldosterone, DOC, and
cortisol, indicating that GR’s cortisol specificity is
evolutionarily derived (13).

To identify the structural mechanisms by
which GR evolved this new function, we used
x-ray crystallography to determine the structures
of the resurrected AncCR ligand-binding domain
(LBD) in complex with aldosterone, DOC, and
cortisol (16) at 1.9, 2.0, and 2.4 Å resolution,
respectively (table S1). All structures adopt the
classic active conformation for nuclear receptors
(17), with unambiguous electron density for each
hormone (Fig. 1B and figs. S1 and S2). AncCR’s
structure is extremely similar to the human MR
[root mean square deviation (RMSD) = 0.9 Å for
all backbone atoms] and, to a lesser extent, to the
human GR (RMSD = 1.2 Å). The network of
hydrogen-bonds supporting activation in the
human MR (18) is present in AncCR, indicating
that MR’s structural mode of action has been
conserved for >400 million years (fig. S3).

Because aldosterone evolved only in the
tetrapods, tens of millions of years after AncCR,
that receptor’s sensitivity to aldosterone was
surprising (13). The AncCR-ligand structures
indicate that the receptor’s ancient response to
aldosterone was a structural by-product of its
sensitivity to DOC, the likely ancestral ligand,
which it binds almost identically (Fig. 1C). Key
contacts for binding DOC involve conserved

surfaces among the hormones, and no obligate
contacts are made with moieties at C11, C17, and
C18, the only variable positions among the three
hormones. These inferences are robust to uncer-
tainty in the sequence reconstruction: We modeled
each plausible alternate reconstruction [posterior
probability (PP) > 0.20] into the AncCR crystal
structures and found that none significantly af-
fected the backbone conformation or ligand inter-
actions. The receptor, therefore, had the structural
potential to be fortuitously activated by aldoster-
one when that hormone evolved tens of millions
of years later, providing the mechanism for evo-
lution of the MR-aldosterone partnership by mo-
lecular exploitation, as described (13).

To determine how GR’s preference for cortisol
evolved, we identified substitutions that occurred
during the same period as the shift in GR function.
We used maximum likelihood phylogenetics to de-
termine the sequences of ancestral receptors along
the GR lineage (16). The reconstructions had strong
support, with mean PP >0.93 and the vast majority
of sites with PP >0.90 (tables S2 and S3). We
synthesized a cDNA for each reconstructed LBD,
expressed it in cultured cells, and experimentally
characterized its hormone sensitivity in a reporter
gene transcription assay (16). GR from the com-
mon ancestor of all jawed vertebrates (AncGR1 in
Fig. 1A) retained AncCR’s sensitivity to aldoster-
one, DOC, and cortisol. At the next node, however,
GR from the common ancestor of bony vertebrates
(AncGR2) had a phenotype like that of modern
GRs, responding only to cortisol. This inference is
robust to reconstruction uncertainty: We introduced

D
o
w

n
lo

a
d
e
d
f
ro

m
w

w
w

.s
ci

e
n
ce

m
a
g
.o

rg
o

n
D

e
ce

m
b
e
r

1
1
,
2
0
0
7

of corticosteroid receptors. Dose-
response curves show transcrip-
tion of a luciferase reporter gene
by extant and resurrected ances-
tral receptors with varying doses
(in log M) of aldosterone (green),
DOC (orange), and cortisol (pur-
ple). Black box indicates evolution
of cortisol specificity. The number
of sequence changes on each
branch is shown (aa, replacement;
D, deletion). Scale bars, SEM of
three replicates. Node dates from
the fossil record (19, 20). For com-
plete phylogeny and sequences,
see fig. S10 and table S5. (B)
Crystal structure of the AncCR LBD
with bound aldosterone (green,
with red oxygens). Helices are la-
beled. (C) AncCR’s ligand-binding
pocket. Side chains (<4.2 Å from
bound ligand) are superimposed
from crystal structures of AncCR
with aldosterone (green), DOC
(orange), and cortisol (purple).
Oxygen and nitrogen atoms are
red and blue, respectively; dashed
lines indicate hydrogen bonds.
Arrows show C11, C17, and C18
positions, which differ among the h

www.sciencemag.org SCIENCE VOL 317 14 SEPTEMBER 2007 1545

20

15

10

AncGR1+
L111Q

AncGR1+
S106P, L111Q

0

5

10

15

20

5

0
-11 -10 -9 -8 -7 -6 -11 -10 -9 -8 -7 -6 -5

AncGR1+
S106P

0

5

10

15

20AncGR120

15

10

5

0
-11 -10 -9 -8 -7 -6 -11 -10 -9 -8 -7 -6 -5

REPORTS

plausible alternative states by mutagenesis, but
none changed function (fig. S4). GR’s specificity
therefore evolved during the interval between these
two speciation events, ~420 to 440 Ma (19, 20).

During this interval, there were 36 substitutions
and one single-codon deletion (figs. S5 and S6).
Four substitutions and the deletion are conserved in
one state in all GRs that descend from AncGR2 and
in another state in all receptors with the ancestral
function. Two of these—S106P and L111Q (21)—
were previously identified as increasing cortisol
specificity when introduced into AncCR (13). We
introduced these substitutions into AncGR1 and
found that they recapitulate a large portion of the
functional shift from AncGR1 to AncGR2, radi-
cally reducing aldosterone and DOC response
while maintaining moderate sensitivity to cortisol
(Fig. 2A); the concentrations required for half-
maximal activation (EC50) by aldosterone and
DOC increased by 169- and 57-fold, respectively,
whereas that for cortisol increased only twofold. A
strong epistatic interaction between substitutions
was apparent: L111Q alone had little effect on
sensitivity to any hormone, but S106P dramatically
reduced activation by all ligands. Only the
combination switched receptor preference from
aldosterone and DOC to cortisol. Introducing these
historical substitutions into the human MR yielded
a completely nonfunctional receptor, as did
reversing them in the human GR (fig. S7). These
results emphasize the importance of having the
ancestral sequence to reveal the functional impacts
of historical substitutions.

To determine the mechanism by which these
two substitutions shift function, we compared the
structures of AncGR1 and AncGR2, which were
generated by homology modeling and energy
minimization based on the AncCR and human
GR crystal structures, respectively (16). These
structures are robust to uncertainty in the recon-
struction: Modeling plausible alternate states did
not significantly alter backbone conformation,
interactions with ligand, or intraprotein interactions.
The major structural difference between AncGR1

Fig. 2. Mechanism for switching A
AncGR1’s ligand preference from al-

and AncGR2 involves helix 7 and the loop
preceding it, which contain S106P and L111Q
and form part of the ligand pocket (Fig. 2B and fig.
S8). In AncGR1 and AncCR, the loop’s position is
stabilized by a hydrogen bond between Ser106 and
the backbone carbonyl of Met103 . Replacing Ser106

with proline in the derived GRs breaks this bond
and introduces a sharp kink into the backbone,
which pulls the loop downward, repositioning and
partially unwinding helix 7. By destabilizing this
crucial region of the receptor, S106P impairs
activation by all ligands. The movement of helix
7, however, also dramatically repositions site 111,
bringing it close to the ligand. In this conforma-
tional background, L111Q generates a hydrogen
bond with cortisol’s C17-hydroxyl, stabilizing the
receptor-hormone complex. Aldosterone and DOC
lack this hydroxyl, so the new bond is cortisol-
specific. The net effect of these two substitu-
tions is to destabilize the receptor complex with
aldosterone or DOC and restore stability in a
cortisol-specific fashion, switching AncGR2’s pref-
erence to that hormone. We call this mode of
structural evolution conformational epistasis, be-
cause one substitution remodels the protein back-
bone and repositions a second site, changing the
functional effect of substitution at the latter.

Although S106P and L111Q (“group X” for
convenience) recapitulate the evolutionary switch
in preference from aldosterone to cortisol, the
receptor retains some sensitivity to MR’s ligands,
unlike AncGR2 and extant GRs. We hypothesized
that the other three strictly conserved changes that
occurred between AncGR1 and AncGR2 (L29M,
F98I, and deletion S212D) would complete the
functional switch. Surprisingly, introducing these
“group Y” changes into the AncGR1 and AncGR1 +
X backgrounds produced completely nonfunc-
tional receptors that cannot activate transcription,
even in the presence of high ligand concentrations
(Fig. 3A). Additional epistatic substitutions must
have modulated the effect of group Y, which pro-
vided a permissive background for their evolution
that was not yet present in AncGR1.

The AncCR crystal structure allowed us to
identify these permissive mutations by analyzing
the effects of group Y substitutions (Fig. 3B).
In all steroid receptors, transcriptional activity
depends on the stability of an activation-function
helix (AF-H), which is repositioned when the
ligand binds, generating the interface for tran-
scriptional coactivators. The stability of this
orientation is determined by a network of inter-
actions among three structural elements: the loop
preceding AF-H, the ligand, and helix 3 (17).
Group Y substitutions compromise activation be-
cause they disrupt this network. S212D eliminates
a hydrogen bond that directly stabilizes the AF-H
loop, and L29M on helix 3 creates a steric clash
and unfavorable interactions with the D-ring of
the hormone. F98I opens up space between helix
3, helix 7, and the ligand; the resulting instability
is transmitted indirectly to AF-H, impairing
activation by all ligands (Fig. 3B). If the protein
could tolerate group Y, however, the structures
predict that these mutations would enhance
cortisol specificity: L29M forms a hydrogen
bond with cortisol’s unique C17-hydroxyl, and
the additional space created by F98I relieves a
steric clash between the repositioned loop and
Met108 , stabilizing the key interaction between
Q111 and the C17-hydroxyl (Fig. 3B).

We hypothesized that historical substitutions
that added stability to the regions destabilized by
group Y might have permitted the evolving pro-
tein to tolerate group Y mutations and to complete
the GR phenotype. Structural analysis suggested
two candidates (group Z): N26T generates a new
hydrogen bond between helix 3 and the AF-H
loop, and Q105L allows helix 7 to pack more
tightly against helix 3, stabilizing the latter and,
indirectly, AF-H (Fig. 3B). As predicted, intro-
ducing group Z into the nonfunctional AncGR1 +
X + Y receptor restored transcriptional activity,
indicating that Z is permissive for Y (Fig. 3A).
Further, AncGR1 + X + Y + Z displays a fully
GR-like phenotype that is unresponsive to
aldosterone and DOC and maintains moderate

B

D
o
w

n
lo

a
d
e
d
f
ro

m
w

w
w

.s
ci

e
n
ce

m
a
g
.o

rg
o

n
D

e
ce

m
b
e
r

1
1
,
2
0
0
7

dosterone to cortisol. (A) Effect of
substitutions S106P and L111Q on the
resurrected AncGR1’s response to hor-
mones. Dashed lines indicate sensitivity

F
o
ld

a
ct

iv
a
tio

n

to aldosterone (green), cortisol (purple),
and DOC (orange) as the EC50 for
reporter gene activation. Green arrow
shows probable pathway through a
functional intermediate; red arrow,
intermediate with radically reduced
sensitivity to all hormones. (B) Struc-
tural change conferring new ligand
specificity. Backbones of helices 6 and
7 from AncGR1 (green) and AncGR2
(yellow) in complex with cortisol are
superimposed. Substitution S106P Hormone (log M)
induces a kink in the interhelical loop
of AncGR2, repositioning sites 106 and 111 (arrows). In this background, L111Q forms a new hydrogen bond with cortisol’s unique C17-hydroxyl (dotted red line).

1546 14 SEPTEMBER 2007 VOL 317 SCIENCE www.sciencemag.org

REPORTS

cortisol sensitivity. Both N26T and Q105L are
required for this effect (table S4). Strong epistasis
is again apparent: Adding group Z substitutions
in the absence of Y has little or no effect on ligand-
activated transcription, presumably because the
receptor has not yet been destabilized (Fig. 3A).
Evolutionary trajectories that pass through func-
tional intermediates are more likely than those
involving nonfunctional steps (22), so the only
historically likely pathways to AncGR2 are those
in which the permissive substitutions of group Z
and the large-effect mutations of group X occurred
before group Y was complete (Fig. 3C).

Fig. 3. Permissive substitutions in the
evolution of receptor specificity. (A)
Effects of various combinations of
historical substitutions on AncGR1’s
transcriptional activity and hormone-
sensitivity in a reporter gene assay.
Group Y (L29M, F98I, and S212D) abol-
ishes receptor activity unless groups X
(S106P, L111Q) and Z (N26T and
Q105L) are present; the XYZ combina-
tion yields complete cortisol-specificity.
The 95% confidence interval for each
EC50 is in parentheses. Dash, no acti-
vation. (B) Structural prediction of
permissive substitutions. Models of

AncGR1 (green) and AncGR2 (yellow)
are shown with cortisol. Group X and Y
substitutions (circles and rectangles)
yield new interactions with the C17-
hydroxyl of cortisol (purple) but de-
stabilize receptor regions required for
activation. Group Z (underlined) imparts
additional stability to the destabilized
regions. (C) Restricted evolutionary
paths through sequence space. The
corners of the cube represent states for
residue sets X, Y, and Z. Edges represent
pathways from the ancestral sequence
(AncGR1) to the cortisol-specific combi-

Our discovery of permissive substitutions in the
AncGR1-AncGR2 interval suggested that other
permissive mutations might have evolved even
earlier. We used the structures to pred

Ecology homework help

MOLECULAR EVOLUTION

By (name)

Affiliated institution

Professor

Course

Date

MOLECULAR EVOLUTION

Kimura, M. (1968). Evolutionary rate at the molecular level. Nature217(5129), 624-626.

Molecular evolution is the subject of Kimura’s studies. Comparison of hemoglobin molecules from different animal species This suggests that, during the course of mammalian evolution, amino-acid substitution happened at a rate of roughly one amino-acid change per chain composed of some amino-acids. There must be a lot of neutral mutations in order for Kimura’s estimate of the rate at which evolution occurs to come out so high.’ It is possible to use neutral mutations as a form of “molecular clock” to describe genetic evolution since they act as a kind of “yardstick.” Kimura’s work is critical to this study because if the nearly neutral or neutral mutation is generated at a faster rate in each generation than previously thought, we ought to acknowledge the critical role of random genetic drift because to a restricted number of individuals in a group has an impact on the genetic makeup of that population.

King, J. L., & Jukes, T. H. (1969). Non-Darwinian Evolution: Most evolutionary change in proteins may be due to neutral mutations and genetic drift. Science, 164(3881), 788-798.

“Non-Darwinian Evolution” is a 1969 scientific study co-authored by Thomas H. Jukes and Jack Lester King. It is credited with suggesting what became recognized as “the neutral hypothesis of molecular evolution,” along with Motoo Kimura’s 1968 publication “Evolutionary Rate at the Molecular Level.” The study marshals a wealth of information, spanning from protein sequence analyses to investigations of the Treffer’s mutator genes in E. coli, genetic code analysis, and comparative immunology, to suggest that the majority of protein evolution occurs as a result of genetic drift and neutral mutations. The paper “non-Darwinian evolution” is relevant for my study since it offers an in-depth explanation of the concept of the neutral theory of molecular evolution

Ecology homework help

letters to nature

5. Charlesworth, B. The effect of background selection against deleterious mutations on weakly selected,
linked variants. Genet. Res. 63, 213±227 (1994).

6. Fay, J., Wycoff, G. J. & Wu, C.-I. Positive and negative selection on the human genome. Genetics 158,
1227±1234 (2001).

7. McDonald, J. H. & Kreitman, M. Adaptive evolution at the Adh locus in Drosophila. Nature 351, 652±
654 (1991).

8. Charlesworth, B., Morgan, M. T. & Charlesworth, D. The effect of deleterious mutations on neutral
molecular variation. Genetics 134, 1289±1303 (1993).

9. Maynard Smith, J. & Haigh, J. The hitch-hiking effect of a favourable gene. Genet. Res. 23, 23±35 (1974).
10. Begun, D. J. & Aquadro, C. F. levels of naturally occuring DNA polymorphism correlate with

recombination rates in D. melanogaster. Nature 356, 519±520 (1992).
11. Begun, D. The frequency distribution of nucleotide variation in Drosophila simulans. Mol. Biol. Evol.

18, 1343±1352 (2001).
12. Kliman, R. Recent selection on synonymous codon usage in Drosophila. J. Mol. Evol. 49, 343±351 (1999).
13. Adams, M. D. et al. The genome sequence of Drosophila melanogaster. Science 287, 2185±2195 (2000).
14. Powell, J. R. & DeSalle, R. Drosophila molecular phylogenies and their uses. Evol. Biol. 28, 87±138

(1995).
15. Haldane, J. B. S. The cost of natural selection. J. Genet. 55, 511±524 (1957).
16. Kimura, M. Evolutionary rate at the molecular level. Nature 217, 624±626 (1968).
17. Thompson, J. D., Higgins, D. G. & Gibson, T. J. ClustalWÐimproving the sensitivity of progressive

multiple alignment through sequence weighting, position-speci®c gap penalties and weight matrix
choice. Nucl. Acids Res. 22, 4673±4680 (1994).

18. Xia, X. Data Analysis in Molecular Biology and Evolution (Kluwer Academic, London, 2000).
19. Rozas, J. & Rozas, R. DnaSP version 3: an integrated program for molecular population genetics and

molecular evolution analysis. Bioinformatics 15, 174±175 (1999).
20. Yang, Z. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl.

Biosci. 13, 555±556 (1997).

Supplementary Information accompanies the paper on Nature’s website
(http://www.nature.com).

Acknowledgements
We thank B. Charlesworth, C.-I. Wu, S. Otto, M. Whitlock, T. Johnson, P. Awadalla,
J. Gillespie, G. McVean and P. Keightley for helpful discussions, and E. Moriyama for help
with data collection. N.G.C.S. was funded by the Biotechnology and Biological Sciences
Research Council (BBSRC) and A.E.-W. is funded by the Royal Society and the BBSRC.

Competing interests statement
The authors declare that they have no competing ®nancial interests.

Correspondence and requests for materials should be addressed to A.E.-W.
(e-mail: a.c.eyre-walker@sussex.ac.uk).

………………………………………………………..
Testing the neutral theory of
molecular evolution with
genomic data from Drosophila
Justin C. Fay*², Gerald J. Wyckoff*² & Chung-I Wu*³

* Committee on Genetics, University of Chicago, Chicago, Illinois 60637, USA
³ Department of Ecology and Evolution, University of Chicago, Chicago,
Illinois 60637, USA

…………………………………………………………………………………………………………………………….

Although positive selection has been detected in many genes, its
overall contribution to protein evolution is debatable1. If the bulk
of molecular evolution is neutral, then the ratio of amino-acid (A)
to synonymous (S) polymorphism should, on average, equal that
of divergence2. A comparison of the A/S ratio of polymorphism in
Drosophila melanogaster with that of divergence from Drosophila
simulans shows that the A/S ratio of divergence is twice as highÐa
difference that is often attributed to positive selection. But an
increase in selective constraint owing to an increase in effective
population size could also explain this observation, and, if so, all
genes should be affected similarly. Here we show that the differ-
ence between polymorphism and divergence is limited to only a

² Present addresses: Department of Genome Sciences, Lawrence Berkeley National Laboratory, Berkeley,
California 94720 (J.C.F.); Department of Human Genetics, University of Chicago, Chicago, Illinois 60637,
USA (G.J.W).

fraction of the genes, which are also evolving more rapidly, and this
implies that positive selection is responsible. A higher A/S ratio of
divergence than of polymorphism is also observed in other species,
which suggests a rate of adaptive evolution that is far higher than
permitted by the neutral theory of molecular evolution.
The neutral theory holds that the bulk of DNA divergence

between species is driven by mutation and drift, rather than by
positive darwinian selection3. But because the effect of positive
selection is often masked by negative selection4, detecting positive
selection is a challenging task. A rate of amino-acid substitution
greater than that of synonymous substitution can be explained only
by positive selection5, but such a criterion is very stringent as
negative selection lowers the rate of amino-acid substitution. A
high rate of amino-acid substitution is limited mostly to genes that
are involved in resistance to disease or in sexual reproduction, where
there is continual room for improvement6,7.
The McDonald±Kreitman test can detect positive selection even

in the presence of negative selection through a ratio of amino-acid
divergence to synonymous divergence greater than that of
polymorphism2. The A/S ratio of divergence is in¯ated above
polymorphism by advantageous amino-acid mutations, which
quickly sweep through a population but have a cumulative effect
on divergence. The McDonald±Kreitman test has been applied to
many genes individually, but only a few have yielded a signi®cant
excess of amino-acid divergence (Drosophila genes are reviewed in
refs 8, 9). This may in part be caused by a lack of power in detecting
positive selection in individual genes unless a large number of
adaptive substitutions have occurred.
For those genes that have yielded a signi®cant McDonald±

Kreitman test result, the A/S ratio of divergence is more than twice
as great as polymorphism10±12 . The effects of positive selection may
also be obscured by slightly deleterious amino-acid mutations
that in¯ate the A/S ratio of polymorphism but not divergence.
The effects of slightly deleterious mutations can be removed by
comparing common polymorphism with divergence, because dele-
terious amino-acid mutations are kept at low frequency in the
population4. This can only be done when the data from a large
number of genes are combined; individual genes rarely contain
more than a few common amino-acid polymorphisms.
An important but rarely appreciated assumption of the

McDonald±Kreitman test is that the selective constraint on a gene
remains constant over time. The selective constraint on a gene is
determined by the proportion of amino-acid mutations that are
deleterious3, 2Ns , -1, so both a change in the selection coef®cient
(s) and a change in effective population size (N) can result in a
change in selective constraint. Although it is well known that
selective constraint is not static across phylogenetic lineages13,14,
this assumption is rarely justi®ed in applications of the McDonald±
Kreitman test. Whereas the strength of selection on each gene might
¯uctuate over time depending on the genetic or environmental
background, a genome-wide change in constraint, such as that
caused by a change in effective population size, should produce a
consistent increase or decrease in the A/S ratio across all genes.
Alternatively, under positive selection each gene might be affected
to a different degree and some genes might not be affected at all.
To compare genomic patterns of amino-acid and synonymous

Table 1 Polymorphisms in D. melanogaster and divergence from D. simulans

Gene* Class Amino-acid Synonymous A/S
polymorphism, A polymorphism, S

………………………………………………………………………………………………………………………………………………………..
X-linked Rare (#12.5%) 4 67 0.06

Common (.12.5%) 6 46 0.13
Divergence 42 189 0.22

Autosomal Rare 79 126 0.63
Common 44 118 0.37
Divergence 421 521 0.81

………………………………………………………………………………………………………………………………………………………..
* There are 5 X-linked and 31 autosomal genes with a sample size of eight or greater (see text for the
data from all 45 genes).

1024 © 2002 Macmillan Magazines Ltd NATURE | VOL 415 | 28 FEBRUARY 2002 | www.nature.com

letters to nature

Table 2 African and non-African common polymorphism and divergence

Class Population Amino-acid Synonymous A/S
polymorphism, A polymorphism, S

………………………………………………………………………………………………………………………………………………………..
Polymorphism Non-African 48 124 0.39

African 40 159 0.25
Divergence 413 663 0.62
………………………………………………………………………………………………………………………………………………………..

site evolution, we tabulated polymorphism in D. melanogaster and
divergence from D. simulans from 45 gene surveys (Methods). If all
amino-acid and synonymous variation is neutral, then the A/S ratio
of polymorphism and divergence should be constant. The A/S ratio
of divergence (598/950 = 0.63) is signi®cantly greater than that of
common polymorphism (65/224 = 0.29; P , 10 -6). We compared
divergence with the common rather than the total polymorphism
because deleterious mutations at low frequency in¯ate the A/S ratio
of polymorphism. For the 36 genes with sample sizes of eight or
greater, there is a signi®cant excess of rare over common amino-acid
variation in autosomal genes (P = 0.022; Table 1), as is observed in
humans4. The absence of a difference in X-linked genes suggests that
the deleterious mutations are partially recessive and are more
readily eliminated from the X chromosome.
Both positive selection and an increase in selective constraint on

amino-acid changes can produce a higher A/S ratio of divergence
than of polymorphism. But only under certain restrictive conditions
is a genome-wide change in constraint possible. One such condition
is an increase in effective population size that is neither too distant
nor too recent in the evolutionary past. If this possibility can be
ruled out, positive selection may be the only viable explanation for
the high rate of amino-acid divergence.
If an increase in selective constraint resulted from a population

size increase associated with the spread of D. melanogaster outside
Africa15, it might be more appropriate to compare the A/S ratio of
the African population with that of divergence. Table 2, which
includes the 32 genes for which both African and non-African
populations were surveyed, shows that there is a signi®cantly larger
A/S ratio of divergence than of polymorphism in either population.
If a recent increase in effective population size increased constraint
on amino-acid polymorphism in both African and non-African
populations, then patterns of synonymous polymorphism might be
skewed towards rare variants. Neither African or non-African
populations show this pattern16. Finally, if there has been a decrease
in effective population size along the D. melanogaster lineage17,18, the
A/S ratio of polymorphism should be greater than that of divergence
between the two species.

12

10

8

6

4

2

0

–12 –8 –4 0 4 8 >10

ka < 0.02

ka > 0.02

Excess of amino-acid divergence

N
u
m

b
e
r

o
f

g
e
n
e
s

Figure 1 The distribution of the excess of amino-acid divergence contributed by each
gene. For reference, fast and slowly evolving genes are denoted by a rate of amino-acid
substitution (ka) greater than (®lled bars) or less than (open bars) 2%.

Table 3 Polymorphism and divergence in neutral and fast genes

Genes* Class Amino-acid Synonymous A/S
polymorphism, A polymorphism, S

………………………………………………………………………………………………………………………………………………………..
Neutral Rare 31 90 0.34

Common 16 69 0.23
Divergence 65 247 0.26

Fast Rare 48 36 1.33
Common 28 49 0.57
Divergence 356 274 1.30

………………………………………………………………………………………………………………………………………………………..
*X-linked genes are excluded.

If an increase in effective population size has produced a genome-
wide increase in selective constraint, the A/S ratio of all genes should
be affected. In Fig. 1, the distribution of each gene’s contribution to
the excess of amino-acid divergence suggests that there are two
classes of gene: neutral and rapidly evolving. The neutral class
comprises 34 genes that deviate by less than 10 amino-acid sub-
stitutions from that expected on the basis of the A/S ratio of all
common polymorphism. The remaining 11 genes all have a higher
A/S ratio of divergence than of polymorphism, and account for the
whole difference in the A/S ratio of polymorphism and divergence.
These genes are Acp26Aa, Acp29Ab, anon1A3, anon1E9, anon1G5, ci,
est-6, Ref2P, Rel, tra and Zw. As expected under positive selection,
which increases the rate of protein evolution, these 11 genes have a
high rate of amino-acid substitution (Fig. 1).
Can the pattern in Fig. 1 be explained by selection or demogra-

phy? Table 3 shows that, in the rapidly evolving genes, the A/S ratios
of divergence and of rare polymorphism are much higher than the
A/S ratio of the common polymorphism. This is expected if the
genes are under positive selection. Although a large increase in
population size in the recent past could account for the difference
between the A/S ratio of divergence and that of common poly-
morphism, this explanation is incompatible with the very small
difference found in the 26 neutral genes. Because both the neutral
and rapidly evolving genes have a higher A/S ratio of rare poly-
morphism than of common polymorphism, both should have been
affected by an increase in effective population size.
If positive selection is common, other species should also have an

A/S ratio of divergence greater than that of polymorphism. In
addition, any demographic scheme is not likely to be shared by
several species. In a study of eight genes in D. simulans, Drosophila
mauritiana and Drosophila sechellia, the A/S ratio of polymorphism
(A/S = 32/183) is 34% that of divergence (28/55)19. In a study of 42
genes with polymorphism in both D. melanogaster and D. simulans,
the A/S ratio of polymorphism is 65% that of divergence (N. G. C.
Smith and A. Eyre-Walker, personal communication). In another
study of 23 genes, the A/S ratio of polymorphism (45/305) is 30%
that of divergence along the D. simulans lineage (65/133)20. In
humans, the A/S ratio of common polymorphism (70/122) found
in 181 genes is 65% that of divergence (3,660/4,151) found in a
different set of 182 human and Old World monkey genes4.
Although these genomic patterns of variation are not explained

easily by the neutral theory, slightly deleterious mutations must
clearly be accounted for in attempting to measure positive selection.
In humans, 38% of amino-acid polymorphism was estimated to be
slightly deleterious4, and in D. melanogaster the estimate is 26%,
(0.63 – 0.37) ́ 126/123, from the combined neutral and rapidly
evolving genes (Table 3). These slightly deleterious mutations,
which are emphasized by the nearly neutral theory21, could
become effectively neutral and ®xed during a population bottleneck
of suf®cient severity, providing a burst of amino-acid substitutions
and an increase in the A/S ratio of divergence. We control for the
impact of these slightly deleterious mutations by comparing the
rapidly evolving class of gene to the neutral class (Fig. 1, Table 3).
Additional genomic data from other species will be needed to
estimate the general impact of these slightly deleterious mutations
on protein evolution. M

NATURE | VOL 415 | 28 FEBRUARY 2002 | www.nature.com © 2002 Macmillan Magazines Ltd 1025

letters to nature

Methods
Data
A literature search yielded 45 genes for which polymorphism had been surveyed in
D. melanogaster and for which an outgroup sequence was available. Of these, 36 had a
sample size of eight or greater, 32 had been surveyed in at least two African and two non-
African individuals and 10 were of X-linked genes. The 45 genes and their references are
listed in Supplementary Information.

Analysis
Polymorphism data was tabulated by hand or from GenBank accession numbers using
SITES21 or DNASP22. For each polymorphic site, the minor allele was classi®ed as rare
(# 12.5%) or common (. 12.5%). The cutoff of 12.5% was chosen to exclude deleterious
mutations from the common frequency class and to include those genes with samples of
eight or more in the analysis of rare compared to common polymorphism. Cutoffs of 10
and 15% produce similar results. We treated three alleles segregating at a single nucleotide
as two segregating sites and excluded complex variations. Divergence data was obtained by
comparing a randomly chosen sequence of D. melanogaster with that of D. simulans or, if
unavailable, either D. mauritiana or D. sechellia. The number of amino-acid and
synonymous substitutions between species was estimated using Kimura’s two-parameter
model to correct for multiple hits.

The contribution of each gene to the excess number of amino-acid substitutions was
calculated as the excess number of amino-acid substitutions minus the excess number of
amino-acid polymorphisms found in each gene. The excess for polymorphism and
divergence is A – S ́ (65/224), where A and S are the number of amino-acid and
synonymous substitutions, respectively, and 65/224 is the total number of amino-acid
polymorphisms divided by synonymous polymorphisms. (Ideally, the excess of amino-
acid divergence in each gene should be calculated using only polymorphism and
divergence in that gene but there is rarely suf®cient polymorphism in a single gene for
comparison with divergence.) We also calculated the contribution to the excess separately
for three groups of genes sorted by their rate of amino-acid divergence. The two methods
produced a similar distribution so the simpler method using a single group of genes was
used.

Received 27 June; accepted 4 December 2001.

1. Nei, M. Molecular Evolutionary Genetics (Columbia Univ. Press, New York, 1987).
2. McDonald, J. H. & Kreitman, M. Adaptive protein evolution at the Adh locus in Drosophila. Nature

351, 652±654 (1991).
3. Kimura, M. The Neutral Theory of Molecular Evolution (Cambridge Univ. Press, Cambridge, 1983).
4. Fay, J. C., Wyckoff, G. J. & Wu, C.-I. Positive and negative selection on the human genome. Genetics

158, 1227±1234 (2001).
5. Kimura, M. Preponderance of synonymous changes as evidence for the neutral theory of molecular

evolution. Nature 267, 275±276 (1977).
6. Yang, Z. & Bielawski, J. P. Statistical methods for detecting molecular adaptation. Trends Ecol. Evol. 15,

496±503 (2000).
7. Wyckoff, G. J., Wang, W. & Wu, C.-I. Rapid evolution of male reproductive genes in the descent of

man. Nature 403, 304±309 (2000).
8. Weinreich, D. M. & Rand, D. M. Contrasting patterns of nonneutral evolution in proteins encoded in

nuclear and mitochondrial genomes. Genetics 156, 385±399 (2000).
9. Moriyama, E. N. & Powell, J. R. Intraspeci®c nuclear DNA variation in Drosophila. Mol. Biol. Evol. 13,

261±277 (1996).
10. Eanes, W. F., Kirchner, M. & Yoon, J. Evidence for adaptive evolution of the G6pd gene in the

Drosophila melanogaster and Drosophila simulans lineages. Proc. Natl Acad. Sci. USA 90, 7475±7479
(1993).

11. Begun, D. J. & Whitley, P. Adaptive evolution of relish, a Drosophila NF-kB/IkB protein. Genetics 154,
1231±1238 (2000).

12. Tsaur, S. C., Ting, C. T. & Wu, C. I. Positive selection driving the evolution of a gene of male
reproduction, Acp26Aa, of Drosophila: II. Divergence versus polymorphism. Mol. Biol. Evol. 15, 1040±
1046 (1998).

13. Langley, C. H. & Fitch, W. M. An examination of the constancy of the rate of molecular evolution.
J. Mol. Evol. 3, 161±177 (1974).

14. Ohta, T. Synonymous and nonsynonymous substitutions in mammalian genes and the nearly neutral
theory. J. Mol. Evol. 40, 56±63 (1995).

15. Lachaise, D. M., Cariou, M.-L., David, J. R., Lemeunier, F. & Tsacas, L. The origin and dispersal of the
Drosophila melanogaster subgroup: a speculative paleogeographic essay. Evol. Biol. 22, 159±225
(1988).

16. Andolfatto, P. Contrasting patterns of X-linked and autosomal nucleotide variation in Drosophila
melanogaster and Drosophila simulans. Mol. Biol. Evol. 18, 279±290 (2001).

17. Akashi, H. Codon bias evolution in Drosophila: Population genetics of mutation-selection drift. Gene
205, 269±278 (1997).

18. McVean, G. A., Vieira, J. Inferring parameters of mutation, selection and demography from patterns
of synonymous site evolution in Drosophila. Genetics 157, 245±257 (2001).

19. Kliman, R. M. et al. The population genetics of the origin and divergence of the Drosophila simulans
complex species. Genetics 156, 1913±1931 (2000).

20. Begun, D. J. The frequency distribution of nucleotide variation in Drosophila simulans. Mol. Biol. Evol.
18, 1343±1352 (2001).

21. Ohta, T. Slightly deleterious mutant substitutions during evolution. Nature 246, 96±98 (1973).
22. Hey, J. & Wakeley, J. A coalescent estimator of the population recombination rate. Genetics 145, 833±

846 (1997).
23. Rozas, J. & Rozas, R. DnaSP version 3: an integrated program for molecular population genetics and

molecular evolution analysis. Bioinformatics 15, 174±175 (1999).

Supplementary Information accompanies the paper on Nature’s website
(http://www.nature.com).

Acknowledgements
This work was supported by grants from the NIH and NSF to C.-I.W. and a Genetics
Training Grant and a Department of Education PhD fellowship to J.C.F.

Competing interests statement
The authors declare that they have no competing ®nancial interests.

Correspondence and requests for materials should be addressed to J.C.F.
(e-mail: jcfay@lbl.gov).

………………………………………………………..
Brain potential and functional MRI
evidence for how to handle two
languages with one brain
Antoni Rodriguez-Fornells*, Michael Rotte², Hans-Jochen Heinze²,
ToÈmme NoÈsselt² & Thomas F. MuÈ nte*

* Department of Neuropsychology, Otto von Guericke University,
UniversitaÈtsplatz 2, GebaÈude 24, 39106 Magdeburg, Germany
² Klinik fuÈr Neurologie 2, Otto von Guericke University, Leipzigerstrasse 44,
39120 Magdeburg, Germany

…………………………………………………………………………………………………………………………….

Bilingual individuals need effective mechanisms to prevent inter-
ference from one language while processing material in the other1.
Here we show, using event-related brain potentials and functional
magnetic resonance imaging (fMRI), that words from the non-
target language are rejected at an early stage before semantic
analysis in bilinguals. Bilingual Spanish/Catalan and monolingual
Spanish subjects were instructed to press a button when presented
with words in one language, while ignoring words in the other
language and pseudowords. The brain potentials of bilingual
subjects in response to words of the non-target language were
not sensitive to word frequency, indicating that the meaning of
non-target words was not accessed in bilinguals. The fMRI
activation patterns of bilinguals included a number of areas
previously implicated in phonological and pseudoword process-
ing2±5 , suggesting that bilinguals use an indirect phonological
access route to the lexicon of the target language to avoid
interference6.
High-pro®ciency bilingual subjects manage to understand and

speak one of their languages without apparent interference from the
other. This is a remarkable ability in the face of the fact that neuro-
imaging studies have revealed, at least for high-pro®ciency bilin-
guals, that neuro-anatomical representations of both languages are

Monolinguals

–2 m V

400 800 1,200 ms

Bilinguals

Spanish Catalan Pseudo

Figure 1 Lateralized readiness potentials (LRPs) from the main experiment indicating the
preparation of motor responses. The onset latency of the LRP to Spanish words, estimated
by the time at which the amplitude was signi®cantly different from zero for at least 4
consecutive time points (sequential t-tests)14, was 408 ms in the monolingual and 520 ms
in the bilingual group. No LRP activity is observed for Catalan words, indicating an
effective blocking of `word’ (go) responses in the bilingual group.

1026 © 2002 Macmillan Magazines Ltd NATURE | VOL 415 | 28 FEBRUARY 2002 | www.nature.com

Ecology homework help

Copyright © 1998 by the Genetics Society of America

Evidence for Genetic Hitchhiking Effect Associated With Insecticide
Resistance in Aedes aegypti

Guiyun Yan,* Dave D. Chadee† and David W. Severson*,1

* Department of Animal Health and Biomedical Sciences, University of Wisconsin, Madison, Wisconsin 53706 and
† Insect Vector Control Division, Ministry of Health, St. Joseph, Trinidad and Tobago, West Indies

Manuscript received April 3, 1997
Accepted for publication October 29, 1997

A B S T R A C T
Information on genetic variation within and between populations is critical for understanding the evo-

lutionary history of mosquito populations and disease epidemiology. Previous studies with Drosophila sug-
gest that genetic variation of selectively neutral loci in a large fraction of genome may be constrained by
fixation of advantageous mutations associated with hitchhiking effect. This study examined restriction frag-
ment length polymorphisms of four natural Aedes aegypti mosquito populations from Trinidad and Tobago,
at 16 loci. These populations have been subjected to organophosphate (OP) insecticide treatments for
more than two decades, while dichlor-diphenyltrichlor (DDT) was the insecticide of choice prior to this pe-
riod. We predicted that genes closely linked to the OP target loci would exhibit reduced genetic variation
as a result of the hitchhiking effect associated with intensive OP insecticide selection. We also predicted
that genetic variability of the genes conferring resistance to DDT and loci near the target site would be sim-
ilar to other unlinked loci. As predicted, reduced genetic variation was found for loci in the general chro-
mosomal region of a putative OP target site, and these loci generally exhibited larger FST values than other
random loci. In contrast, the gene conferring resistance to DDT and its linked loci show polymorphisms
and genetic differentiation similar to other random loci. The reduced genetic variability and apparent
gene deletion in some regions of chromosome 1 likely reflect the hitchhiking effect associated with OP in-
secticide selection.

MOSQUITOES are important vectors for several human pathogens because of their close associa-
tion with humans. Mosquito habitats often change rap-
idly as a result of vector control efforts; therefore,
successful adaptation to varying human habitats is es-
sential for mosquito reproduction. Adaptation ability
of an organism depends on its genetic variability. Infor-
mation on genetic variation within and between popu-
lations is critical for understanding the evolutionary
history of mosquito populations and disease epidemiol-
ogy (Tabachnick and Black 1996). Protein electro-
phoresis and DNA sequence analyses have revealed
remarkable variation in many genes in natural popula-
tions of Drosophila and other species, but the genetic
variability seems to differ substantially for genes in dif-
ferent genome regions (Aquadro 1992). Distribution
patterns of genetic variants in natural populations are
the joint effects of various evolutionary forces and de-
mographic factors, including random genetic drift, se-
lection, recombination, mutation, gene flow, mating
system and life history (e.g., colonization, range expan-
sions or contractions; Slatkin 1985). Population life

Corresponding author: Guiyun Yan, Department of Biological Sci-
ences, State University of New York, 109 Cooke Hall, Buffalo, NY
14260. E-mail: gyan@calshp.cals.wisc.edu

1Present address: Department of Biological Sciences, State Univer-
sity of Notre Dame, Notre Dame, IN 46556.

history and mating structure influence all loci equally,
but selection affects only the target loci (Kreitman and
Akashi 1995). Variation of selectively neutral loci may
also be constrained by the hitchhiking effect, particu-
larly in genome regions with low recombination rates
and under extensive selection (Maynard Smith and
Haigh 1974). Recent studies with D. melanogaster sug-
gest that the hitchhiking effect may have occurred over
a large fraction of the insect genome (Begun and
Aquadro 1992). In this study we analyzed restriction
site variation of 16 loci in natural populations of the
yellow fever mosquitoes, Aedes aegypti, and provide evi-
dence that the hitchhiking effect may have reduced ge-
netic variation in the genome regions around a puta-
tive insecticide resistance locus.

A. aegypti is an important vector of yellow fever and
dengue fever viruses in many tropical countries, includ-
ing Trinidad and Tobago, West Indies. Control efforts
for A. aegypti have focused primarily on habitat reduc-
tion and chemical treatment, which is based on the de-
struction of breeding sites and the use of insecticides,
including dichlor-diphenyltrichlor (DDT) in the 1950s
and several organophosphates (OP) since the 1960s.
The wide use of insecticides has been a powerful selec-
tion agent, and rapid development of resistance to
DDT and OPs is well documented (Gilkes et al. 1956;
Rawlins and Wan 1995). The genetic mechanisms of
insect resistance to various insecticides have been well

Genetics 148: 793–800 (February, 1998)

794 G. Yan et al.

characterized. For example, a point mutation in the
para sodium channel gene confers one form of resis-
tance to DDT (Williamson et al. 1996), and esterase
(EST ) gene amplification is associated with resistance
to OPs in Culex mosquitoes (Mouchès et al. 1990). A
genetic linkage map, based largely on random cDNA
sequences, has been constructed for A. aegypti (Sever-
son et al. 1993), and several insecticide resistance genes
have been mapped (Severson et al. 1997). In this study,
we used cDNA markers distributed across the mosquito
genome to examine DNA polymorphism and popula-
tion genetic differentiation, and to examine mosquito
genome structural changes associated with strong selec-
tion imposed by insecticides. Several studies have
investigated the genetic variation of various genera of
mosquito populations with isozyme, RAPD-DNA, mic-
rosatellite and mitochondrial DNA markers (Powell et
al. 1980; Tabachnick and Wallis 1985; Conn et al.
1993; Chevillon et al. 1995; Apostol et al. 1996). Re-
striction fragment length polymorphism (RFLP) mark-
ers are particularly suitable for population genetic stud-
ies, because they are presumably neutral, highly
polymorphic, segregate as codominant markers, and
can be used for studies of other mosquito species (Sev-
erson et al. 1994a). We chose Trinidad and Tobago
populations because the population history is known
and surveillance programs have been well established
there. Population historical information is important
for the interpretation of genetic data. Because the mos-
quito populations have been under selection of OP in-
secticides, genetic variation of loci closely linked to an
esterase locus conferring resistance would be reduced
if the hitchhiking effect has occurred. The hitchhiking
effect would be prominent in genome regions with
strong linkage disequilibrium and intense selection
(e.g., insecticides). In contrast, gene diversity at the
para locus and other neighboring loci is expected to be
similar to unlinked loci in the genome, because selec-
tion pressure has been removed at the para locus since
DDT was abandoned more than two decades ago.

M AT E R I A L S A N D M E T H O D S

Natural history of A. aegypti in Trinidad and Tobago: It is gen-
erally believed that domestic A. aegypti originated from an
African sylvan ancestor, and was introduced to the New World
from West Africa via transoceanic trade during the fifteenth
to seventeenth centuries (Tabachnick 1991). Caribbean pop-
ulations probably represent the initial introduction of the
mosquito species into the New World in the course of New
World colonization. The first outbreak of yellow fever in Trin-
idad was recorded in 1796, and in 1820 in Tobago.

In the 1950s, intensive vector control programs aimed to-
ward mosquito eradication were adopted, primarily by the
widespread usage of DDT. In the early 1960s, Trinidad was
considered free of A. aegypti, but was reinfested in 1962. Over-
all, mosquito populations in Tobago have been exposed to
fewer insecticides than Trinidad populations. A. aegypti is now
widely distributed in Trinidad and Tobago, despite continued

intensive vector control efforts through the use of OP insecti-
cides. Insecticide applications not only impose strong selec-
tion on the target loci, but also lead to recurrent reductions
of population sizes.

Collection of samples: In conjunction with the A. aegypti
surveillance program, in April, 1995, we collected three geo-
graphically-distinct samples from Trinidad and one sample
from Tobago (Figure 1). These four villages share similar cli-
mates, including temperature and the annual amount of rain-
fall. For each village, 100 ovitraps were distributed, and about
half of the village’s residential area was covered (approxi-
mately two traps every five houses). Each ovitrap consisted of
a black plastic container roughly half-filled with water into
which a rectangular masonite strip was placed in an upright
manner. Female A. aegypti mosquitoes will readily oviposit on
the masonite strip, near the water interface. After 2–3 days,
the masonite strips were removed, transported to the labora-
tory, where attached eggs were allowed to hatch, and reared
into adults. All adults were identified as A. aegypti by micro-
scopic examination, and were frozen for subsequent DNA
analysis. Previous studies with A. aegypti in Puerto Rico suggest
that the mean number of families represented per ovitrap was
4.7 (e.g., several female mosquitoes frequently oviposit in the
same container; Apostol et al. 1994). Therefore, it is unlikely
that siblings within a subpopulation would be sampled.

RFLP and probe selection: We genotyped a total of 870
mosquitoes for four populations (n 5 150 for Curepe, 262 for
Couva, 258 for San Fernando, and 200 for Tobago). DNA ex-
traction from individual mosquitoes, digestion with EcoRI,
Southern blotting and hybridization were as previously de-
scribed (Severson et al. 1993). Fifteen mapped RFLP markers
were selected to provide broad coverage of the A. aegypti ge-
nome with an average resolution of 10.6 cM (Figure 2). All
clones used were random cDNA clones with the exception of
the para and Mal I clones. Mal I is a gene specifically expressed
in the salivary glands, and its putative function is related to

Figure 1.—Map of Trinidad and Tobago. Three samples
were collected from Trinidad: Curepe (10838.62’N,
61824.23’W), Couva (10826.12’N, 61828.19’W), and San
Fernando (10818.11’N, 61828.21’W). One sample was col-
lected from Tobago (11811.23’N, 60844.21’W).

795 Hitchhiking Effect in Mosquitoes

Figure 2.—Relative map positions of the 16 Aedes aegypti
(2N 5 6) RFLP loci used in the study. Chromosome numbers
are in italics. Map distances are in Kosambi centimorgans.
Underlined loci were not used in the study. Esterase gene
amplification is involved with resistance to organophosphate
insecticides in mosquitoes (Mouchès et al. 1990). Three
esterase loci were mapped to the general chromosomal loca-
tions as shown in the figure (Munstermann 1990). LF250
represents duplicated loci. Markers in italics are genes with
known functions; other markers are random cDNA.

sugar metabolism ( James et al. 1989). The mosquito genome
consists of single or low-copy DNA sequences and repetitive
DNA with short-period interspersion (Black and Rai 1988).
In this study, we focused on allelic variations of single- or low-
copy cDNA sequences.

Data analysis: DNA polymorphism and Hardy-Weinberg equilib-
rium (HWE) tests: Molecular weights of fragments detected by
each clone were estimated by comparing them to lambda-
HindIII digest standards included on each gel, using the Eagle
Sight image capture and analysis software (Stratagene, La
Jolla, CA). DNA polymorphisms may be measured by the
proportion of polymorphic loci, number of alleles, and het-
erozygosity. Conformance with HWE was tested using the
probability test for each locus and each population, using
the GENEPOP computer program (Raymond and Rousset
1995). Because this test is robust to allele frequencies, rare al-
leles were not pooled. We further tested whether distortion
from HWE resulted from deficient or excessive heterozygos-
ity, using the FIS statistics (Weir 1990; Rousset and Raymond
1995). FIS is defined as [1 2 (observed heterozygosity/ex-
pected heterozygosity from HWE)]. Because FIS estimates at
individual loci may be unduly influenced by rare alleles, we
tested the significance of the average FIS over all loci using the
method of Robertson and Hill (1984). Variations in het-
erozygosity among the populations were analyzed following
the method of Weir (1990), using the analysis of variance
(ANOVA) with subpopulations, individuals, loci and inter-
actions of loci and individuals as factors. All factors were
treated as random effects except loci.

Population genetic structure, gene flow and genetic distance:
Population genetic structure was examined with Wright’s
F-statistics, based on the procedure of Weir and Cockerham
(1984) and using the FSTAT computer program (Goudet
1995). Standard deviations (SD) of F-statistics were obtained

for each locus by a jackknife procedure over the alleles, and
were used to test the significance of the F statistics. We first
tested whether the three populations from Trinidad were sig-
nificantly substructured, then included the Tobago popula-
tion data in the analysis.

Gene flow (Nm) was estimated from the standardized-
among-population genetic variance (FST) estimate of each lo-
cus using the relationship Nm 5 (1/FST 2 1)/4, where N is the
effective population size of a deme, and m is the rate of gene
flow (Wright 1943). This equation assumes the infinite-is-
land model of population structure and gene flow. Few popu-
lations probably conform to this assumption, but it provides a
useful approximation of the relative magnitude of gene flow.
Gene flow was also estimated using the private-alleles method
for the appropriate loci (Slatkin and Barton 1989). Private
alleles are the alleles unique to a given deme. Nei’s unbaised
genetic distance for all pairs of populations was calculated
based on population allele frequency for all loci (Nei 1987).

R E S U LT S

DNA polymorphisms and HWE tests: Fifteen cDNA
markers examined in this study were all polymorphic.
The RFLP patterns of one marker (LF250) indicate
that this marker represents a gene duplication (data
not shown), and therefore, the 15 markers represented
a total of 16 loci. A total of 91 unique alleles were iden-
tified, 68 alleles (74.7%) were common to all four pop-
ulations. The average number of alleles was about five
per locus (Table 1). Six loci (LF198, ARC1, LF250a,
para, LF168 and Mal I) exhibited private alleles, and
five private alleles were in the Tobago population. An
excess of rare alleles was found: 16 alleles (18%) had a
frequency less than 0.05. Under the infinite alleles
model (equation 8.24; Kimura 1983), we expected to
find only four alleles in this frequency class with our
sample size (n 5 870). The mean sizes of restriction
fragments detected by the cDNA clones weighted by
their frequencies ranged from 0.73 kb at locus LF250b to
12.90 kb at locus LF352, and exhibited an overall mean
of 5.13 kb (95% confidence interval: 3.61–6.64 kb).

In general, high heterozygosity was observed in all
four mosquito populations, except at the LF90 locus. The
LF90 locus showed significantly lower heterozygosity
than the other 15 loci examined (Table 1; ANOVA, t 5
8.37, d.f. 5 1, P , 0.0001). The most heterozygous loci
were LF178 on chromosome 1 and LF282 on chromo-
some 2. The high heterozygosity at the LF178 locus does
not seem to be a result of sex linkage (see Figure 2), be-
cause males and females showed similar heterozygosity
(data not shown). Population average heterozygosity over
all 16 loci varied little among populations (ranged from
0.582 for Couva to 0.627 for Tobago), and such variations
were not statistically significant (Table 1; ANOVA, F 5
1.08, d.f. 5 3, 49, P . 0.05). Heterozygosity is not corre-
lated with the mean size of restriction fragments
weighted by frequencies at a locus (r 5 0.22, d.f. 5 15,
P . 0.05), but seems to correlate with the number of ob-
served alleles (r 5 0.49, d.f. 515, P 5 0.052).

The genotype frequencies at several loci did not con-

796 G. Yan et al.

TABLE 1

RFLP polymorphisms of four Aedes aegypti populations from Trinidad and Tobago, measured by
observed heterozygosity and the number of alleles

Curepe Couva San Fernando Tobago

Chromosome Locus n Hobs
aFIS n Hobs FIS n Hobs FIS n Hobs FIS

1 LF90 5 0.122 20.036 3 0.159 20.076 5 0.256 0.160* 3 0.391 0.244***
LF230b 3 0.250 0.600*** 3 0.069 0.880*** 3 0.129 0.774*** 3 0.296 0.535***
LF198 6 0.655 0.096 6 0.504 0.100** 6 0.775 0.029 7 0.738 20.120**
LF178 6 0.761 0.012 6 0.849 20.075* 6 0.851 20.100*** 6 0.828 20.120**
TY7 5 0.503 0.160* 5 0.546 0.151* 5 0.543 20.001 5 0.656 0.060*

Average over
chromosome 1 5.5 0.510 0.059 5.0 0.515 0.025 5.5 0.606 0.022 5.3 0.654 0.016

2 ARC1 5 0.750 20.046 5 0.724 0.015 5 0.702 20.038 6 0.725 0.010
LF138 4 0.729 20.083 4 0.714 20.201** 4 0.492 20.075 4 0.487 0.145**
LF282 9 0.838 20.009 7 0.795 0.025 8 0.864 20.060* 7 0.793 20.102*
LF98 5 0.642 0.092 5 0.654 20.035 6 0.682 20.073 6 0.878 20.068*
LF250a 4 0.615 0.060 3 0.596 20.032 4 0.800 20.154* 3 0.562 20.183***
LF250b 3 0.644 20.009 3 0.687 20.038 3 0.662 20.238*** 3 0.557 20.160*
LF115 5 0.514 20.031 5 0.366 20.016 5 0.566 20.101 4 0.562 20.077

Average over
chromosome 2 5.0 0.676 20.003 4.6 0.648 20.040 5.0 0.681 20.105 4.7 0.652 20.062

3 LF352 6 0.548 0.268** 6 0.727 0.113** 6 0.441 0.292*** 6 0.550 0.263***
LF261 4 0.483 0.035 4 0.279 20.061 4 0.476 20.058 3 0.592 0.008
para 4 0.469 0.152 4 0.563 0.024 5 0.598 0.081 6 0.555 0.103
LF168 6 0.667 0.042** 7 0.516 0.145** 7 0.667 0.109 6 0.582 20.043
MalI 4 0.627 0.103** 4 0.637 0.009 3 0.640 0.034 4 0.577 0.137*

Average over
chromosome 3 4.8 0.559 0.119 5.0 0.544 0.046 5.0 0.564 0.092 5.0 0.571 0.093

Average over all loci 5.1 0.598 0.050 4.8 0.582 0.003 5.1 0.626 20.012 4.9 0.627 0.006

*P , 0.05, ** P , 0.01, ***P , 0.001. Hobs, observed heterozygosity; n, number of alleles.
a Significant FIS also indicates distortion from HWE. Positive FIS indicates heterozygosity deficit from HWE expectation; negative

FIS indicates excess of heterozygosity.
b A large proportion of individuals show an apparent gene deletion at the LF230 locus, therefore, this locus was not used for

chromosomal average heterozygosity calculation. The heterozygosity and FIS were based on the individuals without deletions. The
percentage of individuals showing the gene deletion at the LF230 locus was 41.4 for the Curepe population, 58.6 for Couva, 53.7
for San Fernando, and 46.0 for Tobago.

form to HWE. Loci on chromosome 2 generally exhib-
ited a heterozygote excess, but loci on chromosome 3
that showed HWE distortion exhibited a heterozygote
deficit (Table 1). The FIS values varied greatly among
the loci, suggesting no systematic inbreeding occurred
in these populations. The average FIS over all loci was
not significantly different from 0 for each population
(Table 1). Departure from HWE probably reflects
either the effect of insecticide selection on some loci
linked to the resistance loci, or simply sampling error.

Population genetic structure, gene flow and genetic
distance: Analysis of F statistics for the three Trinidad
populations found small, but statistically significant FST
estimates for all loci (Table 2), suggesting that these
populations are genetically differentiated. FST estimates
showed a six-fold difference among loci, with an aver-
age FST over all loci of 0.043. When the Tobago popula-
tion was included in the analysis, the basic pattern of

estimation among the loci was not altered (Table FST

2). As expected, slightly larger FST values were obtained
for most loci, and the average FST over all loci was
0.056. The para locus exhibited similar polymorphism
and genetic differentiation as other random loci.

Assuming that the populations are at an equilibrium
between migration and random drift, the average num-
ber of migrants exchanged per generation can be cal-
culated. Average gene flow (Nm) among the four popu-
lations, based on the FST method, was 4.2 migrants per
generation (95% confidence interval: 3.2–5.7). This es-
timate was similar to the estimate based on the average
frequency of six private alleles present in the popula-
tions (Nm 5 4.5). Table 3 shows genetic distances and
gene flow between each pair of populations calculated
from the pair-wise average FST. A large gene flow be-
tween the Tobago and Trinidad populations was de-
tected. There was no significant correlation between ge-
netic distance and geographic distance (r2 5 0.5, d.f. 5
5, P . 0.05).

797 Hitchhiking Effect in Mosquitoes

TABLE 2

FST statistics and Nm estimates of four Aedes aegypti populations of Trinidad and Tobago

FST Nm estimates of all populations
a

Chromosome Locus
Trinidad

populationsb
Trinidad and Tobago

populationsa
Based on

FST
Based on the

private-alleles method

1

2

3

LF90
LF198
LF178
TY7
ARC1
LF138
LF282
LF98
LF250a
LF250b
LF115
LF352
LF261
para
LF168
MalI

0.053 6 0.022
0.121 6 0.062
0.011 6 0.006
0.033 6 0.022
0.064 6 0.046
0.044 6 0.032
0.019 6 0.011
0.042 6 0.012
0.081 6 0.053
0.106 6 0.067
0.021 6 0.012
0.119 6 0.059
0.038 6 0.034
0.039 6 0.025
0.020 6 0.013
0.034 6 0.019

0.107 6 0.102
0.109 6 0.046
0.025 6 0.013
0.034 6 0.011
0.049 6 0.034
0.030 6 0.022
0.041 6 0.027
0.061 6 0.021
0.099 6 0.045
0.110 6 0.041
0.055 6 0.039
0.079 6 0.037
0.102 6 0.060
0.035 6 0.017
0.040 6 0.021
0.020 6 0.015

2.1
2.0
9.8
7.1
4.9
8.1
5.9
3.9
2.3
2.0
4.3
2.9
2.2
6.9
6.0

12.3

—c
3.5


177.6



0.8





70.9
1.7
9.2

Summary over 16 loci 0.043 6 0.007 0.056 6 0.008 4.2 4.5

Values are 6 SD.
All FST values were significantly larger than 0 at P , 0.001. The test was performed using a jackknifing proce-

dure over samples.
a n 5 4.
b n 5 3.
c The estimate was not available because no private alleles existed for the locus.

Hitchhiking effect on DNA polymorphisms: Gene-
tic hitchhiking occurs when a (neutral) mutation
changes frequency through genetic linkage to a muta-
tion that is selected, resulting in reduced genetic varia-
tion surrounding the target site of selection. Low DNA
polymorphism at the LF90 locus suggests that hitchhik-
ing has probably occurred in the genome region of the
EST-4 locus. We collected additional evidence to test for
this hypothesis by examining genetic polymorphism at
the LF230 locus, which also is in the general genomic
region of EST-4 (Figure 2). An apparent chromosomal
deletion event occurred around this locus in 42–59%
of the individuals (Figure 3), and low heterozygosity
(0.07–0.29) was observed among those individuals with-
out the apparent deletion (Table 1). However, substan-
tial reduction of heterozygosity for the para locus and
other loci in the vicinity of para was not observed (Ta-
ble 1).

D I S C U S S I O N

DNA polymorphisms of four A. aegypti mosquito
populations were examined using RFLP markers. The
Trinidad populations have been exposed to OPs every
3–4 months for about two decades. These populations
have therefore experienced intense selection by insecti-
cides, that probably resulted in periodic population
bottlenecks. A population bottleneck maintains a long-

term effect on population heterozygosity, even for spe-
cies with a large intrinsic rate of growth such as A. aegypti
(Nei et al. 1975). Thus, genetic polymorphisms are ex-
pected to decline rapidly during insecticide use for any
locus in the mosquito genome. Loci conferring OP re-
sistance are expected to maintain lower genetic vari-
ability than other random loci in the genome, and ge-
netic variation of other closely-linked neutral loci may
be reduced through genetic linkage.

If recurrent population bottlenecks have occurred
in the mosquito populations, low polymorphism for all
loci in the genome would be expected. In contrast to
the expectation, we observed high polymorphisms for
most loci. For the same loci, average heterozygosities of

TABLE 3

Nm matrix based on pairwise FST estimates and Nei’s
unbiased genetic distance matrix

Curepe Couva San Fernando Tobago

Curepe
Couva
San Fernando
Tobago

11.4
8.6
4.3

0.041

3.7
2.6

0.057
0.112

3.7

0.113
0.166
0.124

Numbers below diagonal line are FST estimates, above the
diagonal line are Nei’s unbiased genetic distance.

798 G. Yan et al.

Figure 3.—Southern blot analysis of natural Aedes aegypti
populations probed with cDNA clones LF230 (top) and LF90
(bottom). The mosquito genomic DNA was digested with
EcoRI. Each lane is for a single mosquito. (Top) Apparent
gene deletion around the LF230 locus in 55% of individuals
(11 out of 20). (Bottom) Probe LF90 was used as a control to
demonstrate that absence of hybridization of mosquito ge-
nomic DNA to LF230 was not due to incomplete DNA diges-
tion, or to poor probe conditions. DNA hybridization was ob-
served for the same individuals with all other markers tested.
See Table 1 for heterozygosity and percentage of gene dele-
tion at the LF230 locus.

the populations studied here are substantially higher
than a laboratory population, which has not been ex-
posed to insecticides for more than 20 years and has
not experienced population bottleneck (Yan et al.
1997). The highest heterozygosity was observed for loci
at two chromosomal regions (LF198-LF178 on chromo-
some 1, and LF282-LF98 on chromosome 2). Coinci-
dentally, these two chromosomal regions in A. aegypti
harbor genes determining vector competence for filar-
ial worms and malaria parasites (Severson et al. 1994b,
1995). The high levels of heterozygosity observed may
be explained by two mechanisms. The first is that the
effective size of population bottlenecks has never been
small, because heterogeneous habitats may provide
effective shelters for the field populations. The sec-
ond is that genetic polymorphisms are introduced and
maintained by large gene flow among populations. The
gene flow estimates seem to support the second hy-
pothesis.

The LF90 locus consistently exhibited lower het-
erozygosity than other loci in the genome for the four
populations used in this study. The heterozygosity of a
RFLP locus may be influenced by several factors, in-
cluding the size of the probes, the size of the regions
being probed by the probes, reduced mutation or re-
combination rates in these genome regions, natural or
artificial selection on a particular locus, and hitchhik-
ing (selective sweep) of a selectively neutral locus by se-
lectively favored substitutions at linked loci. We argue

that the polymorphism pattern of the LF90 locus likely
reflects the result of a hitchhiking effect. First, the puta-
tive function of the LF90 clone is coding for ribosomal
protein S14 (Severson and Zhang 1996), and thus the
RFLP fragments of LF90 themselves are presumably
neutral. Second, LF90 is located in the general chro-
mosomal region of EST-4, and gene amplification at an
esterase locus is a common mechanism of OP resis-
tance. Our populations have been under selective pres-
sure by OP insecticides for decades. Third, low het-
erozygosity of the LF90 locus is likely not related to the
size of the genome region being probed by the LF90
marker, because we found no significant correlation be-
tween heterozygosity and RFLP fragment sizes. Fourth,
the observed heterozygosity of the LF90 locus in labora-
tory colonies of A. aegypti that have not been exposed to
insecticide selection was similar to other random loci
across the genome (Yan et al. 1997).

Our argument for a hitchhiking effect is strength-
ened by the RFLP data of the LF230 locus, which is
linked to LF90 and also is in the general genomic re-
gion of EST-4. We found very low DNA polymorphism
and an apparent gene deletion for many individuals at
this locus, a phenomenon which has not been observed
in other laboratory colonies of A. aegypti (Yan et al.
1997). Gene deletions may be the result of unequal re-
combination in this chromosomal region, associated
with the esterase gene amplification. For example, OP
resistant Culex mosquitoes possess 250–500 copies of a
30-kb esterase B1 gene, compared to a single copy in
susceptible individuals (Mouchès et al. 1990). Given
the genome size of A. aegypti of about 320 Mb (Zaitlin
and Severson, unpublished results), if the magnitude
of esterase gene amplification in the Aedes mosquitoes
is similar to Culex mosquitoes, then the hitchhiking ef-
fect associated with OP insecticide selection could af-
fect meiotic pairing across a large genome region of
chromosome 1, and could lead to chromosomal abnor-
malities (i.e., deletions or duplications) within this re-
gion. In contrast, a hitchhiking effect associated with
DDT usage is not evident for the para locus, as indi-
cated by the fact that genetic heterozygosity at the para
locus and loci closely linked to para was similar to other
random loci in the genome. This result is consistent
with the hypothesis that in the years since DDT was aban-
doned, the populations have had time to re-equilibrate.

Ideally, the hitchhiking effect should be demon-
strated at the nucleotide diversity level (Kaplan et al.
1989). Unfortunately, nucleotide diversity cannot be
appropriately estimated for the present data, because
our RFLP data is based on one restriction enzyme. To
statistically rule out the possibility of low heterozygosity
caused by reduced mutation rates and/or increased
functional constraints in the LF90 gene region, one
needs to examine intraspecific variation and interspe-
cific divergence over several gene regions for closely-
related species. This method has been elegantly applied

799 Hitchhiking Effect in Mosquitoes

to Drosophila studies (Begun and Aquadro 1991,
1992). The rationale is that, if reduced mutation rate in
a gene region leads to low intraspecific variation, then
interspecific divergence is expected to be smaller than
in other gene regions. However, hitchhiking effect only
redu

Ecology homework help

Open access, freely available online PLoS BIOLOGY

Two Rounds of Whole Genome Duplication
in the Ancestral Vertebrate

1 1,2
Paramvir Dehal , Jeffrey L. Boore

1 Evolutionary Genomics Department, Department of Energy Joint Genome Institute and Lawrence Berkeley National Laboratory, Walnut Creek, California, United States of

America, 2 Department of Integrative Biology, University of California, Berkeley, California, United States of America

The hypothesis that the relatively large and complex vertebrate genome was created by two ancient, whole genome
duplications has been hotly debated, but remains unresolved. We reconstructed the evolutionary relationships of all
gene families from the complete gene sets of a tunicate, fish, mouse, and human, and then determined when each
gene duplicated relative to the evolutionary tree of the organisms. We confirmed the results of earlier studies that
there remains little signal of these events in numbers of duplicated genes, gene tree topology, or the number of genes
per multigene family. However, when we plotted the genomic map positions of only the subset of paralogous genes
that were duplicated prior to the fish–tetrapod split, their global physical organization provides unmistakable
evidence of two distinct genome duplication events early in vertebrate evolution indicated by clear patterns of four-
way paralogous regions covering a large part of the human genome. Our results highlight the potential for these large-
scale genomic events to have driven the evolutionary success of the vertebrate lineage.

Citation: Dehal P, Boore JL (2005) Two rounds of whole genome duplication in the ancestral vertebrate. PLoS Biol 3(10): e314.

Introduction

It has long been hypothesized that the increased complex-
ity and genome size of vertebrates has resulted from two
rounds (2R) of whole genome duplication (WGD) occurring in
early vertebrate evolution, thus providing the requisite raw
materials [1]. This seemed to be supported by the long-
standing speculation that humans have about 100,000 genes,
roughly four times the number expected for invertebrates’
genomes, but this is now known to be incorrect, with the
actual human gene count being closer to 30,000 [2,3].
Conflicting analyses have now made this very controversial,
with some studies supporting 2R (e.g., [4–8]), others seeing
only a single round of WGD (e.g., [9–11]), and still others
refuting WGD altogether by concluding that nothing greater
than limited segmental duplications have occurred (e.g.,
[12,13]).

The 2R hypothesis had been bolstered by observations that
a few gene families, e.g., Hox clusters [14], follow a ‘‘4:1 rule’’
in the numbers of vertebrate to invertebrate genes. However,
comparison of the complete genome sequences of human
[2,3] and Drosophila [15] revealed that less than 5% of
homologous gene families follow the 4:1 rule [12]. Further,
although two sequential duplications are expected to
generate the evolutionary topology (AB)(CD) for the descend-
ent genes, rather than (A)(BCD), in fact, the relationships of
vertebrate multigene families do not generally show this
pattern, as indicated by early studies using only a few genes
[16] and confirmed as complete genome sequences became
available [2,13]. (However, for a different view, see [17].)
Several studies have incorporated data from sparse sampling
of genes from taxa thought to have branched near to these
purported duplications, including lamprey [18], hagfish,
amphioxus [17,19–22], and Ciona [23]; although these results
are useful for timing duplications, the conclusions could
never be viewed as definitively resolving this issue because
these products could have alternatively been generated by
duplications of individual genes or short gene segments

rather than by WGDs. Even duplicating all of the genes in a
genome individually is quite different from a whole genome
duplicating simultaneously.
There are several reasons why this has been a difficult issue

to resolve. After duplication, only the minority of gene pairs
will adopt a new function (‘‘neofunctionalization’’) or
partition old functions (‘‘subfunctionalization’’) quickly
enough to escape disabling mutations that would lead to
their eradication [24]; therefore, rampant gene loss rapidly
erases this signal of genome duplication. Further, four-
member gene families, even those with the (AB)(CD) top-
ology, can be generated by two rounds of duplications of
individual genes or of segments much smaller than the entire
genome, generating a condition that cannot be differentiated
on this basis from 2R followed by many gene losses. This
alternative scenario seems especially plausible because recent
analyses have shown that gene duplications occur much more
frequently than had been thought, with the typical rate being
sufficient to duplicate an entire genome equivalent every 100
million years (MY) [25,26]. Until recently, no complete
genome sequence has been available from an outgroup that
is closely related to vertebrates, and all methods of
phylogenetic reconstructions are less accurate with more
distant relatives such as Drosophila and yeast [20]. Lastly, there
has not been to date a method to accurately and compre-
hensively cluster genes into homologous families because
methods that rely on sequence similarity alone are highly

Received September 7, 2004; Accepted July 8, 2005; Published September 6, 2005
DOI: 10.1371/journal.pbio.0030314

Copyright: � 2005 Dehal and Boore. This is an open-access article distributed
under the terms of the Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any medium, provided the
original work is properly cited.

Abbreviations: 2R, two rounds of whole genome duplication; MY, million years;
WGD, whole genome duplication

Academic Editor: Peter Holland, University of Oxford, United Kingdom

E-mail: PSDehal@LBL.gov (PD), JLBoore@LBL.gov (JLB)

PLoS Biology | www.plosbiology.org 1700 October 2005 | Volume 3 | Issue 10 | e314

Vertebrate Genome Duplication

subject to artifactual association of slowly evolving paralogs
and to erroneous exclusion of the more rapidly evolving
genes.

Fortunately, as has been shown convincingly for the yeast
genome and for Arabidopsis [27–30], evidence of an ancient
genome duplication can be seen in the large-scale pattern of
the physical locations of homologous genes, even when the
great majority of the duplicated genes have been lost. Studies
have shown that the human genome also has multiple regions
of colinear paralogous gene copies [4,21,22,31–37], but
considered the arrangements of too small a number of genes
and genomic regions to be comprehensive. This approach is
now available for a large-scale evaluation of the vertebrate 2R
hypothesis, because complete (i.e., at least draft quality)
genome sequences are available for the tunicate Ciona
intestinalis [38] (a basal chordate outgroup) and the verte-
brates Takifugu rubripes [39] (a pufferfish ‘‘fugu’’), Mus musculus
[40] (mouse), and human [2,3]. Figure 1 illustrates how the

Figure 1. Pattern Predicted for the Relative Locations of Paralogous
Genes from Two Genome Duplications
(A) Representation of a hypothetical genome that has 22 genes shown as
colored squares.
(B) A genome duplication generates a complete set of paralogs in
identical order.
(C) Many paralogous genes suffer disabling mutations, become
pseudogenes, and are then lost. One could imagine this condition being
evidence of a single round of genome duplication followed by significant
gene losses.
(D) A second genome duplication recreates another set of paralogs in
identical order, with multigene families that retained two copies now
present in four, and those that had lost a member now present in two
copies.
(E) Again, many paralogous genes suffer disabling mutations, become
pseudogenes, and are then lost. Of course, unrelated gene duplications
and transpositions can occur. Even though this leaves only a few four-
member gene families, the patterns of 2- and 3-fold gene families unite,
in various combinations, all four genomic segments, revealing that the
sequential duplications had been of very large regions, in this case all or
nearly all of this hypothetical genome.
DOI: 10.1371/journal.pbio.0030314.g001

signal of two rounds of genome duplication could be retained
by the large-scale pattern in location of duplicated genes, in
which many tracks of paralogous duplicates (which may not
contain identical subsets of genes) each occur at exactly four
positions in the genome, i.e., ‘‘tetra-paralogons.’’ No similar
signal would be generated by repeated duplications of genes
or even large gene segments; only WGDs would result in such
global organization of paralogous genes.

Results

Gene Clustering and Duplication Timing
A graph-based method was used with the complete gene

sets of the four chordates (98,517 total genes; see Table 1 for
details of each step in the analysis) to generate clusters such
that each contains all, and only, those genes that descended
from a single gene in their common ancestor (Figure 2). A
multiple sequence alignment and a maximum likelihood
evolutionary tree were constructed for each cluster, then a
Web browser interface was built so that each can be viewed
individually. (For more details and updates that include more
taxa, see the ‘‘PhIGs’’ [Phylogenetically Inferred Groups] Web
site at http://phigs.org/.) We could then easily determine when
each gene duplicated relative to lineage splitting by compar-
ing these gene trees with the known evolutionary relation-
ships of the animals. For example, a gene duplication that is
specific to only one animal’s lineage is seen as two genes from
the same genome clustered together. A gene that duplicated
once in the unique common ancestor of mouse and human
would generate a tree that groups gene copy 1 of human and
mouse and, separately, gene copy 2 of human and mouse. Put
more generally, gene duplications that are shared by more
that one species are seen as a replication of the phylogeny of
the descendant organisms for each gene copy. Of course, gene
losses and various combinations of these processes are seen as
well. Figure 3 shows all possible gene topologies along with
how each would be interpreted.

This reveals that 46.6% of the ancestral chordate genes
appear in duplicate in one or more of the vertebrate lineages,
with 34.5% having at least one duplication before the
divergence of fish from tetrapods and 23.5% having at least
one duplication afterward. (Some of these are counted twice,
having had duplications both before and after the fish–
tetrapod split.) This means that there are 3,753 gene
duplications placed at the base of Vertebrata, which is
remarkable because the ancestral genome would be reason-
ably estimated to have had fewer than 20,000 genes, which is
the case for the tunicate as well as other invertebrate
outgroups. However, as can be seen in Figure 4, gene
duplications are in large numbers on every branch of the
tree, making it unclear whether this, in itself, indicates a
significant acceleration in duplication rate. Additionally, of
the gene clusters with duplications basal to the fish–tetrapod
split, 20.6% have had one duplication event, 10.8% have had
two, and 5.1% have had more than two, counter to the
expectation from 2R, and casting further doubt on the
significance of this for the 2R hypothesis.

Gene Family Membership
An early observation in support of 2R was that several gene

families have expanded from a single member in inverte-
brates to having four members for some vertebrates. Previous

PLoS Biology | www.plosbiology.org 1701 October 2005 | Volume 3 | Issue 10 | e314

Vertebrate Genome Duplication

Table 1. Overview of the Process for Analyzing the Complete Gene Sets with Number of Genes Included at Each Step

Step Process Step Gene Counts Clusters
Number

Ciona Fugu Mouse Human

1 Retrieve sequences 15,852 37,241 22,444 22,980 —
2 Run BLASTP 12,448 27,090 20,918 20,718 —
3 Make seeds — — — — —
4 Generate clusters 7,438 11,339 10,069 10,290 6,641
5 With duplication in vertebrates 3,623 8,394 7,131 7,235 3,096
6 With at least one fugu and one 3,402 7,885 6,907 7,015 2,951

tetrapod gene, ,100 copies in
each taxon

7 Create multiple sequence alignment — — — — —
8 Trim gaps, alignability 2,565 5,618 4,924 4,987 2,340
9 Perform phylogenetic analysis — — — — —

10 Determine strictly bifurcating nodes 1,776 3,770 3,118 3,190 1,621

DOI: 10.1371/journal.pbio.0030314.t001

studies, confirmed in this analysis, have shown that this is not
generally true for vertebrate multigene families [12]. As can
be seen in Figure S1, there is no peak at four for gene family
membership for any vertebrate. In fact, even gene duplica-
tions do not predominate; for each vertebrate species
considered individually, one member per cluster is the

largest category, accounting for 55%, 57%, and 59% of the
fugu, mouse, and human genes, respectively, with 53.4% of
the gene clusters having no duplication events whatsoever.
Thus, there is no signal of 2R remaining in gene family
membership, despite early anecdotal observations to the
contrary.

Figure 2. Overview of the Building of a Gene Cluster and Phylogenetic
Tree Shown by a Hypothetical Example
(A) Each circle represents a gene, labeled with the source genome
according to the first letter of each taxon—C, M, H, and F for Ciona,
mouse, human, and fugu, respectively—and further differentiated by
numeral. BLASTP was first used to search all vertebrate genes for the one
most similar to Ciona’s C1 gene, in this case the mouse gene M1. Then
other genes are recruited to the cluster if they have a higher similarity
score to M1 that that between C1 and M1, indicated here by the red lines.
The six genes shown on the right side of the diagram have some
sequence similarity to those in the cluster, but less than that between C1
and M1, so are not included. Because the vertebrates are more closely
related to each other than any is to Ciona, each cluster will include those
genes descended from a single gene in the common chordate ancestor,
having arisen by either lineage splitting or gene duplication specific to
one or more vertebrates. (See Materials and Methods for more details.)
(B) Evolutionary tree of the genes in this cluster show separate
duplications for fugu and for human. Because the maximum likelihood
method does not rely solely on sequence similarity, there is no
significance to the mouse gene being most similar to C1. The mouse
genome simply contained the most slowly evolving vertebrate gene in
this multigene family; this can be from any vertebrate taxon with
approximately equal likelihood.
DOI: 10.1371/journal.pbio.0030314.g002

Figure 3. Hypothetical Phylogenetic Tree Showing All Possible Types of
Gene Relationships and How They Are Most Parsimoniously Interpreted
Interior nodes are designated in lower case for those that simply result
from lineage splitting and in upper case for gene duplications within a
lineage. Although not shown, nodes are still scored if there is gene loss.
Phylogenetic trees for each gene family can be viewed at http://phigs.
org/, also providing a valuable tool for improving the inference of gene
function. DBFTS, duplication before fish–tetrapod split; DBPRS, duplica-
tion before primate–rodent split; FD, fugu duplication; fts, fish–tetrapod
split; HD, human duplication; MD, mouse duplication; prs, primate–
rodent split.
DOI: 10.1371/journal.pbio.0030314.g003

PLoS Biology | www.plosbiology.org 1702 October 2005 | Volume 3 | Issue 10 | e314

Vertebrate Genome Duplication

Figure 4. Phylogenetic Analysis of the Four Chordates with Drosophila as
an Outgroup
This phylogenetic tree is based on 766 concatenated single copy protein
sequences totaling 313,797 amino acid positions with branches propor-
tional to the amount of change. Numerals in bold above the branches
indicate the number of gene duplications occurring in each lineage;
numerals below indicate branch lengths.
DOI: 10.1371/journal.pbio.0030314.g004

Determination of Concordantly Duplicated Regions
To test the extent to which the 3,753 early duplication

events that are timed to the base of Vertebrata were
generated as part of larger scale, multigene duplications, we
examined the relative positions of these resulting paralogs in
the human genome (which is currently the best assembled and
annotated vertebrate genome). These results are shown in
Figure 5 (and more comprehensively in Figure S2) in which
the linear array of genes for each chromosome is used to
query for paralogs generated by any duplication event prior
to the fish–tetrapod split. It is apparent from these figures
that there is a large-scale pattern of genome segments that
are concordant in having similar arrangements of paralogous
genes. We quantified this by identifying all cases in which two
or more different early-duplicating genes are within a 100-
gene window, then, for each, querying all other places in the
genome, using a sliding window to count the number of cases
in which their respective paralogs are within both 50 genes
upstream as well as 50 genes downstream from that point.
There is a distinct pattern of having multiple chromosomes
matching with long linear stretches of paralogous genes. This
indicates that these duplications occurred in very large
segments, consistent with the hypothesis of WGD(s). Having
matches to three other chromosomal segments is the
dominant category, as can be seen by the darker coloring in
Figures 5 and S2 and in the histogram of Figure 6. These
patterns, with each genomic region corresponding in gene
arrangement to sets of paralogs in three other genomic
segments, are strong support for the 2R hypothesis.

Although the 4-fold (i.e., including the query segment)
category is the most prevalent, it accounts for only 25% of
the genome. Nonetheless, it is striking that this remains the
largest category despite approximately 450 MY of evolution.
This constitutes a strong signal of 2R, and could not
reasonably have been generated by a series of smaller
duplication events. For the latter to have generated this
pattern, multiple duplications of the same region (or its
resulting duplicates) would have to have occurred three
times, and have done so for many regions throughout the
genome. We would expect, rather, that independent, random
duplications would follow a Poisson distribution; this
contrasting situation is seen when the same analysis is done
with all human gene paralogs generated by duplication after

the split of fish and tetrapods (not shown). Even if we were to
consider the alternative of a single WGD followed by
subsequent independent duplications of large segments, it
would be difficult to explain why these would have been
predominantly 2-fold for previously duplicated regions. The
most parsimonious explanation for the observed pattern can
only be 2R.

Tetra-Paralogon Detection
To further establish 2R, we evaluated these sets of paralogs

for whether this 4-fold matching indicates that they fall into
tetra-paralogons, as illustrated in Figure 1. We formalized
this by first identifying paralogons (paralogous genomic
segments) containing the same set of at least two duplicated
gene pairs, while allowing for a maximum of 100 undupli-
cated genes in between (similar to the approach in [10]). (The
allowance of 100 genes is arbitrary, but the results are not
critically dependent on this number, which is only used to
find the blocks of paralogous genes.) We infer that duplicated
genes in paralogons are likely to have arisen from a single
duplication involving all contained, duplicated genes, and
that the unique, intervening genes have resulted from
differential gene deletions and subsequent genome rear-
rangements.
We identified 2,953 paralogous human gene pairs that are

inferred to have resulted from 1,912 genes that duplicated
prior to the divergence of the fish and tetrapod lineages (with
some gene losses also). Of these paralogous genes, 32.4% are
still in 386 detectable paralogons comprising 772 individual
genomic segments, containing from two to 42 gene pairs
(Table S1). Of these 772 genomic segments, 454 comprise
tetra-paralogons (Figures 7A and S3, Table 2) as shown
hypothetically in Figure 1, in which overlapping sets of
paralogs fall into 4-fold groups. (Unfortunately, it was not
possible for us to evaluate the hypothesis of an additional
genome duplication unique to ray-finned fish [41,42] because
of the generally poor contiguity of the fugu draft assembly.)
In contrast, when looking at the gene pairs that arose from

a duplication event after the divergence of the fish and
mammal lineages (see Figure 4), we find only 11% are
detected in paralogons in the human genome, indicating that
these duplications have less commonly included large seg-
ments of the genome (Figure 7B). This is especially interesting
in that their relative recency would make it more likely that
any large duplications would remain detectable, reinforcing
the contrast with the large-scale structure of those earlier
duplications. By looking specifically for tandemly duplicated
genes by defining them as paralogs on the same chromosome
that are separated by fewer than 10 intervening genes, we can
recognize that 50% of these human gene pairs arose from
tandem duplication, compared with 6% for the human gene
pairs that arose before the divergence of the fish and tetrapod
lineages.

Discussion

No detectible signal of WGD exists in the analysis of gene
family membership. There is no peak at four genes per family
for any of the vertebrates (Figure S1) as might result from 2R.
Presumably this results from a great number of subsequent
gene losses that have erased this signal. Likewise, the
phylogenetic timing of the duplication events is also incon-

PLoS Biology | www.plosbiology.org 1703 October 2005 | Volume 3 | Issue 10 | e314

Vertebrate Genome Duplication

PLoS Biology | www.plosbiology.org 1704 October 2005 | Volume 3 | Issue 10 | e314

F
ig

u
re

5
.

P
lo

t
o

f
th

e
G

e
n

o
m

ic
P

o
si

ti
o

n
s

o
f

P
a

ra
lo

g
o

u
s

P
a

ir
s

o
f

H
u

m
a

n
G

e
n

e
s

th
a

t
A

ro
se

f
ro

m
D

u
p

li
c

a
ti

o
n

s
P

re
d

a
ti

n
g

t
h

e
F

is
h


T

e
tr

a
p

o
d

S
p

li
t

T
h

e
q

u
e

ri
e

s
sh

o
w

n
h

e
re

u
se

C
h

ro
m

o
so

m
e

s
2

, 4
, 5

, a
n

d
1

0
, a

s
in

d
ic

a
te

d
fo

r t
h

e
fo

u
r p

a
n

e
ls

. (
T

h
e

c
o

m
p

le
te

s
e

t
c

a
n

b
e

s
e

e
n

in
F

ig
u

re
S

2
.)
O

n
t

h
e

x
-a

x
is

is
e

a
c

h
c

h
ro

m
o

so
m

e
a

rr
a

n
g

e
d

fr
o

m
p

t
o

q
t

e
lo

m
e

re
s.

O
n

t
h

e

y-
a

x
is

is
e

a
c

h
o

f
th

e
2

2
h

u
m

a
n

a
u

to
so

m
e

s
p

lu
s

th
e
X

a
n

d
Y

c
h

ro
m

o
so

m
e

s.
F

o
r

e
a

c
h

q
u

e
ry

g
e

n
e
o

n
t

h
e
x

-a
x

is
, a

‘‘
h

it
’’
is

s
c

o
re

d
if

t
h

e
s

u
b

je
c

t
c

h
ro

m
o

so
m

e
c

o
n

ta
in

s
a
p

a
ra

lo
g

g
e

n
e

ra
te

d
b

y
a

g
e

n
e

d
u

p
li

c
a

ti
o

n

p
ri

o
r t

o
t

h
e

fi
sh


te

tr
a

p
o

d
s

p
li

t.
T

h
e

lo
w

e
r

p
o

rt
io

n
o

f e
a

c
h

p
a

n
e

l p
lo

ts
t

h
e

n
-f

o
ld

re
d

u
n

d
a

n
c

y
a

lo
n

g
t

h
e

q
u

e
ry

c
h

ro
m

o
so

m
e

a
s

d
e

fi
n

e
d

b
y
p

a
ir

s
o

f p
a

ra
lo

g
s

d
e

te
c

te
d

in
a

s
li

d
in

g
w

in
d

o
w

a
n

a
ly

si
s.

S
e

e
t

h
e

M
a

te
ri

a
l

a
n

d
M

e
th

o
d

s
se

c
ti

o
n

f
o

r
d

e
ta

il
s,

b
u

t
b

ri
e

fl
y

, f
o

r
e

v
e

ry
h

u
m

a
n

q
u

e
ry

g
e

n
e

, a
w

in
d

o
w

w
a

s
c

o
n

si
d

e
re

d
o

f
5

0
g

e
n

e
s

to
t

h
e

le
ft

a
n

d
5

0
g

e
n

e
s

to
t

h
e

r
ig

h
t,

w
it

h
a

‘‘
h

it
’’
o

b
ta

in
e

d
f

o
r

th
e

s
u

b
je

c
t

c
h

ro
m

o
so

m
e

if
it

in

c
lu

d
e

s
th

e
e

a
rl

y
-d

u
p

li
c

a
te

d
p

a
ra

lo
g

s
o

f
g

e
n

e
s

o
n

e
a

c
h

s
id

e
o

f
th

e
q

u
e

ry
. F

o
u

r-
fo

ld
(

i.e
.,

in
c

lu
d

in
g

t
h

e
q

u
e

ry
)

m
a

tc
h

in
g

, a
s

e
x

p
e

c
te

d
b

y
t

h
e

2
R

h
y

p
o

th
e

si
s,

is
h

ig
h

li
g

h
te

d
in

a
d

a
rk

e
r

sh
a

d
e

o
f

b
lu

e
.

D
O

I:
1

0
.1

3
7

1
/j

o
u

rn
a

l.p
b

io
.0

0
3

0
3

1
4

.g
0

0
5

Vertebrate Genome Duplication

Figure 6. Histogram Showing the Lower Bound Estimate of N-fold
Redundancy Using the Analysis Reported in Figure 5
This histogram is generated by counting the depth of paralogon
redundancy across all human chromosomes as shown in the lower part
of Figure S2 (and subsampled for Figure 5). The peak at 4-fold coverage
is consistent with the 2R hypothesis, and constitutes a lower bound
estimate, because the sliding window examines only a small span of
flanking genes and would be highly subject to effects of local gene
rearrangements.
DOI: 10.1371/journal.pbio.0030314.g006

clusive, because duplications are common on every branch
(see Figure 4). Although there is a somewhat greater number
assigned to the base of vertebrates, there is no reliable way to
evaluate the significance of this. In fact, even if this larger
number could be found to be statistically significant, it may
simply indicate that this was a period with an accelerated
duplication of individual genes or multigene segments or a
reduction in the rate of gene loss, rather than indicating
WGD.

Conclusive evidence for 2R is seen only when data from
gene families, phylogenetic trees, and genomic map position
are all taken together, as has been advocated by others
[21,32,43]. When examining the genomic map position of only

those genes in the human genome that trace their ancestry
back to a duplication event at the base of vertebrates, a clear
pattern of tetra-paralogons emerges, indicating that 2R
occurred at the base of vertebrates. This signal remains most
clearly in 25% of the human genome that forms the largest
category in the analysis shown in Figures 5 and 6, but we also
find that 72% of all human genes are included in the total
extent of all of the paralogons that overlap with these regions,
providing the least constrained estimate of the portion of the
human genome still retaining structure from the 2R. This is
the outside estimate, because some portion could have as well
been the result of segmental duplications of regions earlier
established by WGD. This is in contrast to the pattern seen
for the many other gene duplications, which generated
paralogs that are predominantly arranged in tandem.

This is particularly compelling considering that this signal
has survived more than 450 MY of genome rearrangements
and the loss of many genes. We can imagine the effect that
duplications, translocations, inversions, and deletions (and
combinations thereof) would have had on this analysis: (1)
Duplications would cause an increase beyond the 4-fold
category; (2) translocations would decrease the 4-fold
category if they are pervasive enough to clear large regions
of paralogs; (3) inversions can either cause a decrease in the
number of chromosomes hit by moving paralogous genes
beyond the detection of the sliding window analysis or cause
an increase by spreading some paralogous genes across the
boundaries into adjacent segments; both of which can be
exacerbated by gene translocations that blur the edges of the
corresponding regions; and (4) deletions would generally
increase the 3-fold chromosome category at the expense of
the 4-fold category, and a deletion that occurred between the
two WGDs would increase the 2-fold chromosome category.
Additionally, in some cases, a few individual gene deletions or
translocations may have eliminated the links between pairs of

Figure 7. An Arbitrarily Selected Subset of the Human Genome Showing the Physical Relationships Among Paralogous Genes
(A) This is an example of the tetra-paralogous relationships of a subset of human genes that are all inferred,

Ecology homework help

nature Vol 444 | 23 November 2006 | doi:10.1038/nature05329

ARTICLES

Global variation in copy number in the
human genome
Richard Redon

1
, Shumpei Ishikawa

2,3
, Karen R. Fitch

4
, Lars Feuk

5,6
, George H. Perry

7
, T. Daniel Andrews

1
,

Heike Fiegler
1
, Michael H. Shapero

4
, Andrew R. Carson

5,6
, Wenwei Chen

4
, Eun Kyung Cho

7
, Stephanie Dallaire

7
,

Jennifer L. Freeman7 , Juan R. González
8
, Mònica Gratacòs

8
, Jing Huang

4
, Dimitrios Kalaitzopoulos

1
,

Daisuke Komura3 , Jeffrey R. MacDonald5 , Christian R. Marshall5,6 , Rui Mei4 , Lyndal Montgomery1 ,
Kunihiro Nishimura

2
, Kohji Okamura

5,6
, Fan Shen

4
, Martin J. Somerville

9
, Joelle Tchinda

7
, Armand Valsesia

1
,

Cara Woodwark1 , Fengtang Yang
1
, Junjun Zhang

5
, Tatiana Zerjal

1
, Jane Zhang

4
, Lluis Armengol

8
,

Donald F. Conrad
10
, Xavier Estivill

8,11
, Chris Tyler-Smith

1
, Nigel P. Carter

1
, Hiroyuki Aburatani

2,12
, Charles Lee

7,13
,

Keith W. Jones4 , Stephen W. Scherer
5,6

& Matthew E. Hurles
1

Copy number variation (CNV) of DNA sequences is functionally significant but has yet to be fully ascertained. We have
constructed a first-generation CNV map of the human genome through the study of 270 individuals from four populations
with ancestry in Europe, Africa or Asia (the HapMap collection). DNA from these individuals was screened for CNV using two
complementary technologies: single-nucleotide polymorphism (SNP) genotyping arrays, and clone-based comparative
genomic hybridization. A total of 1,447 copy number variable regions (CNVRs), which can encompass overlapping or
adjacent gains or losses, covering 360 megabases (12% of the genome) were identified in these populations. These CNVRs
contained hundreds of genes, disease loci, functional elements and segmental duplications. Notably, the CNVRs
encompassed more nucleotide content per genome than SNPs, underscoring the importance of CNV in genetic diversity and
evolution. The data obtained delineate linkage disequilibrium patterns for many CNVs, and reveal marked variation in copy
number among populations. We also demonstrate the utility of this resource for genetic disease studies.

Genetic variation in the human genome takes many forms, ranging at genes at which other types of mutation are strongly associated
from large, microscopically visible chromosome anomalies to single- with specific diseases: CHARGE syndrome21 and Parkinson’s and
nucleotide changes. Recently, multiple studies have discovered an Alzheimer’s disease22,23. Furthermore, CNVs can influence gene
abundance of submicroscopic copy number variation of DNA seg- expression indirectly through position effects, predispose to deleteri-
ments ranging from kilobases (kb) to megabases (Mb) in size1–8. ous genetic changes, or provide substrates for chromosomal change
Deletions, insertions, duplications and complex multi-site variants9, in evolution10,11,17,24.
collectively termed copy number variations (CNVs) or copy number In this study, we investigated genome-wide characteristics of CNV
polymorphisms (CNPs), are found in all humans10 and other mam- in four populations with different ancestry, and classified CNVs into
mals examined11. We defined a CNV as a DNA segment that is 1 kb or different types according to their complexity and whether copies
larger and present at variable copy number in comparison with a have been gained or lost (Supplementary Fig. 1). To maximize the
reference genome10. A CNV can be simple in structure, such as tan- utility of these data and the potential for integration of CNVs with
dem duplication, or may involve complex gains or losses of homo- SNPs for genetic studies, we performed experiments with the
logous sequences at multiple sites in the genome (Supplementary International HapMap DNA and cell-line collection25 derived from
Fig. 1). apparently healthy individuals. The result is the first comprehensive

An early association of CNV with a phenotype was described 70 yr map of copy number variation in the human genome, which provides
ago, with the duplication of the Bar gene in Drosophila melanogaster an important resource for studies of genome structure and human
being shown to cause the Bar eye phenotype12. CNVs influence gene disease.
expression, phenotypic variation and adaptation by disrupting genes
and altering gene dosage7,13–15, and can cause disease, as in micro- Two platforms for assessing genome-wide CNV

deletion or microduplication disorders16–18, or confer risk to complex The HapMap collection comprises four populations: 30 parent–off-
disease traits such as HIV-1 infection and glomerulonephritis19,20. spring trios of the Yoruba from Nigeria (YRI), 30 parent–offspring
CNVs often represent an appreciable minority of causative alleles trios of European descent from Utah, USA (CEU), 45 unrelated

1The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. 2Genome Science, and 3Dependable and High Performance Computing,
Research Center for Advanced Science and Technology, University of Tokyo, 4-6-1 Komaba Meguro, Tokyo 153-8904, Japan.

4
Affymetrix, Inc., Santa Clara, California 95051, USA.

5The Centre for Applied Genomics and Program in Genetics and Genomic Biology, The Hospital for Sick Children, MaRS Centre–East Tower, 101 College Street, Room 14-701, Toronto,
Ontario M5G 1L7, Canada.

6
Department of Molecular and Medical Genetics, Faculty of Medicine, University of Toronto M5S 1A8, Canada.

7
Department of Pathology, Brigham and

Women’s Hospital, Boston, Massachusetts 02115, USA.
8
Genes and Disease Program, Center for Genomic Regulation, Charles Darwin s/n, Barcelona Biomedical Research Park,

08003 Barcelona, Catalonia, Spain.
9
Departments of Medical Genetics and Pediatrics, University of Alberta, Edmonton, Alberta T6G 2H7, Canada.

10
Department of Human Genetics,

University of Chicago, 920 East 58th Street, Chicago, Illinois 60637, USA.
11

Pompeu Fabra University, Charles Darwin s/n, and National Genotyping Centre (CeGen), Passeig Marı́tim
37-49, Barcelona Biomedical Research Park, 08003 Barcelona, Catalonia, Spain. 12Japan Science and Technology Agency, Kawaguchi, Saitama 332-0012, Japan. 13Harvard Medical
School, Boston, Massachusetts 02115, USA.

©2006 Nature Publishing Group
444

I :~ ………. “‘ .. ,. …… ~~h;••’:j E-……….. ~ … , •• 0 •• ,,111~ I
“”- ‘

I ~1 . –,,_..______·. -. . …. : .· ,. ~· -~1

Comparative genome hybridization

Whole Genome TilePath array

Comparative intensity analysis

Affymetrix 500K early access SNP chip

Reference

DNA

Test

DNA

Reference

DNA

Test

DNA

Test

DNA

Test

DNA

Test

DNA

Test

DNA

Genome profile

log2
(test/reference)

Chromosome profile

log2
(test/reference)

10 Mb window

log2
(test/reference)

1 21 2

Combine chips

Compare samples

Combine chips

Chromosome 8

Combine dye-swaps

Chromosome 8

1

0

–1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1819 202122 X Y 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1819202122 X

1

0

–1

1

0

–1

1

0

–1

1

0

–1

1

0

–1

50 Mb 100 Mb 150 Mb 50 Mb 100 Mb 150 Mb

2 Mb 4 Mb 6 Mb 8 Mb 10 Mb 2 Mb 4 Mb 6 Mb 8 Mb 10 Mb

NspI StyI NspI StyI

NATURE | Vol 444 | 23 November 2006 ARTICLES

Japanese from Tokyo, Japan (JPT) and 45 unrelated Han Chinese
from Beijing, China (CHB). Genomic DNA from Epstein–Barr-
virus-transformed lymphoblastoid cell-lines was used.

Two technology platforms were used to assess CNV (Fig. 1): (1)
comparative analysis of hybridization intensities on Affymetrix
GeneChip Human Mapping 500K early access arrays (500K EA), in
which 474,642 SNPs were analysed; and (2) comparative genomic
hybridization with a Whole Genome TilePath (WGTP) array that
comprises 26,574 large-insert clones representing 93.7% of the
euchromatic portion of the human genome26.

Stringent quality control criteria were set for each platform and
experiments were repeated for 82 individuals on the WGTP and 15
individuals on the 500K EA platforms. The quality of the final data
sets was assessed by the standard deviation among log2 ratios of
autosomal probes (after normalization and filtering for cell-line arte-
facts), which for the WGTP platform was 0.047 (Supplementary
Fig. 2) and for the 500K EA platform was 0.220, both of which are
improvements on published data8,27.

The different nature of the two data sets required the development
of distinct algorithms to identify CNVs. In essence, these algorithms
segment a continuous distribution of intensity ratios into discrete
regions of CNV. To train the threshold parameters, we attempted to
validate experimentally 203 CNVs that had been defined with vary-

4,5,7ing degrees of confidence in two well-characterized genomes
(NA10851 and NA15510). By performing technical replicate experi-
ments on both platforms we assessed the proportion of CNV calls
that were false positives for different algorithm parameters across a

set of experiments representing the spectrum of data quality. The
threshold parameters for both algorithms were set to achieve an
average false-positive rate per experiment beneath 5% (Methods;
see also Supplementary Methods, Supplementary Tables 1–4 and refs
26, 28).

Because all DNAs were derived from lymphoblastoid cell lines, we
differentiated somatic artefacts (such as culture-induced rearrange-
ments and aneuploidies) from germline CNVs. We karyotyped all
available 268 HapMap cell lines (Supplementary Table 5) and sought
evidence for chromosomal abnormalities in the WGTP and 500K EA
intensity data. We identified 30 cell lines with unusual chromosomal
constitutions (Supplementary Table 5 and Supplementary Fig. 3),
and removed the aberrant chromosomes from further analyses.
Chromosomes 9, 12 and X seemed to be particularly prone to tris-
omy. For a cell line with mosaic trisomy of chromosome 12, we
confirmed by array comparative genomic hybridization that this
trisomy was not apparent in blood DNA from the same individual
(Supplementary Fig. 4). Furthermore, we sought signals of somatic
deletions within the SNP genotypes of HapMap trios. A somatic
deletion in a parental genome manifests as a cluster of SNPs at which
alleles present in the offspring are not found in either parent5. We
assessed all of our preliminary CNV calls in 120 trio parents and
found that 17 (of 4,758) fell in genomic regions that harbour highly
significant clusters of HapMap Phase II SNP genotypes compatible
with a somatic deletion in a parental genome (Supplementary Table
5A, Supplementary Fig. 5 and Supplementary Note). These putative
cell-line artefacts were removed from further analyses. Extrapolating

Figure 1 | Protocol outline for two CNV detection platforms. The profile shows the log2 ratio of copy number in these two genomes
experimental procedures for comparative genome hybridization on the chromosome-by-chromosome. The 500K EA data are smoothed over a five-
WGTP array and comparative intensity analysis on the 500K EA platform probe window. Below the genome profiles are expanded plots of
are shown schematically (see Supplementary Methods for details), for a chromosome 8, and a 10-Mb window containing a large duplication in
comparison of two male genomes (NA10851 and NA19007). The genome NA19007 identified on both platforms (indicated by the red bracket).

©2006 Nature Publishing Group
445

ARTICLES NATURE | Vol 444 | 23 November 2006

this analysis to the entire HapMap collection suggests that less than
0.5% of the deletions we observed were likely to have been somatic
artefacts.

The quality of resultant CNV calls was assessed in additional
26,28ways . Technical replicate experiments (triplicates for ten indivi-

duals) demonstrated that CNV calls are highly replicable (Supple-
mentary Table 6), and that noisier experiments are characterized by
higher false-negative rates, rather than higher false-positive rates
(Supplementary Fig. 2). Heritability of CNVs within trios was inves-
tigated at 67 biallelic CNVs at which CNV genotypes could be
inferred (Fig. 2; see also Supplementary Table 7). Of 12,060 biallelic
CNV genotypes, only ,0.2% exhibited mendelian discordance,
which probably reflects the genotyping error rate rather than the
rate of de novo events at these loci. Additional locus-specific experi-
mental validation was performed on subsets of CNVs (Supple-
mentary Table 4). CNVs called in only a single individual (singleton
CNVs) are more likely to be false positives compared with CNVs
identified in several individuals. We attempted to validate 50 single-
ton CNVs called on only one platform (25 from each platform) and
14 singleton CNVs called on both platforms. All 14 singleton CNVs
replicated by both platforms were verified as true positives, whereas
38 out of 50 of CNVs called by only one platform were similarly
confirmed (false-positive rate of 24%). Extrapolating these valida-
tion rates across the entire data set suggests that only 8% (24%
multiplied by the frequency of singleton CNVs called on only one
platform) of the CNV regions we identify (see below) are likely to be
false positives.

Chr8tp-17E9a | | b

40

20

0

–1.0 –0.5 0 0.5
log ratios

2

Chr1tp-31C8
| | | | | || | | | | |

30

20

10

0

–0.4 –0.2 0 0.2
log ratios

2

Chr5tp-22E4
20 | | | | | | | | | || |

15

10

5

0

.IL.IL !I

.1
TT

l. … UL•
.D..D. a.a. □□ uu

-. l. A .. A
.IL.IL I! II TT

TT I! II

-■~I
.IL.IL
TT I! II

.11..11. .a..a. II .11..11. 1.11.
TT UU TT T I! □ ! 11 1! I!

.11.1 .D..D.
T UU II ! I !!

I! □ □ II !! !! II □ ! II II II

II □ ! II ! I !I

II □ □ I! I! I! II !! I ! I! II

II □ ! I! II !I

II □ ! I! I! II

| | | | |||| | |

–0.5 –0.4 –0.3 –0.2 –0.1 0
log ratios

2

Chr6tp-5C12
|| | ||| | ||| | |

20

10

0

–0.2 –0.1 0 0.1 0.2 0.3 0.4 0.5
log ratios

2

Chr6tp-11A11
| | | | | | | || || |

F
re

q
u
e
n
c
y

F
re

q
u
e
n
c
y

F
re

q
u
e
n
c
y

F
re

q
u
e
n
c
y

F
re

q
u
e
n
c
y
NA12144 NA12145

NA10846

NA06994 NA07000

NA07029

NA18504 NA18505

NA18503

NA18501 NA18502

NA18500

20

10

0

–0.2 –0.1 0 0.1 0.2
log ratios

2

i

i
i

A genome-wide map of copy number variation

The average number of CNVs detected per experiment was 70 and 24
for the WGTP and 500K EA platforms, respectively (Supplementary
Tables 8–10). Owing to the nature of the comparative analysis, each
WGTP experiment detects CNVs in both test and reference genomes,
whereas each 500K EA experiment detects CNV in a single genome.
The median size of CNVs from the two platforms was 228 kb
(WGTP) and 81 kb (500K EA), and the mean size was 341 kb and
206 kb, respectively. Consequently, the average length of the genome
shown to be copy number variable in a single experiment is 24 Mb
and 5 Mb on the WGTP and 500K EA platforms, respectively. The
larger median size of the WGTP CNVs partially reflects inevitable
overestimation of CNV boundaries on a platform comprising large-
insert clones, as CNV encompassing only a fraction of a clone can be
detected, but will be reported as if the whole clone was involved.

By merging overlapping CNVs identified in each individual, we
delineated a minimal set of discrete copy number variable regions
(CNVRs) among the 270 samples (Fig. 3; see also Supplementary
Table 11). We identified 913 CNVRs on the WGTP platform and
980 CNVRs on the 500K EA platform and mapped their genomic
distribution (Fig. 4). Approximately half of these CNVRs were called
in more than one individual and 43% of all CNVs identified on one
platform were replicated on the other. Combining the data resulted
in a total of 1,447 discrete CNVRs, covering 12% (,360 Mb) of the
human genome. Using locus-specific quantitative assays on a subset
of regions we validated 173 (12%) of these CNVRs (Supplementary
Tables 4 and 12). A minority (30%) of these 1,447 CNVRs overlapped

Figure 2 | Heritability of five CNVs in four
HapMap trios. a, The distribution of WGTP log2
ratios at five CNVs with genotype information.
Each histogram of log2 ratios in 270 HapMap
ndividuals exhibits three clusters, each
corresponding to a genotype of a biallelic CNV,
with the two alleles depicted by broken and
complete bars, representing lower and higher
copy number alleles, respectively. Red lines above
each histogram denote log2 ratios in the 12
ndividuals represented in b. b, Mendelian
nheritance of five CNVs in four parent–offspring
trios. The individual CNVs were genotyped from
WGTP clones: green, Chr8tp-17E9; yellow,
Chr1tp-31C8; blue, Chr5tp-22E4; red, Chr6tp-
5C12; black, Chr6tp-11A11.

©2006 Nature Publishing Group
446

NATURE | Vol 444 | 23 November 2006 ARTICLES

Both overlaps <threshold One overlap >threshold One overlap >threshold Both overlaps >threshold

Individual A
Individual B
Individual C
Individual D
Individual E

1111111!

= –
i. L

….. …..

~

~ii ….. i-i.i n ili ili

Thresholds:
WGTP: 40% of length
500K EA: 30% of SNPs

CNV regions (CNVR)

CNVs
both overlaps >threshold

CNV ends
enriched for breakpoints

1 <10 kb

10 100 kb
CNVR associated with

100 1 Mb
segmental duplications

3 4 5 6
7

8 9 10 11 12

13 14 15 16
17 18 19 20

21 22

X

Y

=

I=

Figure 3 | Defining CNVRs, CNVs and CNV ends. Overlapping CNVs called
in five individuals are shown schematically for four loci (in blue); dashed
lines indicate overlap. Copy number variable regions (CNVRs) represent the
union of overlapping CNVs (in green). Independent juxtaposed CNVs (in
black) are identified by requiring that only individual-specific CNVs that
overlap by more than a threshold proportion be merged. Intervals

those identified in previous studies1–3,5–8,29. Combining different
classes of experimental replication revealed that 957 (66%) of the
1,447 CNVRs detected here have been replicated on both WGTP
and 500K EA platforms, or with a locus-specific assay, or in another
individual, or in a previous study (Supplementary Table 12). Whole-
genome views of CNV show that although common, large-scale CNV
is distributed in a heterogeneous manner throughout the genome
(Supplementary Fig. 6), no large stretches of the genome are exempt
from CNV (Fig. 4), and the proportion of any given chromosome
susceptible to CNV varies from 6% to 19% (Supplementary Fig. 7).

Gaps within the reference human genome assembly have an extre-
mely high likelihood of being associated with CNVs; out of the 345
gaps in the build 35 assembly, 48% (164 out of 345) are flanked or
overlapped by CNVRs. This finding highlights the complexity in
generating a reference sequence in regions of structural dynamism

1 2

encompassing CNV breakpoints (in red) are defined using platform-
dependent criteria (Supplementary Methods), and contain a significant
paucity of recombination hotspots76,77 (Supplementary Table 13), which
results from the enrichment of segmental duplications within which fewer
inferred recombination hotspots reside.

and emphasizes the need for ongoing characterization of these geno-
mic regions.

Comparing the CNVRs identified on the two platforms reveals
that the WGTP and 500K EA platforms largely complement one
another. The 500K EA platform is better at detecting smaller CNVs
(Supplementary Fig. 8), whereas the WGTP platform has more power
to detect CNVs in duplicated genomic regions (Supplementary Table
13) where 500K EA coverage is poorer30.

Some CNVRs encompass two or more independent juxtaposed
CNVs. For example, a small deletion found in one individual over-
lapping a much larger duplication in another individual was merged
into a single CNVR, despite these representing distinct events. To
delineate independent CNVs (CNV events) we applied more strin-
gent merging criteria to separate juxtaposed CNVs (Fig. 3), and
identified 1,116 and 1,203 CNVs on the WGTP and 500K EA

Figure 4 | Genomic distribution of CNVRs. The chromosomal locations of among 270 HapMap samples). When both platforms identify a CNVR, the
1,447 CNVRs are indicated by lines to either side of ideograms. Green lines maximum call frequency of the two is shown. For clarity, the dynamic range
denote CNVRs associated with segmental duplications; blue lines denote of length and frequency are log transformed (see scale bars). All data can be
CNVRs not associated with segmental duplications. The length of right- viewed at the Database of Genomic Variants (http://projects.tcag.ca/
hand side lines represents the size of each CNVR. The length of left-hand side variation/).
lines indicates the frequency that a CNVR is detected (minor call frequency

©2006 Nature Publishing Group
447

CNVR lengthCall frequencyCNVR not associated with

segmental duplications

… -vvrl ., ..

. . … -, •~”. ‘

. .., … “llljll’. -~, -, ….

WGTP 500K EA

(% SegDup (% SegDup

Deletion associated) associated)
20

15

10 445 676

5 (23.6) (14.9)

0

–0.4 –0.3 –0.2 –0.1 0 0.1

Duplication
20

15

423 406 10

5 (41.4) (37.2)

0

F
re

q
u
e
n
c
y

0 0.1 0.2 0.3 0.4 0.5

Deletion & duplication

20

15
98 65 10

5 (81.6) (66.2)
0

–0.6 –0.4 –0.2 0 0.2

Multi-allelic
20

15

10 19 12

5 (94.7) (91.7)
0

–0.5 0 0.5 1.0

12 Complex

8

131 44
4

(70.2) (79.5)
0

–0.4 –0.2 0 0.2 0.4 0.6

log2 ratios 1,116 1,203

‘ ”SIOT ”

ARTICLES NATURE|Vol 444|23 November 2006

platforms, respectively (Fig. 5; see also Supplementary Table 11). We
classified these CNVs into five types: (1) deletions; (2) duplications;
(3) deletions and duplications at the same locus; (4) multi-allelic loci;
and (5) complex loci whose precise nature was difficult to discern.
Owing to the inherently relative nature of these comparative data, it
was impossible to determine unambiguously the ancestral state for
most CNVs, and hence whether they are deletions or duplications.
Here we adopted the convention of assuming that the minor allele is
the derived allele

31
, thus deletions have a minor allele of lower copy

number and duplications have a minor allele of higher copy number.
Approximately equal numbers of deletions and duplications were
identified on the WGTP platform, whereas deletions outnumbered
duplications by approximately 2:1 on the 500K EA platform. In addi-
tion, 33 homozygous deletions (relative to the reference sequence)
identified on the 500K EA platform were experimentally validated
with locus-specific assays (Supplementary Table 14). Most (27 out of
33) of these have not been observed in a previous genome-wide
survey of deletions

7
.

To investigate mechanisms of CNV formation, we studied the
sequence context of sites of CNV. Non-allelic homologous recom-
bination can generate rearrangements as a result of recombination
between highly similar duplicated sequences

32,33
. Segmental duplica-

Figure 5 | Classes of CNVs. CNVs identified from WGTP and 500K EA
platforms can be classified from the population distribution of log2 ratios
(exemplified with WGTP data) into five different types (see text). Biallelic
CNVs (deletions and duplications) can be genotyped if the clusters
representing different genotypes are sufficiently distinct. The numbers of
each class of CNV identified on WGTP and 500K EA platforms are given,
along with the proportion of those CNVs that overlap segmental
duplications. The overall proportion of CNVRs overlapping segmental
duplications was 20% and 34% on the 500K EA and WGTP platforms,
respectively.

448

tions are defined as sequences in the reference genome assembly
sharing .90% sequence similarity over .1 kb with another genomic
location

34,35
. We found that 24% of the 1,447 CNVRs were associated

with segmental duplications, a significant enrichment (P , 0.05).
This association results from two factors: (1) rearrangements gener-
ated by non-allelic homologous recombination; and (2) not all anno-
tated segmental duplications are fixed in humans, but are, in fact,
CNVs. This latter point highlights the essentially arbitrary nature of
defining segmental duplications on the basis of a single genome
sequence (albeit derived from several individuals).

The likelihood of a CNV being associated with segmental duplica-
tions depended on its length and its classification: multi-allelic
CNVs, complex CNVs and loci at which both deletions and duplica-
tions occurred were markedly enriched for segmental duplications
(Fig. 5; see also Supplementary Fig. 9). This is not surprising given
the role that non-allelic homologous recombination has been shown
to have in generating complex structural variation

36
, arrays of tan-

dem duplications that vary in size
37
, and reciprocal deletions and

duplications
38
.

The likelihood of a segmental duplication being associated with a
CNV was greater for intrachromosomal duplications than for inter-
chromosomal duplications, and was highly correlated with increas-
ing sequence similarity to its duplicated copy (Supplementary Fig.
10). Non-allelic homologous recombination is known to operate
mainly on intrachromosomal segmental duplications and to require
97–100% sequence similarity between duplicated copies

33,39
.

This role for non-allelic homologous recombination in generating
CNVs in duplicated regions of the genome is supported by the
enrichment of segmental duplications within intervals that probably
contain the breakpoints of the CNV (Fig. 3). We identified 88 CNVs
from the 500K EA platform and 53 CNVs from the WGTP platform
that contain a pair of segmental duplications, one at either end. These
pairs of segmental duplications were biased towards high (.97%)
sequence similarity, and were more frequently associated with the
longest CNVs (Supplementary Fig. 11). In addition to segmental
duplications, there are other types of sequence homologies that can
promote non-allelic homologous recombination, for example, dis-
persed repetitive elements, such as Alu elements

40
. We performed an

exhaustive search for sequence homology of all kinds
41
and identified

121 CNVs from the 500K EA platform and 223 on the WGTP plat-
form that contain lengths of perfect sequence identity longer than
100 bp between either end of the CNV.

Genomic impact of CNV

Deletions are known to be biased away from genes
5
, as a result of

selection. In contrast, the selective pressures on duplications are
poorly understood; the existence of gene families pays testament to
positive selection acting on some gene duplications over longer-term
evolution

42
. We identified the different classes of functional sequence

that fell within CNVRs, and tested whether they were significantly
enriched or impoverished within these CNVRs compared to the
entire genome (Table 1; see also Supplementary Table 13 and Supple-
mentary Methods).

Table 1 | Functional sequences within CNVRs

Functional sequence WGTP CNVRs 500K EA CNVRs Merged CNVRs

RefSeq genes 2,561 1,139{ 2,908{
OMIM genes 251 112{ 285
Ultra-conserved elements 48{ 16{ 50{
Conserved non-coding 116,678* 55,937* 130,353*
elements
Non-coding RNAs 57 29{ 67

Statistical significance of the enrichment or paucity of functional sequences within CNVRs was
assessed by randomly permuting the genomic location of autosomal CNVRs (Supplementary
Methods). Significant observations are shown in bold. Note that both conserved non-coding
elements

75
and CNVRs are biased away from genes, so an enrichment of conserved non-coding

elements in CNVRs is not unexpected.
* Significant (P , 0.05) enrichment.
{ Significant (P , 0.05) paucity.

©2006 Nature Publishing Group

NATURE | Vol 444 | 23 November 2006 ARTICLES

It is not possible to define precisely the breakpoints of CNVRs;
therefore, some of these functional sequences might flank rather than
be encompassed by CNVRs. We observed a significant paucity of all
functional sequences (with the exception of conserved non-coding
sequences43) in CNVRs detected on the 500K EA platform, which
provided the highest resolution breakpoint mapping (Table 1). Thus,
CNVs are preferentially located outside of genes and ultra-conserved
elements in the human genome44. We attempted to validate experi-
mentally 11 CNVs containing 12 ultra-conserved elements. Although
all but two of the CNVs validated, only two ultra-conserved elements
actually fell within these CNVs (Supplementary Table 13B), so the
selection against CNV at ultra-conserved elements is likely to be even
stronger than this analysis would suggest. Nevertheless, thousands of
putatively functional sequences, including known disease-related
genes, flank or fall within these CNVs: over half (58%) of the 1,447
CNVRs overlap known RefSeq genes, and more than 99% overlap
conserved non-coding sequences43.

We examined whether deletions or duplications are equally likely
to encompass these different classes of functional sequences. We
observed that a significantly lower proportion of deletions than
duplications (identified on the 500K EA platform) overlap with
the Online Mendelian Inheritance in Man (OMIM) database of
disease-related genes (P 5 0.017, chi-squared) and RefSeq genes
(P 5 1.7 3 10

29
). Thus, deletions are biased away from genes with

respect to duplications. The same trend was observed with
ultra-conserved elements but their number is too small to provide
statistical significance.

If deletions are under stronger purifying selection (which removes
deleterious variants from the population) than duplications8,45, then
deletions should, on average, be both less frequent and smaller than
duplications. Although, on average, deletions were almost threefold
shorter than duplications (43 kb versus 120 kb from 500K EA), we
detected no significant difference in the frequencies with which dele-
tions and duplications were called (P . 0.05 using G-test for inde-
pendence46 on WGTP data). We note that our length analysis could
be confounded if long duplications ar

Ecology homework help

e

© 2001 Nature Publishing Group http://genetics.nature.com

letter

An abundance of X-linked genes expressed in
spermatogonia

P. Jeremy Wang1, John R. McCarrey2, Fang Yang1 & David C. Page1

©
2

0
0
1
N

a
tu

re
P

u
b

li
s
h

in
g

G
ro

u
p

h

tt
p

:/
/g

e
n

e
ti

c
s
.n

a
tu

re
.c

o
m

Spermatogonia are the self-renewing, mitotic germ cells of the
testis from which sperm arise by means of the differentiation
pathway known as spermatogenesis1. By contrast with
hematopoietic and other mammalian stem-cell populations,
which have been subjects of intense molecular genetic investi-
gation, spermatogonia have remained largely unexplored at the
molecular level. Here we describe a systematic search for genes
expressed in mouse spermatogonia, but not in somatic tissues.
We identified 25 genes (19 of which are novel) that are
expressed in only male germ cells. Of the 25 genes, 3 are Y-
linked and 10 are X-linked. If these genes had been distributed
randomly in the genome, one would have expected zero to two
of the genes to be X-linked. Our findings indicate that the X
chromosome has a predominant role in pre-meiotic stages of
mammalian spermatogenesis. We hypothesize that the X chro-
mosome acquired this prominent role in male germ-cell develop-
ment as it evolved from an ordinary, unspecialized autosome.
We identified genes specific to germ cells through ‘cDNA sub-
traction’2,3, whereby a pool of transcripts present in one cell type
(‘tracer’) is depleted of transcripts shared with other cell types
(‘driver’). In our subtraction, tracer cDNA was generated from
purified mouse spermatogonia4, whereas driver cDNA was gen-
erated from a combination of 11 somatic tissues (heart, brain,
lung, liver, skeletal muscle, kidney, spleen, stomach, thymus, skin
and germ-cell-depleted KitW-v/W-v testis5).

To validate our cDNA subtraction experiments, we tested
whether we had recovered previously identified genes that were
known to be expressed in spermatogonia but not in somatic tis-
sues. Eight such genes (Mage, Ube1y, Usp9y, Rbmy, Stra8, Ott,
Ddx4 and Dazl; Table 1) had been identified during the past
decade through the efforts of several laboratories. The extent to
which we recovered the eight known genes would provide a mea-
sure of our protocol’s adequacy in capturing spermatogonially
expressed, germ-cell–specific genes. We determined the
nucleotide sequence of 2,235 fragments chosen at random from
the cDNA subtraction product. We expected that this collection of
sequence fragments would constitute a redundant sampling of a
much smaller set of genes. Nucleotide sequence analysis revealed
that 409 fragments corresponded to 13 known germ-cell–specific
genes, including all 8 genes shown to be expressed in spermatogo-
nia in previous studies (Table 1). We recovered five other known
germ-cell–specific genes (Table 1) that were not previously
reported to be expressed in spermatogonia. We tested and con-
firmed their expression in purified spermatogonia by RT–PCR
(data not shown; primitive type A and mature type A and B sper-
matogonia prepared from prepubertal testes). We recovered no
known genes specific to meiotic or post-meiotic germ cells. These
results indicated that our spermatogonial cDNA subtraction
would provide an efficient and sensitive route to the identification
of germ-cell–specific genes expressed before meiosis.

Through further analysis of the remaining subtraction product
sequences, we identified 23 novel germ-cell–specific genes. We
first identified sequence fragments that were present at least twice
among the 2,235 subtraction product sequences and that did not
correspond to known genes. By testing these sequence fragments
for expression in diverse mouse tissues, we identified novel frag-
ments that seemed to be expressed in germ cells, but not in
somatic cells of the testis or other organs (Fig. 1). Nucleotide
sequencing of cDNA clones, and rescreening of libraries as neces-
sary, resulted in full-length cDNA sequences for 23 novel germ-
cell–specific genes (Table 2).

Virtual translation of these novel cDNA sequences and com-
parison with the previously reported genes indicate that many
spermatogonially expressed, germ-cell–specific proteins are
involved in transcriptional or post-transcriptional regulation of
gene expression. Similarities to well-characterized proteins sug-
gest that these proteins include a component of RNA polymerase
II transcription initiation complexes (the product of Taf2q; Table
2), a nuclear RNA export factor (Nxf2), a ribonuclease inhibitor
(Rnh2), a ring-finger protein (Rnf17), an RNA helicase
(Mov10l1), and four proteins with RNA-binding domains (RRM
domains in the Dazl (refs. 6,7) and Rbmy products8; tudor
domains in the Stk31 and Tdrd1 products). These findings, and
particularly the large number of putative RNA regulators, are
reminiscent of the large role played by post-transcriptional gene
regulation in pre-meiotic germ-cell development in Drosophila
melanogaster and Caenorhabditis elegans9,10. Our studies suggest
that the same is true of pre-meiotic germ-cell development in
mammals.

We examined the sex specificity of all 36 spermatogonially
expressed, germ-cell–specific genes—in particular, whether they
are expressed in the ovary, the site of female germ cells. Eleven
genes (four novel genes and seven previously reported genes) are

Table 1 • Known mouse genes expressed in spermatogonia
but not in somatic tissues

Gene symbol* Expression Chromosome

Mage testis X
Ube1y testis Y
Usp9y testis Y
Rbmy testis Y
Tuba3/Tuba7 testis 6
Stra8 testis 6
Ott testis and ovary X
Sycp2 testis and ovary 2
Sycp1 testis and ovary 3
Figla testis and ovary 6
Sycp3 testis and ovary 10
Ddx4 testis and ovary 13
Dazl testis and ovary 17

*For references, see Web Table A.

1Howard Hughes Medical Institute, Whitehead Institute, and Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.
2Department of Genetics, Southwest Foundation for Biomedical Research, San Antonio, Texas, USA. Correspondence should be addressed to D.C.P.
(e-mail: dcpage@wi.mit.edu).

422 nature genetics • volume 27 • april 2001

Gapd

Fshr

Dazi

Rbmy

Fth / 17

Usp26

Tkt/1

Tex11

Tex16

Taf2q

Prame/3

Nxf2

Tex13

Prame/1

Tex17

Stk31

Rnh2

Tex12

Tex18

Tex14

Rnf17

Piwi/2

Mov10/1

Tex20

Tex15

Tex19

Tdrd1

.-,

ti

“‘
Q)

·c ,,, u ,,,
0 -~ ::,
Cl E 0 ! .c 1o :,. <ii >, C u ,,,
§ ~ Q) “‘

::, ,,,
~ Q) Q) E Cl 1:’. E C Q) -~ “‘ -.; C

C Q) ai “‘ ·.; a. > 3:: -“‘ :.i2
,,

0.. 0
~

C Q) >,

..0 ,,, ! 0 ,,, ,,, :.i2 ,,, .;; .2 .c £

-·———–· .. —

-· – –

— …

© 2001 Nature Publishing Group http://genetics.nature.com

letter
©

2
0
0
1
N

a
tu

re
P

u
b

li
s
h

in
g

G
ro

u
p

h

tt
p

:/
/g

e
n

e
ti

c
s
.n

a
tu

re
.c

o
m

expressed in ovary and in male germ cells (Tables 1 and 2, and
Fig. 1). By contrast, the other 25 genes (19 of which are novel)
seem to be male-specific (Table 1 and Fig. 1).

We then discovered a strong and unexpected correlation
between the sex specificities of the genes and their genomic loca-
tions. We ascertained the chromosomal locations of all 36 sper-
matogonially expressed, germ-cell–specific genes, 8 of which had
been mapped previously (Fig. 2). As expected, the 11 genes that
are expressed in both testes and ovaries seemed to be scattered
randomly throughout the genome, with 1 such gene mapping to
the X chromosome and the other 10 genes distributed among 9
autosomes. By contrast, 13 of 25 male-specific genes mapped to
sex chromosomes, with 3 genes (all as previously reported) local-
izing to the Y chromosome and 10 genes (9 of which are novel)
mapping to the X chromosome. If the 22 non-Y-chromosomal,
male-specific genes had been distributed randomly throughout
the genome, one would have expected 0–2 such genes to be X-
linked. Our finding of 10 X-linked genes is highly unlikely to have
occurred by chance (P<10–8), and it indicates a roughly 15-fold
enrichment on the X chromosome for male germ-cell–specific,
spermatogonially expressed genes. Our mapping and expression
studies indicate that, in mammals, the X chromosome has a role
in the mitotic stages of spermatogenesis.

Why should the mammalian X chromosome have such a spe-
cialization in spermatogonial function? The mammalian X and Y
chromosomes evolved from an ordinary pair of autosomes, a
gradual process that began approximately 240–320 million years
ago with the emergence of SRY, the sex-determining factor, on
one member of that ancestral autosomal pair11,12. Apart from

nature genetics • volume 27 • april 2001

Fig. 1 Expression of 23 novel germ-cell-specific genes in mouse tissues assayed
by RT–PCR. RNAs were prepared from spermatogonia of 8-d CD-1 males, testes
of 8-d C57BL/6J males, ovaries of adult C57BL/6J females, germ-cell–depleted
testes of adult KitW-v/W-v C57BL/6J males5, and other tissues of 8-d C57BL/6J
males. The Gapd, Fshr, Dazl and Rbmy served as controls. Gapd is expressed
ubiquitously. Fshr is expressed in somatic cells of testis and ovary. The Fshr sig-
nal in the spermatogonial lane likely reflects the fact that there is 15% contam-
ination (with testicular somatic cells) in the spermatogonial preparation. Rbmy
and Dazl are expressed only in germ cells, with Rbmy expressed only in testis8

and Dazl expressed in both testis and ovary6,7. Germ cells are reduced in num-
ber, but are not entirely absent, in KitW-v/W-v testes. For genes expressed in sper-
matogonia, one expects to see a reduced RT–PCR signal (or none) in KitW-v/W-v

testes as compared with wild-type testes. This was observed for all genes
except the somatically expressed controls Gapd and Fshr. For Tex20 and Tex15,
faint RT–PCR signals are visible in some somatic tissues. Additional RT–PCR data
not shown: all 36 germ-cell-specific genes under study (including the 13 genes
listed in Table 1) were found to be expressed in primitive type A spermatogo-
nia prepared from 6-d CD-1 mice.

SRY, the ancestral autosome was unlikely to have had an outsized
role in testicular development or function. At issue then are the
adaptive forces that caused the X chromosome, as it differentiated
from the Y chromosome, to accumulate so many genes expressed
in early stages of spermatogenesis. No explanation is provided by
traditional, prevailing models of mammalian X-chromosome
evolution, as these have focused on issues of gene dosage (the
emergence of X inactivation) without envisioning or predicting
any functional specialization13,14. We will outline two possible
explanations, both previously debated in evolutionary biology:
sex-chromosome meiotic drive15–17 and sexual antagonism18–20.

Meiotic drive, which has been documented in diverse species,
including mice21, refers to mechanisms that result in preferential
transmission of certain chromosomes at the expense of their
homologs. X and Y chromosomes are considered much more
likely to become subject to meiotic drive during evolution than
are autosomes15,16. Sex-chromosome meiotic drive skews the
transmission of X versus Y chromosomes to gametes, and thus
the critical drive genes should be expressed during spermatogen-
esis. Perhaps some of the X-linked genes that we report are dri-
vers of X transmission or suppressors of Y transmission.

The theory of sexually antagonistic genes, which has been pos-
tulated to explain the wealth of spermatogenesis factors on mam-
malian Y chromosomes18,22, might also account for our findings
on the X chromosome. Studies in Drosophila and other systems
have demonstrated the existence of sexually antagonistic genes,
which enhance reproductive fitness in one sex but diminish fit-
ness in the other sex23. Empirical and theoretical studies indicate
that, during evolution, sexually antagonistic genes should accu-
mulate preferentially on sex chromosomes18,19. Here, conditions
favor the emergence of genes that benefit the heterogametic sex
(for example, XY), even when those genes are detrimental to the
homogametic sex (XX). Sexual antagonism provides a powerful
explanation for the enrichment of dominant genes that benefit
males on Y chromosomes, including male-ornamentation genes
in guppies and spermatogenesis genes in mammals18,22,24.

Focusing on recessive mutations that enhance reproductive fit-
ness in males but diminish it in females, it has been argued that
natural selection should favor the emergence of sexually antago-
nistic alleles on X chromosomes19. The evolutionary dynamics of
such male-benefit mutations were considered when they first
appear as rare alleles on X chromosomes as opposed to auto-
somes. When they are rare, autosomal recessive alleles would be
of no advantage to (heterozygous) males and thus would be
unlikely to spread widely in the population. By contrast, X-linked
recessive alleles would immediately benefit hemizygous males,
greatly increasing the alleles’ likelihood of permeating the popu-
lation. Eventually, as an allele’s frequency increased in the popu-
lation, female fitness would be diminished by the detrimental

423

© 2001 Nature Publishing Group http://genetics.nature.com

letter

Table 2 • New spermatogonially expressed, germ-cell–specific genes in mouse, and their human orthologs

©
2

0
0
1
N

a
tu

re
P

u
b

li
s
h

in
g

G
ro

u
p

h

tt
p

:/
/g

e
n

e
ti

c
s
.n

a
tu

re
.c

o
m

Mouse genes Human orthologs
Gene symbol Gene name Expression Chr Comments* Gene symbol Chr

Fthl17 ferritin heavy testis X ferritin, functioning in iron metabolism, FTHL17 X
polypeptide-like 17 consists of 24 heavy and light chains

Usp26 ubiquitin specific protease 26 testis X predicted protein contains His and Cys domains USP26 X
conserved among deubiquitinating enzymes

Tktl1 transketolase-like 1 testis X homologous to human transketolase TKTL1
Tex11 testis expressed gene 11 testis X novel 947-residue protein TEX11 X
Tex16 testis expressed gene 16 testis X novel 1,139-residue protein; rich in serine
Taf2q TBP-associated factor, testis X human autosomal homolog TAF2F encodes TAF2Q X

RNA polymerase II, Q a component of TFIID

Pramel3 PRAME (human)-like 3 testis X homologous to human PRAME, encoding a
melanoma antigen recognized by cytotoxic T cells

Nxf2 nuclear RNA export factor 2 testis X homologous to Mex67p, NXF1 and NXF2, NXF2 X
encoding nuclear RNA export factors

Tex13 testis expressed gene 13 testis X novel 186-residue protein; 2 closely related TEX13A X
homologs on human X chromosome TEX13B X

Pramel1 PRAME (human)-like 1 testis 4 homologous to human PRAME
Tex17 testis expressed gene 17 testis 4 novel 120-residue protein; calculated pI 9.9
Stk31 serine/threonine kinase 31 testis 6 putative protein kinase with tudor domain (found STK31 7

in RNA-interacting proteins) and coiled coil region
Rnh2 ribonuclease inhibitor 2 testis 7 predicted protein contains 6 leucine-rich repeats
Tex12 testis expressed gene 12 testis 9 novel 123-residue protein with coiled coil region TEX12 11
Tex18 testis expressed gene 18 testis 10 novel 80-residue protein
Tex14 testis expressed gene 14 testis 11 predicted protein contains 2 protein kinase domains TEX14 17
Rnf17 ring finger protein 17 testis 14 a RING finger-containing protein RNF17 13
Piwil2 piwi (Drosophila)-like 2 testis 14 homologous to Drosophila piwi, involved in

germline stem cell renewal and meiotic drive
Mov10l1 Mov10 (mouse)-like 1 testis 15 putative RNA helicase MOV10L1 22
Tex20 testis expressed gene 20 testis and ovary 2 novel 188-residue protein; calculated pI 10.2
Tex15 testis expressed gene 15 testis and ovary 8 novel 2785-residue protein TEX15 8
Tex19 testis expressed gene 19 testis and ovary 11 novel 351-residue protein with coiled coil region
Tdrd1 tudor domain protein 1 testis and ovary 19 predicted protein contains 4 tudor domains TDRD1 10

*For references, see Web Table B.

effects of homozygosity. This would generate adaptive pressure to
limit the gene’s expression to males, through additional muta-
tions. Based on this theoretical scenario, it was postulated that X
chromosomes should evolve to carry a disproportionate share of
male-specific genes functioning in male differentiation19. Our
findings are in accord with this prediction.

Our hypothesis that mammalian X chromosomes have pre-
eminent roles in early stages of spermatogenesis can now be
tested through targeted disruption of the many X-linked mouse
genes reported here, and through genetic studies in humans. To
facilitate studies in humans, we identi-
fied orthologous, full-length human
cDNA sequences for 13 of 23 novel Chr. 1 2 3 4 5

CEN

Tex20

Sycp1
Mov10l1

Rnh2

Tex15

Tex12

Tex14

Tex19

Rnf17

Piwil2

Ube1y
USP9y
Rbmy

I
i

Sycp2
Tex17

Pramel1

6 7 8 9 10 11 12 13 14 15 16 17 18 19 X Y

Dazl

Stra8

Stk31
Tex18

Sycp3
Ddx4

Figla

Tuba7

Fthl17

Usp26

Tktl1

Tex11

Tex16

Ott

Taf2q

Nxf2
Tex13

Pramel3

Mage

Tdrd1

mouse genes reported here (Table 2). In
all 13 cases, the orthologous human
genes are expressed exclusively in testes
(or testes and ovaries), presumably in
germ cells (Fig. 3), and map to chromo-

Fig. 2 Chromosomal locations of 36 spermato-
gonially expressed, germ-cell–specific genes in
mouse. Genes that seem to be expressed only in
testis are shown in blue; genes expressed in
both testis and ovary are shown in red; novel
genes are boxed. Eight of the genes (Sycp3,
Sycp1, Dazl, Rbmy, Ube1y, Usp9y, Mage, Ott)
were mapped previously (Table 1); all other
genes were mapped by radiation hybrid analy-
sis. In the case of gene families residing on a sin-
gle chromosome, only one family member is
shown (for example, Magea5 is a representative
of the X-linked Mage family). The Y chromo-
some is shown in proportion to its estimated
physical length30; all other chromosomes are
drawn on a centiray scale29.

somal regions of known conserved synteny between the mouse
and human genomes (Table 2). In particular, we have identified
testis-specific, X-linked human orthologs of six of the novel
testis-specific, X-linked mouse genes reported here. In the cDNA
subtraction experiments reported here, we recovered the mouse
homologs of USP9Y, RBMY and DAZ, the three human Y chro-
mosomal genes that have been most strongly implicated in male
infertility25–27 (Table 1). Perhaps some of the novel X-linked
genes will also prove to be sites of mutation in human spermato-
genic failure. The stage is set for systematic examination, in both

424 nature genetics • volume 27 • april 2001

FTH1
FTHL17
USP26
TEX11
TAF2O
NXF2

TEX13A
TEX138
STK31
TEX12
TEX14
RNF17
MOV10L1
TEX15
TDRD1

ti

Q)
Q)

Cl) N >, C C (,) Q) ::, Cl) ~ ~ C 0 Q) E Cl) ~ “‘ Q) _Q .>< a. >, e > .~ 0 ::, Cl) ;; 0. J!l 0 (,) ~
——–

© 2001 Nature Publishing Group http://genetics.nature.com

letter
©

2
0
0
1
N

a
tu

re
P

u
b

li
s
h

in
g

G
ro

u
p

h

tt
p

:/
/g

e
n

e
ti

c
s
.n

a
tu

re
.c

o
m

Fig. 3 Expression of ortholo-
gous human genes assayed
by RT–PCR. FTH1 (encoding
ferritin heavy chain) is
expressed ubiquitously and
served as a control. All of the
novel human genes seem to
be expressed in testis; TEX15
is also expressed in ovary, as
is its mouse ortholog (Fig. 1).
We also tested eight addi-
tional tissues (heart, brain,
placenta, lung, liver, skeletal
muscle, kidney, pancreas)
and detected no expression
of the novel genes there
(data not shown).

mouse and human, of the postulated role of the X chromosome
in early stages of spermatogenesis.

Note: supplementary information is available on the Nature Genet-
ics web site (http://genetics.nature.com/supplementary_info/).

Methods
Isolation of mouse spermatogonia. We isolated spermatogonia by the
Staput method of sedimentation velocity at unit gravity4. Primitive type A
spermatogonia were prepared from testes of 6-d CD-1 mice (Charles River
Laboratories). Mature type A and type B spermatogonia were isolated from
8-d CD-1 mice. By microscopic examination, at least 85% of the cells in the
resulting preparations were spermatogonia, with no more than 15%
somatic cell (Sertoli cell) contamination. The spermatogonial preparations
contained no spermatocytes, as spermatocytes are not present in the testes
of 6-d or 8-d CD-1 mice4.

cDNA subtraction. We carried out three independent subtraction experi-
ments, using cDNAs from primitive type A, type A or type B spermatogo-
nia as tracer. In all cases, tracer and driver cDNAs were derived from oli-
go(dT)-selected RNAs. Germ-cell-depleted testes were from KitW-v/W-v ani-
mals. Before subtraction, tracer and driver cDNAs were digested to com-
pletion with RsaI. In each of the three experiments, we carried out one
round of subtraction using the PCR-select protocol2 (Clontech). To more
thoroughly subtract ubiquitous cDNAs, four additional rounds of subtrac-
tion were performed using a modified procedure (D. Menke, pers. comm.)
as described3. Between rounds of subtraction, we monitored enrichment of
Dazl cDNA (germ-cell-specific) and disappearance of Gapd cDNA (ubiq-
uitous). Three plasmid libraries (one for each of the three independent
experiments) were prepared from the resulting pools of subtracted cDNA
fragments. We sequenced (one read only) 800 randomly selected clones
from each of the three libraries. Of the 2,400 sequences generated, 165 were
of poor quality or derived from the cloning vector, leaving 2,235 sequences
for further analysis.

Sequence analysis. Of the 2,235 sequence fragments, 409 corresponded to
13 previously reported germ-cell-specific genes (142 to Mage, 11 to Ube1y,
2 to Usp9y, 44 to Rbmy, 10 to Tuba3/Tuba7, 2 to Stra8, 45 to Ott, 16 to
Sycp2, 3 to Sycp1, 3 to Figla, 8 to Sycp3, 21 to Ddx4 and 102 to Dazl).
Among the remaining 1,826 sequence fragments, we searched electronical-
ly for redundancies and identities to known genes. We found 98 unique,
novel sequence fragments that were each recovered at least twice. We tested
each of these 98 sequences for germ-cell specificity by RT–PCR on the 14
tissues shown in Fig. 1. Of the 98 sequences, 45 were found to be expressed
in spermatogonia and wild-type testis, but not in somatic tissues including
KitW-v/W-v testis, indicating that they are germ-cell specific. After full-
length cDNA sequences were assembled, these 45 sequence fragments were
found to derive from a total of 23 different genes. Of the original set of
2,235 sequence fragments, 546 corresponded to these 23 novel genes (8 to
Fthl17; 29 to Usp26; 38 to Tktl1; 66 to Tex11; 2 to Tex16; 132 to Taf2q; 57 to

nature genetics • volume 27 • april 2001

Pramel3; 13 to Nxf2; 5 to Tex13; 4 to Pramel1; 3 to Tex17; 2 to Stk31; 6 to
Rnh2; 29 to Tex12; 4 to Tex18; 2 to Tex14; 8 to Rnf17; 16 to Piwil2; 36 to
Mov10l1; 7 to Tex20; 71 to Tex15; 6 to Tex19; 2 to Tdrd1).

cDNA cloning. Full-length mouse cDNA sequences were composites
derived from subtracted cDNA clones, 5´- and 3´-RACE products, and
clones isolated from conventional cDNA libraries that were prepared from
adult testes (Clontech, Stratagene and one library of our own construc-
tion). We identified orthologous human sequences by searching GenBank
using mouse cDNA sequences. We obtained full-length human cDNA
sequences by screening a cDNA library prepared from adult testes (Clon-
tech).

Radiation hybrid mapping. Using PCR, we tested genomic DNAs from the
93 cell lines of the mouse T31 radiation hybrid panel (Research Genetics)
for the presence of each gene28. PCR conditions and primer sequences have
been deposited at GenBank. Analysis of the results positioned the genes
with respect to the radiation hybrid map of the mouse genome constructed
at the Whitehead/MIT Center for Genome Research29 (http://www-
genome.wi.mit.edu/mouse_rh/index.html). Chromosomal mapping data
of human genes were retrieved from GenBank and confirmed by RH map-
ping using the GeneBridge 4 panel (Research Genetics; data not shown).

Expression analysis. Total RNAs were prepared using TRIzol reagent (Gib-
co BRL); poly(A)+ RNAs were subsequently isolated using a QuickPrep
Micro mRNA purification kit (Amersham Pharmacia Biotech). For each of
the 14 tissues shown in Fig. 1, reverse transcription primed with either ran-
dom hexamers or oligo (dT)18 was carried out in bulk, using poly(A)

+

RNA (70 ng) from spermatogonia and poly(A)+ RNA (200 ng) from each
of the other tissues. RT products were diluted to a final volume of 200 µl, 5
µl of which was used in each PCR amplification. PCR conditions and
primer sequences have been deposited at GenBank.

GenBank accession numbers. cDNA sequences for mouse genes: Fthl17,
AF285569; Mov10l1, AF285587; Nxf2, AF285575; Piwil2, AF285586;
Pramel1, AF285578; Pramel3, AY004873; Rnf17, AF285585; Rnh2,
AF285581; Stk31, AF285580; Taf2q, AF285574; Tdrd1, AF285591; Tex11,
AF285572; Tex12, AF285582; Tex13, AF285576; Tex14, AF285584; Tex15,
AF285589; Tex16, AF285573; Tex17, AF285579; Tex18, AF285583; Tex19,
AF285590; Tex20, AF285588; Tktl1, AF285571; and Usp26, AF285570.

cDNA sequences for human genes: FTHL17, AF285592; MOV10L1,
AF285604; NXF2, AF285596; RNF17, AF285602 and AF285603; STK31,
AF285599; TAF2Q, AF285595; TDRD1, AF285606; TEX11, AF285594;
TEX12, AF285600; TEX13A, AF285597; TEX13B, AF285598; TEX14,
AF285601; TEX15, AF285605; and USP26, AF285593.

Primer sequences and PCR conditions for mouse RH mapping: Figla,
G65193; Magea5, G65194; Ddx4, G65195; Ott, G65196; Sycp2, G65197;
Sycp3,G65198; Stra8,G65199; Tuba3, G65200; Tuba7, G65201; Fthl17,
G65202; Mov10l1,G65203; Nxf2,G65204; Piwil2, G65205; Pramel1,
G65206; Pramel3, G65331; Rnf17, G65207; Rnh2, G65208; Stk31, G65210;
Taf2q, G65211; Tdrd1, G65212; Tex11, G65213; Tex12, G65214; Tex13,
G65215; Tex14, G65216; Tex15, G65217; Tex16, G65218; Tex17, G65219;
Tex18, G65220; Tex19, G65221; Tex20, G65222; Tktl1, G65223; and Usp26,
G65224.

Primer sequences and RT–PCR conditions for mouse genes: Gapd,
G65758; Fshr, G65759; Dazl, G65760; Rbmy, G65761; Fthl17, G65778;
Mov10l1, G65779; Nxf2, G65780; Piwil2, G65781; Pramel1, G65762;
Pramel3, G65782; Rnf17, G65763; Rnh2, G65783; Stk31, G65784; Taf2q,
G65785; Tdrd1, G65786; Tex11, G65787; Tex12, G65788; Tex13, G65789;
Tex14, G65790; Tex15, G65791; Tex16, G65792; Tex17, G65793; Tex18,
G65794; Tex19, G65795; Tex20, G65796; Tktl1, G65797; Usp26, G65798.

Primer sequences and RT–PCR conditions for human genes: FTH1,
G65764; FTHL17, G65765; MOV10L1, G65766; NXF2, G65767; RNF17,
G65799; STK31, G65768; TAF2Q, G65769; TDRD1, G65770; TEX11,
G65771; TEX12, G65772; TEX13A, G65773; TEX13B, G65774; TEX14,
G65775; TEX15, G65776; USP26, G65777.

Acknowledgments
We thank D. Menke for developing the subtraction protocol; H. Skaletsky for
statistical advice and bioinformatics support; and A. Arango, D. Berry, A.

425

e

© 2001 Nature Publishing Group http://genetics.nature.com

letter
©

2
0
0
1
N

a
tu

re
P

u
b

li
s
h

in
g

G
ro

u
p

h

tt
p

:/
/g

e
n

e
ti

c
s
.n

a
tu

re
.c

o
m

Bortvin, D. Charlesworth, B. Charlesworth, A. Chess, A. Clark, C. Disteche,
L. Goldmakher, D. Haig, M. Handel, R. Jaenisch, T. Kawaguchi, F. Lewitter,
B. Lahn, A. Lin, D. Menke, T. Rasmussen, W. Rice, S. Rozen and S. Silber for
comments on the manuscript. Supported by National Institutes of Health.
P.J.W. was the recipient of a Lalor Foundation fellowship.

Received 27 December 2000; accepted 7 March 2001.

1. de Rooij, D.G. & Grootegoed, J.A. Spermatogonial stem cells. Curr. Opin. Cell Biol.
10, 694–701 (1998).

2. Diatchenko, L. et al. Suppression subtractive hybridization: a method for
generating differentially regulated or tissue-specific cDNA probes and libraries.
Proc. Natl. Acad. Sci. USA 93, 6025–6030 (1996).

3. Lavery, D.J., Lopez-Molina, L., Fleury-Olela, F. & Schibler, U. Selective amplification
via biotin- and restriction-mediated enrichment (SABRE), a novel selective
amplification procedure for detection of differentially expressed mRNAs. Proc.
Natl. Acad. Sci. USA 94, 6831–6836 (1997).

4. Bellve, A.R. Purification, culture, and fractionation of spermatogenic cells.
Met

Ecology homework help

Copyright © 1999 by the Genetics Society of America

Preservation of Duplicate Genes by Complementary, Degenerative Mutations

Allan Force,* Michael Lynch,* F. Bryan Pickett,† Angel Amores,*
Yi-lin Yan* and John Postlethwait*

*Department of Biology, University of Oregon, Eugene, Oregon 97403 and †Department of Biology,
Loyola University of Chicago, Chicago, Illinois 60626

Manuscript received March 17, 1998
Accepted for publication December 28, 1998

ABSTRACT
The origin of organismal complexity is generally thought to be tightly coupled to the evolution of new

gene functions arising subsequent to gene duplication. Under the classical model for the evolution of
duplicate genes, one member of the duplicated pair usually degenerates within a few million years by
accumulating deleterious mutations, while the other duplicate retains the original function. This model
further predicts that on rare occasions, one duplicate may acquire a new adaptive function, resulting in
the preservation of both members of the pair, one with the new function and the other retaining the old.
However, empirical data suggest that a much greater proportion of gene duplicates is preserved than
predicted by the classical model. Here we present a new conceptual framework for understanding the
evolution of duplicate genes that may help explain this conundrum. Focusing on the regulatory complexity
of eukaryotic genes, we show how complementary degenerative mutations in different regulatory elements
of duplicated genes can facilitate the preservation of both duplicates, thereby increasing long-term opportu-
nities for the evolution of new gene functions. The duplication-degeneration-complementation (DDC)
model predicts that (1) degenerative mutations in regulatory elements can increase rather than reduce
the probability of duplicate gene preservation and (2) the usual mechanism of duplicate gene preservation
is the partitioning of ancestral functions rather than the evolution of new functions. We present several
examples (including analysis of a new engrailed gene in zebrafsh) that appear to be consistent with the
DDC model, and we suggest several analytical and experimental approaches for determining whether the
complementary loss of gene subfunctions or the acquisition of novel functions are likely to be the primary
mechanisms for the preservation of gene duplicates.

For a newly duplicated paralog, survival depends on the outcome of the race between entropic
decay and chance acquisition of an advantageous regulatory mutation.

Sidow (1996, p. 717)

On one hand, it may fx an advantageous allele giving it a slightly different, and selectable, function
from its original copy. This initial fxation provides substantial protection against future fxation of
null mutations, allowing additional mutations to accumulate that refne functional differentiation.
Alternatively, a duplicate locus can instead frst fx a null allele, becoming a pseudogene.

Walsh (1995, p. 426)

Duplicated genes persist only if mutations create new and essential protein functions, an event that
is predicted to occur rarely.

Nadeau and Sankoff (1997, p. 1259)

Thus overall, with complex metazoans, the major mechanism for retention of ancient gene duplicates
would appear to have been the acquisition of novel expression sites for developmental genes,
with its accompanying opportunity for new gene roles underlying the progressive extension of
development itself.

Cooke et al. (1997, p. 362)

THE genomes of most organisms contain multiple events such as those presumed to have preceded the copies of genes that are closely related in structure origin of vertebrates (Ohno 1970; Morizot et al. 1991;
and function. Such gene families can arise from tandem Lundin 1993; Holland et al. 1994; Amores et al. 1998;
duplications, as in the case of the HOX, hemoglobin, Pébusque et al. 1998), brewer’s yeast (Wolfe and
and keratin clusters in animals, or from polyploidization Shields 1997; Seoighe and Wolfe 1998), and many

plant species (Lewis 1979). The mechanism that pre-
serves a large proportion of duplicate genes for long
time periods, however, is unclear. The classical model Corresponding author: Allan Force, Department of Biology, University

of Oregon, Eugene, OR 97403. E-mail: force@oregon.uoregon.edu predicts that duplicate genes initially have fully overlap-

Genetics 151: 1531–1545 (April 1999)

1532 A. Force et al.

ping, redundant functions, such that one copy may
shield the second copy from natural selection, if gene
dosage is not critical. Because deleterious mutations
occur much more frequently than benefcial mutations
(Lynch and Walsh 1998), the classical model predicts
that the most common fate for the duplicate pair should
be the fxation of a null allele that prevents normal
transcription, translation, and/or protein function, i.e.,
the formation of a pseudogene at one of the duplicate
loci (Haldane 1933; Nei and Roychoudhury 1973;
Bailey et al. 1978; Takahata and Maruyama 1979;
Li 1980; Watterson 1983). Under this model, frst
elucidated by Ohno (1970), the only mechanism for
the permanent preservation of duplicate genes is the
fxation of rare benefcial mutations endowing one of
the copies with a novel function, while the second copy
maintains the original function. The introductory quo-
tations illustrate the extent to which this model is cur-
rently the central paradigm in the theory of duplicate
gene evolution.

Here we discuss diffculties in the ability of the classical
model to explain the preservation of gene duplicates
in evolution and then propose a new model that can
explain duplicate gene preservation by the fxation of
degenerative mutations rather than by the fxation of
new benefcial mutations. Next, we present several ex-
amples, including original data from the zebrafsh en-
grailed genes, consistent with the new model. Finally, we
suggest a series of experimental approaches for testing
the new model.

Problems with the classical model for the preservation
of gene duplicates: Under the simplest model for the
fate of duplicate genes (the double-recessive model),
the rate at which nonfunctional genes (genes that do
not make a functional protein product) become fxed
in populations is largely determined by random genetic
drift and the null mutation rate (u), provided the prod-
uct of the effective population size and u is ,0.01. Under
these conditions, the frequency of individuals homozy-
gous null at both duplicate loci is negligible, and null
mutations behave essentially as neutral alleles. The
probability that one copy will become nonfunctional is
then z1 2 e22ut , where t is the number of generations
since the two loci have been functionally diploid with
respect to meiosis (Nei and Roychoudhury 1973;
Takahata and Maruyama 1979; Li 1980; Watterson
1983). This result suggests that most gene duplicates
should become nonfunctional with high probability in
a relatively short period of time. For example, if u is
1026 per generation, then the mean time to nonfunc-
tionalization is on the order of a few million generations
or less.

Three general observations involving species derived
from polyploidization events appear to contradict the
rapid demise of gene duplicates predicted by the classi-
cal model. First, in numerous cases, the fraction of genes
preserved is higher than predicted by the classic model.

For example, in tetraploid fsh lineages, 30–75% of the
duplicate protein-coding genes have avoided nonfunc-
tionalization for time spans on the order of 50 to 100
million yr (Allendorf et al. 1975; Ferris and Whitt
1979); in Xenopus laevis, about half of all duplicate genes
have been preserved for 30 million yr (Bisbee et al. 1977;
Graf and Kobel 1991; Hughes and Hughes 1993); and
for the allopolyploidization event in maize, an annual
plant, 72% have avoided nonfunctionalization for 11
million yr (Whitkus et al. 1992; Ahn and Tanksley
1993; White and Doebley 1998). The fact that most
loci observed in these lineages appear to have a non-
functional member in some related tetraploid species
argues against the idea that both duplicate genes are
retained due to constraints imposed by gene dosage
requirements, at least for the enzyme loci investigated.
Although the highest levels of duplicate gene retention
in some fsh and plant lineages may be due to the incom-
plete transition to diploid inheritance, similar estimates
of duplicate gene preservation have emerged for more
ancient polyploidization events for which disomic inher-
itance has clearly been reestablished, such as duplica-
tions that preceded the origin of tetrapods (33%;
Nadeau and Sankoff 1997). Second, in X. laevis, which
became tetraploid z30 mya, nucleotide substitution pat-
terns are consistent with the action of purifying selection
on both copies of the duplicated genes (Hughes and
Hughes 1993). Third, for loci that have avoided non-
functionalization in both duplicate copies, there seems
to be a relative paucity of null alleles segregating in
extant populations (Ferris and Whitt 1977). Such ob-
servations are unexpected for loci involved in an ongo-
ing degenerative process, and suggest the possibility that
the duplicate loci are stabilized in populations.

Several attempts have been made to explain the high
rate of duplicate gene preservation found by empirical
observation. First, surviving duplicate loci in these taxa
may have been preserved because new gene functions
evolve at a much higher rate than predicted. We are
not aware, however, of any convincing evidence that the
majority of duplicate copies have acquired new func-
tions that did not already exist in the ancestral genes
(Ferris and Whitt 1979). A second possible explana-
tion is that long-term effective population sizes may have
been larger than expected, in fact large enough so that
selection against double homozygotes prevents the fxa-
tion of null alleles at either locus (Takahata and Maru-
yama 1979; Li 1980; Walsh 1995). This appears not to
account for the case of X. laevis (Hughes and Hughes
1993). The population size requirements for the preser-
vation of gene duplicates by selection against double
nulls over hundreds of millions of years may be prohibi-
tively extreme. A third possible explanation for the dis-
crepancy between theory and observation is that the
rate of gene loss has been slowed by some form of
natural selection against heterozygous carriers of null
alleles (Bailey et al. 1978; Takahata and Maruyama

Preservation of Duplicate Genes 1533

1979; Li 1980; Hughes and Hughes 1993; Clark 1994;
Nowak et al. 1997).

Gene structure and duplicate gene preservation: An
alternative reason for the failure of the classical model
to explain the fates of most duplicate loci may be an
overly simplistic view of gene structure. Although a gen-
eral assumption of the classical model is that the proper-
ties of a gene may be adequately subsumed under a
single function, genes often have several functions, each
of which may be controlled by different DNA regulatory
elements (see the following reviews for a number of
examples: Piatigorsky and Wistow 1991; Hughes
1994; Kirchhamer et al. 1996; Arnone and Davidson
1997). A case in point is the cut locus in Drosophila
melanogaster ( Jack 1985; Liu et al. 1991; Jack and
DeLotto 1995). Genetic and molecular analyses dem-
onstrate that a 120-kb region of DNA upstream of the
cut promoter drives tissue-specifc expression, and that
many spontaneous recessive mutant alleles result from
insertions of transposable elements into this region. The
regulatory mutation alleles fall into fve complementa-
tion classes, with varying effects on tissue-specifc expres-
sion (in Malpighian tubules, spiracles, central nervous
system, specifc portions of wing and leg imaginal discs,
and embryonic and adult external sensory organs), as
well as on morphology and viability. Similar comple-
mentation groups involving regulatory-region muta-
tions are known for other developmental genes in D.
melanogaster, including cubitus interruptus (Slusarski et
al. 1995) and Ultrabithorax (Bender et al. 1983).

The widespread existence of complementation classes
within eukaryotic gene loci indicates that gene expres-
sion patterns are typically controlled by multiple (and
often modular and independent) regulatory regions as-
sociated with distinct protein-coding domains (Arnone
and Davidson 1997). With the explicit assumption that
these principles involving complementation between al-
leles at the same locus can be extended to complementa-
tion between two duplicate loci, we suggest that the
regulatory complexity inherent in many gene classes
is an essential, but previously missing, component of
models for the evolutionary fate of duplicate genes.
Further justifcation for this argument derives from sub-
stantial evidence showing spatial and temporal parti-
tioning of expression patterns for gene duplicates in a
wide variety of species (Ferris and Whitt 1979;
Hughes and Hughes 1993; Ekker et al. 1995; Lee et al.
1996; Raff 1996; Gerhart and Kirschner 1997). To
formally incorporate the issue of expression pattern
complexity into models of gene duplication, we focus
here on subfunctions that affect different gene expres-
sion domains during development. Here we adopt an
operational defnition of a subfunction as a specifc
subset of a gene’s function that, when mutated, estab-
lishes a distinct complementation group, as in the cut
example above (Liu et al. 1991; Jack and DeLotto
1995). A subfunction might involve the expression of a

gene in a specifc tissue, cell lineage, or developmental
stage, or individual functional domains within the poly-
peptide coding portion of the gene.

The model presented below outlines how degenera-
tive mutations in regulatory subfunctions can facilitate
the preservation of duplicate genes, in the absence of
any positive selection for benefcial mutations, by parti-
tioning the repertoire of gene expression patterns of
ancestral alleles. This model is quite distinct from the
classical model, under which degenerative mutations
can only lead to gene loss and benefcial mutations are
the only route to gene preservation.

GENE PRESERVATION BY COMPLEMENTARY
DEGENERATIVE MUTATIONS
(SUBFUNCTIONALIZATION)

Following a polyploidization event, genomic redun-
dancies exist at several levels: duplicate chromosomes,
duplicate genes, and duplicate regulatory regions driv-
ing gene expression. Each level of redundancy is subject
to processes of mutation and random genetic drift,
which can lead to loss of function by chromosome loss,
gene inactivation, or loss of individual regulatory ele-
ments. If duplicate chromosomes lose different genes,
then for the organism to remain viable, the two chromo-
somes must complement each other by jointly retaining
functional copies of all genes present on the original
ancestral chromosome. Likewise, if duplicate genes lose
different regulatory subfunctions, then they must com-
plement each other by jointly retaining the full set of
subfunctions present in the original ancestral gene. We
refer to the complementary loss of duplicate genetic
elements by degenerative mutation as the duplication-
degeneration-complementation (DDC) process. The
unique feature that distinguishes the DDC process from
the classical model is that degenerative mutations facili-
tate rather than hinder the preservation of duplicate
functional genes. In the following discussion, we focus
on duplications of entire chromosomes or genomes
rather than tandem gene duplications because we wish
to exclude for now complications caused by uncertainty
about the extent of the original duplication and local
homogenization events caused by unequal crossing over
or gene conversions (Zhou and Li 1996).

Under the general DDC model, the process of dupli-
cate gene evolution occurs in two phases (Figure 1).
During phase I, genes may experience one of three
alternative fates, the frst two of which correspond to
outcomes under the classical model. First, one copy
may incur a null mutation in the coding region, which
subsequently drifts to fxation, leading to gene loss (non-
functionalization). Nonfunctionalization can also occur
if all of the regulatory regions of one duplicate are
destroyed. Second, one copy may acquire a mutation
conferring a new function, which becomes fxed
through positive Darwinian selection (neofunctionaliza-

1534 A. Force et al.

Figure 1.—Three potential fates of duplicate
gene pairs with multiple regulatory regions.
The small boxes denote regulatory elements
with unique functions, and the large boxes
denote transcribed regions. Solid boxes de-
note intact regions of a gene, while open boxes
denote null mutations, and triangles denote
the evolution of a new function. Because the
model focuses on mutations fxed in popula-
tions, the diagram shows the state of a single
gamete. In the frst two steps, one of the copies
acquires null mutations in each of two regula-
tory regions. On the left, the next fxed muta-
tion results in the absence of a functional pro-
tein product from the upper copy. Because
this gene is now a nonfunctional pseudogene,
the remaining regulatory regions associated
with this copy eventually accumulate degenera-
tive mutations. On the right, the lower copy
acquires a null mutation in a regulatory region

that is intact in the upper copy. Because both copies are now essential for complete gene expression, this third mutational event
permanently preserves both members of the gene pair from future nonfunctionalization. The fourth regulatory region, however,
may still eventually acquire a null mutation in one copy or the other. In the center, a regulatory region acquires a new function
that preserves that copy. If the benefcial mutation occurs at the expense of an otherwise essential function, then the duplicate
copy is preserved because it retains the original function.

tion). It is now thought that such mutations may often
involve changes in regulatory regions (Grenier et al.
1997; Shubin et al. 1997; Palopoli and Patel 1998).
Assuming this new function results in the loss of an
essential ancestral function, neofunctionalization in-
sures the preservation of the nonmutated copy. [In prin-
ciple, neofunctionalization can also occur if one or both
copies acquire a new regulatory region without altering
existing subfunctions, as pointed out by Sidow (1996)].
Third, each duplicate may experience loss or reduction
of expression for different subfunctions by degenerative
mutations. In such a case, the combined action of both
gene copies is necessary to fulfll the requirements of the
ancestral locus (subfunctionalization). If this happens,
then complementation of subfunctions between dupli-
cate genes will preserve both partially degenerated cop-
ies. In phase II of the DDC model, duplicate genes
preserved either by neofunctionalization or subfunc-
tionalization undergo random resolution of persisting
redundant subfunctions, as the accumulation of degen-
erative mutations eliminates each subfunction in one
or the other copy.

Subfunctionalization can occur by two different
routes: qualitative or quantitative. Under qualitative sub-
functionalization, which we model below and illustrate
in Figure 1, one duplicate copy goes to fxation for a
complete loss-of-subfunction mutation, and the second
locus subsequently acquires a null mutation for a differ-
ent subfunction. In contrast, quantitative subfunctional-
ization results from the fxation of reduction-of-expres-
sion mutations in both duplicates. In this case, once
the total regulatory effciency of a subfunction in both
copies has been reduced to a threshold level determined
by organismal requirements, any further degradation

of the subfunction from either copy may be opposed
by purifying selection.

Mutations that cause subfunctions to degrade may
occur by several mechanisms, including nucleotide sub-
stitutions, deletions, inversions, insertions of transpos-
able elements, slippage/replication errors, and unequal
crossing over between repeated transcription-factor
binding sites. Transposable elements may generate
many subfunctional alleles. For example, P, copia, and
gypsy elements are known to be mutagenic when they
insert into 59 regions of Drosophila genes (Kidwell
and Lisch 1997). Species with a recent history of poly-
ploidization, for example, maize, appear to have such
insertions commonly in the 59 and 39 regions of genes,
whereas in species lacking a recent polyploidization
event, such insertions are infrequent (White et al. 1994;
Wessler et al. 1995). Such transposable element inser-
tions, presumably in regulatory DNA, may be tolerated
in recently evolved polyploid species because of the
redundancy of their regulatory elements.

The probability of subfunctionalization: The argu-
ments presented above suggest that the DDC process
could make both gene duplicates essential, but can it
account for the high levels of duplicate gene preserva-
tion observed in polyploid lineages? Here we consider
a simple model that suggests that, with reasonable pa-
rameter values, the DDC process can account for a sig-
nifcant fraction of preserved duplicate genes.

Consider the situation in which both members of a
recently duplicated gene have z independently mutable
subfunctions, all of which are essential, at least in single
copy, and all of which mutate at identical rates (ur) to
alleles lacking the relevant subfunction. Letting uc be
the rate at which null mutations arise in the coding

Preservation of Duplicate Genes 1535

region, the null mutation rate for the locus is then uc 1
zur per gene copy. We assume that conditions are such
that one functional allele (of four possible allele copies)
of a given duplicated gene pair is suffcient for wild-
type function (the double recessive model), and that
benefcial mutations are rare relative to degenerative
mutations. Provided the product of population size and
genic mutation rate is ,0.01 (Li 1980), the frequency
of double null homozygotes will be suffciently low such
that all allele frequencies will evolve in an effectively
neutral manner. The rate of fxation of a mutation in
a population will then be approximately equal to the
rate of mutation at the level of the gene (Kimura 1983).

Now imagine that one of the duplicate gene copies
experiences a fxation event. Assuming there is more
than one subfunction, the probability that the gene
survives this event (and does not become a pseudogene)
is the total regulatory-region mutation rate divided by
the total mutation rate for the two copies

Prob (survival of frst fxation event) 5
zur . (1)

uc 1 zur

Following the elimination of one of the z subfunctions
from the frst gene copy, the second copy must maintain
this subfunction, because complete loss of an essential
subfunction from both duplicates would be lethal. Thus,
the permissible mutation rate for the second copy be-
comes (z 2 1)ur. Additional null mutations can occur
in the remaining (z 2 1) regulatory subfunctions or in
the coding region in the partially degraded frst copy.
Therefore, the total rate (summed over both copies)
for the second mutational event is [uc 1 2(z 2 1)ur].
The probability of subfunctionalization upon this sec-
ond event, PS,2, is equal to the probability that the coding
regions have survived the frst hit multiplied by the
probability that the second mutation occurs in a comple-
mentary subfunction in the second copy,

zur (z 2 1)urPS,2 5 1 2 1 2. (2) uc 1 zur uc 1 2(z 2 1)ur
Following this logic, it can be seen that (z 2 1) distinct
series of mutational events can lead to duplicate-gene
preservation by subfunctionalization—the frst two null
mutations in regulatory regions may occur on different
gene copies, two may initially occur on the same copy
followed by a third on the second copy, three may ini-
tially occur on the same copy followed by a fourth on
the second copy, and so on. The probability of each of
these additional pathways to subfunctionalization, i.e.,
(i 2 1) consecutive regulatory-region null mutations on
one copy followed by one on the other, is given by the
generalization of Equation 2,

zur
i22 (z 2 j 2 1)urPS,i 5 1 2 p 1 2. (3) uc 1 zur j50 uc 1 2(z 2 j 2 1)ur

The total probability of gene preservation by subfunc-

Figure 2.—Combinations of relative null mutation rates to
regulatory and coding regions (ur/uc) and number of subfunc-
tions (z) that yield various probabilities (PS) of duplicate gene
preservation by the DDC process. The probablility of duplicate
gene preservation increases with the number of regulatory
elements.

tionalization, PS, is obtained by summing this quantity
over i 5 2 to z,

z

PS 5 o PS,i, (4)
i52

and the probability of nonfunctionalization is equal to
1 2 PS. From this expression, we see that the probability
of duplicate-gene preservation increases with the num-
ber of regulatory regions and with the mutation rate per
regulatory region (Figure 2). More regulatory regions
provide more targets for subfunctionalization that can
be hit without penalty, and an increasing mutation rate
per subfunction reduces the relative probability of fxa-
tion of a null mutation in the coding region before
complementation.

The DDC process leads to subfunctionalization with
high probability given reasonable parameter values. For
example, if there are fve subfunctions and the mutation
rate per subfunction is 10% of the coding region null
rate, then the probability of subfunctionalization is 0.1,
and if the mutation rate per subfunction is 30% that of
the null rate, then the probablitity of subfunctionaliza-
tion is 30% (Figure 2). Generally, if the total rate of
subfunctional mutations (zur) exceeds the null rate in
the coding region by more than approximately fourfold,
then the probability of gene preservation by subfunc-
tionalization exceeds 50%. The complexity and size of
regulatory regions of eukaryotic genes (Kirchhamer et
al. 1996; Arnone and Davidson 1997) suggests that
these conditions may be met frequently.

Time scales for subfunctionalization and resolution:

1536 A. Force et al.

Using the model presented above, the mean time to
gene preservation conditional on its actual occurrence
can be obtained by treating the times to mutational
events as geometrically distributed variables. The rate
of occurrence of an initial regulatory-region null muta-
tion is 2zur, because each of the two copies contains z
mutational targets. As noted above, subsequent to this
initial event, zero to (z 2 2) additional degenerative
mutations may be incurred by the frst-hit copy before
the frst mutation on the opposite copy. The mean time
to subfunctionalization conditional on the occurrence
of (i 2 1) consecutive regulatory-region null mutations
on one copy followed by one on the other is then

1 1 i21 1
tS,i 5 1 1 o 2. (5a) ur 2z j51 z 2 j

The mean time to subfunctionalization is then

z tS,i PS,itS 5 o . (5b)
i52 PS

As in the classical model, these expressions indicate
that the fates of duplicate genes are generally deter-
mined in a relatively short period (on an evolutionary
time scale; Figure 3A). For example, if ur 5 1027/yr,
tS is on the order of 4 million yr or less provided the
number of regulatory regions is greater than fve, and
even with z , 5 it does not exceed 12.5 million yr. Thus,
under the DDC model, most duplicate genes that are
destined to be preserved by subfunctionalization are

Figure 3.—(A) The mean time to subfunctionalization as
expected to become so within a few million years. With a function of the number of subfunctions (z), for the situation
a regulatory-region mutation rate x times that in the in which ur 5 1027/yr. (B) The fates of gene pairs are deter-
fgure, the mean time to subfunctionalization would be mined in a short time, on an evolutionary scale, and the ratio

of the mutation rate in regulatory and coding regions is adivided by x.
weak determinant of the expected degree of resolution at the Unless there are only two initial regulatory regions,
time of subfunctionalization.

some regulatory regions (as many as z 2 2) will likely
remain to be resolved over evolutionary time after the
initial subfunctionalization event. The fraction of regu- The molecular nature of subfunctions and the preser-
latory regions that is expected to be resolved at the time vation of genetic redundancy: The preceding theory
of gene preservation by subfunctionalization is assumes that individual regulatory subfunctions are in-

z dependently mutable, with single mutations being suf-iPS,iPr(0) 5 o . (6) fcient to eliminate a subfunction. Under this simple
i52 zPS

scenario, the various subfunctions within duplicate
This fraction depends only weakly on the ratio of genes preserved by the DDC process are expected to

coding-region to regulatory-region mutation rates, and be resolved randomly, with each copy retaining about
is ,0.5 if the number of regulatory regions exceeds fve half of its subfunctions within the limits of binomial
(Figure 3B). Thus, we anticipate that after the preserva- sampling. However, while we defne subfunctions by
tion of duplicate genes by the DDC process, a substantial their mutational properties such that they are members
fraction of regulatory subfunctions will typically remain of distinct complementation classes, this defnition does
to be resolved in phase II. Assuming that the occurrence not describe how such subfunctions are arranged on
of mutations that destroy regulatory regions is a Poisson the DNA molecule. Regulatory regions for different sub-
process, for any site that is unresolved at the time of functions are often partially overlapping or embedded,
gene preservation, the probability that it is still unre- leading to the situation where the number of expression
solved after t further time units is simply P0(t) 5 e22tur. domains exceeds the number of complementation
The number of unresolved sites at time t then follows groups ( Jack and DeLotto 1995; Kirchhamer et al.
a binomial distribution with parameter P0(t). 1996). Some of the central issues are illustrated in Figure

Preservation of Duplicate Genes 1537

Figure 4.—Overlapping and embedded regulatory ele-
ments. All transcription-factor binding sites shown are as-
sumed to be essential for each expression domain. (A) Inde-
pendent regulatory regions with independent transcription-
factor binding sites. (B) Overlapping regulatory regions with
independent sites. (C) Overlapping and embedded regulatory
regions with independent sites. (D) Overlapping regulatory
regions with shared sites. (E) Embedded regulatory regions
with shared sites. (F) Resolution of overlapping regulatory
regions with independent sites (derived from C) leading to
quantitative resolution of regulatory region 2 after 1 and 3 are
destroyed on paralogous copies. (G) Resolution of embedded
regulatory regions with shared sites leading to true redun-
dancy for regulatory region 2 after 1 and 3 are destroyed on
paralogous copies.

4. The situation modeled above is equivalent to a setting
in which the spatial arrangement of transcription-factor
binding sites allows the independent resolution of the
subfunctions (z 5 3 in Figure 4A). Within overlapping
(Figure 4, B and D) or embedded (Figure 4, C and E)
regulatory regions, the transcription-factor binding sites
may be either interdigitated and acting independently
(Figure 4, B and C) or shared, with the same DNA
binding site(s) bei

Ecology homework help

nature Vol 450 | 8 November 2007 | doi:10.1038/nature06340

ARTICLES

Discovery of functional elements in 12
Drosophila genomes using evolutionary
signatures
Alexander Stark1,2*, Michael F. Lin1,2*, Pouya Kheradpour2*, Jakob S. Pedersen3,4*, Leopold Parts5,6 ,
Joseph W. Carlson

7
, Madeline A. Crosby

8
, Matthew D. Rasmussen

2
, Sushmita Roy

9
, Ameya N. Deoras

2
,

J. Graham Ruby10,11 , Julius Brennecke
12
, Harvard FlyBase curators{, Berkeley Drosophila Genome Project{,

Emily Hodges
12
, Angie S. Hinrichs

4
, Anat Caspi

13
, Benedict Paten

4,5,14
, Seung-Won Park

15
, Mira V. Han

16
,

Morgan L. Maeder17 , Benjamin J. Polansky
17
, Bryanne E. Robson

17
, Stein Aerts

18,19
, Jacques van Helden

20
,

Bassem Hassan
18,19

, Donald G. Gilbert
21
, Deborah A. Eastman

17
, Michael Rice

22
, Michael Weir

23
,

Matthew W. Hahn16 , Yongkyu Park
15
, Colin N. Dewey

24
, Lior Pachter

25,26
, W. James Kent

4
, David Haussler

4
,

Eric C. Lai
27
, David P. Bartel

10,11
, Gregory J. Hannon

12
, Thomas C. Kaufman

21
, Michael B. Eisen

28,29
,

Andrew G. Clark30 , Douglas Smith
31
, Susan E. Celniker

7
, William M. Gelbart

8,32
& Manolis Kellis

1,2

Sequencing of multiple related species followed by comparative genomics analysis constitutes a powerful approach for the
systematic understanding of any genome. Here, we use the genomes of 12 Drosophila species for the de novo discovery of
functional elements in the fly. Each type of functional element shows characteristic patterns of change, or ‘evolutionary
signatures’, dictated by its precise selective constraints. Such signatures enable recognition of new protein-coding genes and
exons, spurious and incorrect gene annotations, and numerous unusual gene structures, including abundant stop-codon
readthrough. Similarly, we predict non-protein-coding RNA genes and structures, and new microRNA (miRNA) genes. We
provide evidence of miRNA processing and functionality from both hairpin arms and both DNA strands. We identify several
classes of pre- and post-transcriptional regulatory motifs, and predict individual motif instances with high confidence. We
also study how discovery power scales with the divergence and number of species compared, and we provide general
guidelines for comparative studies.

The sequencing of the human genome and the genomes of dozens of should increase with the number of genomes15–20, in principle enab-
other metazoan species has intensified the need for systematic meth- ling the systematic discovery of all conserved functional elements.
ods to extract biological information directly from DNA sequence. The fruitfly Drosophila melanogaster is an ideal system for deve-
Comparative genomics has emerged as a powerful methodology for loping and evaluating comparative genomics methodologies. Over
this endeavour1,2. Comparison of few (two–four) closely related gen- the past century, Drosophila has been a pioneering model in which
omes has proven successful for the discovery of protein-coding many of the basic principles governing animal development and
genes3–5, RNA genes6,7, miRNA genes8–11 and catalogues of regulatory population biology were established21. In the past decade, the genome
elements3,4,12–14 . The resolution and discovery power of these studies sequence of D. melanogaster provided one of the first systematic views

1
The Broad Institute, Massachusetts Institute of Technology and Harvard University, Cambridge, Massachusetts 02140, USA.

2
Computer Science and Artificial Intelligence

Laboratory, MIT, Cambridge, Massachusetts 02139, USA. 3The Bioinformatics Centre, Department of Molecular Biology, University of Copenhagen, Ole Maaloes Vej 5, 2200
Copenhagen N, Denmark.

4
Center for Biomolecular Science and Engineering, University of California, Santa Cruz, California 95064, USA.

5
Wellcome Trust Sanger Institute, Wellcome

Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. 6Institute of Computer Science, University of Tartu, Estonia. 7BDGP, LBNL, 1 Cyclotron Road MS 64-0119, Berkeley,
California 94720, USA.

8
FlyBase, The Biological Laboratories, Harvard University, 16 Divinity Avenue, Cambridge, Massachusetts 02138, USA.

9
Department of Computer Science,

University of New Mexico, Albuquerque, New Mexico 87131, USA.
10
Department of Biology, MIT, Cambridge, Massachusetts 02139, USA.

11
Whitehead Institute, Cambridge,

Massachusetts 02142, USA.
12
Cold Spring Harbor Laboratory, Watson School of Biological Sciences, 1 Bungtown Road, Cold Spring Harbor, New York 11724, USA.

13
University of

California, San Francisco/University of California, Berkeley Joint Graduate Group in Bioengineering, Berkeley, California 97210, USA.
14
EMBL Nucleotide Sequence Submissions,

European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK. 15Department of Cell Biology and Molecular Medicine, G-629, MSB, 185
South Orange Avenue, UMDNJ-New Jersey Medical School, Newark, New Jersey 07103, USA.

16
Department of Biology and School of Informatics, Indiana University, Indiana 47405,

USA. 17Department of Biology, Connecticut College, New London, Connecticut 06320, USA. 18Laboratory of Neurogenetics, Department of Molecular and Developmental Genetics,
VIB, 3000 Leuven, Belgium.

19
Department of Human Genetics, K. U. Leuven School of Medicine, 3000 Leuven, Belgium.

20
Department de Biologie Moleculaire, Universite Libre de

Bruxelles, 1050 Brussels, Belgium.
21
Department of Biology, Indiana University, Bloomington, Indiana 47405, USA.

22
Department of Mathematics and Computer Science, Wesleyan

University, Middletown, Connecticut 06459, USA. 23Biology Department, Wesleyan University Middletown, Connecticut 06459, USA. 24Department of Biostatistics and Medical
Informatics, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA.

25
Department of Mathematics, University of California at Berkeley, Berkeley, California 94720, USA.

26Department of Computer Science, University of California at Berkeley, Berkeley, California 94720, USA. 27Department of Developmental Biology, Memorial Sloan-Kettering Cancer
Center, New York, New York 10021, USA.

28
Graduate Group in Biophysics, Department of Molecular and Cell Biology, and Center for Integrative Genomics, University of California,

Berkeley, California 94720, USA.
29
Lawrence Berkeley National Laboratory, Life Sciences Division, Berkeley, California 94720, USA.

30
Department of Molecular Biology and Genetics,

Cornell University, Ithaca, New York 14853, USA.
31
Agencourt Bioscience Corporation, 500 Cummings Center, Suite 2450, Beverly, Massachusetts 01915, USA.

32
The Department of

Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts 02138, USA.
*These authors contributed equally to this work.
{Lists of participants and affiliations appear at the end of the paper.

©2007 Nature Publishing Group
219

D
.m

o
j

D
.a

n
a

p
se

D

.p
e
r

D
.s

im se
c

ya
k

D
.m

e
l

D
.e

re

D
.w

il

g
ri

D
.v

irc

D
.

D
.

D
.

D
.

O
p

o
ss

u
m

C
h
ic

ke
n

L
iz

a
rd

F
ro

g

S
ti
c
kl

e
b

a
c
k

F
u
g
u

o
n

is
h

M
e
d

a
ka

p
u
s

H
u
m

a
n

C
h
im

p
R

h
e
su

s
B

u
sh

b
a
b

y
H

o
rs

e
E

le
p

h
a
n
t

D
o
g

A
rm

a
d

ill
o

C
o
w

C
a
t

G
u
in

eR
a
b
b
it

H
e
d

g
e
h
o
g

S
h
re

w
T
e
n
re

c
M

o
u
se R
a
t

a
-p

ig

T
re

e
sh

re
w

T
e
tr

a
o
d

Z
e
b

ra
f

P
la

ty

(+
D

.s
im

)

ak
)

(+
D

.a
na

)

(+
D

.p
se

)

3
sp

.

5
sp

.

s
p

.

6
sp

.

8
sp

.

(+
D

.w
il)

.g
ri)

9
sp

.

D
.m

el

(+
D

.y

12
(+

D

an

(r
he

su
s)

hr
ew

) e)

15
s

p
.

3
sp

.

nt
)

p
us

)

(+
d

o
g
) )

9
sp

.

(+
o
p

o
ss

um

(+
m

o
us

H
um

.

18
s

p
.

5
sp

.

19
s

p
.

20
s

p

(+
el

ep
ha

(+
p

la
ty

(+
tr

ee
s

ARTICLES NATURE | Vol 450 | 8 November 2007

of a metazoan genome22, and the ongoing effort by the FlyBase and
Berkeley Drosophila Genome Project (BDGP) groups established a
systematic high-quality genome annotation23–25. Moreover, the fruit-

26–28fly benefits from extensive experimental resources , which enable
novel functional elements to be systematically tested and used in the

29,30evaluation of genetic screens .
The fly research community has sequenced, assembled and anno-

tated the genomes of 12 Drosophila species22,31,32 at a range of evolu-
tionary distances from D. melanogaster (Fig. 1a, b). The analysis of
these genomes was organized around two complementary aims. The
first, described in an accompanying paper32, was to understand the
evolution of genes and chromosomes on the Drosophila phylogeny,
and how it relates to speciation and adaptation. The second goal,
described here, was to develop general comparative methodologies to
discover and refine functional elements in D. melanogaster using the
12 genomes, and to investigate the scaling of discovery power and its
implications for studies in vertebrates (Fig. 1c).

Here, we report genome-wide alignments of the 12 species
(Supplementary Information 1), and the systematic discovery of
euchromatic functional elements in the D. melanogaster genome.
We predict and refine thousands of protein-coding exons, RNA
genes and structures, miRNAs, pre- and post-transcriptional regu-
latory motifs and regulatory targets. We validate many of these ele-
ments using complementary DNA (cDNA) sequencing, human
curation, small RNA sequencing, and correlation with experimen-
tally supported transcription factor and miRNA targets. In addition,
our analysis leads to several specific biological findings, listed below.
$ We predict 123 novel polycistronic transcripts, 149 genes with
apparent stop-codon readthrough and several candidate programmed

a b
D. melanogaster D.mel

melanogaster D. simulans D.sim
subgroup

melanogaster D. sechellia D.sec
group D. yakuba D.yak

D. erecta D.ere
Subgenus D. ananassae D.ana
Sophophora D. pseudoobscura D.pse

D. persimilis D.per
D. willistoni D.wil

D. mojavensis D.moj
D.virD. virilis

Subgenus D.gri
D. grimshawi 0.1 substitutions

Drosophila
per site

frameshifts, with potential roles in regulation, localization and func-
tion of the corresponding protein products.
$ We make available the first systematic prediction of general RNA
genes and structures (non-coding RNAs (ncRNAs)) in Drosophila,
including several structures probably involved in translational regu-
lation and adenosine-to-inosine RNA editing (A-to-I editing).
$ We present comparative and experimental evidence that some
miRNA loci yield multiple functional products, from both hairpin
arms or from both DNA strands, thereby increasing the versatility
and complexity of miRNA-mediated regulation.
$ We provide further comparative evidence for miRNA targeting in
protein-coding exons.
$ We report an initial network of pre- and post-transcriptional
regulatory targets in Drosophila on the basis of individual high-
confidence motif occurrences.
Comparative genomics and evolutionary signatures. Although
multiple closely related genomes provide sufficient neutral diver-
gence for recognition of functional regions in stretches of highly
conserved nucleotides16,17,33, measures of nucleotide conservation
alone do not distinguish between different types of functional ele-
ments. Moreover, functional elements that tolerate abundant ‘silent’
mutations, such as protein-coding exons and many regulatory
motifs, might not be detected when searching on the basis of strong
nucleotide conservation.

Across many genomes spanning larger evolutionary distances, the
information in the patterns of sequence change reveals evolutionary
signatures (Fig. 2) that can be used for systematic genome annota-
tion. Protein-coding regions show highly constrained codon substi-
tution frequencies34 and insertions and deletions that are heavily

CG4495

(pairwise)
Flies

0.1 0.2 0.5 0.8 1.0 1.1 1.3 1.4 1.5 1.9 2.1 2.2 2.3 2.4
Vertebrates

(pairwise)

(multi-species)
Flies

Mammals 0.1 0.2 0.4 0.5 1.3 1.9 2.3 2.9 3.5 4.2
(multi-species)

Figure 1 | Phylogeny and alignment of 12 Drosophila species. Individual exons and introns are not shown. c, Comparison of evolutionary
a, Phylogenetic tree relating the 12 Drosophila species, estimated from distances spanned by fly and vertebrate trees. Pairwise and multi-species
fourfold degenerate sites (Supplementary Methods 1). The 12 species span a distances (in substitutions per fourfold degenerate site) are shown from D.
total branch length of 4.13 substitutions per neutral site. b, Gene order melanogaster and from human as reference genomes. Note that species with
conservation for a 0.45-Mb region of chromosome 2L centred on CG4495, longer branches (for example, mouse) show higher pairwise distances, not
for which we predict a new exon (Fig. 3a), and spanning 35 genes. Colour always reflecting the order of divergence. Multi-species distances include all
represents the direction of transcription. Boxes represent full gene models. species within a phylogenetic clade.

©2007 Nature Publishing Group
220

NATURE | Vol 450 | 8 November 2007 ARTICLES

biased to be multiples of three3 (Fig. 2a). RNA genes and structures
tolerate substitutions that preserve base pairing35,36 (Fig. 2b).
MicroRNA hairpins show a characteristic conservation profile with
high conservation in the stem and mutations in loop regions10,11

(Fig. 2c). Finally, regulatory motifs are marked by high levels of
genome-wide conservation3,4,12–14, and post-transcriptional motifs
show strand-biased conservation12 (Fig. 2d, e).

We find that these signatures can be much more precise for gen-
ome annotation than the overall level of nucleotide conservation (for
example, Fig. 3a).

Revisiting the protein-coding gene catalogue

The annotation of protein-coding genes remains difficult in meta-
zoan genomes owing to short exons and complex gene structures

G S A A T I Y Y E S M P A S A S T G V L S L T Ta
AACCGCCTTCCCCCTGGACTCGTCCCACTCTCTGCTCCTTCTCCACCAGCGATGCAAACTTTGCGAATCACT Characteristic protein-preserving events
AGCCGCCTTCCCCCCGGACTCGTCCCACTACCTGCTCCTTCTCCACCAGCGATGCAAACTTTGCGAATCACT
AGCCGCCTTCCCCCCGGACTCGCCCCACTACCTGCTCCTTCTCCACCAGCGATGCAAACTTTGCGAATCACT

AGCCGCCTTCCCTCTG———— 14 CATGCTCCTTCTCCTCCAGCGATGCAAACTTTGCGAATCACT
AGCCGCCTTCCCCCTGGACTCGTCCCACTACCTGCTCCTGCTCCTCCAACGATGCAAACTTTGCGAATCACT
GGCCATCCTCCTCCTGGCAGC-CCCAACTGCCTCCGTTTTGTCTGTGTGTGTTGGTAACTTTGCAAATCACT
GTTCACGTCCTTTGTGGCCAGTTCTCCTCTCCTTTTCTCTCTCGGTGCGTGTTGGAAACTTTGCAAATCACT
GTTCACGTCCTTTGTGGCCAGTTCTCCTCTCCTTTTCTCTCTCGGTGCGTGTTGGAAACTTTGCAAATCACT

CTT——ACTCGCCAGCTTTGTGGCCAG—3 TAGTTCTCTGCT 7 GTGTGTTGGAAAACTTGCAAATCACT
AGCTTACGTCCAAGTGAGCGTGTGCGTATACCTGTTGTGTTGGCTTGCCTGTTGAAAATTTTTCCCAACACT
AGCTAACGTCCAAGTGTGCATGTGCATGTACGTGTGGTGTTTGTATGTCTGTTGAAAATTTTGCCCAACACT
AGCTAACGTTCAGCTGTG————— 17 TGTGTGTGTGTGTTCGTTGAAAATTTTGCCAAACACT

D.mel GGAAGTGCTGCCACAATCTACTACGAATCTATGCCAGCCTCCGCCTCCACAGGCGTTCTATCATTGACTACG D.sim
GGAAGTGCTGCCACAATCTACTACGAATCTATGCCAGCCTCCGCCTCCACAGGCGTTCTATCATTGACTACG

D.sec GGAAGTGCTGCCACAATCTACTACGAATCTATGCCAGCCTCCGCCTCCACAGGCGTTCTATCATTGACTACT
D.yak GGAAGTGCTGCCACAATCTACTAC TCTATGCCAGCCTCCGCCTCCACGGGCGTTCTATCATTGACTACG
D.ere GGAAGTGCTGCCACAATCTACTAC TCTATGCCAGCCTCCGCCTCC GGCGTTCTATCATTGACTACG
D. GAG

GAA

GAG ACA
ACTana GGTAGTGCAGCTACGATCTACTAC TCAATGCCGGCATCCTCGTCC GGCGTACTCTCGTTGACCACC

D.pse GGCAGCTCTGCCACAATCTACTACGAATCGATGCCCGCCTCGGCCTCCACGGGCGTCCTCTCGCTGACCACA
D.per GGCAGCTCTGCCACAATCTACTACGAATCGATGCCCGCCTCGGCCTCCACGGGCGTCCTCTCGCTGACCACA
D.wil GGTGGAGCTGCCACCATTTATTATGAA GGAGTCCTCTCGCTGACCACC
D.

TCT
TCC

ATG
ATGCCA

CCGG
GCA
C-

TCT
—-6

GCC
-C

TCA
TCA

ACT
ACG

moj GGCAGCTCAG–3 -CCATCTACTATGAA GGCGTTCTATCGCTGACCACC
D.vir GGCAGCTCGG—CC GC——CTCGACGGGGGTGCTCTCGCTGACCACC
D.

ATC
ATC

TAC
TATTAC

TAT
GAG
GAG

TCG
TCC

ATG
ATG

CCG
CCG

gri GGCAGCTCGG—CC GC——GTCGACGGGCGTCCTCTCACTGACGACG

Codon substitution typical of protein-coding regions
L Frame-preserving gap (length L a multiple of 3)

Characteristic non-coding region events
Triplet substitution typical of non-coding regions

Nonsense mutation introducing a stop codon

Frame-shifting gap (length L not a multiple of 3)
** * * * * * * ** ** ** ** ** ***** ** ** * ** ** ** ** ** ** **** ** * *

Protein-coding exon Non-coding region

b
AA A C U

C G

G
U G

29 38

C G 1 0 1 0 2 92 8 3 74 7 5 7 6
A U

U
G

A
C

G C
G C
U A
U G

20 U A 47
G C

A A
A U
C G
U AC
U G

A

G
C G

C
10 A U 57

G G C
G C
U A
U A
U A
G
A U

U

1 G
C G

C 67
5′ 3′

D.mel GCGAUUUGGAGCUCUCAAGUUUGGGUCACUUAAAC-GGGUGACCCAGACAUGAAGGCUGCCAAAUUGC
D.sim GCGAUUUGGAGCUCUCAAGUUUGGGUCACUUAAAG-GGGUGACCCAGACAUGAAAGCUGCCAAAUCGC
D.sec GCGAUUUGGAGCUCUCAAGUUUGGGUCACUUAAAG-GGGUGACCCAGACAUGAAGGCUGCCAAAUUGC
D.yak GCGAUUUGGAGCCCUUAAGUUUGGGUCAUUUAAAG-GGGUGACCCAGACAUGAGGGCUGCCAAGUUGC
D.ere GCGAUUUGGAGCCAUUAAGUUUGGGUCAUUUAAAG-GGGUGACCCAGACAUGAGGGCUGCCAAGUUGC
D.ana GCGAUUUGGAGCCCUCAAGUUUGGGUCACUUUAAC-GCGUGUCCCAGACAUGAUGGCUGCCAAAUUGC
D.pse GCGAUUUGGAGCCCUCAAGUUUGGGUCACUUAAAU-GGGUGACCCAGACAUGAUGGCUACUAGAUC–
D.per GCGAUUUGGAGCCCUCAAGUUUGGGUCACUUAAAU-GGGUGACCCAGACAUGAUGGCUACUAGAUC–
D.wil GCAAUUUCGAACUAUUAAGUUUGGAUCACUUAAAGCACGUGAUCCAGACAUAAUAGAUCUGAGAUUUU
D.moj AACAUUUGG-CCUGUCAAGUCUGCGCCAUUUAAAU-GCGUGGCCCAGACAUGACAAGCUACAAAUGUU
D.vir AGCAUUUGG-UUUGCCAAGUCUGUGGCAUUUGAAU-GUAUGUCGCAGACAUGACAAUC-GCAAAUGCU
D.gri AGCAUUUGG-UUUGUUAAGUCUGCGUCAUUUCAAU-GUGUGCCGCAGACAUGACAAAUUCCAAAUGUU

((((((((.((((.(((.(((((((((((…… ..))))))))))).))).))))..))))))))
abcdefgh iklm nop qrstuvwxyzA Azyxwvutsrq pon mlki hgfedcba

RNA

U U C c U A miRNAU U
U A

* ***** * ** *

*

*

L

**

No change

Conserved paired nucleotide
Conserved unpaired nucleotide

Silent changes characteristic of RNA evolution
Silent G•U substitution
Silent substitution in unpaired base Silent
base-preserving double substitution

Changes disruptive of RNA structures
Disruptive double substitution
Disruptive single substitution

Disruptive insertion or deletion

miRNA*
A

G C

U A
C G
A U

G G
U A

U G

U

G U
G C

m
iR

N
A

D.mel GGGGATGTGGGGAAGGATGCTCTTTTCTGACTCTATTTTGTCGGCGAACATGGATCTAGTGCACGGTGG-TTCATGATTAAGTTCGTGACTAGATTTCATGCTCGTCTATTAAGTTGGGTCAGCACA-ACGAAGA—-GAGCGGAGCT
D.sim GGGGATGTGGGGAAGGATGCTCTTTTCTGACTCTATTTTGTCGGCGAACATGGATCTAGTGCACGGTGG-TTCATGATTAAGTTCGTGACTAGATTTCATGCTCGTCTATTAAGTTGGGTCAGCACA-ACGAAGA—-GAGCGCAGCT
D.sec GGGGATGTGGGGAAGGATGCTCTTTTCTGACTCTATTTTGTCGGCGAACATGGATCTAGTGCACGGTGG-TTCATGATTAAGTTCGTGACTAGATTTCATGCTCGTCTATTAAGTTGGGTCAGCACA-ACGAAGA—-GAGCGGAGCT
D.yak GGGGATGTGGGGAAGGATGCTCTTTTCTGACTCTATTTTGTCGGCGAACATGGATCTAGTGCACGGTGG-TTCATGATTAAGTTCGTGACTAGATTTCATGCTCGTCTATTAAGTTGGGTCAGCACT-ACGAAGA—-GAG—–CT
D.ere GGAGAAGTGGGGAAGGATGCTCTTTTCTGACTCTATTTTGTCGGCGAACATGGATCTAGTGCACGGTGG-TTCATGATTAAGTTCGTGACTAGATTTCATGCTCGTCTATTAAGTTGGGTCAGCACT-ACGAAGA—-GAG—–CT
D.ana GAAAAGG—-ATTTGGGGTCTTTTTCTGACTCTATTTTGTCGGCGAACATGGATCTAGTGCACGGTGT-TTCATGATTAAGTTCGTGACTAGATTTCATGCTCGTCTATTAAGTTGGGTCAGCACA-CCAAAGAGTCGGATAGTGGAG
D.pse TCTGATCCGGCAGCGTTTGCTCTTCTCTGACTCTATTTTGTCGGCGAACATGGATCTAGTGCACGGTTG-TTCATGATTAAGTTCGTGACTAGATTTCATGCTCGTCTATTAAGTTGGGTCAACACA-ACGAACCGAAAGAGCAGAGCA

U A
U U
U U

D.per TCTGATCCGGCAGCGTTTGCTCTTCTCTGACTCTATTTTGTCGGCGAACATGGATCTAGTGCACGGTTG-TTCATGATTAAGTTCGTGACTAGATTTCATGCTCGTCTATTAAGTTGGGTCAACACA-ACGAACCGAAAGAGCAGAGCA
D.wil GAGTCCTTTCTATGTGGCAGCGTCTCTTGACTCTATTTTGTCGGCGAACATGGATCTAGTGCACGGTTTGTTCATGATTAAGTTCGTGACTAGATTTCATGCTCGTCTATTAAGTTGGGTCAGCACA-ACAAGAG–CGCAGCGGAGAG

A U
D.moj ATTTCTTTT—–TTTTGCTCTTCTCTGACTCTATTTTGTCGGCGAACATGGATCTAGTGCACGGTTG-TTCATGATTAAGTTCGTGACTAGATTTCATGCTCGTCTATTAAGTTGGGTCAATACACACA-GCGAAAACATGGCCAAGG U

C G D.vir
U A

GTTTCGCTC—–TTTTGCTCTTCTCTGACTCTATTTTGTCGGCGAACATGGATCTAGTGCACGGTTG-TTCATGATTAAGTTCGTGACTAGATTTCATGCTCGTCTATTAAGTTGGGTCAACACACACACACACACACATAAAAGAA
U A D.gri ACTGCAACTGCAACTGCTGCTCTTTTCTGACTCTATTTTGTCGGCGAACATGGATCTAGTGCACGGTTG-TTCATGATTAAGTTCGTGACTAGATTTCATGCTCGTCTATTAAGTTGGGTCAACACACA-ACACAAAAAAAAAAGAGGA
U A
U G ((((( (((.(((((… ………….))))))))))).)))))).))))).)).)))).)))))))
G C
G U
C G

C A

A U

****************************************************************************************************************************************************
*
****************************************************************************************************************************************************
*
****************************************************************************************************************************************************
*
****************************************************************************************************************************************************
* * ******* * ** ************************************************************************************************************************
*********

A U ***
A U

* *** *** ** ************************************************** ***************************************************************** *
*** * ** * * * ************************************************** ****************************************************** ******* ** *
***

**
******* ****************************************** ****************************************************** ******* ** * *G U

* *5′ 3′

m
iR

N
A

*

*
******* ****************************************** ****************************************************** **** **
* ** ****************************************** ***************************************************** *** **

* **************************************** **************************************************** ** *miRNA

d D.mel GATTAGT——TCATCATTTATTAT—T——ATT—AATTAATGGCGTT———–TCGCAGC-GGCTGG-C———————–TGTTTATTATTAACCATTATTT——A-ACA—-CC e 200

Known motifs

Random motifs

Confidence

• ••• ••• ••• • •• •• ••

·.-,· ,-,
,:,
‘ ‘

• • • • •• • • • • • • • •••• • • ••• • ••• •

• • • • • •• • • • • • • •• •

((. ((((. ((. (((((. (((((((((

• • • • • •

-D

□ -D

[:.:I

D

• • • • • • • [!JI • ••• -• •• • •• D

)) .})J})J.J)})J .)) .)})J .J)})J

D.sim GATTAGT——TCATCATTTATTAT—T——ATT—AATTAATGGCGTT———–TCGCAGC–GCTGG-C———————–TGTTTATTATTAACCATTATTT——A-ACA—-CC
D.sec GATTAGT——TCATCATTTATTAT—T——ATT—AATTAATGGCGTT———–TCGCAGC–GCTGG-C———————–TTTTTATTATTAACCATTATTT——A-ATA—-CC
D.yak GATTAGT——TCATCATTTATTAT—T——ATT—AATTAATGGCGTT———–TCGCAGC–GCTGG-CTG———————TGTTTATTATTTATCATTATTA——A-ACA—-CC
D.ere GATTAGT——TCATCGTTTATTAT—T——ATC—AATTAATGGCGTT———–TCGCAGC–GGTGG-C———————–TGTTTATTATTAACCATTACTA——A-ACA—-CC
D.ana GATTTGT——TCATCATTTATTAT—T————AATTAATGGTATT———–TCTTGACTGGCTGC-CTGCC—TGCCTGTTA–TTTGTTGTTTATTATTAAGCATTATTA——A-ACA—-CA
D.pse GATATGC——TCATCATTTATTAT—T——GAT—AATTAATGGAACTTTGGTCAGTT-TTGCTGCCTGCCTG-TTGCCTGCTGCCTGTTGCTTTTGCTGTTTATTATTAACTATTATTG——A-GCAGCGCCA
D.per GATATGC——TCCCCATTTTTTCT—T——GAT—AATTAATGGAAATTTGGTCACTTATTACTGCCTGCCGG-T——-CACCTCTCGCTTCTGCTGTTTATTATTAACTATTATTG——A-GCAGCGCCA
D.wil GATTAGT——TCATCATTTATTAT—TATTTATATT—AATTAATGAAGTTT———-TCGTTTC——G-T———————–TTCGTATGGTT—–TCGTTT——G-ATG——
D.moj GATTAGTCGTTCATCAATATTAATTATGTAT——ATAATTAATTAATGAAGTT———–TT—-C–GCTTTAT———————–CGTTTATCGACAGCTATTTTTAAT—-A-ACA—-AC
D.vir GATTAGTTGATCATCATCATTAATTAT—T——ATA—AATTAATGAAGTT——————–GCGTT-T———————–CGTTTATCGACAGCTATTTTTAAT—-A-ACA—-AC
D.gri GATTAGTTGCTCATCATCATTAATTATGAGT——ATT—AATTAATGAAGTT———–T——–GCTCT-T———————–CGCTCACCGATAGCTATTTTTAATACCAA-ACA—-AC N

u
m

b
e
r

o
f

c
o

n
se

rv
e
d

in
st

a
n
c
e
s

C
o

n
fid

e
n
c
e
le

ve
l

0.8160

120

80

0.240

0
Regulatory motifs Mef2 (BLS=0.25) Mef2: YTAWWWWTAR Mef2 (BLS=0.83) 0 20 40 60 80 100

BLS (% of tree)

1

0.6

0.4

0

with abundant alternative splicing. Comparative information has
improved computational gene predictors5, but their accuracy still
falls far short of well-studied gene catalogues such as the FlyBase
annotation, which combines computational gene prediction37,

data38–42high-throughput experimental and extensive manual
curation23. Recognizing this, we set out not only to produce an
independent computational annotation of protein-coding genes in
the fly genome, but also to assess and refine its already high-quality
annotations43.

Our analyses of D. melanogaster coding genes are based on two
independent evolutionary signatures unique to protein-coding
regions (Fig. 2a): (1) reading frame conservation (RFC)3, which
observes the tendency of nucleotide insertions and deletions to pre-
serve the codon reading frame; and (2) codon substitution frequencies

Figure 2 | Distinct evolutionary signatures for diverse classes of functional
elements. a, Protein-coding genes tolerate mutations that preserve the
amino-acid translation, leading to abundant conservative codon
substitutions (green). Insertions and deletions are largely constrained to be a
multiple of three (grey). In contrast, non-coding regions show abundant
non-conservative triplet substitutions (red), nonsense mutations (blue)
and frame-shifting insertions and deletions (orange). b, RNA genes
tolerate mutations that preserve the secondary structure (for example,
single substitutions involving G.U base pairs and compensatory changes)
and exclude structure-disrupting mutations. Matching parentheses and
matching letters of the alphabet indicate paired bases. c, MicroRNA genes, in

contrast, generally do not show changes in stem regions, but tolerate
substitutions in loop regions and flanking unpaired regions, leading to a
distinctive conservation profile. Asterisks denote the number of informant
species matching the melanogaster sequence at each position. d, Regulatory
motifs tolerate local movement and nucleotide substitutions consistent with
their degeneracy patterns, and show increased conservation across the
phylogenetic tree, measured as the branch length score (BLS;
Supplementary Methods 5a). e, Increasing BLS thresholds select for
instances of known motifs (black) at increasing confidence (red), as the
number of conserved instances of control motifs (grey) drops significantly
faster.

©2007 Nature Publishing Group
221


1-

=
-■-==

-1;!i:: i:;
■ ■

•==-==== ■ = ==-= – –=–

–!:i


– =====• ­■■

F
ly

B
a
se

c
u
ra

ti
o

n
a 7183K 7184K 7185K 7186K 7187K 7188K b

Chr 2L

Low conservation
High protein-coding signal

High conservation
No protein-coding signal

New exon
(see panel c)

CG4495

CG4496

414 rejected
genes

Predicted exons

Protein-coding 928 predicted
evolutionary signal new exons

FlyBase genes

71% 29%

222 73 119
Removed from FlyBase Flagged as No action

protein-coding genes uncertain

81% 19%

562 192 174
Modify existing annotation New gene No action

c
(see panel a)CG4495

Conservation Known splice form LD46238

Inverse PCR primers

cDNA validation IP17639

d CG8092

A V A A A E Q Q H Y H A Q H H H H P Q X Y K P H G K L K S R D Y T L H W Q N Y X
GCAGTCGCTGCCGCCGAGCAGCAGCACTATCACGCCCAGCaCCATCACCATcCGCaATGATACAAGCCCCACGGAAAGCTCAAATCACGCGACTATACCCTTCACTGGCAGAACTATTAGTTAAAGTTCATTCATATTCaTCGCACATTGGCCATATCCCGA

Protein-coding evolution Stop Continued protein-coding evolution Stop Non-coding evolution

e Codon substitutions

Gaps

Protein-coding evolution (frame 1) Protein-coding evolution (frame 2)Frameshift

+1

CG14047
D Y F N N Q Q R E R H Y Q L R R Q S Q R Q P P R F V P P P P P P R R L L L T Q T
GACTATTTCAACAATCAGCAGCGCGAGCGACACTACCAGCTCCGGCGGCAGAGCCAGCGGCAG CCTCCGAGATTTGTACCGCCGCCACCGCCTCCGCGTCGCTTGCTCCTCACGCAGACC A

A
A
A
A
G
G
G
G
G
G
G

Conservative substitution
Disruptive substitution

Frame-preserving (multiple of 3)
Frame-shifting (not a multiple of 3)

Exon boundaries
Stop codon –

ARTICLES NATURE | Vol 450 | 8 November 2007

Figure 3 | Revisiting the protein-coding gene catalogue and revealing
unusual gene structures. a, Protein-coding evolutionary signatures
correlate with annotated protein-coding exons more precisely than the
overall conservation level (phastCons track33), for example excluding highly
conserved yet non-coding elements. Asterisk denotes new predicted exon,
which we validate with cDNA sequencing (see panel c). The height of the
black tracks indicates protein-coding potential according to evolutionary
signatures (top) and overall sequence conservation (bottom). Blue and
green boxes indicate predicted coding exons (top) and the current FlyBase
an

Ecology homework help

Evolutionary Constraint and Adaptation in the Metabolic Network of
Drosophila

Anthony J. Greenberg, Sarah R. Stockwell, and Andrew G. Clark
Department of Molecular Biology and Genetics, Cornell University

Organisms must carefully control their metabolism in order to survive. On the other hand, enzymes must adapt in
response to evolutionary pressures on the pathways in which they are imbedded. Taking advantage of the newly available
whole-genome sequences of 12 Drosophila species, we examined how protein function and metabolic network
architecture influence rates of enzyme evolution. We found that despite high overall constraint, there were significant
differences in rates of amino acid substitution among functional classes of enzymes. This heterogeneity arises because
proteins involved in the metabolism of foreign compounds evolve relatively rapidly, whereas enzymes that act in ‘‘core’’
metabolism exhibit much slower rates of amino acid replacement, suggesting strong selective constraint. Network
architecture also influences enzymes’ rates of amino acid replacement. In particular, enzymes that share metabolites with
many other enzymes are relatively constrained, although apparently not because they are more likely to be essential. Our
analyses suggest that this pattern is driven by strong constraint of enzymes acting at branch points in metabolic pathways.
We conclude that metabolic network architecture and enzyme function separately affect enzyme evolution rates.

Introduction

Metabolism lies at the core of organismal survival. It is
required for fundamental and highly conserved processes,
such as energy generation. Yet, changes in metabolic func-
tion accompany adaptation to novel environments (e.g.,
Berenbaum et al. 1996; Jones 2005). The genome sequen-
ces of 12 Drosophila species provide an ideal opportunity
to tease apart the opposing forces of conservation and
adaptability in metabolic evolution.

Enzymes do not act in isolation. They are organized
into a network, where enzymes are connected by the com-
pounds they metabolize (Jeong et al. 2000; Wagner and Fell
2001; Stelling et al. 2002; Tanaka 2005). This network con-
sists of modules, which loosely correspond to the tradition-
ally recognized metabolic pathways (Schuster et al. 2000,
2002; Ravasz et al. 2002; Holme et al. 2003). An enzyme’s
position in the network helps determine how dramatically
a change in its activity will affect flux (rate of metabolite
production) through pathways (Kacser and Burns 1973;
Stephanopoulos et al. 1998; Stelling et al. 2002). This in-
fluence is measured by control coefficients, which are high
for enzymes with strong influence over flux (Kacser and
Burns 1973). Metabolic control theory predicts that en-
zymes at branch points have higher control coefficients than
those in linear pathways and that enzymes catalyzing irre-
versible reactions will have more influence on flux than re-
versible reaction enzymes (Kacser and Burns 1973;
Heinrich and Rapoport 1974).

Changes in the activity of enzymes with low control
coefficients should be nearly neutral, especially in pathways
that perform nonessential functions. Thus, if purifying
selection is the dominant force, genes coding for such en-
zymes should tolerate more amino acid-changing mutations
over the course of evolution than genes coding for enzymes
with high control coefficients (Wilson et al. 1977; Rausher
et al. 1999). Consistent with this prediction, highly con-
nected enzymes evolve slowly in yeast (Vitkup et al.

Key words: metabolic network, molecular evolution, codon
substitution.

E-mail: ajg67@cornell.edu.

Mol. Biol. Evol. 25(12):2537–2546. 2008
doi:10.1093/molbev/msn205
Advance Access publication September 17, 2008

� The Author 2008. Published by Oxford University Press on behalf of
the Society for Molecular Biology and Evolution. All rights reserved.
For permissions, please e-mail: journals.permissions@oxfordjournals.org

2006), although perhaps not in bacteria (Hahn et al.
2004). The relative importance of positive and purifying
selection is still unclear, however. Recent estimates for
Drosophila suggest that at least a third, and possibly most,
amino acid differences between closely related species are
the result of the action of positive selection (Smith and
Eyre-Walker 2002; Sawyer et al. 2003; Shapiro et al.
2007). Any analysis of the influence of metabolic network
architecture on enzyme evolution patterns must therefore
discriminate between genes that show signs of adaptive
evolution and those that do not.

Other aspects of network architecture may also mod-
ulate an enzyme’s influence over metabolic function. The
number of connections an enzyme has to other enzymes via
shared metabolites (‘‘degree’’), the number of pathways it
participates in, the number of reactions it catalyzes, and the
extent to which it serves as a conduit of information be-
tween modules/pathways (‘‘betweenness’’) may all affect
how sensitive an organism’s physiology is to changes in
a given enzyme’s activity.

We set out to test comprehensively the effects of
enzyme function and metabolic network architecture on
enzyme evolution using the newly available genomic se-
quence from 12 Drosophila species (Drosophila 12 Genomes
Consortium 2007). We found that enzymes involved in
metabolizing xenobiotic (foreign) compounds evolve sig-
nificantly faster than average at the amino acid level. More-
over, almost all enzymes involved in this process also
participate in other pathways and significantly affect me-
dian evolution rates for those pathways. Of the network ar-
chitecture parameters, only enzyme degree is significantly
correlated with rates of protein evolution: highly connected
enzymes are relatively constrained, regardless of whether
the connections are between or within pathways. Such en-
zymes are not more likely to be essential, however, and
have average rates of adaptive evolution. We conclude that
metabolic network architecture has a measurable impact on
enzyme evolution rates that is independent of the influence
of enzyme function.

Methods

An expanded version of this section is available Sup-
plementary Material online.

D
ow

nloaded from
https://academ

ic.oup.com
/m

be/article/25/12/2537/1110609 by guest on 24 D
ecem

ber 2021

2538 Greenberg et al.

Genome Sequence and Evolutionary Rate Estimation

We used maximum likelihood estimates of rates of
amino acid (dN) and silent (dS) substitutions as well as
the rate of amino acid change corrected for silent site diver-
gence (x 5 dN/dS) for each gene. These estimates were
calculated for six species most closely related to Drosophila
melanogaster (D. melanogaster, Drosophila simulans,
Drosophila sechellia, Drosophila yakuba, Drosophila
erecta, and Drosophila ananassae) by Larracuente et al.
(2008), based on the genome assemblies and gene models
provided by the 12 genomes sequencing group (Drosophila
12 Genomes Consortium 2007). In addition to parameter
estimation, Larracuente et al. (2008) performed tests of pos-
itive selection (model M8 against model M7). We classified
genes as evolving under positive selection if they satisfied
the q value (Storey and Tibshirani 2003) cutoff of 0.1 (i.e.,
the expected fraction of true positives is 90%). For statis-
tical tests, we coded positively selected genes as ‘‘1,’’ and
the rest as ‘‘0.’’ To estimate rates of gene duplication, we
took advantage of the fuzzy reciprocal BLAST of all D.
melanogaster genes against all the remaining 11 genomes
(the five mentioned above plus Drosophila pseudoobscura,
Drosophila persimilis, Drosophila willistoni, Drosophila
mojavensis, Drosophila virilis, and Drosophila grimshawi;
Drosophila 12 Genomes Consortium 2007). Some of the
genes had homologous clusters within species that could
not be assigned an unambiguous ortholog in other species
(for details, see Drosophila 12 Genomes Consortium 2007).
If an enzyme was coded by any such genes, it was marked
as having evidence of duplication and was coded as ‘‘1’’ for
statistical tests. Otherwise, it was coded as ‘‘0.’’ Unless other-
wise indicated, we considered a gene to be duplicated if there
was evidence of multiple copies in any of the 12 species.

Metabolic Network Data

We downloaded data on the D. melanogaster metabolic
network and pathway assignments from the KEGG database
(Kanehisa and Goto 2000). We extracted information on
D. melanogaster genes coding for enzymes and related it
to gene information from FlyBase (http://flybase.net) and
the data from the 12 genomes project via FBgn numbers.
We were able to calculate evolutionary rates for 447 genes
(between 23 and 128 genes in each ‘‘pathway group’’) that
coded for enzymes found in the KEGG database.

We assume the network is undirected. Although a ma-
jority (73%) of the reactions in our data set are effectively
irreversible, mechanisms such as feedback regulation let the
information travel both ways (Wagner and Fell 2001).

We also collected information on the phenotypes of
mutations from FlyBase. Details of the classification of
genes based on alleles listed in FlyBase are presented in
Larracuente et al. (2008) and online. For partial correlation
analysis, we only considered the essential (coded ‘‘1’’) and
viable (coded ‘‘0’’) genes.

Statistical Tests

All statistical tests were performed using R (R Devel-
opment Core Team 2006). To estimate the effect of each

network parameter on rates of protein evolution, we used
partial correlations, which are defined as correlations be-
tween pairs of variables calculated conditional on all other
parameters (Whittaker 1990). To estimate the partial corre-
lations, we calculated the pseudoinverse of correlation ma-
trices, as implemented in the R package corpcor (Schäfer
et al. 2006). Distributions of all the variables are highly ir-
regular. Therefore, we used nonparametric and permutation
tests to estimate P values (Davison and Hinkley 1997). For
permutations, we used the boot package (Canty and Ripley
2006). All P values were two-tailed, except for the analyses
of variance (ANOVAs) and Kruskal–Wallis tests, where
they were right tailed. We corrected for multiple tests by
controlling the false discovery rate (FDR; Benjamini and
Hochberg 1995) at 5% unless noted otherwise.

A number of enzymes belong to several pathway
groups and thus statistics calculated for each category
are not independent. To calculate the P values in such cases
(e.g., for the deviation of median x for each functional
group from the data set-wide median), we modified our
permutation test as follows. We randomly assigned the
x values to each gene. We then recalculated the deviation
for each group, keeping the relationship between genes and
pathway groups constant for every permutation. Similarly,
some enzymes are encoded by multiple genes either be-
cause they are composed of multiple subunits or due to gene
duplication. Because some variables (such as x) are as-
signed to genes, whereas others (such as degree) pertain
to enzymes, we had to modify the standard permutation
tests to account for this (for details, see supplementary
methods, Supplementary Material online). We did 9,999
permutations for all tests.

Results
Genes Coding for Enzymes Evolve Slowly

Metabolic function is essential for survival, and many
metabolic processes are highly conserved from bacteria to
mammals. We thus wanted to test whether genes coding for
enzymes are more constrained than other genes. To accom-
plish this, we compared the distributions of the number of
amino acid changes per silent substitution (x 5 dN/dS,
estimated for the whole phylogeny of six species of the
D. melanogaster family, see Methods; Larracuente et al.
2008) for the two groups of genes. Low values of x indicate
that few amino acid-altering mutations have been fixed dur-
ing evolution, compared with the number of silent (non–
amino acid changing and thus nearly neutral) mutations
fixed in the same gene. A gene with a low value of x
has been constrained from changing its amino acid se-
quence and thus codes for a protein that is not free to evolve.

Enzymes are indeed relatively constrained: the median
x of enzymes is 0.045, whereas it is 0.066 for nonenzymes
(Wilcoxon test P 5 5.7 � 10�24). However, genes in-
volved in metabolic function are also slightly more likely
to be essential (17% vs. 12% for the rest; Fisher’s exact test
P 5 0.0157). We wanted to know whether enzymes are
constrained simply because they are more essential. We di-
vided genes into four classes: ‘‘essential,’’ ‘‘viable,’’ ‘‘no
information,’’ and ‘‘no alleles’’ (for details, see Methods

D
ow

nloaded from
https://academ

ic.oup.com
/m

be/article/25/12/2537/1110609 by guest on 24 D
ecem

ber 2021

Evolution in the Metabolic Network 2539

FIG. 1.—Metabolic genes are more constrained than average.

and Larracuente et al. [2008]). We then compared enzyme-
coding to nonenzyme-coding genes within each class. We
found that metabolic genes were still more constrained than
other genes (fig. 1). The ‘‘enzyme’’ effect was highly sig-
nificant (two-way ANOVA: F 5 67.0, degree of freedom
[df] 5 1, P 5 3.1 � 10�16; permutation P � 0.0001),
whereas no interaction between the ‘‘enzyme’’ and ‘‘essen-
tiality’’ terms was detectable (F 5 1.3, df 5 3, P 5 0.27).
Furthermore, the difference between enzymes and nonen-
zymes within each class was highly significant (Wilcoxon
test P values ranging from 5.4 � 10�9 to 7.5 � 10�5).

Smaller x can result from either a decrease in amino
acid changes (dN) or an increase in dS. In this case, it is due
to a decrease in dN: median dN for enzymes is 0.077 versus
0.114 for nonenzymes (Wilcoxon test P 5 3.5 � 10�20),
whereas the values for dS were 1.796 and 1.777 (Wilcoxon
test P 5 0.36).

The decision to classify a gene as coding for a partic-
ular enzyme is at least partially based on its homology to
enzyme-coding genes in other species. Apparent high con-
straint of enzymes could thus be due to ascertainment bias.
To control for this, we repeated the analysis with the subset
of Drosophila genes that have a human homolog (Blast
E value � 10�10), thereby eliminating from the nonenzyme
group the fast-evolving genes that could inflate the group’s
score. We still found that enzymes are more constrained
than nonenzymes. Median x for enzyme-encoding genes
was 0.043, in contrast with 0.052 for the rest of the genes
(Wilcoxon test P 5 9.9 � 10�5; two-way ANOVA en-
zyme effect F 5 15.4, df 5 1, P 5 9.0 � 10�5; permuta-
tion P 5 0.0005). We conclude that ascertainment bias is
unlikely to fully explain our observations.

Xenobiotic-Detoxifying Enzymes Evolve Relatively
Quickly

Despite overall high levels of constraint, enzyme-
encoding genes vary considerably in their ability to accom-
modate amino acid substitutions (x ranges from 0.0001 to
0.2833). To test if enzyme function affects evolutionary
rates, we grouped enzymes into 11 functional categories
(‘‘pathway groups’’) according to the KEGG classification
(Kanehisa and Goto 2000). Each category encompasses
a number of pathways with similar functions. Because some
pathways contain few enzymes and assignment of enzymes
to functionally related pathways is prone to error, grouping
pathways should increase statistical power and reduce an-
notation errors.

We compared median rates of amino acid change of
genes in different pathway groups. As expected (fig. 2), me-
dian x for enzymes involved in metabolism of xenobiotic
compounds is the highest of all the groups. This is true for
the phylogeny-wide estimate of dN/dS as well as for individual

D
ow

nloaded from
https://academ

ic.oup.com
/m

be/article/25/12/2537/1110609 by guest on 24 D
ecem

ber 2021

FIG. 2.—Rates of amino acid substitution for metabolic pathway groups. Pathway groups with nominally significant deviations of median x from
the overall median (dashed line) are colored in gray. Pathway groups are as follows: 1, protein synthesis; 2, amino acid metabolism; 3, glycan
biosynthesis; 4, nucleotide metabolism; 5, carbohydrate metabolism; 6, metabolism of cofactors and vitamins; 7, energy metabolism; 8, metabolism of
other amino acids; 9, lipid metabolism; 10, secondary metabolites; and 11, metabolism of xenobiotics.

2540 Greenberg et al.

species (see supplementary fig. S1, Supplementary Material
online).

Because about one-third of all enzymes belong to
more than one category, the distributions of x for each
group are not independent, and normal statistical tests, such
as Kruskal–Wallis, are not applicable. We therefore devel-
oped a permutation test that accounts for such nonindepen-
dence (see Methods). Using this approach, we determined
that amino acid substitution rate heterogeneity among path-
way groups is indeed significant (Kruskal–Wallis v 2 5 22.3,
permutation P 5 0.0078). This is at least partly due to rel-
atively fast evolution of xenobiotic detoxification genes
(median x 5 0.05 for the xenobiotic group vs. x 5 0.04
overall, two-tailed permutation P 5 0.0110), and these re-
sults are robust to moderate levels of random annotation
error (see Methods; supplementary fig. S2, Supplementary
Material online). This prompted us to further investigate the
effect of this group.

Almost all the proteins in the fast-evolving xenobiotics
pathway group act in other processes. How strongly are
other groups’ median amino acid substitution rates influ-
enced by genes that detoxify foreign compounds? We
found that there is a strong correlation between the median
x for a pathway group and the fraction of the category’s
genes that it shares with the xenobiotics group (Spearman’s
q 5 0.76, P 5 0.0108). When we excluded the xenobiot-
ics group from consideration but retained its genes in other
pathways, we still saw significant heterogeneity in x among
pathway groups (Kruskal–Wallis v 2 5 17.1654, permuta-
tion P 5 0.0045). In contrast, when we eliminated all the
genes that belong to this category, we found no significant
differences among groups (Kruskal–Wallis v 2 5 11.0, per-
mutation P 5 0.2167). Heterogeneity in amino acid substi-
tution rates among pathway groups is thus entirely due to
the genes that detoxify foreign compounds.

These observations suggest that interdependence
among pathways may hinder optimization of metabolic
function achievable through amino acid change. One
way to alleviate this problem would be through gene dupli-
cation (Lynch and Force 2000). We see some evidence that
this occurs. No pathway group significantly deviates from
the data set average in the fraction of its enzymes coded for
by genes duplicated in at least one of the 12 Drosophila
genomes we analyzed (for details, see Methods; fig. 3B).
Nevertheless, there is a significant positive correlation be-
tween the fraction of enzymes in a pathway group that be-
long to more than one category and the fraction encoded by
duplicated genes (Spearman’s q 5 0.673, P 5 0.0268).
This result suggests that although some paralogs may act
only in one pathway, all members of a cluster are annotated
as performing each function.

Network Parameters and Constraint

The results presented so far suggest that enzyme func-
tion affects how quickly enzyme-coding genes evolve. We
wished to know if enzyme evolution is also influenced by
metabolic network architecture. To investigate this question
comprehensively, we examined several characteristics of
the metabolic network topology. In this model of the net-

FIG. 3.—Rates of nonindependence and gene duplication by pathway
groups. Significant deviations (after FDR correction) from data set-wide
fractions (dashed lines) are marked with gray. (A) Fraction of enzymes
that act in multiple pathway group. (B) Fraction of enzymes that are
encoded by duplicated genes. Duplicated genes are those that were not
resolved by the fuzzy reciprocal Blast into orthologous sets.

work, each node represents an enzyme, and the nodes are
connected by the metabolites with which the enzymes in-
teract (fig. 4B). This ‘‘enzyme-centered’’ representation
ensures that each enzyme appears only once in the network,
although metabolites may appear multiple times. We calcu-
lated partial correlation coefficients to estimate the associ-
ations between each pair of network parameters, while
controlling for all others. Although this approach has some
shortcomings (Drummond et al. 2006), it is the only method
that allows us to determine relationships among all the var-
iables (for discussion, see supplementary methods, Supple-
mentary Material online). We used a permutation test to
estimate all P values (see supplementary methods, Supple-
mentary Material online) and corrected for multiple tests by
controlling the FDR (Benjamini and Hochberg 1995) at 5%.

Reversibility of Reactions

Metabolic control theory (Kacser and Burns 1973;
Heinrich and Rapoport 1974) suggests that enzymes cata-
lyzing reversible reactions should exert little control over
flux. We might therefore expect genes coding for such

D
ow

nloaded from
https://academ

ic.oup.com
/m

be/article/25/12/2537/1110609 by guest on 24 D
ecem

ber 2021

Evolution in the Metabolic Network 2541

FIG. 4.—Illustration of the network representations we used in this study. (A) A connectivity pattern showing both enzymes and metabolites. (B)
The enzyme-centered network derived from (A). To analyze the enzymes’ topological parameters and relate them to x, we redraw the network so that
nodes represent enzymes. An edge between enzymes indicates that they share a metabolite in (A). Each enzyme appears only once. Note that high
betweenness does not necessarily imply high degree or vice versa. Enzyme E4#s role as the sole conduit between the branching parts of the network
gives it high betweenness, but its degree is only 2. (C) The metabolite-centered network derived from (A). To measure the mean metabolite degree of
each enzyme, we transform the network in (A) so that nodes represent metabolites. Each metabolite appears only once in this version of the network,
and its degree is the number of other metabolites from which it is only one reaction away. (D and E) Enzymes with the same mean metabolite degree but
different distributions of connections.

enzymes to be less essential and hence more permissive of
amino acid changes. In our correlation framework, this
would produce a negative relationship between essentiality
and reversibility and a positive one between x and revers-
ibility. We do indeed see a negative correlation between es-
sentiality and reversibility (Spearman’s partial q 5 �0.212,
permutation P 5 0.0104; fig. 5B). However, this does not
translate into a relationship between x and reversibility
(Spearman’s partial q 5 �0.077, permutation P 5 0.2704).

Measures of Pleiotropy: Number of Pathways and
Reactions Catalyzed

Enzymes that participate in many pathways or catalyze
many reactions may be under increased constraint because
any change could affect several metabolic functions. We
first examined the number of pathways in which a gene’s
product participates and found no correlation with x
(Spearman’s partial q 5 0.018, permutation P 5 0.7840).
Xenobiotic detoxification enzymes could obscure a correla-
tion because they are especially likely to appear in many
pathways and also tend to have high values of x. To test
this, we repeated our analysis without these enzymes but

obtained the same result (Spearman’s partial q 5 0.026,
permutation P 5 0.7360).

We then examined the effect of the number of reac-
tions an enzyme catalyzes. We saw a slight positive corre-
lation between the number of reactions catalyzed and x, but
it was not quite significant even without a multiple test
correction (Spearman’s partial q 5 0.126, permutation
P 5 0.0560; fig. 5A). Our measures of pleiotropy thus
do not appear to influence levels of evolutionary constraint.

Network Topology Measures: Betweenness and Degree

Engineering principles suggest that evolving modules
would change internally but maintain consistent interfaces
with the rest of the network so that communication between
modules would be uninterrupted (Csete and Doyle 2002). If
this is the case for the metabolic network, we should expect
enzymes that form between-module bridges to be under in-
creased constraint. Such nodes are considered to have high
betweenness (Freeman 1977; Girvan and Newman 2002).
The shortest-path betweenness (Newman and Girvan 2004)
of an enzyme i is defined as the number of the shortest paths
between all other pairs of enzymes in the network that pass

D
ow

nloaded from
https://academ

ic.oup.com
/m

be/article/25/12/2537/1110609 by guest on 24 D
ecem

ber 2021

2542 Greenberg et al.

FIG. 5.—Graphs of partial correlations among network parameters and x 5 dN/dS (A), essentiality (B), and duplication rate (C). Blue lines indicate
positive correlations, red—negative. Bold lines denote relationships significant at 5% FDR; thin lines—those with nominal significance at 5%, but not
after multiple test correction; dashed lines—correlations with 5% , P � 10%. Numbers over the lines show the partial correlation coefficients with
permutation P values in parentheses. Partial correlations between x and either enzyme or mean compound degree shown in (A) are only significant
when one and not the other of these variables is present in the analysis. This is indicated by the double arrow.

through i. High-betweenness enzymes play an important role
in the flow of biomass through the network because they
often act as ‘‘bottlenecks’’ (Girvan and Newman 2002;
Wunderlich and Mirny 2006; Liu et al. 2007; Yu et al. 2007).

Enzymes that connect pathways do have relatively high
betweenness (mean betweenness for pathway-connecting en-
zymes is 1027.1, whereas for enzymes with no between-
pathway connections, it is 290.2; Wilcoxon test P 5
1.2 � 10�7), supporting the claims that the traditionally
recognized pathways appear to correspond to modules in
the metabolic network (Schuster et al. 2000, 2002; Ravasz
et al. 2002; Holme et al. 2003). However, we see no cor-
relation between x and betweenness (Spearman’s partial
q 5 0.028, permutation P 5 0.6974; fig. 5A).

Another indication of an enzyme’s importance in the
network topology is its degree, defined as the number of
other enzymes that share an edge with it. In the enzyme-

centered network used here, where edges represent metab-
olites, an enzyme’s degree is the number of other enzymes
to which it is connected through common metabolites
(fig. 4B). Note that this is not necessarily equal to the num-
ber of reactions catalyzed (see supplementary methods,
Supplementary Material online). We found a significant
negative correlation between x and enzyme degree
(fig. 5A) after correcting for multiple tests (Spearman’s par-
tial q 5 �0.240, permutation P 5 0.0020). This result ap-
pears to be consistent across multiple Drosophila species
and is robust to alternate statistical methods (see supplemen-
tary fig. S1 and methods, Supplementary Material online),
and the correlation persists if we exclude xenobiotic-
detoxifying enzymes from the data set (Spearman’s partial
q 5 �0.212, permutation P 5 0.0070).

A potential confounding factor in interpreting this re-
sult is gene expression. Highly expressed genes are

D
ow

nloaded from
https://academ

ic.oup.com
/m

be/article/25/12/2537/1110609 by guest on 24 D
ecem

ber 2021

Evolution in the Metabolic Network 2543

relatively constrained, and relationships between x and
various variables often disappear once expression levels
are accounted for (Drummond et al. 2006 and Larracuente
et al. 2008). As a measure of gene expression, we used
a principal component that includes whole-fly mRNA
levels from FlyAtlas (Chintapalli et al. 2007) and a measure
of codon bias (frequency of preferred codons; for details,
see Larracuente et al. [2008]). The association between
degree and x remained unchanged (Spearman’s partial
q 5 �0.210, permutation P 5 0.0100).

The relationship between x and degree is due to a def-
icit of amino acid changes in high-degree enzymes: the cor-
relation of dN with degree is significant (Spearman’s partial
q 5 �0.235, permutation P 5 0.0034), whereas that for dS
is not (q 5 �0.041, permutation P 5 0.5542). This result
also strongly indicates that selection on silent sites does not
compromise the estimates of x enough to affect our results.

In protein–protein interaction networks, highly con-
nected proteins are relatively constrained only if the con-
nections are inside modules (Fraser 2005). If the same
pattern holds for metabolic networks, we would expect ex-
cess constraint only for enzymes with many connections
within their own pathways. We therefore partitioned the
connections into within-pathway and between-pathway
links and repeated the partial correlation analysis to assess
their independent association with x. We found that
both kinds of connections constrain enzyme evolution:
Spearman’s partial q 5 �0.182 (permutation P 5 0.0090)
for connections within pathways and q 5 �0.184 (permu-
tation P 5 0.0100) for links between them.

Why are highly connected enzymes relatively con-
strained? It is not because of their central position in the
network, because betweenness is a measure of centrality
(Freeman 1977) and we find that it has no measurable effect
on constraint. Perhaps, high-degree enzymes interact with
a few metabolites that are each involved in many reactions
(branch point metabolites), providing many neighbors in
the enzyme-centered network. Alternatively, a high-degree
enzyme may interact with many metabolites that participate
in a few reactions each. To distinguish between these topol-
ogies, consider a network representation where nodes are
metabolites instead of enzymes (and edges represent
enzymes/reactions) (fig. 4C). Here, nodes with many edges
(high ‘‘metabolite degree’’) are branch point metabolites,
and the enzymes on those edges are branch point enzymes.
We can calculate the degree of each metabolite in this
‘‘metabolite-centered’’ network and average the degrees
of all substrates and prod

Ecology homework help

Science
~MAS

D
JSTOR

Inferring Nonneutral Evolution from Human-Chimp-Mouse Orthologous Gene Trios

Author(s): Andrew G. Clark, Stephen Glanowski, Rasmus Nielsen, Paul D. Thomas, Anish
Kejariwal, Melissa A. Todd, David M. Tanenbaum, Daniel Civello, Fu Lu, Brian Murphy,
Steve Ferriera, Gary Wang, Xianqgun Zheng, Thomas J. White, John J. Sninsky, Mark
D. Adams and Michele Cargill

Source: Science , Dec. 12, 2003, New Series, Vol. 302, No. 5652 (Dec. 12, 2003), pp. 1960-
1963

Published by: American Association for the Advancement of Science

Stable URL: https://www.jstor.org/stable/3835731

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Terms and Conditions of Use

American Association for the Advancement of Science is collaborating with JSTOR to digitize,
preserve and extend access to Science

This content downloaded from
�������������71.57.134.161 on Thu, 16 Dec 2021 12:14:02 UTC�������������

All use subject to https://about.jstor.org/terms

REPORTS

and LED 8 (541 observed, 456 expected, P <
0.0006}-providing evidence in plants for a link
between genome organization and gene regulation.

Together these data provide an organ ex­
pression map, revealing putative localized hor­
mone-response domains and a complex pattern
of regulatory genes that could mediate primary
developmental cues. These data should help
identify candidate genes involved in pattern
formation and cell specificity in the root, which
is a model for organogenesis. The expression
map will also facilitate both computational and
experimental methods aimed at decoding regu­
latory mechanisms in the root. Thus, these re­
sults can now be used to explore how the
hundreds of different expression patterns they
reveal are established and interpreted at the
cellular level to generate a complex organ.

References and Notes
1. N. M. Kerk, T. Ceserani, S. L. Tausta, I. M. Sussex, T. M.

Nelson, Plant Physiol. 132, 27 (2003).
2. T. Asano et al., Plant}. 32, 401 (2002).

3. D. Milioni, P. E. Sado, N. J. Stacey, K. Roberts, M. C.
Mccann, Plant Cell 14, 2813 (2002).

4. P. J. Roy, J. M. Stuart. J. Lund, S. K. Kim, Nature 418, 975
{2002).

5. H. Jasper et al., Dev. Cell 3, 511 {2002).
6. P. N. Benfey, J. W. Schiefelbein, Trends Genet. 10, 84

(1994).
7. Materials and methods are available as supporting

material on Science Online.
8. J. Sheen, Plant Physiol. 127, 1466 {2001 ).
9. J. Quackenbush, Nature Rev. Genet. 2, 418 {2001).

10. The program Ouster was used in the analysis and down­
loaded from http-/ /rana.lbl.gov/EisenSoflware.htm.

11. M. B. Eisen, P. T. Spellman, P. 0. Brown, D. Botstein,
Proc. Natl. Acad. Sci. U.S.A. 95, 14863 {1998).

12. K. Birnbaum et al., unpublished data.
13. T. Berleth, J. Mattsson, Cu”. Opin. Plant Biol. 3, 406 (2000).
14. U. Wittstock. B. A. Halkier, Trends Plant Sci. 7, 263 (2002).
15. L. L. Murdock, R. E. Shade, J. Agric. Food Chem. 50,

6605 {2002).
16. B. A. Cohen, R. D. Mitra, J. D. Hughes, G. M. Church,

Nature Genet. 26, 183 {2000).
17. H. Caron et al., Science 291, 1289 (2001).
18. A. P. Mahonen et al., Genes Dev. 14, 2938 {2000).
19. M. Bonke, S. Thitamadee, A. P. Mahonen, M. T.

Hauser, Y. Helariutta, Nature, in press.
20. j. W. Wysocka-Diller, Y. Helariutta, H. Fukaki, J. E.

Malamy, P. N. Benfey, Development 127, 595 {2000).
21. The plant line was generated by the Haseloff labora-

Inferring Nonneutral Evolution
from Human-Chimp-Mouse

Orthologous Gene Trios
Andrew G. Clark, 1 Stephen Glanowski, 3 Rasmus Nielsen, 2

Paul D. Thomas,4 Anish Kejariwal, 4 Melissa A. Todd,2
David M. Tanenbaum, 5 Daniel Civello, 6 Fu Lu,5 Brian Murphy, 3

Steve Ferriera,3 Gary Wang, 3 Xianqgun Zheng, 5

Thomas J. White, 6 John J. Sninsky,6 Mark D. Adams,5 *
Michele Cargill 6 t

Even though human and chimpanzee gene sequences are nearly 99% identica~ se­
quence comparisons can nevertheless be highly infom,ative in identifying biologically
important changes that have occurred since our ancestral lineages diverged. We an­
alyzed alignments of 7645 chimpanzee gene sequences to their human and mouse
orthologs. These three-species sequence alignments allowed us to identify genes
undergoing natural selection along the human and chimp lineage by fitting models
that include parameters specifying rates of synonymous and nonsynonymous
nucleotide substitution. This evolutionary approach revealed an infom,ative set of
genes with significantly different patterns of substitution on the human lineage
compared with the chimpanzee and mouse lineages. Partitions of genes into in­
ferred biological classes identified accelerated evolution in several functional class­
es, including olfaction and nuclear transport. In addition to suggesting adaptive
physiological differences between chimps and humans, human-accelerated genes
are significantly more likely to underlie major known Mendelian disorders.

Although the human genome project will al­
low us to compare our genome to that of
other primates and discover features that are
uniquely human, there is no guarantee that
such features are responsible for any of our
unique biological attributes. To identify
genes and biological processes that have been
most altered by our recent evolutionary di­
vergence from other primates, we need to fit
the data to models of sequence divergence
that allow us to distinguish between diver-

gence caused by random drift and divergence
driven by natural selection. Early observa­
tions of unexpectedly low levels of protein
divergence between humans and chimpan­
zees led to the hypothesis that most of the
evolutionary changes must have occurred at
the level of gene regulation (J). Recently,
much more extensive efforts at DNA se­
quencing in nonhuman primates has con­
firmed the very close evolutionary relation­
ship between humans and chimps (2), with an

tory (www.plantsci.cam.ac.uk/Haseloff/Home.html).
The lines were obtained through the Arabidopsis In­
formation Resource (www.arabidopsis.org/).

22. Y. Lin, j. Schiefelbein, Development 128, 3697 {2001 ).
23. M. M. Lee, j. Schiefelbein, Ce// 99, 473 (1999).
24. E. Truernit, N. Sauer, Planta 196, 564 (1995).
25. We thank J. Malamy for valuable ideas on the proto­

plasting technique; H. Petri, K. Gordon, and J. Hirst for
assistance in cell sorting; H. Dressman and the Duke
Microarray Core Facility for assistance with microar­
rays; A. Pekka Mahonen and Y. Helariutta for use of
the pWOL::GFP line and M. Cilia and D. Jackson for
the pSUCZ::GFP line, both before publication; M.
Levesque for valuable discussions; and G. Sena and T.
Nawy for photos. This work was supported by NSF
grants MCB-020975 (P.N.B. and D.E.S.), DBl-9813360
(D.W.G.), DBl-0211857 (D.W.G.), and a Small Grant
for Exploratory Research (P.N.B. and D.E.S.). The NIH
supported K.B. with a postdoctoral fellowship grant
(5 F32 GM20716-03).

Supporting Online Material
www.sciencemag.org/cgi/content/full/302/5652/1956/
DC1
Materials and Methods
Figs. S 1 to S3
Tables S1 to S4

4 August 2003; accepted 15 October 2003

average nucleotide divergence of just 1.2%
(3-5). The role of protein divergence in caus­
ing morphological, physiological, and behav­
ioral differences between these two species,
however, remains unknown.

Here we apply evolutionary tests to iden­
tify genes and pathways from a new collec­
tion of more than 200,000 chimpanzee exonic
sequences that show patterns of divergence
consistent with natural selection along the
human and chimpanzee lineages.

To construct the human-chimp-mouse
alignments, we sequenced PCR amplifica­
tions using primers designed to essentially all
human exons from one male chimpanzee,
resulting in more than 20,000 human-chimp
gene alignments spanning 18.5 Mb (6-8). To
identify changes that are specific to the di­
vergence in the human lineage, we compared
the human-chimp aligned genes to their
mouse ortholog. Inference of orthology in­
volved a combination of reciprocal best
matches and syntenic evidence between hu­
man and mouse gene annotations (9, 10).
This genome-wide set of orthologs under­
went a series of filtering steps to remove
ambiguities, orthologs with little sequence
data, and genes with suspect annotation ( 6).
The filtered ortholog set was compared to

1Molecular Biology and Genetics, 2 Biological Statistics
and Computational Biology, Cornell University, Ithaca,
NY 14853, USA 3Applied Biosystems, 45 West Gude
Drive, Rockville, MD 20850, USA. 4 Protein Informatics,
Cetera Genomics, 850 Lincoln Centre Drive, Foster City,
CA 94404, USA scelera Genomics, 45 West Gude Drive,
Rockville, MD 20850, USA 6Celera Diagnostics, 1401
Harbor Bay Parkway, Alameda, CA 94502, USA.

*Present address: Department of Genetics, Case
Western Reserve University, 10900 Euclid Avenue,
Cleveland, OH 44106, USA.
tTo whom correspondence should be addressed. E­
mail: michele_cargill@celeradiagnostics.com

1960 12 DECEMBER 2003 VOL 302 SCIENCE www.sciencemag.org

This content downloaded from
�������������71.57.134.161 on Thu, 16 Dec 2021 12:14:02 UTC�������������

All use subject to https://about.jstor.org/terms

other public sets and found to be highly
consistent (11) (table SI). We used the most
conservative set of 7645 genes for which we
had the highest confidence in orthology and
sequence annotation (12) (Database SI).

To identify genes that have undergone
adaptive protein evolution, we applied two
formal statistical tests that fit models of mo­
lecular evolution at the codon level. Both
tests fit models of the nucleotide-substitution
process by maximum likelihood (ML) (13),
and both include parameters specifying rates
of synonymous and nonsynonymous substi­
tution (14- 16). In the first (Model I), we
performed a classic test of the null hypothesis
of dN/d8 = I in the human lineage (17, 18).
The second model is a modification of the
method described by Yang and Nielsen (J 6),
which allows variation in the dN/d8 ratio
among lineages and among sites at the same
time. In this method (Model 2), a likelihood
ratio test of the hypothesis of no positive
selection is performed by comparing the like­
lihood values for two hypotheses. Under the
null hypothesis, it is assumed that all sites are
either neutral ( d~ d8 = 1) or evolve under
negative selection (dN/d8 < I). Under the
alternative hypothesis, some of the sites are
allowed to evolve with dN > d8 in the human
lineage only (Fig. I). We refer to this as
Model 2, and to the P-value of neutrality as
P 2 ( 6). The test based on Model 2 is not as
conservative as the test based on Model I and
may tend to detect genes with accelerated
amino acid substitution rates in humans even
if the average dN/d8 rate is not larger than I.

There were 1547 human genes and 1534
chimp genes, which met the criteria for positive
selection (with dN/d8 >I). The neutral null hy­
pothesis of Model I was rejected for 72 genes
(0.94% of the tests) at P < 0.001, 414 genes
(5.4%) at P < 0.Dl, and 1216 genes (15.9%) at
P < 0.05 (12). There were six human genes for
which the neutral null hypothesis of Model I
was rejected at P < 0.05 and ~ds was greater
than I (12). The neutral null hypothesis of Model
2 was rejected for 28 genes (0.38%) at P2 <
0.001, 178 genes (2.3%) at P2 < 0.01, and 667
genes (8.7%) at P 2 < 0.05. The relatively low
overlap of these sets reflects the different nature
of the tests. Of the 154 7 human genes that ex­
hibited ~ds > I, only 125 also fell into the
class of 178 human genes with a P2 < 0.Dl.
Similarly, Model 2 can detect cases where a
protein has a domain undergoing positive selec­
tion, but the overall di)d8 may not be elevated,
and thus would be missed by Model I. For this
reason, the remainder of the analysis considers
only the Model 2 test results.

Before attempting any biological inference
from the results of the statistical tests, it is im­
portant to consider whether attributes like GC
content, repeat density, local recombination rate,
and segmental duplications might affect the rates
and patterns of substitution (19, 20). In principle,

the ML estimation procedure corrects for varia­
tion in base composition; however, if the true
substitution rate differs across the genome in a
manner that is correlated with GC content, then
we should be able to detect this by simple cor­
relation ( 6, 12) (Database S2). The synonymous
substitution rate was significantly correlated
with the following attributes: GC content (0.164,
P < 0.000 I), local recombination rate in cM/Mb
(21) (0.100, P < 0.001), and LINE Qong inter­
spersed nuclear element) density (-0.091, P <
0.0001). None of these factors was significantly
correlated with either nonsynonymous substitu­
tion rate or P 2-value; however, genes associated
with some biological processes, such as olfac­
tion, do show nonrandom associations with
genomic location [P < 10-4 , Kolmogorov­
Smimov (K-S) test] and GC content (P < 10-9,
K-S test). We also verified that segmental dupli­
cations were not responsible for distortions in the
patterns of substitution seen in our tests, mostly
because genes with close duplicates were under­
represented in our set because of the requirement
for strict human-mouse orthology. Interestingly,
the genes with P 2-values <0.05 are overrepre­
sented in the Online Mendelian Inheritance in
Man (OMIM) catalog of genes associated with
genetic disease (P = 0.009), demonstrating the
relevance of interspecific comparisons (ftp.ncbi.
nih.gov/repository/OMIM/morbidmap ).

Many of the 7645 genes have been classified
into inferred functional categories based on the

Fig. 1. Graphical rep­
resentation of the test
of positive selection
(Model 2). The null hy- M
pothesis (H0 ) assumes
all three branches
have two classes of
amino acid residues:

Ho

REPORTS

Panther classification system (6, 22). We asked,
for the subset of genes in each functional cate­
gory, whether the distribution of P2 values for
those genes differed significantly from the P 2
distribution for the full set of 7645 genes (6)
(tables S2 and S3). In this way, we can gain
insight into higher-order biological processes
and molecular functions that may be under se­
lective pressure in a given lineage (Tables I and
2). The statistical tests of significance are valid
as formal inferences, and these lead immediately
to tentative biological hypotheses, only some of
which we describe here.

In the human lineage, genes involved in
olfaction show a significant tendency to be
under positive selection (P MW < 0.005) (Ta­
ble I and Fig. 2). Nearly all the genes clas­
sified to olfaction are olfactory receptors
(ORs). It seems likely that the different life­
styles of chimps and humans might have led
to divergent selection pressure on these re­
ceptors. There has been a rapid acceleration
ofpseudogene formation in human ORs (23),
and the acceleration of apparent amino acid
substitution in pseudogenes could potentially
lead to a spurious inference of selection.
However, we verified that most of the OR
genes in our set are bona fide genes (http://
bioinformatics.weizmann.ac.il/HORDE/), in­
dicating that these genes are either undergo­
ing positive selection or are in the process of
pseudogenization (24).

p.: dN> ds

,,,”‘” H

M
Po: dN/d s<1 ~
P1:dN=ds C

those that are neutrally evolving (p,: dN = d5) and those that are under constraint (p0 : dN/d5<
1). The alternative hypothesis (Ha) allows the human lineage to have a subset of sites (p.)
with accelerated amino acid substitution (dN > d5).

1.00
• Overall

0.90 □ Olfaction
o Developmental Processes

0.80 ■ Amino Acid Catabolism
__ t:,.OMIM_

0.70
□ :::

□ □

0
·p

JJ ~ 0.60
la • “”‘ !: 0.50 ■

·,g
10.40


u

0.30 ■

0.20

0.10

0.00

0.00 0.05 0.10 0.15

Model 2 P-value


0.20

0.25

Fig. 2. P 2-value distributions
of selected groups of genes.
The plot shows the cumula­
tive fraction of selected bio­
logical processes showing
the excess of cases of signif­
icant positive selection in
genes for olfaction, amino
acid cataboUsm, and Mende­
lian disease genes (OMIM)
relative to the overall distri­
bution of genes. The distri­
bution of developmental
genes that do not show a
significant excess is shown
for comparison.

www.sciencemag.org SCIENCE VOL 302 12 DECEMBER 2003 1961

This content downloaded from
�������������71.57.134.161 on Thu, 16 Dec 2021 12:14:02 UTC�������������

All use subject to https://about.jstor.org/terms

REPORTS

Several other classes of genes ( amino acid
catabolism, developmental processes, reproduc­
tion, neurogenesis, and hearing) show many
genes with low P 2 values, although these classes
do not show significant P MW values or contain
fewer than 20 genes (table SI and Fig. 2). It is
possible that individual genes within these cate­
gories account disproportionately for specific

phenotypic effects. For example, 7 (GSTZI,
HGD, PAH, ALDH6Al, BCKDHA, PCCB, and
HAL) of the 16 genes in the amino acid catab­
olism category have P2 values less than 0.05. A
speculative suggestion is that this signal of pos­
itive selection may arise from different dietary
habits or pressures in the two lineages. For ex­
ample, branched-chain amino acid catabolism,

Table 1. Biological processes showing the strongest evidence for positive selection. The top panel includes
the categories showing the greatest acceleration in human lineage, and the bottom panel includes
categories with the greatest acceleration in the chimp lineage.

Biological process
Number of PMw (human/

genes* Model 2)*

Categories showing the greatest acceleration in human lineage
Olfaction
Sensory perception
Cell surface receptor-mediated signal transduction
Chemosensory perception
Nuclear transport
G-protein-mediated signaling
Signal transduction
Cell adhesion
Ion transport
Intracellular protein traffic
Transport
Metabolism of cyclic nucleotides
Amino acid metabolism
Cation transport
Developmental processes
Hearing

48
146 (98)
sos (464)

S4(6)
26

252 (211)
1030(989)

132
237
278
391

20
78

179
542

21

0
0(0.026)
0(0.0386)
0 (0.1157)
0.0003
0.0003 (0.1205)
0.0004 (0.0255)
0.0136
0.0247
0.0257
0.0326
0.0408
0.0454
0.0458
0.0493
0.0494

Categories with the greatest acceleration in the chimp lineage
Signal transduction
Amino acid metabolism
Amino acid transport
Cell proliferation and differentiation
Cell structure
Oncogenesis
Cell structure and motility
Purine metabolism
Skeletal development
Mesoderm development
Other oncogenesis
DNA repair

1030 (989) 0.0004 (0.0255)
78 0.0454
23 0.1015
82 0.3116

174 0.2633
201 0.3132
239 0.2208

35 0.9127
44 0.2876

168 0.5813
39 0.2777
49 0.9363

PMw (chimp/
Model 2)*

0.9184
0.9691 (0.9079)

0.199 (0.0864)
0.9365 (0.7289)
0.2001
0.2526 (0.0773)
0.0276 (0.0092)
0.3718
0.8025
0.8099
0.7199
0.1324
0.0075
0.8486
0.2322
0.9634

0.0276 (0.0092)
0.0075
0.0102
0.0182
0.0233
0.0267
0.0299
0.0423
0.0438
0.0439
0.0469
0.0477

*The number of genes and the PMw values excluding olfactory receptor genes are shown in parentheses.

Table 2. Molecular functions showing the strongest evidence for positive selection. The table includes
only human-accelerated categories, because the only categories accelerated in the chimp lineage are
chaperones (P = 0.0124), cell adhesion molecules (P = 0.0220), and extracellular matrix (P = 0.0333).

Molecular function

G protein coupled receptor
G protein modulator
Receptor
Ion channel
Extracellular matrix
Other G protein modulator
Extracellular matrix glycoprotein
Voltage-gated ion channel
Other hydrolase
Oxygenase
Protein kinase receptor
Transporter
Ligand-gated ion channel
Microtubule binding motor protein
Microtubule family cytoskeletal protein

Number of
genes*

199 (153)
62

448
134
97(95)
32
44(42)
62
95
46
37

214
45
22
54

PMw (human/
Model 2)*

0 (0.2533)
0.0008
0.0030
0.0043
0.0120 (0.0178)
0.0149
0.0178 (0.0269)
0.0219
0.0260
0.0303
0.0314
0.0338
0.0405
0.0421
0.0467

PMw (chimp/
Model 2)*

0.8689 (0.6776)
0.3776
0.9798
0.8993
0.1482 (0.1593)
0.4441
0.1579 (0.1765)
0.6692
0.4823
0.4792
0.6911
0.1836
0.9503
0.6385
0.2815

*The number of genes and the PMw values excluding olfactory receptor genes are shown in parentheses.

which involves the ALDH6Al, BCKDHA, and
PCCB genes, is the primary pathway for energy
production from muscle protein under starvation
conditions (25). For all seven genes, mutations
have been found that result in human metabolic
disorders, consistent with the idea that natural
selection shifted these genes in a manner that is
relevant to reproductive fitness.

Most of the human developmental genes
with low P2 values fall into two main cate­
gories: skeletal development (TLL2, ALPL,
BMP4, SDC2, MMP20, and MGP) and neu­
rogenesis (NLGN3, SEMA3B, PLXNCl,
NTF3, WNT2, WIFI, EPHB6, NEUROGl,
and SIM2). In addition, several of the genes
with low P 2 values are homeotic transcription
factor genes (CDX4, HOXA5, HOXD4,
MEOX2, POU2F3, MIXLI, and PHTF),
which play key roles in early development.
Several genes associated with pregnancy,
such as the progesterone receptor (PGR),
GNRHR, ·MTNRIA, and PAPPA, appear to
exhibit nonneutral divergence between hu­
mans and chimps. PGR is involved not only
in maintenance of the uterus, but is also
expressed on the cell membrane of sperm,
where it may play a role in the acrosome
reaction (26), so the physiological basis for
the adaptive evolution remains unclear.

Speech is considered to be a defining char­
acteristic of humans. The forkhead-box P2 tran­
scription factor (P2 = 0.0027) has been impli­
cated in speech development, and has previously
been identified as undergoing an unusual hu­
man-specific pattern of substitution (27). Several
genes involved in the development of hearing
also appear to have undergone adaptive evolu­
tion in the human lineage, and we speculate that
understanding spoken language may have re­
quired tuning of hearing acuity. The gene with
the most significant pattern of human-specific
positive selection is alpha tectorin, whose protein
product plays a vital role in the tectorial mem­
brane of the inner ear. Single-amino acid poly­
morphisms are associated with familial high­
frequency hearing loss (28), and knockout mice
are deaf. These results strongly motivate a de­
tailed assessment of the nature ofhearing differ­
ences between humans and chimpanzees. Other
genes involved in hearing that appear to be under
human-specific selection include DIAPHI,
FOXIl, EYA4, EYAl, and OTOR

The inference of lineage-specific evolution­
ary acceleration requires a phylogenetic tree. By
simply adding mouse to our alignments, we went
from a directionless pairwise comparison of hu­
man and chimp to having reasonable ability to
infer common ancestral state, and lineage-specif­
ic changes. These approaches will gain in both
statistical and biological power as additional pri­
mate or other mammalian genomes are se­
quenced, enabling identification of genes that
exhibited accelerated amino acid substitution
since our most recent common ancestor. Al­
though it is tempting to conclude that this will

1962 12 DECEMBER 2003 VOL 302 SCIENCE www.sciencemag.org

This content downloaded from
�������������71.57.134.161 on Thu, 16 Dec 2021 12:14:02 UTC�������������

All use subject to https://about.jstor.org/terms

constitute a list of genes that “make us human,”
one has to take a step back to see the gulf that
exists between understanding at this narrowly
focused molecular level and at the organismal
level. A large number of human genes, when
transformed into mutant yeast or Drosophila,
can rescue the mutant phenotype, but this does
not make these genetically modified organisms
any more human. This study has focused only on
protein-coding genes, and it will require exami­
nation of regulatory sequences to determine the
contribution of regulation of gene expression to
the evolutionary divergence between humans
and chimps.

Perhaps the best way to understand the rela­
tion between DNA sequence divergence and the
differences between human and chimpanzee
physiology and morphology is to compare these
differences to the variability among humans.
Human-chimp DNA sequence divergence is
roughly 10 times the divergence between ran­
dom pairs of humans. Contrasts that are under
way to place human polymorphism in the con­
text of human-specific divergence further em­
power these models to identify molecular targets
of natural selection. Evolutionary analysis will
be extended to include comparison of the X
chromosome and autosomes, the impact of local
recombination rates and GC content, codon-us­
age patterns, and divergence in regulatory se­
quences. Additional insight will be gained by
examining sequence divergence in the context of
gene-expression differences. The informative­
ness of all these approaches will increase by
inclusion of additional mammalian genome se­
quences, and realization of the goal to ascribe
functional significance to the complex landscape
of our own genome will most effectively be
made in the context of our close relatives.

References and Notes
1. M. C. King. A. C. Wilson, Science 188, 107 (1975).
2. Y. Satta, J. Klein, B. Takahata, Mot. Phylogenet. Evol.

14, 259 {2000).
3. F. C. Chen, W.-H. Li, Am. J. Hum. Genet. 68, 444

(2001).
4. I. Ebersberger, D. Metzler, C. Schwarz, S. Paabo, Am.}.

Hum. Genet. 70, 1490 (2002).
5. R. Sakate et al., Genome Res. 13, 1022 (2003).
6. Detailed materials and methods are available as sup­

porting material on Science Online.
7. A total of 201,805 primer pairs were successfully designed

to 23.363 human coding sequences (27.6 Mb).
8. Primer pairs were amplified in 39 female human

individuals (19 African-Americans, 20 Caucasians)
and 1 male chimpanzee (4X0033, Southwest Nation­
al Primate Research Center) by a standard PCR and
sequencing protocols. Trimmed chimp sequences
were BLASTed against human exon sequence (9) to
create virtual transcripts.

9. J. C. Venter et al., Science 291, 1304 (2001 ).
10. R. J. Mural et al., Science 296, 1661 (2002).
11. Mouse-human orthologs were downloaded from Na­

tional Center for Biotechnology Information {NCBI)
HomoloGene; NCBI Homol_seq_pairs; NCBI Homol­
ogy Map; and Mouse Genome Database, Mouse Ge­
nome Informatics Web Site, The Jackson Laboratory
(Bar Harbor, ME).

12. All 7645 alignments in Phylip format (13) and a
flatfile of genes and their associated statistics are
available at http://panther.celera.com/appleraHCM_
alignments/index.jsp. Sequences have been deposited

in GenBank under accession codes AY398769-
AY421703.

13. J. Felsenstein, J. Mot. Evol. 17, 368 (1981).
14. N. Goldman, Z. Yang, Mot. Biol. Evol. 11, 725

(1994).
15. S. V. Muse, B. S. Gaut, Mot. Biol. Evol. 11, 715 (1994).
16. Z. Yang. R. Nielsen, Mot. Biol. Evol. 19, 908 (2002).
17. Z. Yang. R. Nielsen, J. Mot. Evol. 46, 409 (1998).
18. Z. Yang. R. Nielsen, Mot. Biol. Evol. 17, 32 (2000).
19. I. Hellmann et al., Genome Res. 13, 831 (2003).
20. J. A. Bailey et al., Science 297, 1003 (2002).
21. A. Kong et al., Nature Genet. 31,241 (2003).
22. P. D. Thomas et al., Nucleic Acids Res. 31,334 (2003).
23. Y. Gilad, 0. Man, S. Paabo, D. Lancet, Proc. Natl.

Acad. Sci. U.S.A. 100, 3324 (2003).
24. Y. Gilad, C. D. Bustamante, D. Lancet, S. Paabo, Am. J.

Hum. Genet. 73, 489 (2003).
25. H. R. Freund, M. Hanani, Nutrition 18, 287 (2002).
26. S. Gadkar et al., Biol. Reprod. 67, 1327 (2003).
27. W. Enard et al., Nature 418, 869 (2002).

REPORTS

28. S. Naz et al., j. Med. Genet. 40, 360 (2003).
29. The data in this paper were obtained from more

than 18 million sequencing reads obtained from
the Cetera Genomics sequencing center in Rock­
ville, MD. We thank J. Duff, C. Gire, M. A. Rydland,
C. Forbes, and B. Small for development and main­
tenance of software systems, laboratory informa­
tion management systems, and analysis programs.
S. Hannenhalli and S. Levy provided particularly
helpful discussions. C. Aquadro, B. Lazzaro, K. Mon­
tooth, T. Schlenke, and P. Wittkopp provided help­
ful comments on the manuscript.

Supporting Online Material
www.sciencemag.org/cgi/content/full/302/5652/1960/
DC1
Materials and Methods
Tables S1 to S3
Databases S1 and S2

7 July 2003; accepted 24 October 2003

The Proteasome of Mycobacterium
tuberculosis Is Required for
Resistance to Nitric Oxide

K. Heran Darwin, 1 Sabine Ehrt,1 Jose-Carlos Gutierrez-Ramos, 2
Nadine Weich,2 Carl F. Nathan 1 •3 *

The f roduction of nitric oxide and other reactive nitrogen intermediates
(RNI by macrophages helps to control infection by Mycobacterium tuber­
culosis (Mtb). However, the protection is imperfect and infection persists.
To identify genes that Mtb requires to resist RNI, we screened 10,100 Mtb
transposon mutants for hypersusceptibility to acidified nitrite. We found 12
mutants with insertions in seven genes representing six pathways, including
the repair of DNA (uvrB) and the synthesis of a flavin cofactor (fbiC). Five
mutants had inse·rtions in proteasome-associated genes. An Mtb mutant
deficient in a presumptive proteasomal adenosine triphosphatase was at­
tenuated in mice, and exposure to proteasomal protease inhibitors markedly
sensitized wild-type Mtb to RNI. Thus, the mycobacterial proteasome serves
as a defense against oxidative or nitrosative stress.

Mtb persistently infects about two billion
people. The identification of pathways used
by the microbe to resist elimination by the
host immune response may suggest new
targets for prevention or treatment of tuber­
culosis. During latent infection, the primary
residence of Mtb is the macrophage. The
antimicrobial arsenal of the activated mac­
ropha

Ecology homework help

Preliminary Written Report

PRELIMINARY WRITTEN REPORT

For this week’s assignment, you are to complete and submit a preliminary (rough) draft of the written portion of your Project. In your preliminary report, remember to include the following sections:

introduction,

symptoms,

diagnosis,

cure,

prevention,

timeline,

and a brief closing summary.

Ecology homework help

GLOBALIZATION AND THE HEALTHCARE
WORKFORCE

Leah E. Masselink

CHAPTER

3

47

Learning Objectives

After completing this chapter, the reader should be able to

• describe the history and current trends in international migration of
physicians and nurses;

• enumerate the factors that motivate physicians and nurses to migrate to
other countries;

• discuss the implications of physician and nurse migration for sending and
receiving countries;

• understand the policy context and policy interventions that attempt to
manage physician and nurse migration; and

• explain the issues of ethical recruitment, visa regulation, credentialing,
and adaptation for managers of foreign-born and -trained physicians and
nurses.

Introduction

In an increasingly interconnected world, the movement of people and infor-
mation across international borders has become a phenomenon that is often
taken for granted. As skilled healthcare providers, physicians and nurses have
had opportunities to seek employment internationally for several decades, and
foreign-trained professionals are important parts of the healthcare systems in
many countries. In the United States alone, about 25 percent of physicians are
foreign born and educated and about 4 percent of nurses were educated over-
seas (Cooper and Aiken 2006; Aiken et al. 2004).

The implications of international migration of physicians and nurses
are complex, becoming a source of increasing debate in recent years. While
physicians and nurses who migrate to other countries can benefit from better
working conditions or salaries in their destinations, their movement can exacer-
bate inequalities in the worldwide distribution of healthcare workers. Migration

Fried_CH03.qxd 6/11/08 4:08 PM Page 47

C
o
p
y
r
i
g
h
t

2
0
0
8
.

H
e
a
l
t
h

A
d
m
i
n
i
s
t
r
a
t
i
o
n

P
r
e
s
s
.

A
l
l

r
i
g
h
t
s

r
e
s
e
r
v
e
d
.

M
a
y

n
o
t

b
e

r
e
p
r
o
d
u
c
e
d

i
n

a
n
y

f
o
r
m

w
i
t
h
o
u
t

p
e
r
m
i
s
s
i
o
n

f
r
o
m

t
h
e

p
u
b
l
i
s
h
e
r
,

e
x
c
e
p
t

f
a
i
r

u
s
e
s

p
e
r
m
i
t
t
e
d

u
n
d
e
r

U
.
S
.

o
r

a
p
p
l
i
c
a
b
l
e

c
o
p
y
r
i
g
h
t

l
a
w
.

EBSCO Publishing : eBook Academic Collection (EBSCOhost) – printed on 2/1/2022 4:15 PM via WESTERN KENTUCKY UNIVERSITY
AN: 237620 ; Fottler, Myron D., Fried, Bruce.; Human Resources in Healthcare : Managing for Success
Account: s8993066.main.ehost

of healthcare workers from developing countries has particularly far-reaching
implications. These developing countries not only lose their investments in ed-
ucation and training, income tax revenue, and potential for national growth,
buy they also see adverse health effects on their populations. In nations where
healthcare workforce shortages are already severe, the need to replace healthcare
professionals who have left for other countries only further depletes the health
system’s resources—funds that normally go toward fighting diseases and pro-
moting public health. In addition, the lack of highly skilled care providers pre-
vents these countries from meeting their own needs for healthcare innovation
and problem solving. These factors exacerbate the existing inequalities in health-
care between developed and developing countries.

Given that foreign-trained physicians and nurses play an important role
in many healthcare organizations in the United States, healthcare managers in
this country must understand several issues related to the globalization of the
healthcare workforce:

• In what areas do international migration of physicians and nurses occur?
What can explain these patterns?

• What factors motivate the international migration of physicians and
nurses?

• What are the ethical and logistical implications of physician and nurse
migration for sending and receiving countries?

International migration of physicians and nurses is inherently difficult
to manage because policies designed to direct and oversee it must balance two
often competing objectives: (1) to protect the inherent right of people to mi-
grate and (2) to ensure that quality healthcare services are available to all. This
chapter describes past and current migration trends, causes, policy context,
and responses. It also explores several international migration issues, such as
ethical recruitment, visa regulation, credentialing, and adaptation. All of these
topics are essential knowledge for U.S. healthcare managers.

History and Current Trends

Anecdotal accounts of international migration of physicians and nurses began
to circulate in the 1960s. Initial reports mostly documented migration be-
tween developed countries, such as from Canada to the United States (BMJ
1968). In the 1970s, the World Health Organization (WHO) commissioned
The Multinational Study of the International Migration of Physicians. This no-
table study found that, at the time, significant numbers of international med-
ical graduates (IMGs) were practicing in the United States (about one in
every five physicians), the United Kingdom (more than one in every four
physicians), and Canada (one in every three physicians). Germany also had

48 H u m a n R e s o u r c e s i n H e a l t h c a r e

Fried_CH03.qxd 6/11/08 4:08 PM Page 48

EBSCOhost – printed on 2/1/2022 4:15 PM via WESTERN KENTUCKY UNIVERSITY. All use subject to https://www.ebsco.com/terms-of-use

substantial numbers of migrant physicians, including many from Iran and the
Middle East (Mejía 1978). In addition, the study reported that significant
numbers of international nursing graduates (INGs) worked in the United
States, European countries, and other developed nations. Sending countries
(the countries from which healthcare professionals migrate) with particularly
high proportions of nurses who go abroad to work include Haiti, Suriname,
Hong Kong, Jordan, and the Philippines. In absolute numbers, more Filipino
nurses were registered in the United States and Canada than in the Philippines
in 1970 (Mejía 1978).

The characteristics of healthcare workforce migration have shifted since
the WHO study was conducted in the 1970s. New sending countries have be-
come significant sources of migrant physicians, including Egypt, Cuba, and
nations in the Caribbean; sub-Saharan Africa; and the former Soviet Union.
New receiving countries (the destinations of migrant healthcare professionals),
such as the Persian Gulf states, have begun to draw physicians and nurses from
all over the world, including Europe and India. Migration between the Euro-
pean Union and African countries has also increased (Martineau, Decker, and
Bundred 2004). Some countries—particularly South Africa—have emerged as
“holding grounds” for migrant workers who stay temporarily on their way to
their final destination country (Vujicic et al. 2004).

According to Mullan (2005), the countries that send the largest num-
bers of physicians abroad are India, the Philippines, and Pakistan, while the
countries that receive the greatest numbers of IMGs are the United States, the
United Kingdom, Canada, and Australia. IMGs compose approximately
25 percent of the physician workforce in the United States, 28 percent in the
United Kingdom, 23 percent in Canada, and 27 percent in Australia (Mullan
2005). In the United States, the three largest sending countries or regions for
INGs are the Philippines, Canada, and Africa (especially South Africa and Nige-
ria). Between 1997 and 2000, 33 percent of foreign-born nursing-licensure ap-
plicants were Filipino, 22 percent were Canadian, and 7 percent were African
(Buchan, Parkin, and Sochalski 2003).

Migration streams, particularly between English-speaking countries,
appear to be well established: While IMGs make up more than 20 percent of
the total physician workforces in the United States, the United Kingdom,
Australia, and Canada, they represent only a tiny proportion of the physician
workforces in France (3 percent) and Japan (1 percent) (Mullan 2005). In
sub-Saharan Africa, rates of nurse migration are also markedly higher in An-
glophone countries than in French- and Portuguese-speaking countries
(Dovlo 2007). Many sending countries tend to have historical relationships
with English-speaking receiving countries. For example, physicians from India
and Pakistan make up the largest and third-largest groups, respectively, of IMGs
in the United Kingdom, and doctors from the Philippines are the second-
largest group of noncitizen IMGs in the United States (Mullan 2005).1

49C h a p t e r 3 : G l o b a l i z a t i o n a n d t h e H e a l t h c a r e W o r k f o r c e

Fried_CH03.qxd 6/11/08 4:08 PM Page 49

EBSCOhost – printed on 2/1/2022 4:15 PM via WESTERN KENTUCKY UNIVERSITY. All use subject to https://www.ebsco.com/terms-of-use

Many policymakers in both sending and receiving countries have ex-
pressed concern about the fact that the largest receiving countries draw sig-
nificant proportions of their IMG workforces from lower-income countries.
More than 75 percent of the IMGs in the United Kingdom come from lower-
income countries, and other receiving countries have substantial proportions
as well: Sixty percent of IMGs in the United States and about 40 percent of
those in Canada and Australia are from developing nations (Mullan 2005).

Causes of International Migration

Determinants of physician and nurse migration are often discussed in terms of
“push” and “pull” factors. Push factors motivate physicians and nurses to leave
their home countries, while pull factors cause them to choose particular receiv-
ing countries. The reasons are chiefly discussed within an economic framework,
considering a variety of factors as potential determinants. These include per
capita gross domestic product, physician coverage, manpower production rates,
rural/urban distribution of physicians and nurses, and workforce imbalances.

Push factors cited by the majority of studies include low pay, poor
working conditions, political instability and insecurity, inadequate housing
and social services, and lack of educational opportunities and professional de-
velopment. Job dissatisfaction, lack of motivation, and weak professional lead-
ership are also mentioned as contributing factors (Saravia and Miranda 2004).
Pull factors, on the other hand, include opportunities for professional train-
ing, better job opportunities, and higher wages (Forcier, Simoens, and Giuf-
frida 2004). Other pull factors relate to workforce-supply issues that have cre-
ated an imbalance between the demand for services and the supply of workers
in receiving countries, such as aging of both the general population and the
nursing workforce and the slowdowns in enrollment in training programs
(Buchan and Sochalski 2004). The nursing workforces in receiving countries
are vulnerable to such shortages, particularly with the opening of male-dom-
inated careers to women (Marchal and Kegels 2003). IMGs and INGs are par-
ticularly needed in some receiving countries where domestically trained
providers are reluctant to serve in certain capacities, such as in remote areas or
in nursing homes.

Sending Country/Region Trends

Physician and nurse migration can be managed to varying degrees by sending
countries. Some regions (such as sub-Saharan Africa and the Caribbean) con-
tinue to lose workers in the face of severe shortages, while other nations (such
as Cuba, India, and the Philippines) purposely train surplus physicians and

50 H u m a n R e s o u r c e s i n H e a l t h c a r e

Fried_CH03.qxd 6/11/08 4:08 PM Page 50

EBSCOhost – printed on 2/1/2022 4:15 PM via WESTERN KENTUCKY UNIVERSITY. All use subject to https://www.ebsco.com/terms-of-use

nurses for overseas employment. Still other countries (particularly China) are
currently looking to shift into a training-for-export mode. This section sheds
light on the diverse situations faced by sending countries and describes in de-
tail the factors that contribute to each situation.

Brain Drain: Sub-Saharan Africa and the Caribbean

The situation in sub-Saharan Africa and the Caribbean is often referred to as
brain drain—the widespread, uncontrolled departure of physicians and
nurses from countries that already suffer healthcare worker shortages.

In sub-Saharan Africa, the largest sending countries are South Africa and
Nigeria. In 2005, nearly 7,000 South African physicians and more than 4,000
Nigerian physicians were practicing in the United States, the United Kingdom,
Canada, and Australia (Mullan 2005). Ghana has also experienced high rates
of physician and nurse emigration: In 2000, that country lost more practicing
nurses than the number of nursing graduates it produced (Dovlo 2007). As a
relatively wealthy sub-Saharan African state, South Africa is unique in that it
acts as both a sending and a receiving country for migrant physicians and
nurses, many of whom come from other countries in the region.

In Africa, among the factors that influence health professionals’ deci-
sions to leave are low quality of life, high crime rates, conflict, political repres-
sion, and lack of educational opportunities for children. The HIV/AIDS epi-
demic has seriously depleted the healthcare workforce through death and
attrition, and caring for growing numbers of patients with HIV/AIDS has
overburdened the remaining providers. Nurses in this region are poorly paid,
and this lack of adequate compensation also contributes to the workforce
shortage. Sub-Saharan Africa suffers from a serious maldistribution of healthcare
workers, with uneven supply between the public and private sectors, urban and
rural areas, and tertiary and primary levels of care (Padarath et al. 2004).

A lack of higher education and career-development opportunities is an-
other major push factor in this region. This dearth reflects a pattern of under-
investment in higher education by governments and outside donors. Health-
professional education and training not only subsist on very limited material
resources but are also plagued by a shortage of qualified teachers.

Similarly, countries in the Caribbean are overwhelmed by extremely
high rates of HIV infection that are second only to the epidemic in sub-Saharan
Africa. This region has also experienced crippling losses of nurses in recent
years: 42 percent of all nursing positions across the Caribbean are vacant, and
the lack of nursing educational capacity serves only to perpetuate the massive
losses of nursing educators and experienced nurses. Jamaica is particularly af-
fected, with a 58 percent average nursing vacancy rate in 2003. Many Ja-
maican nurses left to work in the United States, the United Kingdom, and
Canada, and Jamaican healthcare leaders have begun to recruit from other
countries in the Caribbean to make up for losses (Salmon et al. 2007).

51C h a p t e r 3 : G l o b a l i z a t i o n a n d t h e H e a l t h c a r e W o r k f o r c e

Fried_CH03.qxd 6/11/08 4:08 PM Page 51

EBSCOhost – printed on 2/1/2022 4:15 PM via WESTERN KENTUCKY UNIVERSITY. All use subject to https://www.ebsco.com/terms-of-use

Strategic Deployment: Cuba, the Philippines, and India

Some developing countries train surplus physicians and nurses for overseas
employment, and both state and business interests promote and manage
this practice. Cuba has a long-standing program of physician deployment,
and the Philippine government has worked to manage nurse migration for
many years. Recently, strategic deployment programs have also arisen in
India.

For several decades now, Cuba has made the provision of healthcare
workers to developing countries a part of its foreign policy, sending physicians
to developing countries as participants in a Peace Corps–style international
medical-aid program (Feinsilver 1989). These efforts are part of a larger ef-
fort by the Cuban government to promote its political agenda and to position
itself as a “world medical power.” Dozens of countries have received Cuban
physicians over the years, including Algeria, South Africa (Lee 1996), and
more recently Venezuela (Muntaner et al. 2006). Cuban physicians who par-
ticipate in the program often provide services in isolated rural areas and are
often involved in training their host countries’ indigenous healthcare workers
(Feinsilver 1989).

The Philippine government has been particularly active in establishing
policies that aim to make the country the niche producer of nurses in the
global economy (Ball 1996). The Philippines produces about 20,000 new
nurses every year (Lorenzo et al. 2007), and the vast majority of these grad-
uates eventually find work overseas: In 2004, 85 percent of all Filipino nurses
practiced abroad (Aiken et al. 2004). A government agency regulates recruit-
ment of Filipino overseas workers and processes documents for those bound
to work in other countries. The emergence of nursing as a pathway to migra-
tion has led to unprecedented demand for nursing education in the Philip-
pines. The number of nursing schools has grown exponentially in the past few
decades, from 40 schools in the 1970s to 460 schools in 2006 (Lorenzo et al.
2007). This growth has led to concerns about the quality of nursing educa-
tion, as schools compete with each other for faculty and hospital training space
(Lorenzo et al. 2007).

Historically, India has been one of the largest sending countries of
physicians to developed nations, including the United States and the United
Kingdom (Mullan 2006). In recent years, it has also become a popular source
country for nurses. Since the 1990s, it has moved from sixth to second posi-
tion (after the Philippines) among countries that send nurses to the United
States. Like the Philippines, India has a huge overall labor surplus, although
it also has a very low nurse-to-population ratio. It has also become the site of
increasing commercial activity around nursing education and migration. In-
dian hospitals have become involved in recruiting and training nurses for over-
seas markets, and local recruitment agencies that partner with U.S.-based re-
cruiters have appeared in many urban areas. In recent years, some state

52 H u m a n R e s o u r c e s i n H e a l t h c a r e

Fried_CH03.qxd 6/11/08 4:08 PM Page 52

EBSCOhost – printed on 2/1/2022 4:15 PM via WESTERN KENTUCKY UNIVERSITY. All use subject to https://www.ebsco.com/terms-of-use

governments have also begun to engage in international deployment of nurses
(Khadria 2007).

The most frequently cited reason for the strategic deployment of physi-
cians and nurses is the remittance income that migrant workers send to their
home countries. Remittances can be a substantial source of revenue for send-
ing countries. For example, Filipino migrant workers remitted $10.7 billion
in 2005 (Lorenzo et al. 2007). Remittance income is often considered a po-
tentially positive outcome of emigration. However, while such income may
offset sending countries’ financial losses, it may not make up for the staffing
issues and poor outcomes associated with workforce migration.

Up-and-Coming Player: China

China is a relative newcomer to the global nursing market. It has sent nurses
abroad for about 15 years, when the government began deploying groups of
English-speaking nurses to Singapore and Saudi Arabia under temporary gov-
ernment-arranged contracts (Fang 2007). Since the early 2000s, this migra-
tion has shifted to countries such as Australia and the United Kingdom, where
it is usually arranged by private agencies. U.S. healthcare organizations have
begun to express interest in recruiting nurses from China.

For some Chinese nurses, the desire to seek employment abroad is in-
fluenced by several domestic factors. First, China has not invested enough in
healthcare to employ all of its trained and educated nurses. Like the Philip-
pines, China has a surplus of nurses based on the number of budgeted posi-
tions. Many nurses are unable to find work, or they are forced to retire early
to make room for new graduates who are entering the workforce. Also, China
has more physicians than nurses, contrary to recommendations by the WHO.
In this context, overseas markets are becoming a desirable alternative for some
Chinese nurses.

Consequences for Receiving Countries

The presence of IMGs and INGs has several important consequences for re-
ceiving countries. Some of the consequences of physician and nurse migration
relate to larger issues of recruitment and retention. International recruitment
is suggested to be a quick fix for recruitment and retention problems in re-
ceiving countries, allowing domestic supply lines to avoid developing their
own solutions to unmet health-system needs. International migration may
help receiving countries to fill positions in areas that are not as attractive to
domestic workers. This leads to concerns that foreign-trained professionals
may be subject to exploitation or may be forced to work in positions that are
below their expertise—a phenomenon referred to as “brain waste” (Marchal
and Kegels 2003).

53C h a p t e r 3 : G l o b a l i z a t i o n a n d t h e H e a l t h c a r e W o r k f o r c e

Fried_CH03.qxd 6/11/08 4:08 PM Page 53

EBSCOhost – printed on 2/1/2022 4:15 PM via WESTERN KENTUCKY UNIVERSITY. All use subject to https://www.ebsco.com/terms-of-use

The effects of having immigrant physicians and nurses on accessibility
and quality of care are unclear: Some suggest that the quality and safety of care
provided by internationally trained providers may be cause for concern, while
others argue that the presence of these professionals may improve access to
care, lower prices, and induce competition and higher quality (Forcier,
Simoens, and Giuffrida 2004). The “safety net” use of immigrant healthcare
workers has been demonstrated to be a real phenomenon (Forcier, Simoens,
and Giuffrida 2004). The presence of immigrant health workers may prevent
receiving healthcare systems from solving their own training and staffing
problems. For example, while U.S. hospitals hire thousands of IMGs each
year, thousands of domestic medical-school applicants are turned away (Mar-
tineau, Decker, and Bundred 2004).

The Policy Context

International migration occurs in the context of several important trade
agreements. One such agreement that could affect future migration dynam-
ics is the General Agreement on Trade in Services (GATS), which was imple-
mented in 1995. GATS is an international treaty that governs the trade of
services, including health services, among member countries of the World
Trade Organization. GATS has three main objectives: (1) to liberalize trade
in services, (2) to encourage economic growth through liberalizing trade in
services, and (3) to increase the participation of developing countries in the
world trade in services. The four modes of trade governed by GATS are (1)
cross-border supply (services provided by workers in one country for organ-
izations in another country), (2) consumption abroad (including medical
tourism and education of foreign students), (3) commercial presence (invest-
ment of capital from one country into another), and (4) movement of natu-
ral persons (temporary cross-border migration of workers to provide services
in another country [Kingma 2006]). While the GATS provision for tempo-
rary migration has caused concern that it would encourage further migration
of health workers from developing countries to developed countries, this el-
ement is still being negotiated, and its final effects remain unclear (Kingma
2007).

Another agreement that particularly affects migration patterns in the
United States is the North American Free Trade Agreement (NAFTA), which
was implemented in 1994. NAFTA provides for the movement of workers be-
tween Canada, the United States, and Mexico, including special visa cate-
gories and mutual recognition of nurse licensure in the United States and
Canada. This agreement has raised Canada’s profile as a sending country of
nurses in the United States, but movement between the two countries has
been mostly unidirectional: About 15,000 Canadian nurses have moved to the
United States under NAFTA, but relatively few U.S.-trained nurses have
moved to Canada (Kingma 2006; Mautino 2003).

54 H u m a n R e s o u r c e s i n H e a l t h c a r e

Fried_CH03.qxd 6/11/08 4:08 PM Page 54

EBSCOhost – printed on 2/1/2022 4:15 PM via WESTERN KENTUCKY UNIVERSITY. All use subject to https://www.ebsco.com/terms-of-use

Policy Responses

A broad variety of policy initiatives have been proposed and implemented by
sending and receiving countries to manage international migration of physi-
cians and nurses. These include programs instituted by worldwide bodies such
as the WHO and the International Council of Nurses (ICN), domestic policy
changes in sending and receiving countries, government-to-government bi-
lateral agreements, and proposed compensation schemes. Some countries or
regions have adopted unique policies to manage the effects of physician and
nurse migration: The Caribbean, as a sending region, has adopted a program
called Managed Migration, and the United Kingdom, as a receiving country,
has established the “Code of Practice on International Recruitment.”

The WHO (2007) has developed a variety of initiatives to manage the migra-
tion of healthcare workers. It is working with the Global Health Workforce
Alliance Task Force to support efforts to scale up health-worker education,
particularly in countries faced by workforce crises. It also provides technical
support to countries and assists regional human resources for health observa-
tories. Additionally, the WHO supports the Treat, Train, Retain (TTR) initia-
tive, begun in 2006 to curb the effects of HIV/AIDS on the healthcare work-
force and health systems in low- and middle-income countries. The goals of
TTR are threefold: (1) to provide treatment, prevention, and support to
health workers affected by HIV/AIDS; (2) to train providers (including com-
munity health workers) to maximize existing capacity to treat HIV/AIDS;
and (3) to retain health workers in rural areas and the public sector in un-
derresourced countries. The WHO will provide assistance to participating
countries in developing TTR plans and budgeting for proposed changes,
but TTR’s implementation and financing will be managed by individual
countries.

The ICN—the federation of national nurses associations (e.g., American
Nurses Association, Philippine Nurses Association)—has developed a position
statement on ethical recruitment of nurses to guide the recruitment efforts
between its member countries. While acknowledging nurses’ inherent right to
migrate, the statement also calls for receiving countries to work toward build-
ing self-sustainable, domestically trained nursing workforces. The statement
also aims to protect migrant nurses, calling for several measures such as good-
faith contracting, freedom of employment and association, and fair pay and
working conditions (ICN 2007).

Some sending countries have implemented domestic policy changes to reduce
the effects of push factors that motivate physicians and nurses to seek overseas
employment. These changes include improvement in pay, career opportunities,
and working conditions; provision of incentives to induce overseas workers to

55C h a p t e r 3 : G l o b a l i z a t i o n a n d t h e H e a l t h c a r e W o r k f o r c e

WHO Activities

ICN Statement

Domestic
Policies in
Sending
Countries

Fried_CH03.qxd 6/11/08 4:08 PM Page 55

EBSCOhost – printed on 2/1/2022 4:15 PM via WESTERN KENTUCKY UNIVERSITY. All use subject to https://www.ebsco.com/terms-of-use

return home; and the development of private-sector opportunities. Other meas-
ures focus more specifically on medical education, including pre-education
screening of candidates likely to stay in-country, shortening of domestic train-
ing programs, and adaptations of curriculum to local conditions.

Still other policies aim to use financial disincentives to keep workers in-
country, requiring emigrants to pay fees upon departure. For example, Eritrea
has a bond program in which departing physicians are required to make up-
front payments that guarantee their return from studies in South Africa (Mar-
chal and Kegels 2003). This type of system could be particularly useful if rev-
enues generated were used to fund human resources development in sending
countries (Saravia and Miranda 2004).

Some receiving countries have adopted domestic policy changes to address
the underlying human resources imbalances that contribute to the demand
for foreign-trained workers. In many developed countries, nursing short-
ages are exacerbated by difficulties in retaining domestically trained
nurses—difficulties that are often related to poor working conditions and
low salaries (Janiszewski Goodin 2003). Turnover rates for nurses in U.S.
hospitals were estimated at between 10 percent and 30 percent in 2000
(HSM Group 2002). To improve domestic retention, receiving countries,
such as the United Kingdom and Australia, have implemented programs to
recruit and retain domestic healthcare workers (Martineau, Decker, and
Bundred 2004). Other countries have begun recruiting nonconventional
workers, such as firefighters, to the healthcare field (Marchal and Kegels
2003).

In 2002, the U.S. Congress passed the Nurse Reinvestment Act, a
piece of legislation that uses a combination of expanded eligibility for loan re-
payment, education vouchers, and other measures to improve retention of
nurses (Andrews 2004). While this legislation represents a positive step in im-
proving retention of domestically trained nurses, its funding stream has been
subject to frequent cuts in the past few years, so its overall impact is unclear
(Janiszewski Goodin 2003).

Some sending and receiving countries have attempted to regulate the migration
of healthcare workers between them by signing government-to-government
bilateral agreements. Under this agreement, a receiving country pledges to
underwrite the costs of training additional staff; to recruit staff for a fixed pe-
riod (often providing training before staff return to the sending country); or
to recruit surplus staff from a sending country (Buchan 2007). For example,
the United Kingdom has bilateral agreements with the Philippines and Spain
that allows the United Kingdom to recruit nurses from these two countries
for temporary work in the National Health Service (Kline 2003). Bilateral
agreements can help to manage the flow of physicians or nurses between

56 H u m a n R e s o u r c e s i n H e a l t h c a r e

Domestic
Policies in
Receiving
Countries

Government-to-
Government

Bilateral
Agreements

Fried_CH03.qxd 6/11/08 4:08 PM Page 56

EBSCOhost – printed on 2/1/2022 4:15 PM via WESTERN KENTUCKY UNIVERSITY. All use subject to https://www.ebsco.com/terms-of-use

sending and receiving countries by mandating short-term rather than perma-
nent migration.

Another policy intervention that has been proposed requires receiving coun-
tries to compensate sending countries for the financial losses associated with
worker migration. Various versions of this plan call for remuneration of the
costs of educating migrant workers, for assistance with human resources de-
velopment in sending countries, and for additional compensation for sending
countries’ lost tax revenue. Although well intended, these measures are diffi-
cult to implement because administrative costs would likely be high and be-
cause determining payment amounts, procedures, and enforcement would
present further challenges to sending and receiving countries (Marchal and
Kegels 2003).

The Managed M

Ecology homework help

1. Title of paper and include the link if it is one you chose versus one I assigned

2. List 3 observations from the paper that inspired the authors to conduct the experiment or write this paper. DO NOT tell me what they did. Tell me WHY they are even going the study – 3 reasons that should be based on previous research and is usually addressed in introductions.

3. What was/is their over-arching question?

4. What is their hypothesis?

5. What was their independent variable?

6. What was their dependent variable?

7. EXPLAIN what they did in their experiment in your own words. This will require some details, show me you understood everything they did.

8. What was their control, did they have one?

9. Summarize the results of their data.

10. Summarize why this study matters? What does it tell us? Why should we care?

Ecology homework help

IMGs and INGs, ethical considerations

Read the corresponding chapter and both current topic articles for week 3, then respond to the following discussion question:

In your opinion, what are the ethical issues that healthcare leaders and managers must consider when recruiting IMGs and INGs? Why are these issues?  What role, if any, does empathy and cultural awareness training have?

· You must cite your source at the end of your post.  Utilize APA format.

Ecology homework help

The ecology and evolution of seed predation by Darwin’s finches on
Tribulus cistoides on the Gal�apagos Islands

SOF�IA CARVAJAL-ENDARA ,1,10 ANDREW P. HENDRY ,1,2 NANCY C. EMERY ,3 COREY P. NEU ,4

DIEGO CARMONA ,5 KIYOKO M. GOTANDA ,6 T. JONATHAN DAVIES ,1,7 JAIME A. CHAVES ,8 AND
MARC T. J. JOHNSON 9

1Department of Biology, McGill University, 1205 Avenue Docteur Penfield, Montr�eal, Quebec H3A 1B1 Canada
2Redpath Museum, McGill University, 859 Sherbrooke Street West, Montr�eal, Quebec H3A 0C4 Canada

3Department of Ecology and Evolutionary Biology, University of Colorado Boulder, Boulder, Colorado 80309-0334 USA
4Department of Mechanical Engineering, University of Colorado Boulder, Boulder, Colorado 80309-0427 USA

5Departamento de Ecolog�ıa Tropical, Campus de Ciencias Biol�ogicas y Agropecuarias, Universidad Aut�onoma de Yucat�an, M�erida,
Yucat�an M�exico

6Department of Zoology, University of Cambridge, Cambridge, CB2 3EJ United Kingdom
7Biodiversity Research Centre, Departments of Botany, Forest and Conservation Sciences, University of British Columbia, 2212 Main

Mall, Vancouver, British Columbia V6T 1Z4 Canada
8Colegio de Ciencias Biol�ogicas y Ambientales – Extensi�on Gal�apagos, Universidad San Francisco de Quito, Campus Cumbay�a, Casilla

Postal 17-1200-841, Quito, Ecuador
9Department of Biology, University of Toronto Mississauga, Mississauga, Ontario L5L 1C6 Canada

Citation: Carvajal-Endara, S., A. P. Hendry, N. C. Emery, C. P. Neu, D. Carmona, K. M. Gotanda,
T. J. Davies, J. A. Chaves, and M. T. J. Johnson. 2020. The ecology and evolution of seed predation by
Darwin’s finches on Tribulus cistoides on the Gal�apagos Islands. Ecological Monographs 90(1):e01392. 10.
1002/ecm.1392

Abstract. Predator–prey interactions play a key role in the evolution of species traits
through antagonistic coevolutionary arms races. The evolution of beak morphology in the
Darwin’s finches in response to competition for seed resources is a classic example of evolu-
tion by natural selection. The seeds of Tribulus cistoides are an important food source for
the largest ground finch species (Geospiza fortis, G. magnirostris, and G. conirostris) in dry
months, and the hard spiny morphology of the fruits is a potent agent of selection that
drives contemporary evolutionary change in finch beak morphology. Although the effects of
these interactions on finches are well known, how seed predation affects the ecology and
evolution of the plants is poorly understood. Here we examine whether seed predation by
Darwin’s finches affects the ecology and evolution of T. cistoides. We ask whether the inten-
sity of seed predation and the strength of natural selection by finches on fruit defense traits
vary among populations, islands, years, or with varying finch community composition (i.e.,
the presence/absence of the largest beaked species, which feed on T. cistoides most easily).
We then further test whether T. cistoides fruit defenses have diverged among islands in
response to spatial variation in finch communities. We addressed these questions by examin-
ing seed predation by finches in 30 populations of T. cistoides over 3 yr. Our study reveals
three key results. First, Darwin’s finches strongly influence T. cistoides seed survival,
whereby seed predation varies with differences in finch community composition among
islands and in response to interannual fluctuations in precipitation. Second, finches impose
phenotypic selection on T. cistoides fruit morphology, whereby smaller and harder fruits
with longer or more spines exhibited higher seed survival. Variation in finch community
composition and precipitation also explains variation in phenotypic selection on fruit
defense traits. Third, variation in the number of spines on fruits among islands is consistent
with divergent phenotypic selection imposed by variation in finch community composition
among islands. These results suggest that Darwin’s finches and T. cistoides are experiencing
an ongoing coevolutionary arms race, and that the strength of this coevolution varies in
space and time.

Key words: adaptive divergence; coevolutionary arms race; geographic mosaic; phenotypic selection;
plant defense; trophic interactions.

INTRODUCTION

Antagonistic interactions play a major role in the evo-
lutionary diversification of traits that mediate species
interactions (Thompson 1999, Vamosi 2005, Paterson

Manuscript received 20 December 2018; revised 8 May 2019;
accepted 9 July 2019. Corresponding Editor: Todd M. Palmer

10 E-mail: sofia.carvajalendara@mail.mcgill.ca

Article e01392; page 1

Ecological Monographs, 90(1), 2020, e01392
© 2019 by the Ecological Society of America

et al. 2010). Plant–herbivore interactions have long been
used as a model to understand the evolution and ecology
of antagonistic interactions (Ehrlich and Raven 1964,
Fritz and Simms 1992, Agrawal 2011). Plants employ a
wide diversity of mechanical and chemical defense
strategies to avoid the negative effects of herbivores,
including seed predators (Crawley 1983, Carmona et al.
2011). In turn, herbivores and predators use a variety of
strategies to counteract plant defenses, including behav-
ioral, morphological, and physiological offensive traits
(Karban and Agrawal 2002). Selection that favors traits
that better protect plants against herbivores and preda-
tors can lead to contemporary evolutionary changes in
plant defense traits (Agrawal et al. 2012, Z€ust et al.
2012, Didiano et al. 2014). Here, we study the effect of
seed predation by Darwin’s finches on plant ecology,
and its potential role in the evolution of seed defense
traits by natural selection.
The interaction between Darwin’s finches and their food

plants on the Gal�apagos Islands is a famous andwell-studied
example of contemporary evolution (Grant and Grant
2014). Previous studies in agroup of Darwin’s finches known
as ground finches show that evolutionary changes in beak
size and shape are driven by the availability and distribution
of seeds (Lack 1947, Grant 1986, Grant and Grant 1995).
Ground finches are primarily seed predators and poor seed
dispersers; they usually crush the seeds before ingesting them,
and their feces and gut samples rarely contain viable seeds
(Buddenhagen and Jewell 2006, Guerrero and Tye 2009). In
general, ground finches are opportunistic feeders that eat a
large variety of seed species, but when resources are limited
following droughts, finches become dependent on the seeds
of a smaller number of plant species that are often harder
and more difficult to open (Grant and Grant 1995, De Le�on
et al. 2014). The ability to exploit those seeds is largely influ-
enced by beak size and shape (Lack 1947, Grant and Grant
1995, De Le�on et al. 2011). Because seeds are a major part of
their diet, and because ground finches exhibit preferences for
certain seeds, it is anticipated that finches have an important
effect on the ecology and evolution of plants on the
Gal�apagos Islands. However, despite the well-developed liter-
ature on the interactions between Darwin’s finches and
plants (Boag and Grant 1981, Schluter and Grant 1984, Price
1987, Grant and Grant 1999, De Le�on et al. 2014), the eco-
logical and evolutionary consequences of seed predation by
finches on plants remains largely unexplored.
The effects of seed predation by finches on plants on

the Gal�apagos Islands are expected to be mediated by
both climate and the strength of species interactions. Pre-
dation pressure by finches on seeds during periods with
high precipitation might be negligible owing to the high
production of seeds, and the increased availability of other
food resources such as insects (Grant and Boag 1980,
Boag and Grant 1984, Price 1985, Gibbs and Grant
1987). However, during extended droughts, when seed
production is reduced, selective seed predation by finches
(Grant 1986, De Le�on et al. 2014, Grant and Grant
2014) could greatly influence seed survival, plant

distributions, and the evolution of seed defense traits.
Selection imposed by finches on seed defense traits is
expected to play the most important role for plant species
that are commonly exploited by finches. Caltrop (Tribulus
cistoides) is one of the main food sources for some species
of ground finches during dry periods, and it is credited
with driving the evolution of beak morphology in the
Medium Ground Finch (Geospiza fortis) during periods
of drought (Grant and Grant 2006, 2014). The fruits of
T. cistoides possess morphological features thought to
provide defenses against predation, including multiple
long spines and a hard protective tissue (Grant 1981;
Fig. 1). Grant (1981) showed that, within a T. cistoides
population on Daphne Major island, fruits with two
spines were eaten more frequently than fruits with four
spines, suggesting that finches impose selection on T. cis-
toides fruit morphology. However, selection on T. cis-
toides fruits has not been assessed across years or in
populations on other islands, and the association between
fruit morphology and seed survival in response to finch
predation across the archipelago remains unclear.
An additional factor that might influence the effects of

seed predation by finches on plants on the Gal�apagos
Islands is variation in the composition of finch communi-
ties. Ground finches are broadly distributed within the
archipelago and most of the islands harbor several species
that differ in beak size and shape. Among ground finches,
only the Large Ground Finch (G. magnirostris), the Large
Cactus Finch (G. conirostris), and the Medium Ground
Finch (G. fortis) are able to exploit T. cistoides seeds (Grant
1981, Grant and Grant 1982). These species, however, are
not uniformly distributed across the islands. The contempo-
rary faunas of some major islands have one of the large-
beaked G. magnirostris and G. conirostris species and the
small-beaked G. fortis, such as Santa Cruz and Isabela
(Fig. 2), whereas others lack the large-beaked species, such
as Floreana and San Crist�obal. This spatial variation in the
finch community could have large ecological and evolution-
ary consequences because G. magnirostris are superior at
feeding on T. cistoides seeds relative to G. fortis (Grant
1981), which could lead to divergent patterns of predation
and selection imposed on fruit morphology across the
Gal�apagos Islands.
Our study focuses on understanding the effects of seed

predation by Darwin’s finches on the ecology and evolu-
tion of T. cistoides. We asked the following three ques-
tions: (1) Does seed predation by finches vary among
populations, islands, finch community composition, and
years? We expected seed predation to vary among years;
due to variation in annual precipitation, and also in asso-
ciation with finch community composition (small-beaked
finches are expected to eat fewer seeds of T. cistoides dur-
ing wetter conditions). (2) Do finches impose selection on
T. cistoides fruit morphology, and does selection vary
among populations, islands, years, and with finch com-
munity composition? We expected the strength of selec-
tion on fruit morphology to vary over time in
correspondence with precipitation, and spatially among

Article e01392; page 2 SOF�IA CARVAJAL-ENDARA ET AL. Ecological Monographs
Vol. 90, No. 1

islands in association with finch community composition:
large-beaked finch species eat seeds more readily and
likely impose differing selection on fruit morphology
compared to communities with only small-beaked

finches. (3) Does T. cistoides fruit morphology differ
among islands with contrasting finch community compo-
sition (i.e., the presence/absence of large-beaked finches)?
We expected spatial variation in fruit morphology to

FIG. 1. (a) Tribulus cistoides fruits (schizocarps), from left to right: a green immature fruit, a mature dry fruit, and a fruit
attached to a maternal plant. (b) Two sets of dry mericarps, corresponding to two fruits of different plants, showing variation in size
and number of spines. Mericarps in the upper set are larger and have four spines while mericarps in the lower set are smaller and
have only two spines. (c) Opened mericarp to expose seed compartments, one empty compartment and three compartments with
seeds inside. (d) Geospiza fortis (Medium Ground Finch) holding a T. cistoides mericarp. Mericarps showing marks observed
(e) when seeds are eaten by finches, (f) when seeds are eaten by insects, and (g) when seeds germinate. Photo credits: Marc T. J. John-
son (a [left and middle], c, and f), Andrew P. Hendry (b), Kiyoko M. Gotanda (d and e), and Sof�ıa Carvajal-Endara (a [right] and g).

February 2020 DARWIN’S FINCHES AS AGENTS OF SELECTION Article e01392; page 3

reflect spatial variation in finch community composition,
which would be consistent with adaptive responses to
divergent selective pressure. To address these questions,
we examined variation in T. cistoides fruit morphology
and patterns of seed predation in 30 natural populations
across seven islands of the Gal�apagos archipelago over 3
yr, and performed a seed predation experiment in a popu-
lation on one of the islands. Our study is one of the first
to address the potential effect of seed predation by Dar-
win’s finches on the evolution of Gal�apagos plants. We
consider the importance of these results for understand-
ing the potential coevolutionary interactions between
Darwin’s finches and the plants whose seeds they con-
sume.

METHODS

Study site and system

The Gal�apagos archipelago is located in the Pacific
Ocean approximately 1,000 km west of the Ecuadorian
coast in South America, and it comprises 14 major

islands and many small islets (Geist 1996). We restricted
our study to seven islands that vary in finch community
composition (Fig. 2), and that harbor at least one of the
three finch species that consume T. cistoides seeds:
G. fortis, G. conirostris, and G. magnirostris. The diet of
these three finch species varies according to the size and
shape of their beaks, as well as the spatial and temporal
availability of seeds (Schluter and Grant 1984; Grant
and Grant 1999, De Le�on et al. 2014). During dry peri-
ods, especially the droughts that accompany La Ni~na
events, preferred foods are limited and, hence, T. cis-
toides seeds become a main food source for these finch
species (Grant and Grant 2014).
Tribulus cistoides (Zygophyllaceae) is a perennial pros-

trate herb native to subtropical and tropical Africa and
now is widespread in tropical and subtropical arid
coastal habitats around the world (Porter 1972). Broadly
distributed across the Gal�apagos archipelago, it is usu-
ally found in arid lowlands and coastal regions, where it
grows in discrete patches close to roads, trails, and
shorelines (Porter 1971). Tribulus cistoides plants can
flower at any time of year on the Gal�apagos Islands, but
most of its vegetative growth occurs during the wet sea-
son (from January to May), they produce fruits called
schizocarps (Fig. 1a), which contain five individual seg-
ments referred to as mericarps that typically separate
from one another as the fruit dries (Fig. 1b) (Wiggins
and Porter 1971). Each T. cistoides mericarp is a hard
fibrous structure that includes from one to seven seeds
contained within individual compartments (Fig. 1c).
Mericarps typically have four spines (two upper and two
lower sharp protuberances), but the size and position of
spines varies greatly among individual plants, and some
mericarps completely lack some or all spines (Fig. 1b).
The spiny mericarps are also a means of seed dispersal
(Porter 1972); fruits adhere easily to animals, such as the
feet of seabirds (Wiggins and Porter 1971). Ocean cur-
rents and humans are considered important vectors of
long-distance dispersal, whereby fruits travel long dis-
tances by getting attached to shoes and rubber tires
(Holm et al. 1977).
To extract the seeds, finches pick up mericarps from

the ground after they have dropped from the plant. The
finches often hold the mericarp laterally between their
mandibles, and apply pressure by closing their beak,
moving the upper and lower mandibles sideways to each
other, to crack the mericarp wall, sometimes stabilizing
the mericarp against a rock or the ground (Fig. 1d, see
Video S1). The mericarps are very durable and long lived
and this, combined with the very distinct damage left by
finch predation, makes it possible to determine which
mericarps have been depredated even months after a pre-
dation event. Specifically, finches remove the ventral sur-
face of the hard mericarp tissue protecting the seeds,
exposing the empty seed compartments from which
seeds are removed (Fig. 1c), often one compartment at a
time (Video S1) (Grant 1981). Mericarps depredated by
finches (Fig. 1e) are easily distinguished from mericarps

FIG. 2. Map showing the seven islands of the Gal�apagos
archipelago where Tribulus cistoides fruits were sampled. Black
and blue identify the islands where large-beaked ground finches
are present: the Large Ground Finch (Geospiza magnirostris) is
present on Isabela and Santa Cruz and the Large Cactus Finch
(G. conirostris) is found on Espa~nola. Orange identifies the
islands where these large-beaked finches are absent. The Med-
ium Ground Finch (G. fortis) is present in all visited islands
except in Espa~nola.

Article e01392; page 4 SOF�IA CARVAJAL-ENDARA ET AL. Ecological Monographs
Vol. 90, No. 1

consumed by insects, which make smaller circular “drill”
holes (Fig. 1f), and from mericarps from which seeds
have germinated, which are apparent as empty seed com-
partments are still partially enclosed by the mericarp
wall (Fig. 1g), without the rough damage characteristic
of seed predation by finches (Fig. 1e). Other than
finches and insects, no other common predators of
T. cistoides seeds are found on the Gal�apagos Islands.
Unopened mericarps of T. cistoides were found in the
gizzard contents of a Gal�apagos dove (Zenaida galapa-
goensis); however, T. cistoides fruits are not a typical
part of the diet of this species (Grant and Grant 1979).

Population sampling and experimental design

To explore impacts of seed predation by finches, we
sampled nearly 7,000 mericarps from 30 T. cistoides
populations across seven islands of the archipelago over
3 yr (2015–2017). Considering only ground finch species
that consume T. cistoides seeds, finch seed-predator
communities on three of the selected islands (Santa
Cruz, Isabela, and Espa~nola) include large-beaked finch
species (G. magnirostris or G. conirostris), whereas finch
communities on the other four islands (San Crist�obal,
Floreana, Baltra, and Seymour Norte) lack large-beaked
finch species (Fig. 2). The medium-beaked species,
G. fortis, is present on all sampled islands except
Espa~nola (Fig. 2). Sampling was performed between the
months of February and March, corresponding to the
end of the dry season and beginning of the wet season
(Fig. 3a), which is when the finches’ preferred food is
expected to be most scarce and their consumption of
T. cistoides seeds becomes highest. On four of the islands
(Santa Cruz, Isabela, San Crist�obal, and Floreana), we
repeated sampling annually from 2015 to 2017. During
this period, the archipelago experienced strong climatic
variation, including an El Ni~no event that occurred in
2015 (Stramma et al. 2016) and resulted in higher pre-
cipitation relative to the preceding and subsequent years
(Fig. 3b).
The number of T. cistoides populations sampled var-

ied among islands (one to eight populations) due to spa-
tial variation in the abundance of plants, with a
“population” considered to be a discrete patch of T. cis-
toides plants separated by at least 500 m from any other
patch. Information about the populations sampled each
year (island, geographic coordinates) is provided in
Appendix S1: Table S1. From each population, we col-
lected approximately 100 mericarps chosen haphazardly
across the area; we made every effort to select mericarps
“blindly” to avoid biases, so that mericarps represented a
random subset of the morphological traits present in the
population as much as possible. Most mericarps are
expected to be from the previous season, but it is possi-
ble that some mericarps were >1 yr old. A total of 6,391
mericarps were collected across all islands, populations,
and years. For each mericarp, we used digital calipers to
measure mericarp length (mm), width (mm), and the

distance between the tips of the upper spines (upper
spine size, mm) located toward the distal end of the
mericarp, and we noted the presence or absence of lower
spines and the number of seeds removed by finches
(Fig. 4a). To estimate the total number of seeds origi-
nally produced in each mericarp we opened and counted
the number of seeds in 752 mericarps, collected from five
populations on Santa Cruz island in 2015. We evaluated
the relationship between the number of seeds per meri-
carp and mericarp morphology by fitting the following
allometric equation: number of seeds = log(length) +
log(width) + log(length) 9 log(width). We then used this
model to predict the total number of seeds per mericarp
(R2 = 0.48).
To test whether there was variation in fruit morphol-

ogy among individual plants for selection to act upon,
we sampled mericarps from two T. cistoides populations
(AB and EG) on Santa Cruz island during February
2015 (see geographic information in Appendix S1:
Table S1). From each population, we sampled 15 indi-
vidual plants, from each of which we collected four com-
plete (i.e., uneaten) and mature fruits (schizocarps), with
each schizocarp having four to five mericarps. In total,

FIG. 3. Variation in (a) monthly and (b) annual precipita-
tion (mm) from 2014 to 2017 on Santa Cruz island. Precipita-
tion data were obtained from a meteorological station at the
Charles Darwin Research Station (CDRS).

February 2020 DARWIN’S FINCHES AS AGENTS OF SELECTION Article e01392; page 5

we sampled 583 mericarps for measurement of morpho-
logical traits including length, width, upper spine size,
presence/absence of lower spines, and mericarp mass (to
the nearest milligram using a digital balance GEM20;
Smart Weigh, Jintan, China).
To experimentally test whether finches impose selec-

tion on mericarp morphology, we performed a seed pre-
dation experiment during March 2016. First, we
collected 600 mature and intact mericarps from a T. cis-
toides population (EG) located on Santa Cruz island
(see geographic information in Appendix S1: Table S1).
We measured four traits from each mericarp (length,
width, upper spine size, and presence/absence of lower
spines), and gave each mericarp a unique mark with
indelible ink so mericarps could be individually identi-
fied. We also applied an experimental removal of spines
from a haphazard subset of the 400 mericarps by
clipping either one or both of the upper spines, which
allowed us to experimentally test the functional role of
spines in defense. The marked mericarps were then
exposed to natural finch predation on 40 circular plastic
trays (~15 cm in diameter). The trays were placed across
the area where the mericarps were collected, at least
30 cm apart from each other, and were monitored every
three days. The mericarps were recovered after 30 d.
Finally, to evaluate the relationship between mericarp

morphology and hardness, we used 102 mericarps col-
lected in 2017 from three populations on Isabela island
and seven populations on Santa Cruz island
(Appendix S1: Table S1). For each mericarp, we mea-
sured hardness (0–100 value on a Shore D scale; Pam-
push et al. 2011) using a handheld durometer (Asker,
Super Ex, Type D, Kyoto, Japan). As the structure of the
mericarp wall varies over its surface (Fig. 4b), we mea-
sured hardness at six locations on each mericarp (see
detailed information in Appendix S2: Fig. S1). In addi-
tion, on each mericarp, we measured six morphological
traits (length, width, depth, upper spine size, longest
spine length, and spine position; Fig. 4a).

Statistical analyses

All statistical analyses were performed using R v. 3.4.2
(R Development Core Team 2008).

Does seed predation by finches vary among populations,
islands, finch community composition, or years?—We
used logistic linear mixed-effects models with the func-
tion glmer in lme4 v. 1.1-14 package (Bates et al. 2015)
to model the proportion of seed predation per popula-
tion (proportion of mericarps with one or more seeds
removed by finches). This model was fit as follows: pre-
dation per population = year + finch community com-
position + year 9 finch community composition +
island + error. Year, finch community composition, and
their interaction were treated as fixed effects, whereas
island was included as a random effect. Finch commu-
nity composition was categorized as 0 on islands where
large-beaked finch species (G. magnirostris and
G. conirostris) were absent (Floreana, San Crist�obal,
Baltra, and Seymour Norte), and 1 on islands where
large-beaked finch species were present (Isabela, Santa
Cruz, and Espa~nola). To examine the association of pre-
cipitation with seed predation during our study, we fit a
similar model in which we replaced the fixed factor year
with the total annual precipitation (mm) registered dur-
ing the year that preceded each sampling. Precipitation
measurements, obtained from a meteorological station
placed on Santa Cruz island at the Charles Darwin
Research Station (0°44037.600 S, 90°50021.900 W), were
log10-transformed. We also fit the following model where
the response variable was the proportion of seeds
removed per mericarp, and mericarp was the unit of
replication: proportion of seeds removed = year + finch
community composition +
year 9 finch community composition + island + popu-
lation(island) + error. In this analysis, the proportion of
seeds consumed per mericarp was calculated as the ratio
between the number of seeds removed from the mericarp

FIG. 4. (a) Mericarp traits and morphological measurements. (b) Micro-computed tomography (lCT) image showing mericarp
wall variation over its surface.

Article e01392; page 6 SOF�IA CARVAJAL-ENDARA ET AL. Ecological Monographs
Vol. 90, No. 1

and the number of seeds predicted based on the traits of
the mericarp. We included year and finch community
composition as fixed effects, whereas island and popula-
tion were included as nested random effects, with the
parentheses denoting nested factors. Significance of fixed
effects was assessed using a type II Wald’s chi-squared
test, and the significance of random effects was assessed
with likelihood-ratio tests. P values were divided by two
because tests of the significance of random effects are
one-tailed given that variance > 0 (Littell et al. 1996).
Finally, to evaluate more directly the effect of the finch
community on seed predation per year (at the level of
population and mericarps), we fit the logistic mixed-
effects models separately for each year. We performed the
analyses described above including all islands and exclud-
ing data from the three islands that were sampled only in
2016 (Espa~nola, Baltra, and Seymour Norte).

Do finches impose selection on T. cistoides fruit morphol-
ogy and does selection vary among populations, islands,
years, or with finch community composition?—We first
confirmed that most mericarp traits examined (length,
upper spine size, presence/absence of lower spines, and
mass) exhibit substantial variation among individual
plants, with the exception of mericarp width
(Appendix S3: Table S1). Next, we measured phenotypic
selection (sensu Lande and Arnold 1983) on mericarps
sampled from natural populations using logistic mixed-
effects models in the R package lme4 v. 1.1-14 (Bates
et al. 2015) to examine the relationship between T. cis-
toides fitness (seed survival) and fruit morphology (Jan-
zen and Stern 1998). Estimates of T. cistoides seed
survival included two variables: (1) a binary response,
where 0 corresponded to a mericarp that had at least
one seed removed and 1 to a mericarp that had no seeds
removed, and (2) the proportion of seeds that survived
finch predation per mericarp, calculated based on the
estimated number of seeds per mericarp. Each of these
response variables was

Ecology homework help

Labour Market Integration of Refugee Health
Professionals in Germany: Challenges and

Strategies

Sidra Khan-G€okkaya* and Mike M€osko*

ABSTRACT

Refugee health professionals are a vulnerable group in a host country’s labour market as they
experience several barriers on their path to labour market integration. This study aims to iden-
tify challenges refugee health professionals and their supervisors experience at their work-
places and strategies they have developed to overcome these barriers. Semi-structured
interviews were conducted with refugee health professionals who have been living in Germany
for an average of four years and their supervisors (n = 24). The interviews were analysed
using qualitative content analysis. Nine themes were identified: (1) recognition of qualifica-
tions, (2) language competencies, (3) differing healthcare systems, (4) working culture, (5)
challenges with patients, (6) challenges with team members, (7) emotional challenges, (8) dis-
crimination and (9) exploitation. Results indicate the need to implement structural changes in
order to improve the labour market experiences of refugee health professionals.

BACKGROUND

The global healthcare workforce is facing skilled labour shortage. The World Health Organization
(WHO) estimates a global shortage of 14.5 million health professionals by 2030 (World Health Orga-
nization, 2006). The European Commission estimates a shortfall of 1 million health workers in Europe
by 2020 (European Commission, 2012), and employment agencies in Germany predict a nationwide
lack of health professionals (Bundesagentur f€ur Arbeit, 2018). In order to address this shortage, nearly
all European countries depend on the recruitment of foreign-trained health professionals (Organisation
for Economic Co-operation and Development (OECD), 2017). Another strategy that has been imple-
mented by the German government to address this shortage is the so-called “activation of domestic
potential” (Bundesregierung, 2018). With that, the German government aims to address those groups
that have difficult access to the labour market, such as refugees in order to improve their employability
and use them to fill shortages (Bundesregierung, 2018). As the number of refugees in Germany has
increased since 2015, the German government has recognized the need to address their labour market
integration (Bundesregierung, 2016). However, refugees belong to a particularly vulnerable group in
the labour market facing unemployment or underemployment (Tanay et al., 2016).

University Medical Center Hamburg-Eppendorf, Hamburg,
This paper is part of a special issue on the “Labour Market Integration of Highly Skilled Refugees in Sweden, Ger-
many and the Netherlands”

doi: 10.1111/imig.12752

© 2020 The Authors. International Migration
published by John Wiley & Sons Ltd on behalf

of International Organization for Migration
International Migration

ISSN 0020-7985

This is an open access article under the terms of the Crea
tive Commons Attribution-NonCommercial-NoDerivs
License, which permits use and distribution in any medium,
provided the original work is properly cited, the use is non-
commercial and no modifications or adaptations are made.

The barriers and difficulties that refugees face in the context of their labour market integration
are multidimensional and manifold. First, their access to the labour market in Germany is restricted
and depends on their legal status and the likelihood of getting a residency permit which in turn
depends on the country of origin (Bundesministerium f€ur Arbeit und Soziales, 2019). In Germany,
there is a ban on employment for all refugees within the first three months. After three months,
their access to the labour market is dependent on the individual residency status. As of the fourth
month, refugees need work permission from the foreign authority office in Germany and the local
employment agencies in order to work (Bundesministerium f€ur Arbeit und Soziales, 2019). Their
access to language courses depends on their legal status and the likelihood of receiving a residence
permit (Bundesministerium f€ur Arbeit und Soziales, 2019). Moreover, participating in job-related
language courses is described as challenging either due to long waiting times or course availability
(United Nations High Commissioner for Refugees-Organisation for Economic Co-operation and
Development (UNHCR-OECD), 2016). Second, refugee health professionals need to go through a
difficult and long recognition process (K€ortek, 2015; Desiderio, 2016) which is described as the
starting point for permanent downward mobility (Hawthorne, 2002). Moreover, refugees may not
be able to provide identity documents (Bucken-Knapp et al., 2019) or official documents about
their education (Bloch, 2008) due to the flight which impedes the recognition process. Third, a lack
of information about career pathways (Cohn et al., 2006), such as knowledge about job search
strategies (Willott and Stevenson, 2013) and unfamiliarity with the healthcare system of the host
country (Ong et al., 2004), are reported barriers. Fourth, due to their flight they may have had a
break in their professional career and/or experienced the loss of their professional status (Willott
and Stevenson, 2013) which is related to the loss of professional identity (Peisker and Tilbury,
2003). It may also result in deskilling (Stewart, 2003), loss of self-confidence (Willott and Steven-
son, 2013), high levels of frustration (Mozetic, 2018) and negative psychological impacts (Cohn
et al., 2006). Additionally, the lack of recognition of their previously gained experiences leads to a
feeling of being disadvantaged compared to locally trained team members (Mozetic, 2018) which
might be intensified by the experience of multiple forms of discrimination (Jirovsky et al., 2015)
and exclusion (Bloch, 2008).
Studies in Germany have also focused on the working experiences of migrant physicians and

international nurses from within the European Union as well as from non-European countries. They
report similar barriers as the above-mentioned. A study on migrant physicians (Klingler and Marck-
mann, 2016) describes difficulties in three fields. The first field refers to the organization of health-
care institutions and other institutional difficulties such as insufficient support or being assigned to
tasks below their level of expertise. Moreover, difficult career advancement opportunities and unfair
treatment of migrant physicians were mentioned as institutional difficulties. The second field relates
to experienced difficulties with own competencies such as language competencies and knowledge
about the healthcare system. The third field relates to difficulties in interpersonal relations and inter-
actions such as inadequate treatment of patients and co-workers. In this context, a study on the
workplace integration of internationally recruited nurses in Germany points out that conflicts often
arise between migrated nurses and locally trained team members. These conflicts arise because
locally trained team members either hold back or do not comprehensively share key information in
order to organize their work. Thus, the incorporation of migrated nurses into daily work routine is
impeded and the potential for conflicts in everyday work is increased (P€utz et al., 2019). These
studies illustrate that international healthcare professionals and refugee healthcare professionals
experience similar barriers at their workplaces. However, refugees were forced to flee by the cir-
cumstances of their home countries (Yarris and Casta~neda, 2015), whereas internationally recruited
health professionals may be considered as voluntary migrants. This distinction between refugees
and voluntary migrants has effects on the barriers they experience. While voluntary migrants were
most likely able to prepare for their migration, refugees had to flee under extreme conditions (Jack-
son et al., 2004). Stressors of the flight, the loss of family members, traumatic experiences and the

2 Khan-G€okkaya and M€osko

© 2020 The Authors. International Migration published by John Wiley & Sons Ltd on behalf of
International Organization for Migration

uncertainty about their residency permit (Carlsson and Sonne, 2018) may also influence their pre-
requisites to work. Rather, in comparison to other highly qualified migrants, highly qualified refu-
gees are more likely to stay in jobs they are overqualified for which mainly relates to the fact that
documentation of their education is missing (Tanay et al., 2016). Moreover, some other barriers,
such as housing, health, absence of networks or childcare, may indirectly influence employment
outcomes (OECD/UNHCR, 2018).
The European Parliament recommends qualification programmes to prepare refugees for work

and strengthen their employability (Konle-Seidl, 2016). These recommendations comprise individu-
ally tailored programmes to the specific needs of refugees. Amongst others, it is recommended to
provide (occupational specific) language courses combined with working opportunities, skills
assessment, mentoring and career advice. For highly skilled refugees, it is especially recommended
to increase availability of on the job trainings, recognize existing qualifications and offer vocational
training. However, in order to implement tailored programmes that match the host countries’ legal
and social requirements it is essential to identify and analyse the barriers refugee health profession-
als face when entering the labour market. While the legal situation of refugees and their access to
the labour market in Germany is documented through policy papers (European Commission, 2012;
Platonova and Urso, 2012; Konle-Seidl, 2016; Tanay et al., 2016; UNHCR-OECD, 2016; OECD,
2017; United Nations Department of Economic and Social Affairs Population Division, 2017;
UNHCR, 2017; Bundesministerium f€ur Arbeit und Soziales, 2019), little attention has been paid to
the challenges they face in everyday working life and their own perspective and strategies. Thus, in
this study, refugee health professionals and their supervisors across Germany were interviewed
about the challenges they faced at their workplaces as workplaces are a “key site of sociocultural
incorporation” (van Riemsdijk et al., 2016). Moreover, this paper advances this field by giving rec-
ommendations for healthcare providers and organizations based on the experiences of refugee
health professionals and their supervisors in order to implement changes on structural levels and
improve the working environment. These changes refer to establishing supporting structures as well
as measures of diversity management and anti-discrimination.

METHODS

The reporting of methods is in accordance with the consolidated criteria for reporting qualitative
research (COREQ) guidelines (Tong et al., 2007).

Researcher characteristics

Qualitative research depends on the personal qualities of the researcher and the theoretical sensitiv-
ity that the researcher brings to a research (Strauss and Corbin, 1990). Thus, it is important to
reflect on the researcher’s characteristics and its impact on the interview situation. All interviews
were conducted in person by the first author, female, person of color, PhD student of the Depart-
ment of Medical Psychology at the University Medical Center Hamburg-Eppendorf. The first author
is trained in cultural studies, international migration and intercultural studies and has several years
of training in conducting qualitative studies. For transparency reasons, participants were informed
that the study was part of a PhD study.

Recruitment

Major educational organizations and projects for the labour market integration of refugee health
professionals (RHPs) across Germany were identified through internet research. The organizations

Labour market: Refugee health professionals 3

© 2020 The Authors. International Migration published by John Wiley & Sons Ltd on behalf of
International Organization for Migration

(n = 15) were contacted and informed about the study. Their consent was obtained. Three of the
major organizations agreed to participate in the study. Participants were divided into RHPs and
supervisors as the refugees’ self-perception about their experiences might differ from the supervi-
sors’ perception. Since the group of RHPs comprises different professions, we decided on subdivid-
ing the stratum of RHPs into two groups: physicians and other health professions. In terms of data
saturation, it is recommended to conduct six to twelve interviews per stratum (Guest et al., 2006).
Thus, 24 interviews were conducted in three major cities in Germany (Hamburg, Hannover and
Frankfurt). All three organizations provided persons that matched the inclusion criteria with infor-
mation on this study and either arranged appointments or provided participants with the research-
ers’ contact information. Inclusion criteria for participants referred to the following aspects:
Target group1.:

• Refugees (regardless of their residency status and form of protection) who have obtained a
qualification in a health profession in their home country or a country other than Germany;

• Supervisors that were responsible for the integration of refugee health professionals, their
supervision or support

Language competencies:

• Required minimum level of German language competencies on the European Reference
level of A2-B12.

Working experiences in Germany:

• RHPs must have had contact with the German healthcare system with a minimum duration
of one month – be it a steady job, an internship or job shadowing

• Supervisors had to work in jobs with close contact with refugee health professionals regard-
less of their hierarchical status. They must have had supervised RHPs at their ward or as an
external supervisor

Context:

• RHPs and supervisors in all healthcare institutions comprising primary, secondary and ter-
tiary care were included

Providers were informed about the inclusion criteria and selected fitting participants. All inter-
views were conducted in German. In one case the inclusion criteria did not match as the participant
was a student of the educational organization without sufficient working experience. Participants
that matched the inclusion criteria were approached via phone followed by an invitation to live
interviews. Participants received two consent forms: one for their participation in the study and one
for their consent to audio recording. The consent form and the study information were orally
explained prior to the interview.

Data collection

The interview guide was developed based on literature focused on the daily work experiences of
refugee health professionals using the SPSS3. approach by Helfferich (2009). The interview guide
was sent to experts in the field of migration research to be critically reviewed. Based on this
review, the authors discussed and adapted the interview guide. Finally, the interview guide was

4 Khan-G€okkaya and M€osko

© 2020 The Authors. International Migration published by John Wiley & Sons Ltd on behalf of
International Organization for Migration

piloted with two migrant nurses that resulted in the specification of some questions. The interview
guide was structured into six main themes:

(1) General experiences while working in a hospital
(2) Experiences with team members and supervisors
(3) Experiences with patients
(4) Experiences with the working culture
(5) Experiences with the healthcare system
(6) Suggestions for improvement

In each interview, the same semi-structured guide was used. After the interview was finished and
the audio recorder was switched off, demographic data were retrieved. The interviews lasted from
00:18 to 00:55 min with a median range of 00:40. Some (n = 4) interviews were transcribed by a
student researcher but the majority (n = 20) of the interviews were transcribed verbatim by a pro-
fessional agency. All transcripts were proofread by the first author.

Data analysis

The interviews were analysed using content analysis (Mayring, 2015). The first author coded all
interviews by means of a computer-based coding programme (MAXQDA, version 10). Deductive
codes were derived from the interview guide but as an explorative approach was preferred more
inductive categories were derived from the material. Code memos were created for all codes includ-
ing a description of the code and typical quotes. For the purpose of quality assurance, a research
assistant coded a random selection of one-quarter of all interviews. Differences in coding were dis-
cussed until a consensus was reached that led to the creation of some new sub codes and a revision
of the category system. Results were presented and discussed with other experts in an interdisci-
plinary research colloquium to ensure comprehensibility and intersubjective reproducibility. The
revised system was then crosschecked by the main author in a second round of coding taking all
interviews into consideration.

Description of sample

Sixteen RHPs and 8 supervisors participated in the study. Two interviews were conducted via tele-
phone due to reduced mobility of the participants. The sample is described in Table 1.

RESULTS

In general, nine major challenges could be identified which either RHPs or supervisors described as
relevant: (1) the recognition of professional qualifications, (2) language competencies, (3) different
healthcare systems, (4) working culture, (5) challenges with patients, (6) challenges with team
members, (7) emotional challenges, (8) discrimination, (9) exploitation. Table 2 provides an over-
view of the identified fields and their specifications.

Recognition4. of professional qualifications

Both supervisors and RHPs pointed out the challenges they faced with regard to the recognition
process of their professional qualifications. Supervisors especially emphasized the difficulties
regarding the recognition process. They criticized the long waiting times for the recognition process

Labour market: Refugee health professionals 5

© 2020 The Authors. International Migration published by John Wiley & Sons Ltd on behalf of
International Organization for Migration

TABLE 1

SAMPLE DESCRIPTION (REFUGEE HEALTH PROFESSIONALS AND SUPERVISORS)

Refugee health professionals (RHPs)

Participant Sex Age
Country
of birth Occupation

Working
experience
in Germany

Working experience
in birth country

A1 m 26 Iran Nurse 1 month 6 years as a nurse
A2 m 23 Iraq Physician 3 months 2 years as a general

physician and
3 years as a
surgeon

A3 m 28 Syria Physiotherapist 2 years 4 years as a physio-
therapist

A4 m 28 Syria Physician
(specialized
in Anaesthesia)

8 months 2,5 years as a medi-
cal assistant in sur-
gery

A6 m 33 Syria Physician 5 months 5 years as a physi-
cian

A7 m 38 Afghanistan Physician one year 1 year as a medical
assistant, 3 years in
public health depart-
ment

A8 w 29 Syria Physician 1,5 years 1 year as a physician
A9 m 30 Afghanistan Physician 3 months 1 year as a medical

assistant
A10 m 44 Syria Physician

(specialized
in anaesthesia)

3, 5 years 4 years as a medical
assistant, two years
as a senior physi-
cian, 9 years as a
chief physician

A11 w 52 Afghanistan Physician (specialized
in gynaecology)

6 months 23 years as a gynae-
cologist (also as a
chief gynaecologist)

A12 m 39 Yemen Physician 4 months 10 years as a physi-
cian

A13 m 45 Afghanistan Physician 2 years 2,5 years as a physi-
cian

A14 m 51 Syria Dentist 3 months 21 years as a dentist
A15 m 39 Afghanistan Physician

(specialized
in
otorhinolaryngology)

6 weeks 3 years as an ear-
nose-throat (ENT)
specialist

A16 w 33 Senegal Midwife and Nurse 3 months eleven months as a
midwife, 15 years
as a nurse

A17 w 36 Azerbaijan Nurse 3 months 2 years as a nurse

Supervisors

Participant Sex Age
Country
of birth Education Current job Experience

B1 m 34 Germany Physiotherapist Part time
physiothera-
pist, part
time supervi-
sor for RHPs
and migrants

5 years as a
physiotherapist,
1 year as a
supervisor

6 Khan-G€okkaya and M€osko

© 2020 The Authors. International Migration published by John Wiley & Sons Ltd on behalf of
International Organization for Migration

(B2-B4, B8) and noted that the bureaucratic procedures for recognition in Germany were not clear
and prolonged the recognition process (B4, B7, B8). RHPs also criticized the length and complex-
ity of the recognition process (A4, A7, A8, A11, A12). Two supervisors (B4, B8) criticized that
former positions such as leadership titles of RHPs were not recognized in Germany. They also criti-
cized that RHP’s specialist medical training or their internships in Germany were not considered
for recognition as working experiences. Furthermore, in one case there was confusion about the
legal foundations of the responsible authorities’ bodies:

One colleague receives a temporary work permit [from the recognition authority] but federal medi-
cal council law and health insurance company’s law contradict each other which inhibits him from
working as a physician unless he has a full licence to practise medicine. But he can only acquire
the full license after taking an exam. Taking that exam is on hold because the [recognition] authori-
ties are understaffed. (B3)5.

RHPs (A1, A11, A13, A15) also indicated their anxiety regarding the licensing examinations as
they feared the examination would be too difficult.

Language competencies

Supervisors and RHPs considered acquiring German language proficiency and German technical
and medical language as a major topic. Supervisors emphasized especially the need to learn the

TABLE 1

(CONTINUED)

Supervisors

Participant Sex Age
Country
of birth Education Current job Experience

B2 m 64 Germany Librarian and editor Commissioner
for refugees
at the medi-
cal associa-
tion in lower
Saxony

2,5 years as
a commissioner

B3 m 64 Germany Physician Physician and
Supervisor
for RHPs

34 years as a
physician,
one year as
a supervisor

B4 m 73 Germany Physician Supervisor for
RHPs/
retired

47 years as
a physician,
2 years as
a supervisor

B5 w 50 Germany Nurse and
professional
advisor

Professional
advisor

15 years as
an advisor

B6 w 54 Germany Nurse Nurse and
supervisor

37 years as nurse
and supervisor

B7 w 38 Germany Nurse and
Psychologist

Psychologist seven years as a
psychologist

B8 m 52 Germany Physician,
Medical
journalist

Managing
director of
refugee and
migrant edu-
cation centre

2 years as
managing director

Labour market: Refugee health professionals 7

© 2020 The Authors. International Migration published by John Wiley & Sons Ltd on behalf of
International Organization for Migration

technical language. They (B1, B5, B8) described that RHPs were afraid to admit there were
parts they did not understand and continued to say “yes” in order to maintain the conversation
flow. This has often led to misunderstandings. RHPs described difficulties in speaking everyday
language and technical language. They (A1, A2, A4) found it difficult to understand handover
reports from physicians or keep up in meetings and written documentation. They (A1, A3, A7,
A12) were also afraid of not being able to understand the language which influenced their
behaviour:

I am afraid if [a patient] someone rings the bell. [. . .] Because my language is not [well] enough
and I am afraid of understanding something wrong or not being able to answer [the patient’s ques-
tion]. That’s why I remain seated and others [colleagues] keep asking me “why are you always sit-
ting?” (A1)

One of them also expressed their fear of being deemed to be incompetent due to their language
competencies: “They think I have learned it wrong in Iran. But in fact I couldn’t understand what
they were asking me” (A1). Moreover, RHPs (A1, A3, A12) felt their language competencies held
them back as they were reluctant to share their opinion: “If we discuss a patient’s case and some-
one has a contradicting opinion on that patient’s case I am afraid to discuss our opinions as I fear
they will say ‘I can’t express myself’” (A3).

Different healthcare systems

Supervisors and RHPs described challenges that derived from differing standards in the home
and host countries’ healthcare system. All supervisors described that RHPs would have to
familiarize themselves and catch up with the healthcare system in Germany. Eleven RHPs (A1,
A2, A8, A9, A11-17) emphasized the difference in the medical equipment, the names of

TABLE 2

CHALLENGES EXPERIENCED BY REFUGEE HEALTH PROFESSIONALS

Recognition of professional
qualifications

Difficulties in the context of the recognition process
Non-recognition of former experiences
Examinations for recognition

Language competencies Knowledge of everyday language
Knowledge of technical language
Feelings and consequences of lacking language competencies

Different healthcare systems Unfamiliarity with and differences between the healthcare systems
Unfamiliarity with bureaucratic procedures within the healthcare system
Consequences of differences and unfamiliarity

Working culture Adaption to formal aspects of work
Adaption to cultural aspects of work
Intercultural and interpersonal differences

Difficulties with patients Language difficulties
Difficulties in delivering bad news
Distrust from patients

Difficulties with team members Difficulties during internships
Interpersonal and interprofessional difficulties

General Emotional Difficulties Discouragement
Negative feelings of RHPs in the context of labour market integration.

Discrimination Discrimination by patients
Discrimination by team members

Exploitation Financial exploitation of RHPs in the context of work.
Professional exploitation of RHPs in the context of work

8 Khan-G€okkaya and M€osko

© 2020 The Authors. International Migration published by John Wiley & Sons Ltd on behalf of
International Organization for Migration

medication and working habits and the feeling to need to familiarize themselves with these dif-
ferences. In this context, supervisors referred especially to the differing professional role of
nurses in Germany:

They mostly come from countries where nursing care is much higher regarded as a profession, it
gets a very high recognition. And here they have to understand this in such a way that the job
description or the professional role is not so highly regarded. (B6)

RHPs (A1-4, A7, A9-A10, A12-A15) criticized bureaucratic procedures in hospitals in Germany
as it was challenging to keep up with all the procedures of them. They (A4, A17, A16) did not
know about occupational law and were also insecure about their rights and obligations in their pro-
fessional duties. During internships or work, they (A2, A3, A8, A9, A10, A13-17) felt held back
as some of them were not allowed to work either because of their status as interns or because they
did not have their license yet:

Yes, the situation was unpleasant that I could not do anything alone. And if I wanted to do some-
thing, someone had to stay with me, a senior physician or chief physician. That was a bit uncom-
fortable for me because I already graduated from university and I also worked as an assistant
physician in my home country for a year. But I didn’t have a solution. I had to come to Germany
and here, the rule is if someone doesn’t have a license he has to cooperate with a chief physician
or with a senior physician. (A9)

Working culture

Supervisors described two facets of working culture that they found important in the context of
their experiences with RHPs: formal and cultural aspects of work and RHPs adaption to these
aspects. They emphasized formal aspects such as being punctual, submitting holiday applications
correctly, calling in sick, being polite and committed to work. Some of the supervisors (B1, B2,
B3, B6, B7, B8) criticized some of these aspects in the context of RHPs as deficits. With regard to
cultural aspects, supervisors mentioned that RHPs had different values that sometimes inhibited
their integration such as examining other-sex patients (B1, B6-B8), taking off headscarves for sev-
eral reasons (B1, B8), dealing with homosexuality (B1) or accepting female superiors (B1-B4, B7).
These values were often attributed to cultural differences although they may result from context-
specific causes, as one supervisor who had a mediatory role describes:

The [female] colleague shouted at him [the RHP] in front of the patients [. . .] Luckily, we heard
about it and picked it up [. . .] she said he was a macho and suggested women were worth less than
men. The trigger was a basic nursing situation which is difficult for our participants as they haven’t
learned it in their home countries. And she gave instructions that were too brief, for example
“wash” and he didn’t know what to do with that instruction. […] And that caused the escalation
spiral. (B7)

RHPs were also asked about their experiences in the context of working culture. They pointed
out that formal aspects of work, such as being punctual and committed, were universal. However,
they (A1, A3, A8, A13, A16) experienced differences on the intercultural and interpersonal level,
such as the value of families and treating other sex patients, and developed several strategies to get
adapted to it:

I was born in an Islamic country. I am not Muslim but born there and I grew up there. And some-
times I think, maybe the [female] patient is embarrassed. Or I ask may I look, may I do. Because

Labour market: Refugee health professionals 9

© 2020 The Authors. International Migration published by John Wiley & Sons Ltd on behalf of
International Organization for Migration

maybe the other colleague does not say anything at all but for me it is a bit ok – maybe she has
problem with men and so on, so I ask. (A1)

Challenges with patients

RHPs experienced difficulties with patients especially if patients did not speak clearly due to their
illness, their age or their way of speaking:

The problem was that I couldn’t understand. For example, the patient said “bring me this and that”.
And the problem was that they spoke very unclearly and for German people it [is] also difficult to
understand and for me of course [it is] especially difficult. (A1)

Some described that talking to patients’ relatives was a new challenging experience especially if
they were furious (A8) or if they had to pass bad news to them (A7). Another challenge was asso-
ciated with distrust from patients: “Maybe they don’t trust the foreign physicians as much but that’s
general [generally the case]. All patients are like that, almost all of them. […] You can tell, they’re
a little scared or something” (A4).

Challenges with team members

Almost all supervisors (B2, B4-8) mentioned the important role of internships in the context of
team integration. However, one supervisor reported that finding internship placements became more
and more difficult due to lower capacities of the hospitals (B4). During some internships, partici-
pants were not given appropriate

Ecology homework help

Consider the public health issue you selected for your course project using the framework of Chapter 8 in Public Health Ethics. As you think about your issue from a global perspective, what aspects of it have the most influence beyond your community and have implications for the worldwide community? Think in terms of lifestyle, population dynamics, disease, climate change, and disaster response.

· With a focus on the entire planet, why does your topic matter?

· If you were to select a global public health issue to focus on, rather than a community public health issue, which one would you be most interested in and why?

· What do you think is the most pressing global public health issue we will be facing 50 years from today, and why?

350-450 words excluding references, APA format and a minimum of 3 references

Ecology homework help

O R I G I N A L P A P E R

Competitive exclusion of Cyanobacterial species
in the Great Salt Lake

Hillary C. Roney Æ Gary M. Booth Æ
Paul Alan Cox

Received: 25 July 2008 / Accepted: 16 December 2008 / Published online: 8 January 2009

� Springer 2009

Abstract The Great Salt Lake is separated into different

salinity regimes by rail and vehicular causeways. Cyano-

bacterial distributions map salinity, with Aphanothece

halophytica proliferating in the highly saline northern arm

(27% saline), while Nodularia spumigena occurs in the less

saline south (6–10%). We sought to test if cyanobacterial

species abundant in the north are competitively excluded

from the south, and if southern species are excluded by the

high salinity of the north. Autoclaved samples from the

north and south sides of each causeway were inoculated

with water from each area. Aphanothece, Oscillatoria,

Phormidium, and Nodularia were identified in the culture

flasks using comparative differential interference contrast,

fluorescence, and scanning electron microscopy. Aphanot-

hece halophytica occurred in all inocula, but is suppressed

in the presence of Nodularia spumigena. N. spumigena was

found only in inocula from the less saline waters in the

south, and apparently cannot survive the extremely

hypersaline waters of the northern arm. These data suggest

that both biotic and abiotic factors influence cyanobacterial

distributions in the Great Salt Lake.

Keywords Competitive exclusion � Halophilic bacteria �
Aphanothece � Nodularia � Oscillatoria � Phormidium �
Gause’s principle

Introduction

Cyanobacteria are well-adapted for living in harsh condi-

tions, including photosynthetic areas beneath Antarctic ice,

hot springs and geysers in Yellowstone, and hypersaline

lakes, including the Great Salt Lake (Dyer 2003). Two

common cyanobacteria species that have been identified

from the Great Salt Lake are Aphanothece halophytica and

Nodularia spumigena (Brock 1976; Felix 1978; Felix and

Rushforth 1980), with N. spumigena episodically blooming

in Farmington Bay (Marcarelli et al. 2006). Since nitrogen

is the limiting nutrient in the Great Salt Lake (Oren 2002) it

is interesting to note that both A. halophytica and

N. spumigena can fix nitrogen.

The Great Salt Lake is a hypersaline remnant of the

Pleistocene Lake Bonneville (Oren 2002) which was

557 km long and 233 km wide with an area of 51,800 km
2

in what is now Utah, Idaho, and Nevada (Utah Geological

Survey 1990). As Lake Bonneville retreated, the lake lost

all outlets, so salinity increased.

After completion in the mid 19th century of the

transcontinental railway near Promontory Point, Utah,

trains had to traverse many additional rail kilometers

around the northern end of the Great Salt Lake. To reduce

this distance, the Union Pacific Railroad constructed a

19 km rail causeway across the Great Salt Lake in 1959

(Fig. 1), replacing an earlier trestle built in 1902 by the

Southern Pacific Railroad. Unlike the former wooden

trestle, which did not impact water flow, the 1959

causeway built with rock fill, hydrologically divided the

Communicated by L. Huang.

H. C. Roney � P. A. Cox (&)
Institute for Ethnomedicine, Box 3464,

Jackson Hole, WY 83001, USA

e-mail: paul@ethnomedicine.org

G. M. Booth

Department of Plant and Wildlife Science,

Brigham Young University, Provo, UT 84602, USA

123

Extremophiles (2009) 13:355–361

DOI 10.1007/s00792-008-0223-1

waters of the Great Salt Lake into two portions, a northern

arm with negligible freshwater inputs, and a southern arm

with more than 90% of the freshwater flow (Butts 1980;

Oren 2000; Stephens and Gillespie 1976; Sturm 1980).

Three major rivers–the Bear, Weber, and Jordan–all flow

into the Great Salt Lake south of the railway causeway

(Gwynn 2002). The overall salinity of the Great Salt

Lake, which to that point had been linked solely to

changing water levels, quickly adjusted with the northern

arm becoming even more hypersaline, moving from an

average of 15% in the 1870s to as high as 28% salinity in

the 1960s (Sturm 1980). In 1970, the northern arm held

approximately 330–350 g salts per liter, while the south

arm held 120–130 g salts per liter (Oren 2002). While the

major cation in the water is Na, Mg, K, Ca, in decreasing

order of abundance are important as is the anion SO4
(Sturm 1980).

These habitat changes were later partially replicated

with construction of a second barrier to lake water flow.

A causeway for vehicular use has been periodically

constructed from Syracuse, Utah to Antelope Island, and

was most recently rebuilt in 1992 (Gwynn 2002). These

vehicular and railway causeways resulted in the parti-

tioning of the Great Salt Lake into three different salinity

regimes: the northern arm, with an average of 27%

salinity, the middle arm with average salinity of 10–16%,

and the southern arm, with average salinity of 6% or less

(Utah Geological Survey 1990). These three different

salinity regimes, any one of which would be considered

hypersaline, allowed species to sort according to eco-

logical tolerances. The results are striking: each large

area of the lake has different colored water, resulting in

part from different concentrations of Artemia fransiscana

brine shrimp cysts and microscopic green algae such as

Dunaliella salina and D. viridis as well as species of the

Archean genus Halobacterium (Post 1981), but also

perhaps due to different concentrations and species

compositions of cyanobacteria.

A study was designed to determine whether cyanobac-

terial distributions in the Great Salt are influenced by

abiotic factors, biotic factors, or both. To explore this

question, experiments were designed to examine two

hypotheses; Hypothesis 1: Cyanobacterial species abundant

north of the railway causeway are competitively excluded

from the south by other species, and Hypothesis 2:

Cyanobacterial species that thrive and bloom south of the

Antelope causeway cannot grow in high salinity waters

from the north.

Materials and methods

Experimental cultures

A total of 28 water samples from both the north and south

sides of Antelope Island causeway and the north and south

sides of the railway causeway (seven from each site) were

collected in December 2007. Water temperatures and GPS

coordinates were recorded at each site. To avoid pseu-

doreplication, six water samples from both sides of the

vehicular and railway causeways, approximately four liters

in volume, were used as inocula. The seventh jar from each

side was approximately eight liters in volume and used for

media after filtering and autoclaving. Approximately 30 ml

of the filtered media water was placed in autoclavable,

sterile 50 ml nalgene plastic flasks. In total there were 96

flasks inoculated with 10 ml of unsterilized water from

either the north or the south of the railway and the vehic-

ular causeways totaling six replicates of inoculum water for

each medium type. No nutrients or other growth media

were added to the water (Dyer 2003). In addition, 21

control flasks were prepared using autoclaved media and

autoclaved inocula. A random number table was used to

decide from which sample jars to draw the inoculum. In

addition, one control flask was prepared which consisted of

autoclaved distilled water inoculated into autoclaved dis-

tilled water medium. After all 129 flasks were inoculated;

they were placed in a heated green house with constant

8.5 h/day illumination. Each flask was gently shaken by

hand periodically for aeration. These liquid cultures were

incubated for 7 weeks.

Cyanobacterial identification

For aquatic cyanobacteria, identification by light micros-

copy (phase and/or interference contrast), and scanning

electron microscopy (SEM) are preferred (Cronberg and

Fig. 1 Great Salt Lake Rail Causeway with hypersaline water in the
north (left) and less saline water on the south (right)

356 Extremophiles (2009) 13:355–361

123

Annadotter 2006). Identification and abundance counts of

cyanobacteria from the culture flasks were performed using

differential interference contrast (DIC) and epi-fluores-

cence imaging, with SEM for verification.

Data analysis

To ensure arbitrariness of cyanobacterial counts, micro-

transects of water cultured from each flask, based on two

microscope slides were conducted. Each microtransect

was replicated twice for each slide, with data entered on a

six-cell mechanical lab counter. When mass colonies of

cyanobacteria where encountered precluding individual

counts, the colony were assessed as ‘‘large’’ or ‘‘very

large’’, with medians and nonparametric analyses used to

analyze qualitative data. In each of the 16 possible

combinations of 4 types of media (railway north, railway

south, Antelope north, and Antelope south) and 4 types of

inocula (railway north, railway south, Antelope north, and

Antelope south), the median counts of the 4 major

cyanobacterial species (A. halophytica, Oscillatoria sp.,

Phormidium tenue, and N. spumigena) were ranked.

A two-way Analysis of Variance (ANOVA) was cal-

culated for abundance of A. halophytica in the culture

flasks with ‘‘large’’ colonies scored as 500 and ‘‘very

large’’ colonies scored as 1,000 for this purpose. To reduce

impact about outliers and ensure consistency of distribution

across the observed range, all data were transformed with a

square root transformation prior to analysis as is standard

for count data. F statistics for the transformed data were

calculated to test three different pairs of hypotheses, with

the null hypothesis to be rejected at the P \ 0.05 level:

Hypothesis pair #1

H0: no variation in cyanobacterial counts exists due to

differences in media.

H1: variation in cyanobacterial counts exists due to

differences in media.

Hypothesis pair #2

H0: no variation in cyanobacterial counts exists due to

differences in inocula.

H1: variation in cyanobacterial counts exists due to

differences in inocula.

Hypothesis pair #3

H0: no variation in cyanobacterial counts exists due to

interactions.

H1: variation in cyanobacterial counts exists due to

media and inocula interactions.

For cyanobacterial taxa which proved to be of rare

occurrence in the culture flasks, exact logistic tests, rather

than an ANOVA, were calculated.

Results

Experimental cultures

When sampling for the experimental cultures of the Great

Salt Lake, profoundly different colors on either side of the

railway causeway were observed from the air (Fig. 1).

These differences were also apparent in water samples

taken from deep water on either side of the causeway rather

than evaporative ponds (Fig. 2). Salinity from the sample

sites was previously measured by hydrometers—south side

of Antelope causeway 20.6 ppt, north side of Antelope

causeway 75 ppt, south side of railway causeway 155 ppt,

and north side of railway causeway 195 ppt (Roney 2007).

Salinity values of the flasks were altered slightly by addi-

tion of inocula, except when the same inoculum was added

to the same medium.

At the time of sampling in December 2007, ambient air

temperature was -2.2�C on the railway causeway
(4181301600N1128320303400W) and -2.8�C on the vehicular
causeway (418404400N11281205700W), with water tempera-
tures north of the railway causeway at 2.0�C, south of the
railway causeway at 4.3�C. Water temperature north of the
vehicular causeway was 3.3�C, while the water tempera-
ture south of the causeway was 2.5�C. Analysis by light
microscopy showed no growth in any of the 21 control

flasks, which were found to be sterile.

Cyanobacterial identification

Four genera of cyanobacteria, Aphanothece, Oscillatoria,

Phormidium and Nodularia, were identified (Figs. 3, 4, 5,

6), with identifications confirmed by Dr. James Metcalf

Fig. 2 Water samples taken on north (left) and south (right) side of
railway causeway. Color differences primarily due to Dunaliella
distributions although the cyanobacterium Aphanothece flourishes in
the hypersaline waters in the north

Extremophiles (2009) 13:355–361 357

123

(University of Dundee, Scotland). In addition, a fifth

cyanobacterial genus, Spirulina, was observed, but was not

found in any transect through any of the microscope slides.

Because of its trichomes and its affinity for saline waters,

this species is referable to Spirulina labyrinthiformis

(Fig. 7), although Nübel et al. (2000) have placed a similar

salt-tolerant species into the new genus, Halospirulina.

Comparisons between differential interference contrast

microscopy, fluorescence microscopy, and scanning elec-

tron microscopy allowed different observations of

cyanobacterial morphology and size to be compared for

taxonomic identification.

Data analysis

Comparative medians of the four cyanobacterial genera for

each inocula type in the four media are shown in Fig. 8,

which demonstrates that A. halophytica appears throughout

all the four types of media and inocula, but that Nodularia

spumigena is abundant only in inocula from Antelope south

waters. Since there are 24 different permutations of the

ordered ranks of Aphanothece (A), Oscillatoria (O), Nod-

ularia (N), and Phormidium (P) plus an additional 16

permutations of three and two-way ties, as well as one

Fig. 3 Aphanothece halophytica: differential interference contrast
(left); fluorescence (middle); scanning electron microscopy (right)

Fig. 4 Oscillatoria sp.: differential interference contrast (left); fluo-
rescence (middle); scanning electron microscopy (right)

Fig. 5 Phormidium tenue: differential interference contrast (left);
fluorescence (middle); scanning electron microscopy (right)

Fig. 6 Nodularia spumigena: differential interference contrast (left);
fluorescence (middle); scanning electron microscopy (right)

Fig. 7 Spirulina cf. labyrinthiformis: differential interference con-
trast (left); fluorescence (middle); scanning electron microscopy
(right)

358 Extremophiles (2009) 13:355–361

123

possible case of a four-way tie in rank, there are 41 dif-

ferent possible rankings of the four cyanobacterial species.

These different rankings can perhaps most easily be por-

trayed as different colors (Fig. 8). A. halophytica was

dominant in all cultures flasks, except those in which N.

spumigena and Oscillatoria sp. occurred.

A two-way ANOVA for the distributions of A. halo-

phytica was performed as indicated in Table 1. The F

statistics for the ANOVA allows each of the null hypoth-

eses in the three pairs of hypotheses to be rejected at the

P \ 0.05 level. Therefore, it can be concluded that media
and inocula, as well as the interaction between media and

inocula significantly affected the growth of A. halophytica

in the culture flasks. Exact tests were calculated for

N. spumigena and Oscillatoria spp. using the exact option

of pro logistic in SAS. In these analyses, counts were

ignored, and instead, presence/absence data were used. For

Oscillatoria, the P value for the exact test of the medium

was 0.0993; thus the effect due to media differences was

not significant. However, the P value for the exact test of

inoculum was 0.0016; hence there was a significant inoc-

ulum effect in distribution of Oscillatoria. The exact test

for the inocula/media interactions was not significant with

a P value of 0.1963. Leaving interaction out of the model,

the additive model (with additive effects of inoculum and

medium) showed the odds of a positive response for

inocula from rail south, rail north, or Antelope north was

just 6.3% relative to inoculum from Antelope south (95%

confidence interval: 0.6–64.2%). Thus, Antelope south

inocula had a significant positive effect on the presence of

Oscillatoria in the culture flasks.

A similar analysis was conducted for the presence or

absence of N. spumigena in the culture flasks. The exact

test for the inocula/media interaction had a P value of

0.0032; hence the interaction was significant. The odds of a

positive effect were extremely high for the combination of

Antelope south inoculum with Antelope south medium. For

all other combinations, the odds of a positive response were

extremely low. For future studies of algal-cyanobacterial

interactions, counts were also made of the green alga

Dunaliella salina and D. virids in the flasks; an ANOVA of

square root transformed data count for Dunaliella showed

significant differences in distributions similar to Aphanothece

distributions; these data will be reported elsewhere.

Discussion

Both the ranking of median abundances in the culture

flasks and the results of the two-way ANOVA support the

overall hypothesis: cyanobacterial species abundant north

of the railway causeway (e.g. A. halophytica) are com-

petitively excluded from the south by other species, in this

case N. spumigena and Oscillatoria spp. It appears that the

cyanobacterium A. halophytica can grow in less saline

waters as well as the extreme saline waters north of the

railway causeway—since it is found in all inocula—but its

growth appears to be suppressed in the south by the pres-

ence of N. spumigena, which periodically blooms in the

Great Salt Lake.

In previous years, we have noted large N. spumigena

blooms in the low salinity regime of Farmington Bay, as well

as in water samples collected south of the railway causeway

(Roney 2007), particularly when winds have concentrated

blooms near the causeway. Rushforth and Felix (1982)

recorded N. spumigena as rare in the south arm; perhaps they

took their samples at a dormant season, as N. spumigena

blooms episodically. The absence of N. spumigena in the

southern arm of the Great Sale Lake may have influenced

the ability of A. halophytica to migrate and prosper in the

fresher water environment of the south arm instead of

thriving in the hyper-saline north arm. In all of our samples

south of the Antelope causeway (Farmington) since 2004,

N. spumigena was present in the water column.

The second overall hypothesis—that cyanobacterial

species that thrive and bloom south of the Antelope

Fig. 8 Rankings of cyanobacterial dominance by medians. Media/
Inocula in the upper left corner of the chart are extremely hypersaline,
while those in the lower right corner are far less saline

Table 1 ANOVA of Aphanothece halophytica distributions in
autoclaved media from the Great Salt Lake

Source Variation Degrees

free

Mean

square

F
Statistic

Significance

Media 64.6 3.0 21.5 11.4 P \ 0.01
Inocula 26.1 3.0 8.7 4.6 P \ 0.01
Interaction 105.6 9.0 11.7 6.2 P \ 0.01
Subtotal 196.3 15.0

Error 151.1 80.0 1.9

Total 347.4 95.0

Extremophiles (2009) 13:355–361 359

123

causeway cannot grow in the high salinity of the north—is

also supported by these experimental data. N. spumigena

was found only in inocula from the less saline waters south of

the Antelope Island causeway, and apparently cannot sur-

vive the high saline waters north of the railway causeway.

Experimental support for these two general hypotheses

helps shed light on our original question: are cyanobacte-

rial distributions in the Great Salt Lake influenced by

abiotic factors, biotic factors, or both? From these experi-

ments, it appears that both abiotic (salinity) and biotic

(interspecies competition) factors seem to affect distribu-

tions of cyanobacterial species. N. spumigena distributions

seem to be primarily influenced by salinity, since it can

only grow in fresher waters. By contrast, A. halophytica

distributions seem to be primarily influenced by competi-

tion from N. spumigena and Oscillatoria sp. There are, of

course, other geochemical processes which we did not

measure but which may affect distributions. We are also

interested in the relationship between the green alga

Dunaliella and cyanobacteria. Our initial analysis suggests

a commensalism with Dunaliella benefitting from the

presence of nitrogen fixing Apanothece: in our microscopic

analysis we often observed Dunaliella cells clustered

around mass colonies of Apanothece. It would be inter-

esting if nitrogen fixed by Apanothece in hypersaline

environments contributed to exceptional salt tolerance of

Dunaliella (Zamir et al. 2004).

These experimental results are consistent with Gause’s

principle, which predicts that no two species can

indefinitely occupy the same niche (Gause 1969; Hardin

1960), since there is a clear niche partitioning between

A. halophytica and N. spumigena in the Great Salt Lake.

These two species cannot occupy the same hypersaline

habitat north of the railway causeway, since N. spumigena

cannot tolerate hypersaline conditions, and A. halophytica

is suppressed in the presence of N. spumigena in the less

saline southern waters.

However, this leaves unanswered the question of why

A. halophytica is not totally excluded from the south, since it

occurs in all samples of inocula, regardless of salinity.

Perhaps A. halophytica is periodically excluded from

southern waters by N. spumigena blooms, but during

intervals between blooms, the extremely small A. halophytica

persists, albeit at lower levels. Thus, Gause’s Principle

should perhaps include a clarification: two species cannot

indefinitely occupy the same niche, except when that niche

is temporally partitioned, as occurs with episodic blooms of

Nodularia. This structuring of the cyanobacterial regimes by

salinity (Williams 1998) is consistent with the intermediate

salinity hypothesis of David Herbst (1999): ‘‘Abundance of

salt-tolerant organisms is limited by physiological stress at

high salinities, and by ecological factors, such as predation

and competition, in more diverse communities at low

salinities’’. Since N. spumigena distributions cannot survive

the high salinity stress of waters from the north arm of the

Great Salt Lake, and Aphanothece halophytica is competi-

tively excluded by N. spumigena at lower salinities, the

intermediate salinity hypothesis may apply.

The precise set of conditions that trigger episodic

N. spumigena blooms is unknown. Our data suggest that

these episodic blooms play a major role in excluding

A. halophytica from vast areas of the Great Salt Lake.

Being able to predict the occurrence of Nodularia blooms

would not only be of theoretical importance; it might also

lead to a better understanding of cyanobacterial blooms and

cyanotoxin impacts on wildlife and human health (Cox

et al. 2005; Metcalf et al. 2008).

Acknowledgments We thank J. Metcalf for assistance in cyano-
bacterial identification, J. Gardner for assistance in scanning electron

microscopy, and B. Schaalje for assistance with biostatistics. We are

grateful to the Wood Family Foundation for the Mus Views DIC/

Fluorescent Microscopy Facility at the Institute for Ethnomedicine,

and A. Fransiscana and R. Smithson for inspiration in our studies of

the Great Salt Lake.

References

Brock TD (1976) Halophilic blue-green algae. Arch Microbiol

107:109–111

Butts DS (1980) Factors affecting the concentration of Great Salt

Lake brines. In: Gwynn JW (ed) Great Salt Lake: a scientific,

historical, and economic overview. Utah Geological and Mineral

Survey, Salt Lake, pp 163–167

Cox PA, Banack SA, Murch SJ, Rasmussen U, Tien G, Bidigare RR,

Metcalf JS, Morrison LF, Codd GA (2005) Diverse taxa of

cyanobacteria produce b-N-methylamino-L-alanine, a neurotoxic
amino acid. Proc Natl Acad Sci USA 102(14):5074–5078

Cronberg G, Annadotter H (2006) Manual of aquatic cyanobacteria: a

photo guide and a synopsis of their toxicology. International

Society for the Study of Harmful Algae, Copenhagen, pp 1–106

Dyer BD (2003) A field guide to bacteria. Cornell University Press,

New York, pp 1–355

Felix EA (1978) MS Thesis: the algal flora of the Great Salt Lake.

Department of Botany and Range Science Brigham Young

University, Provo, pp 1–37

Felix EA, Rushforth SR (1980) Biology of the South arm of the Great

Salt Lake, Utah. In: Gwynn JW (ed) Great Salt Lake: a scientific,

historical, and economic overview. Utah Geological and Mineral

Survey, Salt Lake, pp 305–312

Gause GF (1969) The struggle for existence. Hafner, New York, pp 1–163

Gwynn JW (2002) Great Salt Lake: an overview of change. Utah

Dept. Natural Resources, Salt Lake, pp 1–584

Hardin G (1960) The competitive exclusion principle. Science

131:1292–1297

Herbst DB (1999) Biogeography and physiological adaptations of the

brine fly genus Ephydra (Diptera: Ephydridae) in saline waters
of the Great Basin. Great Basin Natur 59:127–135

Marcarelli AM, Wurtsbaugh WA, Griset O (2006) Salinity controls

phytoplankton to nutrient enrichment in the Great Salt Lake,

Utah, USA. Canad J Fisheries Aquat Sci 63:2236–2248

Metcalf JS, Banack SA, Lindsay J, Morrison LF, Cox PA, Codd GA

(2008) Co-occurrence of b-N-methylamino-L-alanine, a

360 Extremophiles (2009) 13:355–361

123

neurotoxic amino acid with other cyanobacterial toxins in British

waterbodies, 1990–2004. Environ Microbiol 10(3):702–708

Nübel U, Garcia-Pichel F, Muyzer G (2000) The halotolerance and

phylogeny of cyanobacteria with tightly coiled trichomes

(Spirulina turpin) and the description of Halospirulina tapeticola
gen. nov., sp. nov. Intl J Syst Evol Microbiol 50:1265–1277

Oren A (2000) Salts and Brines. In: Whitton BA, Potts M (eds) The

ecology of cyanobacteria. Kluwer, Dordrecht, pp 281–306

Oren A (2002) Halophilic microrganisms and their environments.

Kluwer, Dordrecht, pp 1–575

Post FJ (1981) Microbiology of the Great Salt Lake north arm.

Hydrobiol 81–82:59–69

Roney H (2007) Competitive exclusion of cyanobacteria in the Great Salt

Lake. Honors Thesis Brigham Young University, Provo, pp 1–36

Rushforth SR, Felix EA (1982) Biotic adjustments to changing salinities

in the Great Salt Lake, Utah. USA Microb Ecol 8:157–161

Stephens DW, Gillespie DM (1976) Phytoplankton production in the

Great Salt Lake, Utah, and a laboratory study of algal response to

enrichment. Limnol Oceanog 21:74–87

Sturm PA (1980) The Great Salt Lake brine system. In: Gwynn JW

(ed) Great Salt Lake: a scientific, historical, and economic

overview. Utah Geological and Mineral Survey, Salt Lake,

pp 147–162

Utah Geological and Mineral Survey (1990) The Great Salt Lake

information sheet. Utah Geol Surv Pub Inf Ser 8:1–2

Williams WD (1998) Salinity as a determinant of the structure of

biological communities in Salt Lakes. Hydrobiol 381:191–201

Zamir A, Azachi M, Bageshwar U, Fisher M, Gokhman I, Premkumar

L, Sadka A, Savchenko T (2004) Molecular and functional

adaptations underlying the exceptional salt tolerance of the alga

Dunaliella salina. In: Ventosa A (ed) Halophilic microorgan-
isms. Springer, Heidelberg, pp 165–176

Extremophiles (2009) 13:355–361 361

123

  • Competitive exclusion of Cyanobacterial species�in the Great Salt Lake
    • Abstract
    • Introduction
    • Materials and methods
      • Experimental cultures
      • Cyanobacterial identification
      • Data analysis
    • Results
      • Experimental cultures
      • Cyanobacterial identification
      • Data analysis
    • Discussion
    • Acknowledgments
    • References

<<
/ASCII85EncodePages false
/AllowTransparency false
/AutoPositionEPSFiles true
/AutoRotatePages /None
/Binding /Left
/CalGrayProfile (None)
/CalRGBProfile (sRGB IEC61966-2.1)
/CalCMYKProfile (ISO Coated v2 300% \050ECI\051)
/sRGBProfile (sRGB IEC61966-2.1)
/CannotEmbedFontPolicy /Error
/CompatibilityLevel 1.3
/CompressObjects /Off
/CompressPages true
/ConvertImagesToIndexed true
/PassThroughJPEGImages true
/CreateJDFFile false
/CreateJobTicket false
/DefaultRenderingIntent /Perceptual
/DetectBlends true
/ColorConversionStrategy /sRGB
/DoThumbnails true
/EmbedAllFonts true
/EmbedJobOptions true
/DSCReportingLevel 0
/SyntheticBoldness 1.00
/EmitDSCWarnings false
/EndPage -1
/ImageMemory 524288
/LockDistillerParams true
/MaxSubsetPct 100
/Optimize true
/OPM 1
/ParseDSCComments true
/ParseDSCCommentsForDocInfo true
/PreserveCopyPage true
/PreserveEPSInfo true
/PreserveHalftoneInfo false
/PreserveOPIComments false
/PreserveOverprintSettings true
/StartPage 1
/SubsetFonts false
/TransferFunctionInfo /Apply
/UCRandBGInfo /Preserve
/UsePrologue false
/ColorSettingsFile ()
/AlwaysEmbed [ true
]
/NeverEmbed [ true
]
/AntiAliasColorImages false
/DownsampleColorImages true
/ColorImageDownsampleType /Bicubic
/ColorImageResolution 150
/ColorImageDepth -1
/ColorImageDownsampleThreshold 1.50000
/EncodeColorImages true
/ColorImageFilter /DCTEncode
/AutoFilterColorImages false
/ColorImageAutoFilterStrategy /JPEG
/ColorACSImageDict <<
/QFactor 0.76
/HSamples [2 1 1 2] /VSamples [2 1 1 2]
>>
/ColorImageDict <<
/QFactor 0.76
/HSamples [2 1 1 2] /VSamples [2 1 1 2]
>>
/JPEG2000ColorACSImageDict <<
/TileWidth 256
/TileHeight 256
/Quality 30
>>
/JPEG2000ColorImageDict <

Ecology homework help

Vol 437|20 October 2005|doi:10.1038/nature04107

LETTERS

Adaptive evolution of non-coding DNA in Drosophila
Peter Andolfatto

1

A large fraction of eukaryotic genomes consists of DNA that is not
translated into protein sequence, and little is known about its
functional significance. Here I show that several classes of non-
coding DNA in Drosophila are evolving considerably slower than
synonymous sites, and yet show an excess of between-species
divergence relative to polymorphism when compared with
synonymous sites. The former is a hallmark of selective constraint,
but the latter is a signature of adaptive evolution, resembling
general patterns of protein evolution in Drosophila1,2. I estimate
that about 40–70% of nucleotides in intergenic regions, untrans-
lated portions of mature mRNAs (UTRs) and most intronic DNA
are evolutionarily constrained relative to synonymous sites. How-
ever, I also use an extension to the McDonald–Kreitman test3 to
show that a substantial fraction of the nucleotide divergence in
these regions was driven to fixation by positive selection (about
20% for most intronic and intergenic DNA, and 60% for UTRs).
On the basis of these observations, I suggest that a large fraction of
the non-translated genome is functionally important and subject
to both purifying selection and adaptive evolution. These results
imply that, although positive selection is clearly an important facet
of protein evolution, adaptive changes to non-coding DNA might
have been considerably more common in the evolution of
D. melanogaster.

The high degree of protein sequence similarity between pheno-
typically diverged species has led some to propose that regulatory
evolution may be of considerably more importance than protein
evolution4,5. Although most of the typical eukaryotic genome is
comprised of non-coding DNA, comparatively little is known
about the evolutionary forces acting on it. Some unknown fraction
of the non-translated genome is presumed to be crucial for the
regulation of gene expression. Most of our direct knowledge regard-
ing the evolution of regulatory elements comes from a handful of
direct functional studies5,6. A second, indirect approach is based on
comparative genomics7. The rationale for this second approach is
that if newly arising mutations are typically detrimental to gene
function, functionally important parts of the genome are expected to
evolve more slowly than those lacking function8–11.

There are some limitations to the comparative genomics
approach. First, a given genomic region might be conserved owing
simply to a lower mutation rate12. Second, known regulatory
elements do not seem to be particularly well conserved as a class,
at least in Drosophila10. This fnding suggests that taking an approach
based on sequence conservation alone may lead to a biased view of
regulatory evolution. Functionality of DNA sequences implies that
they can be subject to both negative and positive selection. If a
signifcant fraction of divergence between species observed in non-
coding DNA is positively selected rather than selectively neutral or
constrained, this could lead to underestimates of the functional
importance of non-coding DNA and cause researchers to overlook
the contribution of arguably the most interesting class of mutations
in genome evolution — those refecting adaptive differences between
populations and species.

These limitations can be overcome by combining comparative
genomic analyses with population-level variability data1–3,13. To
assess the mode of selection acting on non-coding DNA, I have
analysed new and previously published polymorphism data for 35
coding fragments (average length 667 base pairs (bp)) and 153 non-
coding fragments (average length 426 bp) scattered across the
X chromosome of D. melanogaster (see Supplementary Materials 1).
To estimate levels of between-species divergence, I have compared
D. melanogaster with its closely related sibling species, D. simulans.

On the basis of the current Drosophila genome annotation (release
4), I separated the surveyed fragments into several categories that are
likely to differ in the intensity and mode of selection acting on them
(see Table 1). It is apparent that most non-coding DNA evolves
considerably slower than synonymous sites (that is, sites in protein-
coding sequences at which mutations do not result in amino acid
substitutions; Table 1). This is the case for introns and UTRs (see also
refs 14–16), as well as intergenic DNA, much of which is far from the
closest known gene (see Supplementary Materials 1). I estimate levels
of constraint in Drosophila non-coding DNA to be 40% for introns,
50% for intergenic regions (IGRs), and 60% for UTRs (Table 2).
These are all considerably higher than previous estimates from a
variety of species comparisons11,15–18 . The non-coding DNA
surveyed is also generally less polymorphic than synonymous sites
in D. melanogaster (Table 1; p , 10

210
, Wilcoxon two-sample test for

UTRs and intronsþIGRs versus synonymous sites). Thus, both
polymorphism and divergence in non-coding DNA are signifcantly
reduced relative to synonymous sites in D. melanogaster.

Reduced levels of polymorphism and divergence in non-coding
DNA resemble general patterns of protein evolution19 and suggest
that non-coding DNA is either functionally constrained or is subject
to a lower mutation rate than synonymous sites. One way to
distinguish between these two models is to consider the distribution
of polymorphism frequencies. Negative selection acting on poly-
morphic variants will keep them at lower frequencies in a population
than expected if they were neutral20. Consistent with this prediction,
the distribution of polymorphism frequencies at both non-coding
DNA and amino acid sites is skewed towards rare frequencies relative
to synonymous polymorphisms (as indicated by a more negative
Tajima’s D value20, Fig. 1). The distribution of Tajima’s D values for
non-synonymous sites among loci is negatively skewed relative to
synonymous sites, suggesting that amino acid polymorphisms are
subject to purifying selection (Fig. 1; p ¼ 0.002, Wilcoxon two-
sample test versus synonymous sites). Here I show that this
same pattern extends to polymorphisms in non-coding DNA
(Fig. 1; Wilcoxon test versus synonymous sites: pooled non-coding,
p ¼ 0.0001; UTRs, p , 0.0001; introns, p ¼ 0.001; IGRs, p ¼ 0.005).
This fnding, together with the observed reduction in polymorphism
and divergence, implies that mutations in non-coding DNA are
subject, on average, to stronger negative selection than synonymous
sites (see also Supplementary Materials 2).

Does selective constraint alone account for patterns of non-coding
DNA evolution? McDonald and Kreitman3 have proposed a frame-

1
Section of Ecology, Behavior and Evolution, Division of Biological Sciences, University of California San Diego, La Jolla, California 92093, USA.

1149
© 2005 Nature Publishing Group

LETTERS NATURE|Vol 437|20 October 2005

Table 1 | Polymorphism and divergence in coding and non-coding DNA of D. melanogaster

Mutation class No. of regions Mean p* Mean Dxy† D‡ P§ pk P’{ p#

Synonymous 35 2.87 13.59 604 502 2 323 2
Non-synonymous 35 0.18 1.72 260 115 ,10

26
52 ,10

29

Non-coding 153 1.06 5.94 3,168 2,386 0.14 1,295 ,10
23

UTRs 31 0.54 4.54 471 246 ,1025 107 ,10211

5
0
UTRs 18 0.61 5.41 328 160 ,10

25
71 ,10

29

3
0
UTRs 13 0.45 3.35 143 86 0.034 36 ,10

24

Introns 72 1.25 6.71 1,564 1,221 0.39 675 0.010
IGRs 50 1.11 5.72 1,133 919 .0.5 513 0.059
pIGRs 20 1.29 6.58 500 400 .0.5 237 0.25
dIGRs 30 0.99 5.18 633 519 .0.5 276 0.041
IntronsþIGR 122 1.19 6.25 2,697 2,140 0.50 1,188 0.013

Mutation classes: synonymous sites, non-synonymous sites, untranslated transcribed regions (UTRs), intergenic regions within 2 kb of a gene (pIGRs), intergenic regions more than 4 kb away
from a gene (dIGRs).
*p is the weighted average within-species pairwise diversity per 100 sites.
†Dxy is the weighted average pairwise divergence per 100 sites between D. melanogaster and D. simulans, corrected for multiple hits (Jukes–Cantor). Dxy at fourfold degenerate synonymous
sites is 12.0%.
‡D is the estimated number of fxed differences between species using a Jukes–Cantor correction for multiple hits (see Methods).
§P is the number of intraspecifc polymorphisms.
k McDonald–Kreitman test of probability using all polymorphisms.
{P’ is the number of intraspecifc polymorphisms excluding singletons.
# McDonald–Kreitman test of probability excluding singleton polymorphisms. Probabilities are from two-tailed Fisher’s exact tests and assume sites are independent. These are likely to be only
slight underestimates given probable levels of intragenic recombination (see Supplementary Materials 2).

work to distinguish neutrality (and variation in mutation rate) from
negative and positive selection in the genome. Their approach
compares levels of polymorphism within and divergence between
species for a putatively selected class of sites in the genome to a
neutral standard. If reduced levels of polymorphism and divergence
in non-coding DNA can be explained by a lower mutation rate, the
ratio of polymorphism to divergence should be similar to that for
synonymous sites. Positive selection will increase divergence relative
to polymorphism at selected sites, whereas negative selection is
expected to result in the opposite pattern21. Although this framework
was originally designed to detect selection within protein-coding
genes, it can be generalized to consider arbitrary classes of putatively
selected sites sampled from multiple genomic regions, including
non-coding DNA (see Supplementary Materials 2). Using all poly-
morphisms, there is a signifcant excess of divergence for amino
acid replacement sites (p ¼ 5 £ 1027) and for UTRs (p ¼ 3 £ 1026,
two-tailed Fisher’s exact test) but not at other subclasses of non-
coding DNA (Table 1). This preliminary analysis suggests that,
similar to the pattern observed for amino acid substitutions1,2, a
signifcant proportion of nucleotide divergence at UTRs was also
driven to fxation by positive selection.

The presence of weakly negatively selected variants in poly-
morphism can mask the signature of adaptive evolution in the
genome1,22, making the McDonald–Kreitman test very conservative.
As I have shown above that polymorphic variants in non-coding
DNA are subject to stronger selective constraint than synonymous
sites (Table 1 and Fig. 1), negatively selected variants contributing to
polymorphism in non-coding DNA are likely to be a factor limiting

Table 2 | Functionally relevant nucleotides in non-coding DNA

Class C (%)* a(%)† p (a # 0)‡ FRN (%)§

UTRs 60.4 57.5 ,10
23

83.2
5

0
UTRs 52.9 60.8 ,1023 80.9

3
0
UTRs 70.7 52.9 ,10

23
86.2

Introns 39.5 19.3 0.007 51.2
IGRs 49.3 15.3 0.036 57.1
pIGRs 40.6 11.4 0.165 47.4
dIGRs 54.6 18.5 0.019 63.0
Introns þ IGR 44.2 17.6 0.013 54.0

* Constraint (C) is estimated relative to fourfold degenerate synonymous sites.
†a is the estimated fraction of divergence driven by positive selection.
‡ Probabilities (a # 0) have been adjusted for effects of linkage within loci (see
Supplementary Materials 2.5).
§ FRN is the inferred fraction of functionally relevant nucleotides given levels of constraint
and a (that is, FRN < C þ (1 2 C)a).

1150

power to detect positive selection. This problem can be partially
overcome by considering only those mutations that are not rare
in a sample from both the neutral and putatively selected classes
(see ref. 23 and Supplementary Materials 2). Applying this approach
reveals a signifcant excess of divergence in UTRs and in most
other classes of non-coding DNA relative to synonymous sites
(Table 1; UTRs, p ¼ 5 £ 10212; introns, p ¼ 0.01; dIGRs, p ¼ 0.04;
intronsþIGRs, p ¼ 0.01). A Hudson–Kreitman–Aguadé (HKA)
test24 also provides statistical support for a reduced ratio of poly-
morphism to divergence for non-coding DNA relative to synony-
mous sites (UTRs, p , 1023; pooled introns and IGRs, p ¼ 0.02; see
Supplementary Materials 2). Together, these results show that a
signifcant fraction of the divergence in UTRs, introns and intergenic
DNA was probably driven to fxation by positive selection.

To quantify the intensity and the relative importance of positive
selection in shaping the evolution of non-coding DNA, I apply two
extensions of the McDonald–Kreitman approach2,13. First I estimate
a, defned as the proportion of the divergence between species that
was driven by positive selection2. I estimate that about 20% of the
nucleotide divergence in introns and intergenic DNA was driven to
fxation by positive selection, and about 60% for UTRs (Fig. 2a and
Table 2). Using a hierarchical bayesian framework13, I estimate the

Figure 1 | Mean Tajima’s D values for coding and non-coding DNA. Means
across loci are given with bars indicating two standard errors. The
expectation of D under the neutral model is shown as a dotted line. Syn,
synonymous sites; NonSyn, non-synonymous sites; NonCod, pooled
non-coding DNA.

© 2005 Nature Publishing Group

NATURE|Vol 437|20 October 2005 LETTERS

selection intensity on non-coding DNA (including UTRs, introns
and IGRs) to be positive and signifcantly different from zero in most
cases (Fig. 2b; Supplementary Materials 3). As this bayesian approach
assumes that segregating and fxed variants are subject to the
same direction and intensity of selection, it is likely to underestimate
the magnitude of 2Nes (the intensity of selection) for nucleotide
substitutions fxed by positive selection (see Supplementary
Materials 2).

Evidence that a signifcant fraction of non-coding DNA is func-
tionally important is emerging from a variety of comparative
genomic studies. However, my fnding of a large fraction of positively
selected divergence implies that ‘evolutionary constraint’ will sub-
stantially underestimate the fraction of functionally relevant nucleo-
tides because it ignores the contribution of positively selected
mutations to divergence. For the example of UTRs, I estimate
evolutionary constraint to be 60%. However, as 58% of the observed
divergence was positively selected, this implies that 83% of nucleo-
tides in UTRs are in fact functionally relevant. Likewise, the fraction
of functionally relevant nucleotides in introns and IGRs is likely to be
about 10–20% higher than suggested by levels of constraint alone
(Table 2).

How frequent is adaptation in the Drosophila genome? Rough
calculations (see Supplementary Materials 4) suggest that there has
been about one adaptive amino acid substitution every 20 years since
the split of D. melanogaster and D. simulans (see also ref. 2). Although
this is substantial, consider that the total number of sites contained in

Figure 2 | Quantifying adaptive divergence and selection intensity.
a, Estimates of a, the fraction of nucleotide divergence driven by positive
selection. Error bars indicate 90% confidence limits determined by a non-
parametric bootstrapping. Estimated probabilities that a $ 0 corrected for
partial linkage are given in Table 2. b, Estimates of the intensity of selection
(2Nes) acting on non-synonymous and non-coding DNA sites. Error bars
indicate 90% confidence limits determined by simulation (see Methods).
Singleton polymorphisms were excluded in estimates of a and 2Nes (see
Supplementary Materials 3). Abbreviations as in Fig. 1.

introns, intergenic regions and UTRs far outweighs the number of
codons in the Drosophila genome25. I estimate that UTRs alone
contribute as much to adaptive divergence between species as do
amino acid changes, and the summed contribution of non-coding
DNA to adaptive divergence could easily be an order of magnitude
larger. These fndings support previous intuitions4,5 about the great
importance of regulatory changes in evolution.

METHODS
Data. All loci used in this study, previously published or newly collected, are
X-linked genomic fragments, with a sample size of 12 D. melanogaster alleles
sampled from a population in Zimbabwe, and a single D. simulans sequence.
For coding DNA (synonymous and non-synonymous sites), I collected poly-
morphism and divergence in 31 coding regions selected randomly with respect
to gene function, and 51 non-coding regions (27 intergenic and 24 untranslated
transcribed regions). Information about these 82 loci and primers used can be
found in Supplementary Materials 1. I used polymerase chain reaction (PCR) to
amplify 700–800-bp regions from genomic DNA extracted from single male fies,
removed primers and nucleotides using exonuclease I and shrimp alkaline
phosphatase, and sequenced the cleaned product on both strands using Big-Dye
(Version 3, Applied Biosystems). Sequences were collected on an ABI 3730
capillary sequencer and were aligned and edited using the program Sequencher
(Gene Codes).

To the 82 regions surveyed above, I added previously published data for
loci that had the same sample size (n ¼ 12 fies) and were surveyed in similar
samples from Zimbabwe26,27. A number of the previously published loci26 had
to be functionally reassigned when compared to Release 4 of the annotated
D. melanogaster genome (http://fybase.bio.indiana.edu/annot/dmel-release4.
html). I excluded any loci in regions of reduced recombination (see below).
Previously published loci ftting these requirements were processed into 106
fragments (4 coding, 7 UTR, 23 intergenic and 72 intron). Thus, the total
number of regions surveyed in this analysis is 188. Alignments for each locus
are available upon request. A reciprocal best-hit BLAST protocol was used to
confrm that the regions compared between D. melanogaster and D. simulans
are indeed orthologous. Extra gaps were introduced into some alignments in
regions that were particularly diffcult to align. This procedure is likely to
upwardly bias estimates of constraint, but is conservative with respect to
detecting positive selection.
Analyses. The estimated number of synonymous sites, non-synonymous sites,
average pairwise diversity (p), average pairwise divergence (Dxy), as well as
counts of the number of polymorphic sites (P) were performed using DnaSP
software (version 4; http://www.ub.es/dnasp/) and Perl code written by P.A. The
number of divergent sites (D) was estimated as Dxy 2 p using a Jukes–Cantor
correction for multiple hits. Multiply hit sites were included in the analysis but
insertion–deletion polymorphisms and mutations overlapping alignment gaps
were excluded. Derived mutations were polarized using a single D. simulans
sequence and assuming standard parsimony criteria. Tajima’s D value20 was
estimated from the number of polymorphisms and p.

In this study, I assume that synonymous sites are more neutral than putatively
selected classes of sites (see Supplementary Materials 2.2). I separated non-
coding DNA into subclasses that I expected a priori to experience different
selection pressures: 5

0
and 3

0
untranslated transcribed regions (UTRs), introns,

intergenic regions within 2 kilobases (kb) of a gene (proximal intergenic regions,
pIGRs), and intergenic regions further than 4 kb from the nearest gene (distal
intergenic regions, dIGRs). My sample of intron fragments is biased towards
introns larger than the median intron size (86 bp) (ref. 28), making estimates of
constraint higher than expected with a random sample of introns14. However,
95% of intronic DNA is contained within introns longer than the median size28,
and thus my estimate refects levels of constraint for most intronic DNA in the
genome.

For comparisons of polymorphism and divergence between synonymous
sites and non-coding DNA, it was necessary to pool sites in each class. I estimate
evolutionary constraint relative to fourfold degenerate synonymous sites using
the approach in ref. 15, except that I pooled classes of sites and used a Jukes–
Cantor correction for multiple hits19. Given differences in base composition
between coding and non-coding regions, I investigated possible differences in
mutations rates owing to the 16 possible adjacent-base contexts of nucleotides
(suggested by A. Kondrashov). There was no signifcant effect of adjacent-base
context on rates of divergence (see Supplementary Materials 5).

I estimate the proportion of divergence driven by positive selection1,2 as
a ¼ 1–(D SP X/DXP S), where S denotes synonymous (that is, putatively neutral)P P n n
sites, X denotes putatively selected sites, and D ¼ i¼1 and P ¼ Pi,Di i¼1

1151
© 2005 Nature Publishing Group

LETTERS NATURE|Vol 437|20 October 2005

where Di and Pi are the number of divergent and polymorphic variants at locus i,
respectively, and n is the number of loci of class S or X. Confdence limits on a
were estimated using a standard non-parametric bootstrapping procedure,
assuming sites are independent. The issue of non-independence of sites within
surveyed fragments is addressed in Supplementary Materials 2.5. For consist-
ency, a was estimated for non-synonymous sites in the same way. The intensity
of selection (2Nes) was estimated on putatively selected classes (pooling sites as
above) using a hierarchical bayesian method (http://cbsuapps.tc.cornell.edu)13.
To avoid problems associated with large-scale variation in recombination rates, I
restricted my survey of loci to regions of the X chromosome that have the highest
rates of recombination29 (see Supplementary Fig. 1.1).

Received 23 May; accepted 2 August 2005.

1. Fay, J. C., Wyckoff, G. J. & Wu, C. I. Testing the neutral theory of molecular
evolution with genomic data from Drosophila. Nature 415, 1024–-1026
(2002).

2. Smith, N. G. & Eyre-Walker, A. Adaptive protein evolution in Drosophila. Nature
415, 1022–-1024 (2002).

3. McDonald, J. & Kreitman, M. Adaptive protein evolution at the Adh locus in
Drosophila. Nature 351, 652–-654 (1991).

4. King, M. C. & Wilson, A. C. Evolution at two levels in humans and
chimpanzees. Science 188, 107–-116 (1975).

5. Carroll, S. B., Grenier, J. K. & Weatherbee, S. D. From DNA to Diversity:
Molecular Genetics and the Evolution of Animal Design (Blackwell Science,
Malden, Massachusetts, 2001).

6. Ludwig, M. et al. Functional evolution of a cis-regulatory module. PLoS Biol. 3,
e93 (2005).

7. Miller, W., Makova, K., Nekrutenko, A. & Hardison, R. Comparative genomics.
Annu. Rev. Genomics Hum. Genet. 5, 15–-56 (2004).

8. Cliften, P. et al. Surveying Saccharomyces genomes to identify functional
elements by comparative DNA sequence analysis. Genome Res. 11, 1175–-1186
(2001).

9. Gibbs, R. et al. Genome sequence of the Brown Norway rat yields insights into
mammalian evolution. Nature 428, 493–-521 (2004).

10. Richards, S. et al. Comparative genome sequencing of Drosophila pseudoobscura:
chromosomal, gene, and cis-element evolution. Genome Res. 15, 1–-18 (2005).

11. Shabalina, S. & Kondrashov, A. Pattern of selective constraint in C. elegans and
C. briggsae genomes. Genet. Res. 74, 23–-30 (1999).

12. Clark, A. The search for meaning in noncoding DNA. Genome Res. 11,
1319–-1320 (2001).

13. Bustamante, C. et al. The cost of inbreeding in Arabidopsis. Nature 416,
531–-534 (2002).

14. Haddrill, P. R., Halligan, D., Charlesworth, B. & Andolfatto, P. Patterns of intron
sequence evolution in Drosophila are dependent upon length and GC content.
Genome Biol. 6, R67 (2005).

15. Halligan, D., Eyre-Walker, A., Andolfatto, P. & Keightley, P. Patterns of
evolutionary constraints in intronic and intergenic DNA of Drosophila. Genome
Res. 14, 273–-279 (2004).

16. Bachtrog, D. Sex chromosome evolution: molecular aspects of Y chromosome
degeneration in Drosophila. Genome Res. 15, 1393–-1401 (2005).

17. Jareborg, N., Birney, E. & Durbin, R. Comparative analysis of noncoding regions
of 77 orthologous mouse and human gene pairs. Genome Res. 9, 815–-824
(1999).

18. Bergman, C. & Kreitman, M. Analysis of conserved noncoding DNA in
Drosophila reveals similar constraints in intergenic and intronic sequences.
Genome Res. 11, 1335–-1345 (2001).

19. Li, W. Molecular Evolution (Sinauer Associates, Sunderland, Massachusetts,
1997).

20. Tajima, F. Statistical method for testing the neutral mutation hypothesis by
DNA polymorphism. Genetics 123, 585–-595 (1989).

21. Kimura, M. The Neutral Theory of Molecular Evolution (Cambridge Univ. Press,
Cambridge, 1983).

22. Charlesworth, B. The effect of background selection against deleterious
mutations on weakly selected, linked variants. Genet. Res. 63, 213–-227 (1994).

23. Templeton, A. Contingency tests of neutrality using intra/interspecifc gene
trees: the rejection of neutrality for the evolution of the mitochondrial
cytochrome oxidase II gene in the hominoid primates. Genetics 144, 1263–-1270
(1996).

24. Hudson, R., Kreitman, M. & Aguadé, M. A test of neutral molecular evolution
based on nucleotide data. Genetics 116, 153–-159 (1987).

25. Misra, S. et al. Annotation of the Drosophila melanogaster euchromatic genome:
a systematic review. Genome Biol. 3, research0083.1-0083.22 (2002).

26. Glinka, S., Ometto, L., Mousset, S., Stephan, W. & De Lorenzo, D. Demography
and natural selection have shaped genetic variation in Drosophila melanogaster:
a multi-locus approach. Genetics 165, 1269–-1278 (2003).

27. Haddrill, P. R., Thornton, K. R., Charlesworth, B. & Andolfatto, P. Multilocus
patterns on nucleotide variability and the demographic and selection history of
Drosophila melanogaster populations. Genome Res. 15, 790–-799 (2005).

28. Yu, J. et al. Minimal introns are not “junk”. Genome Res. 12, 1185–-1189 (2002).
29. Charlesworth, B. Background selection and patterns of genetic diversity in

Drosophila melanogaster. Genet. Res. 68, 131–-149 (1996).

Supplementary Information is linked to the online version of the paper at
www.nature.com/nature.

Acknowledgements The author thanks D. Bachtrog for extensive comments on
the manuscript and help with data quality issues, C. Bustamante and K. Thornton
for providing code, and B. Ballard for Zimbabwe fy lines. P. Haddrill and
K. Thornton assisted in designing primers for distal intergenic and coding
regions, respectively. Thanks to B. Fischman for technical help, A. Betancourt,
A. Kondrashov, A. Poon, D. Presgraves, M. Przeworski and S. Wright for critical
comments on the manuscript, and L. Chao and J. Huelsenbeck for advice.
Thanks also to the Washington University Genome Sequencing Center for
providing unpublished D. simulans sequences. This work was funded in part by a
research grant from the Biotechnology and Biological Sciences Research Council
(UK) to P.A. The author is supported by an Alfred P. Sloan Fellowship in
Molecular and Computational Biology.

Author Information Reprints and permissions information is available at
npg.nature.com/reprintsandpermissions. The author declares no competing
fnancial interests. Correspondence and requests for materials should be
addressed to P.A. (pandolfatto@ucsd.edu).

1152
© 2005 Nature Publishing Group

  • Adaptive evolution of non-coding DNA in Drosophila
    • METHODS

Ecology homework help

DQ3 – Culturally Competent Leadership

As it has been discussed in the textbook and online resources, the only way an organization truly becomes culturally proficient is from support from the leadership of the organization. For DQ3, imagine you have recently been promoted as director of a public health or health care organization and a top item on your agenda is leading a culturally competent organization. Reflect and then respond to the following two questions:

1. As a health professional, what changes in cultural attitudes, behaviors, and values are necessary on MY behalf?

2. How prepared am I and the organization I serve to effectively respond to cultural practices/nuances of patients/clients/customers?

As you reflect on these questions, it may be beneficial to think about any person experiences you’ve had dealing with individuals who have not displayed cultural competence. Remember in your discussions with your classmates that there are no right or wrong answers to this set of questions. The purpose of this discussion is for you to think about what YOU need to change in order to be a culturally proficient leader, as well as what additional training you might need. In responding to your classmates, provide suggestions on how improvements can be made or resources that can be used in order to be culturally proficient.

Ecology homework help

Topics in molecular & Organismal Evolution R studio Project Part 1

COVID-related molecular evolution analysis of the Human Gene ACE-2

 

1.  Background reading 

 

The 1000 Genomes Project Consortium. 2015. A global reference for human genetic variation. Nature 526: 68-74.

Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, McVean G, Durbin R, 1000 Genomes Project Analysis Group. 2011. The variant call format and VCFtools. Bioinformatics App Note 27: 2156-2158.

Gheblawi M, Wang K, Viveiros A, Nguyen Q, Zhong J-C, Turner AJ, Raizada MK, Grant MB, Oudit GY. 2020. Angiotensin-Converting Enzyme 2: SARS-CoV-2 Receptor and Regulator of the Renin-Angiotensin System. Circulation Research, 126(10):1456-1474

Hoffmann M, et al. 2020. SARS-CoV-2 cell entry depends on ACE2 and TMPRSS2 and is blocked by a clinically proven protease inhibitor. Cell, 181(2):271-280.e8

David A, Khanna T, Beykou M, Hanna G, Sternberg MJE. [Preprint]. Structure, function and variants analysis of the androgen-regulated TMPRSS2, a drug target candidate for COVID-19 infection. bioRxiv, accessed 10SEP20, DOI: 10.1101/2020.05.26.116608

Sriram K, Insel P, Loomba R. May 14, 2020. What is the ACE2 receptor, how is it connected to coronavirus and why might it be key to treating COVID-19? The experts explain. The Conversation.


http://www.internationalgenome.org/home (Links to an external site.)
.


https://www.illumina.com/company/news-center/feature-articles/illumina-to-sequence-genomes-for-new-uk-wide-covid19-study.html (Links to an external site.)

 

Question 1.1. What is the 1000 genomes project?

Question 1.2. What are the super populations and sub-populations that contributed genomic DNA to the 1000 genomes project?

Question 1.3. Why are the ACE2 and TMPRSS2 human genes of interest when it comes to understanding the severity of covid infection?

 

2.  Understanding the files we will work with


 

Figure 1 provides an overview of how the sequence data for the 1000 genomes project was generated. The sequencing technology they used was the propriety Illumina platform (NGS or Next Generation Sequencing in the figure). This process involves shearing the genomic DNA into numerous small fragments (i.e. the “shotgun” approach), making a clonal bacterial library of these fragments (for large genomes only > 500Mb, smaller genomes do not require this library and can be sequences directly).

Figure 1: Overview of NGS and the files generated 


 

The next step is to sequence these fragments to a target coverage depth (coverage refers to how many copies of each fragment to generate – since the fragments overlap – the higher the coverage (i.e. more copies of a fragment), the more accurate your eventual re-assembly back into contiguous chromosomes will be as well as the eventual calling of variants). The initial, raw sequence files are in FASTQ file format. FASTQ files contain the sequences themselves along with various quality scores for each base call. You can read more about FASTQ files here: 
https://support.illumina.com/bulletins/2016/04/fastq-files-explained.html (Links to an external site.)
. The raw reads are then passed through reference genome alignment algorithms to generate SAM/BAM files. SAM stands for Sequence Alignment / Map format. You can read more about them here: 
https://samtools.github.io/hts-specs/SAMv1.pdf (Links to an external site.)
. BAM files are the compressed, binary version of SAM files (up to 128MB of sequence). These files typically contain an entire genome worth of DNA sequence information as well as positional information on where in the genome each and every piece of sequence falls in the genome (file sizes are typically in hundreds of Gigabytes, and often exceed a Terabyte). These are NOT files you would want to typically download onto your laptop, rather you would work with these on a server, which has the capacity to work with such large files efficiently. For our purposes (molecular evolution analysis of a single gene, ACE2), VCF (Variant Call Format) files are sufficient, and small enough to work with on our personal laptops and Rstudio. VCF files are the result of picking out only the variant nucleotide positions from a SAM/BAM file (i.e. loci where the sequences differ from the reference genome). You can find more information on VCF files here: 
https://www.internationalgenome.org/wiki/Analysis/vcf4.0/ (Links to an external site.)

VCF files are sufficient for most population genetic / molecular evolution analyses.

 

Question 2.1. Make labelled figures for, or describe (or do both) the headers of a typical FASTQ, SAM/BAM and VCF file, include column heading descriptions where appropriate – highlight what they have in common and what is unique to each file type. The goal here is to be able to recognize when you have successfully created / downloaded a VCF file and understand what it looks like so that we can import it into Rstudio  

 

3.  Using 
EnsEMBL 

to get VCF files from the 1000 genomes project. 

 

You can find general information of how to get data from the 1000 genomes project using genome browsers (of which EnsEMBL is one) here: 
https://www.internationalgenome.org/data (Links to an external site.)

Step by Step instructions:

 

Step 1: Finding ACE2

· Go to the website: 
http://useast.ensembl.org/Homo_sapiens/Info/Index (Links to an external site.)

· Find the search bar in the top left-hand corner and type in “ACE2” Make sure the “category” drop-down menu is set to “Search all categories.” Click “Go.”

· The first result to come up should be called “ACE2 (Human Gene)”.

· Clicking on that will bring you to the “home page” for the gene ACE2

 

Question 3.1: what does ACE stand for?

Question 3.2. On which chromosome do we find the human ACE2 gene?

Question 3.3. How many different orthologous genes of ACE2 have been sequenced and name at least 1 vertebrate and 1 invertebrate in which it is found. 

 

Step 2: This may or may not work –  use “dataslicer” to get a VCF file containing only the data you would like to analyze

The Data Slicer in EnsEMBL is a convenient way to get only the amount of data that you want without using a separate program to cut it out yourself or having to write some code to do it. We will try to use this tool to get the data for our analyses of ACE2. We will be taking one slice of data that contains all of the SNPs (Single Nucleotide Polymorphisms) in ACE2. The link to the Data Slicer is available here: 
https://useast.ensembl.org/Homo_sapiens/Tools/DataSlicer (Links to an external site.)

· First off, in the “Name for this job” category, let’s name it after the population we will be sampling from – the Yoruba population (which shows the most variation in the ACE2gene), so name it “YRI_ACE2”.

· The file format should be set for VCF. If it’s not, click the drop-down menu and select VCF.

· In the “region lookup” bar, copy and paste in the location X:15561033-15602148. These are the GRCh38.p13 version alignment coordinates for the gene ACE2.

In in the “Genotype file URL…” paste this URL: 
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/supporting/GRCh38_positions/ALL.chrX_GRCh38.genotypes.20170504.vcf.gz (Links to an external site.)

This will ensure you get data from the last phase of the 1000 Genomes Project.

In the “filters” category, select “By populations”. Paste in thuis URL 
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/integrated_call_samples_v3.20130502.ALL.panel (Links to an external site.)

· . Select the YRI population so that you only get the data for that population.

 

 

This may or may not work, (data slicer is buggy) if not, then we need to get a bit more sophisticated and perform some command line magic.

 

4.  Using the terminal on macs or command prompt in windows

 

This link is a nice gentle introduction to terminal basics – without the basics, you will be lost, so please work your way through this. Since the Data slicer most likely will not work, we will have to learn some terminal basics to download the data set we want to analyze. It is also one of those skills that is absolutely crucial if you want to do anything mildly bioinformatics these days.   

 


https://medium.com/@grace.m.nolan/terminal-for-beginners-e492ba10902a (Links to an external site.)

 

Question 4.1: what does BASH stand for?

Question 4.2: What is a directory? Do directories have extensions like .exe or .txt?

Question 4.3: using some of the commands you are learning about, briefly describe what the man command does (make sure you understand what the command q does beforehand)

Ecology homework help

Question 1.1. What is the 1000 genomes project?

Question 1.2. What are the super populations and sub-populations that contributed genomic DNA to the 1000 genomes project?

Question 1.3. Why are the ACE2 and TMPRSS2 human genes of interest when it comes to understanding the severity of covid infection?

Question 2.1. Make labelled figures for, or describe (or do both) the headers of a typical FASTQ, SAM/BAM and VCF file, include column heading descriptions where appropriate – highlight what they have in common and what is unique to each file type. The goal here is to be able to recognize when you have successfully created / downloaded a VCF file and understand what it looks like so that we can import it into Rstudio  

Question 3.1: what does ACE stand for?

Question 3.2. On which chromosome do we find the human ACE2 gene?

Question 3.3. How many different orthologous genes of ACE2 have been sequenced and name at least 1 vertebrate and 1 invertebrate in which it is found. 

Question 4.1: what does BASH stand for?

Question 4.2: What is a directory? Do directories have extensions like .exe or .txt?

Question 4.3: using some of the commands you are learning about, briefly describe what the man command does (make sure you understand what the command q does beforehand)

Ecology homework help

The Community Action Plan: Written Report and Slide Presentation

Your Project for this class is a Community Action Plan designed to alleviate or correct a public-health issue in your community. Your community can be your business, school, neighborhood, town or city of residence or birth, or county. 

For this assignment, your Community Action Plan will be a professional project that includes a written report and a slide presentation

Below are some useful sites where you can find examples of the elements of an action plan.


https://www.epa.gov/community-port-collaboration/community-action-roadmap-overview (Links to an external site.)


http://www.cityofchicago.org/dam/city/depts/cdph/tobacco_alchohol_and_drug_abuse/LGBTCommunityActionPlanHC.pdf (Links to an external site.)


https://smartgrowthamerica.org/program/national-complete-streets-coalition/ (Links to an external site.)


https://www.aecf.org/m/e2s/e2s-action-plan-template.pdf (Links to an external site.)


https://sanpabloca.gov/DocumentCenter/View/5537/CAP-only-22-pages?bidId= (Links to an external site.)


https://uknowledge.uky.edu/cgi/viewcontent.cgi?referer=https://www.google.com/&httpsredir=1&article=1018&context=ced_reports (Links to an external site.)

Final Paper Instructions:

· The case study, with five sections (Symptoms, Diagnosis, Cure, Timeline, and Prevention). Each section should be about a page in length. Your entire paper must be 4-5 pages in length not counting the title or reference pages, which must be included.

· In the Symptoms section, describe the public-health issue that you have observed in your community. What “symptoms” does it exhibit? Think about whom it affects, where it affects them, and how (Refer back to module 5 CT 1).

· In the Diagnosis section, discuss the causes of the issue, and give examples of other communities that have suffered the same problem.  How did those communities solve or attempt to address the issue?

· In the Cure section, discuss possible options for a cure for this public-health issue. How can you get rid of it, or reduce its occurrence in your community?

· In the prevention section, discuss possible options for prevention of the public-health issue that you have selected. How can you reduce the chance of people being impacted by your issue in the future?

· In the Timeline section, discuss the time needed for campaigning, education, funding, building, and implementation.

· You must back up your sections using at least two scholarly articles. You may use readings other than the textbook to meet this requirement. The paper should be based on references to scholarly materials (rather than on introductory textbooks, popular website writings, or musings, for example) and should support your claims with evidence.

· A special emphasis on either the demographics of the affected population or the economic implications

· A realistic timeline for your plan. Discuss the time needed for campaigning, education, funding, building, and implementation. Use ideas created during Critical Thinking Assignments, either the option for annotated bibliography or that for the brainstorming draft feedback.

· The paper shall comply with the requirements defined within the APA guidelines. 

Final Instructions for Slide Presentation:

· The slide presentation describes the problem in your community and your action plan.

· Your presentation must be 8-10 slides in length not counting the title and reference slides. 

· Your presentation must be supported by at least two scholarly articles.

· You may use a Web-based slide presentation software such as Prezi, for example, or you may use PowerPoint. If you use a Web-based tool, include the URL to your presentation in a Word document and upload with your presentation.

· The audience for this presentation will be community members or organizations you wish to educate about the public-health issue and your proposed plan.

· The purpose of this slide presentation is to educate the audience about the issue. Keep in mind that in real life you will have limited time to convince people of the gravity of the situation and to come on board with their support, so you want to be persuasive and get to the key points quickly and effectively.

Note: Be sure to submit both your written report and your slide presentation for this assignment. Both files should be uploaded in a single submission to the assignment submission page.

The paper and preliminary deliverables must be well written and formatted in conformity

Ecology homework help

Before you begin crafting your response for this week, be sure to read the required materials and watch Designing Healthy Communities, Searching for Shangri-La (Links to an external site.)
. This video is designed to complement the readings to give you a broad understanding of the topics covered in class.

For this paper, take your community action plan and expand your Cure and Prevention plans to support the wider communities (e.g., if you are looking at a neighborhood, broaden your scope to the city or region in which you live or consider your permanent residency; if looking at a city, expand to the county, for example).

Provide some ideas on how the successes might be maintained and how the problems might be solved. For now, think in terms of a 20-30-year time period. Remember that healthy-community plans should include ideas on protecting the city from uncontrolled growth and unhealthy construction and will always favor community building.

350-400 wprds excluding reference, APA format and a minimum of 3 references

Ecology homework help

Week 4 Discussion Board Poster instructions

In the last posting you will summarize and present your topic in a visually appealing way. Now this is the time to present this to your audience: the current class of Bio100. Your audience will know the basics of biology learned in this class, but will be eager to learn more about the specifics of your topic.

Main factors to consider when choosing information:

· Find the most important highlights of your topic (no need to be too detailed as you already presented them the previous weeks

· Consider any suggestions or questions during the past 2 weeks

· Visual impact: you want to draw your audience’s attention

· Finding a right balance between text and graphics

Posting instructions

1. Use a powerpoint slide or a google presentation slide. You will be using ONE slide only as a digital poster. You can download one of the attached templates and edit it, or start from scratch. I recommend you start with a Layout that has Title and Content, and then just add more content by inserting Textboxes, Pictures, Tables etc.

2. Your font cannot be smaller than 9.

3. Avoid long sentences and use bullet points or lists.

4. Avoid also too flashy templates and overwhelming figures. Balance is the key.

5. Color: be creative but avoid too harsh colors for background, or too light colors for text.

What to post

By now you probably have lots of information available. So the main challenge here is to summarize. Think about a graphic elevator speech- what are the main points I want to get across to my audience?

Sections of the poster:

1. Background: you will define/summarize your topic.

2. 3-4 text boxes with the main points of your topic

3. Conclusions

4. Pictures/graphs: 2-3, with information that is not in the text and works better as picture

5. References: they should be in a textbox, however if you are running out of space with other relevant information you may choose to have a textbox that says: References/see in Notes section, and paste the references in the Notes. Posting due latest Friday Week 4, 11.59 PM PT

Comments

1. Please comment on 2 other student’s posters. The comments should include:

1. What did you like the most in the poster?

2. A question related to the content of the poster

3. A suggestion/constructive comment about the poster

2. Respond to at least one question asked to you.

Comments due latest Sunday Week 4, 11.59 PM PT

Ecology homework help

D
o
w

n
lo

a
d
e
d
b

y
g
u
e
st

o
n
D

e
ce

m
b
e
r

2
9
,
2
0
2
1

Estimating metazoan divergence times with
a molecular clock
Kevin J. Peterson*, Jessica B. Lyons, Kristin S. Nowak, Carter M. Takacs, Matthew J. Wargo, and Mark A. McPeek

Department of Biological Sciences, Dartmouth College, Hanover, NH 03755

Communicated by Eric H. Davidson, California Institute of Technology, Pasadena, CA, March 18, 2004 (received for review December 1, 2003)

Accurately dating when the first bilaterally symmetrical animals
arose is crucial to our understanding of early animal evolution. The
earliest unequivocally bilaterian fossils are �555 million years old.
In contrast, molecular-clock analyses calibrated by using the fossil
record of vertebrates estimate that vertebrates split from dipterans
(Drosophila) �900 million years ago (Ma). Nonetheless, compara-
tive genomic analyses suggest that a significant rate difference
exists between vertebrates and dipterans, because the percentage
difference between the genomes of mosquito and fly is greater
than between fish and mouse, even though the vertebrate diver-
gence is almost twice that of the dipteran. Here we show that the
dipteran rate of molecular evolution is similar to other invertebrate
taxa (echinoderms and bivalve molluscs) but not to vertebrates,
which significantly decreased their rate of molecular evolution
with respect to invertebrates. Using a data set consisting of the
concatenation of seven different amino acid sequences from 23 in-
group taxa (giving a total of 11 different invertebrate calibration
points scattered throughout the bilaterian tree and across the
Phanerozoic), we estimate that the last common ancestor of
bilaterians arose somewhere between 573 and 656 Ma, depending
on the value assigned to the parameter scaling molecular substi-
tution rate heterogeneity. These results are in accord with the
known fossil record and support the view that the Cambrian
explosion reflects, in part, the diversification of bilaterian phyla.

Although the Cambrian explosion is of singular importance to our understanding of the history of life, it continues to defy
explanation (1). This defiance stems, in part, from our inability
to distinguish between two competing hypotheses: whether the
Cambrian explosion ref lects the rapid appearance of fossils with
animals having a deep but cryptic precambrian history, or
whether it ref lects the true sudden appearance and diversifica-
tion of animals in the Cambrian (2). Because each hypothesis
makes a specific prediction of when animals arose in time, one
way to distinguish between these two hypotheses is to date
animal diversifications by using a molecular clock (2). A number
of previous clock studies (reviewed in refs. 3 and 4) have
suggested that the last common ancestor of bilaterians (LCB)
lived well over one billion years ago (5, 6), whereas others suggest
that LCB arose �900 million years ago (Ma) (e.g., refs. 7–10),
and still others are more consistent with an origination closer to
the Cambrian (11–13). These deep estimates for the origin of
LCB raise the question of how hundreds of millions of years of
bilaterian evolution can escape detection, given that LCB and its
near relatives should have had the capability of leaving both body
and trace fossils (14 –16).

Because molecular clocks have several inherent problems,
including how the clock is calibrated, how molecular substitution
rates are estimated, and how heterogeneity in these rates is
detected and corrected (3, 4), as well as an inherent statistical
bias for overestimating dates (4, 17), a much more recent date for
LCB may not yet be refuted. Of crucial importance for clock
accuracy is the calibration of the clock itself, which requires not
only accurate paleontological estimates (18) but also rate ho-
mogeneity between the calibrated and uncalibrated taxa. When
estimating the origination date for LCB, virtually all analyses use
the vertebrate fossil record to calibrate the clock and ask when

vertebrates diverged from dipterans. However, genome-wide
sequence comparisons have shown that the average sequence
identity of nuclear protein-coding genes between dipterans is
lower than that of bony fish, even though the dipteran divergence
time, estimated at 235 Ma (19), is only about half as long as the
divergence of bony fish at 450 Ma (20). It is usually assumed that
dipterans increased their rate of molecular evolution with re-
spect to vertebrates (21), but it is possible that the vertebrate
sequences decreased their rate of molecular evolution. If so, then
any estimate of an invertebrate divergence (including LCB)
derived from a vertebrate calibration will be artifactually twice
too deep, a value suspiciously close to the observed molecular
estimates of LCB vs. paleontological observations (4).

Here, we test this hypothesis by first showing that a pro-
nounced rate difference exists between vertebrates and dipter-
ans. Next, we show that using concatenated amino acid se-
quences of seven nuclear-encoded genes, the dipteran rate of
sequence evolution is similar to two other invertebrate groups,
echinoderms and bivalve molluscs, but all three differ signifi-
cantly from the vertebrate rate of sequence evolution. Finally,
using 11 invertebrate calibration points from all three major
clades of bilaterians and across the Phanerozoic, we estimate that
LCB arose �570 Ma, an estimate in remarkable accord with the
fossil record of metazoans.

Materials and Methods
Cloning. Total RNA from 17 taxa was prepared from live animals
by using a one-step TRIzol method (GIBCO�BRL) or RNAzol
(Leedo Medical Laboratories, Houston). Taxa were purchased
from Marine Biological Laboratory (Woods Hole, MA; Nucula
proxima, Stylochus sp., Obelia sp., and Metridium senile), Gulf
Specimen Aquarium and Marine Biological Supply (Panacea,
FL; Encope michenlini, Eucidaris tribuloides, and Modiolus
americanus), or Charles Hollahan (Santa Barbara, CA, Den-
draster excentricus, Strongylocentrotus purpuratus, Mytilus edulis,
and Mytilus califorianus). Saccoglossus kowalevskii clones,
Monosiga brevicollis cDNA, Antedon mediterrania cDNA, Aste-
rina miniata cDNA, and Priapulis caudatis animals were kind
gifts of John Gerhart (Harvard University, Cambridge, MA),
Nicole King and Sean Carroll (University of Wisconsin, Madi-
son), Ina Arnone (Stazione Zoologica Anton Dohrn, Napoli,
Italy), Veronica Hinman and Eric Davidson (California Institute
of Technology, Pasadena, CA), and Graham Budd (University of
Uppsala, Uppsala, Sweden), respectively. Ptychodera flava,
Chaetopterus sp., Enallagma aspersum, Lestes congener, and
Clypeatula cooperensis were already in the collections of K.J.P.
and M.A.M. cDNA synthesis was performed with RETROscript
(Ambion, Austin, TX) following the manufacturer’s instructions
by using 1–2 �g of total RNA as indicated above.

Partial fragments of seven nuclear-encoded genes were PCR

Abbreviations: LCB, last common ancestor of bilaterians; ML, maximum likelihood; Ma,
million years ago; Myr, million years.

Data deposition: The sequences reported in this paper have been deposited in the GenBank
database (accession nos. AY580167-AY580307).

*To whom correspondence should be addressed. E-mail: kevin.peterson@dartmouth.edu.

© 2004 by The National Academy of Sciences of the USA

6536 – 6541 � PNAS � April 27, 2004 � vol. 101 � no. 17 www.pnas.org�cgi�doi�10.1073�pnas.0401670101

D
o
w

n
lo

a
d
e
d
b

y
g
u
e
st

o
n
D

e
ce

m
b
e
r

2
9
,
2
0
2
1

EV
O

LU
TI

O
N

amplified, cloned, and sequenced by using standard techniques:
aldolase (200 aa), triosephosphate isomerase (217 aa), phospho-
fructokinase (175 aa), methionine adenosyltransferase (348 aa),
elongation factor 1-� (418 aa), ATP synthase � chain (430 aa),
and catalase (264 aa) (we were unable to amplify all seven from
the choanof lagellate, M. califorianus, and S. kowalevskii). These
genes were chosen because they had previously been shown to
support the monophyly of Ecdysozoa or did not significantly
support an alternative arrangement (9, 22) and�or had shown
potential clock-like behavior (23). We stress that no molecule or
region of a molecule was excluded from the analysis, and the
successful amplification and cloning of only these seven (of 12
tested) proved tractable from this diversity of taxa using standard
techniques. Gene-specific primers (sequences available on re-
quest) and 1 or 10 �l of cDNA plus the TaqPlus Precision PCR
system (Stratagene) were mixed and used in touchdown style
PCR. PCR fragments of the predicted sizes were excised,
purified (Qiagen, Valencia, CA), ligated at 16°C overnight into
the pGEM-T-Easy vector according to the manufacturer’s in-
structions (Promega), and electroporated into DH10B cells.
Clones containing the correct insert size were sequenced on an
ABI373 model sequencer. Sequences were edited, translated,
and aligned by using MACVECTOR, Ver. 7.0 (Genetics Computer
Group).

Phylogenetic Analyses. Dipteran, vertebrate, and plant [Arabi-
dopsis (mustard weed) and Oryza (rice)] sequences were
searched by using BLAST, all significant hits were downloaded,
and the inferred amino acid sequences of each gene were
analyzed. The topology of each individual gene as deduced by
neighbor-joining suggests that each is a case of ‘‘many-to-many
orthologues’’ (21). The 50-gene data set was compiled from
previous studies (9, 10); the sea urchin genes for this data set
were acquired from the Sea Urchin Genome Project web site
(http:��sugp.caltech.edu). Distance methods used MEGA, Ver.
2.1, with pairwise deletion (24), and both the Poisson correc-
tion and � distance models [the parameter scaling molecular
substitution rate heterogeneity, �, ranged from � (� Poisson
distributed) to 0.28]; maximum likelihood (ML) used QUARTET
PUZZLE, Ver. 5.0 (25) or PAML (26). ML analyses used the
Jones et al. (27) matrix of amino acid substitution, allowing the
analysis to estimate the parameter for substitution rate het-
erogeneity; all amino acid substitution models gave effectively
the same tree (analyses not shown). Relative rates tests used
the output of QUARTET PUZZLE. Bootstrap values were derived
by using 1,000 replications, and 1,000 puzzling steps were
performed. Analyses of covariance were performed by using
SAS, Ver. 6.1 (SAS Institute, Car y, NC).

Divergence Estimates. Date estimates for uncalibrated nodes in
phylogenies were derived by using R8S, Ver. 1.5 (M. J. Sanderson,
http:��ginger.ucdavis.edu�r8s). This software uses multiple cal-
ibration points to derive estimates of uncalibrated nodes by using
various algorithms. All algorithms for estimating divergence
times gave very similar results, and so we report only those
derived by the Langley–Fitch likelihood method. Confidence
intervals for divergence dates are based on the curvature of the
likelihood surface (6) as implemented in R8S.

Results
Rate Heterogeneity Between Vertebrates and Insects. To first ask
whether a significant molecular rate difference exists between
vertebrates and dipterans, as suggested by comparative genom-
ics (21), we assembled a data set consisting of the concatena-
tion of 50 different nuclear-encoded protein sequences (7,613
aa) taken from the analyses of Wang et al. (9) and Nei et al.
(10). The ML analysis of this data set is shown in Fig. 1. As
expected, the correct topology is realized and strongly sup-

Fig. 1. Rate heterogeneity between vertebrates and dipterans, as assessed
by untenable estimates for the divergences of the uncalibrated taxa. (A) ML
analysis of the 50-gene data set for vertebrates and dipterans using Arabi-
dopsis as the outgroup. (Upper) Values derived for the origin of bony fish,
Diptera, and Bilateria if the tree is calibrated using the amniote divergence at
300 Ma. (Lower) Values derived for the origin of bony fish, amniotes, and
bilaterians if the tree is calibrated by using the dipteran divergence at 235 Ma.
(B) ML analysis of the seven-gene data set from the same taxa. If the tree is
calibrated by using the amniote divergence at 300 Ma, qualitatively similar
values as found in A are derived for the origin of bony fish, dipterans, and
bilaterians.

ported. We applied standard relative rates tests to examine
whether rate differences exist between the vertebrate and
dipteran lineages, using Arabidopsis as the outgroup. All
pairwise tests for differences between the two lineages were
significant (all P � 0.005), indicating strong rate heterogeneity
between the lineages leading to vertebrates and dipterans.
However, all pairwise relative rates tests comparing taxa
within these lineages (e.g., comparing fish and mouse with
Drosophila as the outgroup) were not significant (all P � 0.05),
suggesting no rate heterogeneity within the two lineages.

If the bird�mammal divergence (300 Ma) is used to calibrate
the tree, we find that Osteichthyes arose 445 Ma, an estimate
congruent with the paleontological record (20) (Fig. 1 A Upper).
In addition, we find that LCB arose 1,015 Ma, which is close to
previous estimates (e.g., refs. 9 and 10). Nonetheless, we find that
dipterans, which are near the apex of the insect tree (19, 28),
arose 565 Ma, almost 200 million years (Myr) before the first
appearance of insects in the fossil record (19). If instead the
dipteran divergence is used to calibrate the tree, then the
vertebrate divergences are far too shallow, with amniotes orig-
inating in the Early Cretaceous (125 Ma) and Osteichthyes
originating during the Early Jurassic (185 Ma) (Fig. 1 A Lower).

Vertebrates Significantly Slowed Their Rate of Molecular Evolution.
Because the presence of dipterans in the precambrian and the
absence of bony fish in the Paleozoic are both untenable, these
taxa must differ substantially in the rate of molecular evolution
of the included sequences, as suggested by the relative rates tests.
Although it is possible that insects increased their rate of
molecular evolution with respect to vertebrates, possibly because

Peterson et al. PNAS � April 27, 2004 � vol. 101 � no. 17 � 6537

D
o
w

n
lo

a
d
e
d
b

y
g
u
e
st

o
n
D

e
ce

m
b
e
r

2
9
,
2
0
2
1

Fig. 2. ML tree of the seven concatenated protein sequences from 18
in-group taxa by using Arabidopsis as the outgroup. Bootstrap values for ML
(Upper) as well as distance (Lower) are given to the left of the respective
nodes. Nodes 1–11 are calibration points, whose distances are plotted against
the divergence times derived from the fossil record (Table 1) in the regression
analysis. The two vertebrate divergences (300 Ma for Amniota and 450 Ma for
Osteichthyes) give the vertebrate line whose midpoint value is significantly
displaced from the invertebrate line. The open diamonds indicate the position
of the node when analyzed with the 50-gene data set; note that it is quali-
tatively similar to the seven-gene data set (filled diamonds). Echinoderms are
shown in red, bivalves in green, and insects in blue; vertebrates are in orange.

of their faster generation time (20), it seems as likely that
vertebrates slowed their rate of evolution with respect to
dipterans.

To distinguish between these two alternatives, we analyzed
seven of the 50 sequences discussed above from 14 invertebrate
taxa from all three major bilaterian groups chosen specifically to
maximize the number of calibration points across the Phanero-
zoic (Lowest Ordovician through Miocene) within a known
phylogeny: Deuterostomia (five echinoderm calibrations), Ec-
dysozoa (two additional insect calibrations), and Spiralia (three
bivalve calibrations); these calibration points are numbered 1–11
on Figs. 2 and 3 and are listed in Table 1. The seven different
sequences were concatenated (2,052 aa) and analyzed with ML
and minimum evolution; both analyses were accurate and, for
most nodes, precise (Figs. 2 and 3). Furthermore, analyses of this
data set gave qualitatively similar results when compared to the
50-gene data set (Figs. 1 and 2).

The regression analysis of calibration dates to distance derived
from the ML analysis is also shown in Fig. 2. The dipteran
divergence (node 10) is not an outlier from the regression line
for the invertebrate calibration points (nodes 1–11); analysis of
covariance among the lines generated from echinoderms (red),

Fig. 3. Distance (Poisson) phylogram of the seven concatenated protein
sequences from 23 in-group taxa by using Arabidopsis and Oryza as out-
groups. Bootstrap values are given to the left of the respective nodes. Nodes
1–11 are calibration points (see Fig. 2); the ages of nodes A–K are estimated by
using R8S and are given in Table 1 and shown in Fig. 4. Deuterostomes are
shown in red, spiralians in green, and ecdysozoans in blue.

insects (blue), or spiralians (green) shows that neither the slope
(F3,5 � 1.35, P � 0.35) nor the elevation (F2,7 � 3.10, P � 0.10)
is significantly different among these taxa. However, analysis of
covariance for the regressions of vertebrate vs. invertebrate
calibration points shows that the vertebrate regression is dis-
placed significantly above the invertebrate regression (F1,8 �
35.81, P � 0.0001) (Fig. 2). This result demonstrates that rather
than evolutionary rates increasing in insects (21), molecular
evolutionary rates significantly decreased in vertebrates before
the origin of crown-group Osteichthyes.

Molecular Clock Estimates of Metazoan Divergence Times. To esti-
mate the origination date of Bilateria, as well as several other
invertebrate divergences, we used the R8S software package to
analyze the concatenated seven-gene data set for 23 in-group
taxa (Fig. 3). Using the option in R8S that the calibration points
are fixed, we estimate that LCB (node F, Fig. 3) evolved between
573 and 656 Ma, depending on the specified value of the rate
heterogeneity parameter (Table 1). However, because the cali-
bration points derived from the fossil record are paleontological
minima (i.e., the first occurrence of a recognizable member of a
total group), the estimates derived from these points must be
minima as well. Estimating maxima for divergences is difficult
(3). Nonetheless, if we specify in R8S that the calibration points
are variable, then estimates scale linearly with the error; e.g., if
a 10% error is associated with the calibration points, then all
estimated dates are increased 7–12%. To ask whether the size of
the data set changes the estimate for LCB, we added the sea
urchin S. purpuratus to the 50-gene data set and reanalyzed the
data without the vertebrates. Calibrating the resulting ML tree
with the dipteran divergence gives an estimate of 541 Ma for the
last common ancestor of S. purpuratus and Diptera, which is
equivalent to LCB.

6538 � www.pnas.org�cgi�doi�10.1073�pnas.0401670101 Peterson et al.

Table 1. Calibrations and estimates in millions of years

Estimated ages (95% confidence
intervals)§

Node* Calibrations age†, Myr Refs.‡ Node* � � � � � 0.28

1 Eocene (50) 50, 51 A 526 (513, 558) 567 (551, 586)
2 Early Jurassic (190) 51, 52 B 519 548 (534, 564)
3 Late Permian (260) 51, 53 C 538 (523, 554) 580 (563, 598)
4 Early Ordovician (475) 28, 54 D 542 (521, 565) 599 (578, 621)
5 Early Ordovician (485) 54, 55 E 560 (544, 593) 623 (604, 643)
6 Miocene (20) 56, 57 F 573 (556, 592) 656 (636, 678)
7 Late Carboniferous (325) 56, 57 G 548 (519, 579) 595 (561, 626)
8 Early Ordovician (485) 57, 58 H 615 (592, 643) 724 (697, 756)
9 Early Cretaceous (120) 19 I 653 (625, 684) 832 (796, 880)
10 Middle Triassic (235) 19 J 744 (705, 783) 987 (940, 1,033)
11 Late Carboniferous (325) 19 K 404 (370, 436) 412 (381, 442)

D
o
w

n
lo

a
d
e
d
b

y
g
u
e
st

o
n
D

e
ce

m
b
e
r

2
9
,
2
0
2
1

EV
O

LU
TI

O
N

*Numbered and lettered nodes from Fig. 3.
†Calibration points are derived from the first occurrence of a member of the crown group. For example, although Permian dipterans are
known (19), it is unclear whether they are crown-group dipterans. The first unequivocal crown-group dipterans are Middle Triassic (19),
and hence this was used as the calibration point. The 1999 Geological Time Scale (Geological Society of America, www.geosociety.org�
science�timescale�timescl.htm.) was used for dates.

‡References are for both the age and the phylogenetic position of the node.
§Nodes where R8S could not converge on a solution for the 95% confidence intervals are left blank.

Finally, to address which � parameter estimate (Table 1) might
be more accurate, we had R8S estimate the origin of crown-group
echinoderms (node 5) using both � � � and � � 0.28. The
predicted value based on the fossil record of echinoderms
(Fig. 4) is between 485 (the minimum value based on the first
occurrence of crinoids, Table 1) and 525 Ma (the maximum value
that corresponds to the first occurrence of echinoderm skeletal
material in the fossil record; ref. 29). Using � ��, R8S estimated
the age of crown-group echinoderms at 508 � 12 Ma, whereas
with � � 0.28, R8S estimated the age at 527 � 12 Ma. Thus, the
simpler model of molecular evolution gives an estimate more
consistent with the fossil record.

Discussion
Rate Heterogeneity Between Vertebrates and Invertebrates. Our
data suggest that, inconsistent with most molecular clock

Fig. 4. Metazoan divergence estimates with metazoan diversity and phy-
logeny placed into the geological context of the Neoproterozoic�Cambrian
transition. Tree nodes are positioned according to age estimates derived from
the Poisson analysis (Table 1). Thick lines are the known fossil record, and thin
lines are the lineage extensions as deduced from the molecular clock analysis.
N-D, Nemakit–Daldynian; T�A, Tommotian�Atdabanian; B�T, Botominan�
Toyonian; M, Middle; L, Late (adapted from ref. 1).

Peterson et al.

estimates but consistent with paleontological predictions (e.g.,
refs. 13 and 14), bilaterians do not have a significant precam-
brian evolutionar y histor y. The deep precambrian estimates
for LCB derived from analyses that use nuclear protein-
encoding genes calibrated to the vertebrate fossil record (e.g.,
refs. 9 and 10) are clearly artifacts associated with the signif-
icant rate reduction in the molecular evolution of the verte-
brate genome.

Martin and Palumbi (30) argued that much of the rate
heterogeneity that exists between species can be accounted for
by differences in metabolic rates. Although no single factor can
fully account for rate variation (30), we note that a difference
in metabolic rates is unlikely to be the primar y explanation in
this case because the teleost fish Danio rerio is evolving at a
similar rate to that of the two endothermic amniotes, and the
molecular clock estimate of the origin of Osteichthyes is
concordant with the vertebrate fossil record (Fig. 1 A). Given
the metabolic rate difference between amniotes and fishes
(31), we would expect a greater disparity between the clock
estimate and the vertebrate fossil record if this were a signif-
icant driving factor. One possible explanation for the rate
heterogeneity seen between vertebrates and invertebrates is
the duplication of the vertebrate genome (12), which occurred
sometime between the last common ancestor of cephalochor-
dates and vertebrates and the last common ancestor of Os-
teichthyes (32). Despite the fact that gene duplication events
are thought to increase rather than decrease the rate of
molecular evolution (33), a genome duplication event would
(at least initially) increase the number of interactors for each
protein, potentially slowing the rate of molecular evolution
across the entire genome (34).

Phylogenetic Considerations. Our protein tree (Fig. 3) gives the
correct topology, where known (Table 1), and finds support for
clades such as Ambulacraria (Echinodermata Hemichordata),
Spiralia, and Ecdysozoa, as well as Protostomia. This is now the
third independent molecular data set supporting the monophyly of
these clades, because all four are found with 18S rDNA analyses and
Hox gene duplications, in addition to many newly elucidated
characters (reviewed in ref. 35). We are not able to recover a
monophyletic Deuterostomia when vertebrates are included into
the analysis (Fig. 2), possibly because of the relatively few number

PNAS � April 27, 2004 � vol. 101 � no. 17 � 6539

D
o
w

n
lo

a
d
e
d
b

y
g
u
e
st

o
n
D

e
ce

m
b
e
r

2
9
,
2
0
2
1

of genes analyzed (36) and the pronounced rate heterogeneity
detected with the vertebrate sequences (Fig. 2). Nonetheless, an
analysis of the 50-gene data set, which includes both the sea urchin
and the three vertebrates, results in a monophyletic but weakly
supported Deuterostomia (not shown).

Tempo and Mode of Early Animal Evolution. Although the use of
molecular clocks to infer divergence times is fraught with
difficulties (3, 4), this analysis fulfills the suggested requirements
of Shaul and Graur (37) for a molecular clock analysis: (i) the use
of multiple primary calibration points; (ii) the accommodation of
rate variation; and (iii) the calculation of confidence intervals
associated with the estimates. Interestingly, both our analysis and
the analyses of Aris-Brosou and Yang (11, 12) conclude that
LCB evolved �570 Ma and split from cnidarians somewhere
between 600 and 630 Ma (Fig. 4). Moreover, both analyses agree
that the last common ancestor of protostomes evolved �550 Ma,
and both analyses agree that phylum-level splits within Spiralia
and Deuterostomia occurred 520 –530 Ma. The congruence
between these two clock studies and the fossil record is striking,
although possibly not surprising given that both studies use
multiple calibration points scattered across both phylogeny and
time and account for rate heterogeneity either by removing
vertebrates from the analysis, as was done here, or by using a
Bayesian approach to account for rate change across lineages
(11, 12). Because of this congruence, the Cambrian explosion
must ref lect, at least in part, the diversification of bilaterian
phyla.

We extrapolate the origin of total-group Bilateria to be
�615 Ma, almost 45 Myr before the appearance and rapid
diversification of the crown group and �60 Myr before their
first unequivocal appearance in the fossil record. This cr yptic
precambrian histor y suggests that these stem-group bilaterians
must have been ‘‘micrometazoans,’’ because, although benthic

1. Knoll, A. H. & Carroll, S. B. (1999) Science 284, 2129 –2137.
2. Runnegar, B. (1982) J. Geol. Soc. Aust. 29, 395– 411.
3. Smith, A. B. & Peterson, K. J. (2002) Annu. Rev. Earth Planet. Sci. 30, 65– 88.
4. Benton, M. J. & Ayala, F. J. (2003) Science 300, 1698 –1700.
5. Wray, G. A., Levinton, J. S. & Shapiro, L. H. (1996) Science 274, 568 –573.
6. Cutler, D. J. (2000) Mol. Biol. Evol. 17, 1647–1660.
7. Runnegar, B. (1982) Lethaia 15, 199 –205.
8. Gu, X. (1998) J. Mol. Evol. 47, 369 –371.
9. Wang, D. Y.-C., Kumar, S. & Hedges, S. B. (1999) Proc. R. Soc. London Ser.

B 266, 163–171.
10. Nei, M., Xu, P. & Glazko, G. (2001) Proc. Natl. Acad. Sci. USA 98, 2497–2502.
11. Aris-Brosou, S. & Yang, Z. (2002) Syst. Biol. 51, 703–714.
12. Aris-Brosou, S. & Yang, Z. (2003) Mol. Biol. Evol. 20, 1947–1954.
13. Ayala, F. J., Rzhetsky, A. & Ayala, F. J. (1998) Proc. Natl. Acad. Sci. USA 95,

606 – 611.
14. Conway Morris, S. (1998) Am. Zool. 38, 867– 877.
15. Budd, G. E. & Jensen, S. (2000) Biol. Rev. Camb. Philos. Soc. 75, 253–295.
16. Erwin, D. H. (1999) Am. Zool. 39, 617– 629.
17. Rodriguez-Trelles, F., Tarrio, R. & Ayala, F. J. (2002) Proc. Natl. Acad. Sci.

USA 99, 8112– 8115.
18. Lee, M. S. Y. (1999) J. Mol. Evol. 49, 385–391.
19. Rasnitsyn, A. P. & Quicke, D. L. J. (eds.). (2002) History of Insects (Kluwer,

Dordrecht, The Netherlands).
20. Aparicio, S., Chapman, J., Stupka, E., Putnam, N., Chia, J. M., Dehal, P.,

Christoffels, A., Rash, S., Hoon, S., Smit, A., et al. (2002) Science 297,
1301–1310.

21. Zdobnov, E. M., von Mering, C., Letunic, I., Torrents, D., Suyama, M., Copley,
R. R., Christophides, G. K., Thomasova, D., Holt, R. A., Subramanian, G. M.,
et al. (2002) Science 298, 149 –159.

22. Mushegian, A. R., Garey, J. R., Martin, J. & Liu, L. X. (1998) Genome Res. 8,
590 –598.

23. Nikoh, N., Iwabe, N., Kuma, K., Ohno, M., Sugiyama, T., Watanabe, Y., Yasui,
K., Shi-cui, Z., Hori, K., et al. (1997) J. Mol. Evol. 45, 97–106.

24. Kumar, S., Tamura, K., Jakobsen, I. B. & Nei, M. (2001) MEGA2: Molecular
Evolutionary Genetics Analysis (Arizona State University, Tempe), Ver. 2.1.

25. Strimmer, K. & von Haeseler, A. (1996) Mol. Biol. Evol. 13, 964 –969.
26. Yang, Z. (1997) Comput. Appl. Biosci. 13, 555–556.

(38, 39), they were seemingly incapable of leaving trace fossils
(see also ref. 40). The bilaterian trace fossil record would
commence only after the invention of pattern formation
mechanisms that potentiated the evolution of larger body size
in multiple animal clades near the end of Neoproterozoic (41,
42). Whether the origin of bilaterians or any other metazoan
group can be triggered by environmental perturbations such as
‘‘snowball Earth’’ (43, 44) remains highly speculative at the
moment, given the uncertainty about the exact number and
ages of Neoproterozoic glaciations (45). Nonetheless, if the
‘‘Marinoan’’ glaciation inter val is younger than 590 Ma (46),
then an increase in bilaterian body size might have been
facilitated by the greater productivity of the marine ecosystem
after the glacial melt (47, 48). This increase in both body size
and planktonic productivity would then allow for the evolution
of broadcast spawning and external fertilization and, by �525
Ma, planktotrophic development. The absence of precambrian
planktonic metazoans is consistent with our Early Cambrian
estimate for the origin of the last common ancestor of living
cnidarians, a population of animals whose life cycle lacked a
medusa stage and thus was entirely benthic (49). The devel-
opment of this new planktonic food web (47), coupled with the
evolution of a dispersal stage and the reappearance of exposed
continental shelf, may have provided the environmental stim-
uli necessar y for the rapid evolution of disparate bilaterian
body plans and ultimately the Cambrian explosion itself.

We thank S. Bengtson, D. Campbell, K. Cottingham, N. Christie-Blick,
E. Davidson, M. Dietrich, D. Erwin, D. Evans, D. Jablonski, M.
LaBarbera, A. Rivera, B. Runnegar, A. Smith, and J. Sprinkle for
comments and discussion. We also thank the individuals who provided
or procured material for us and C. Hanselman and V. Moy for technical
assistance. K.J.P. is supported by the National Science Foundation,
National Aer

Ecology homework help

Welcome to RedShelf eReader!
Just getting started? Head over to our partner support page for a guide on the basics.

BAR CHART
A bar chart lets you compare values using horizontal bars. The layout of a bar chart makes it better suited than a column chart for data with long labels.
124

Ecology homework help

Crystal Structure of an Ancient Protein: Evolution
by Conformational Epistasis
Eric A. Ortlund, et al.
Science 317, 1544 (2007);
DOI: 10.1126/science.1142819

The following resources related to this article are available online at
www.sciencemag.org (this information is current as of December 11, 2007 ):

Updated information and services, including high-resolution figures, can be found in the online
version of this article at:
http://www.sciencemag.org/cgi/content/full/317/5844/1544

Supporting Online Material can be found at:
http://www.sciencemag.org/cgi/content/full/1142819/DC1

A list of selected additional articles on the Science Web sites related to this article can be
found at:
http://www.sciencemag.org/cgi/content/full/317/5844/1544#related-content

This article cites 26 articles, 9 of which can be accessed for free:
http://www.sciencemag.org/cgi/content/full/317/5844/1544#otherarticles

This article has been cited by 1 article(s) on the ISI Web of Science.

This article has been cited by 1 articles hosted by HighWire Press; see:
http://www.sciencemag.org/cgi/content/full/317/5844/1544#otherarticles

This article appears in the following subject collections:
Evolution
http://www.sciencemag.org/cgi/collection/evolution

Information about obtaining reprints of this article or about obtaining permission to reproduce
this article in whole or in part can be found at:
http://www.sciencemag.org/about/permissions.dtl

D
o
w

n
lo

a
d
e
d
f
ro

m
w

w
w

.s
ci

e
n
ce

m
a
g
.o

rg
o

n
D

e
ce

m
b
e
r

1
1
,

2
0
0
7

Science (print ISSN 0036-8075; online ISSN 1095-9203) is published weekly, except the last week in December, by the
American Association for the Advancement of Science, 1200 New York Avenue NW, Washington, DC 20005. Copyright
2007 by the American Association for the Advancement of Science; all rights reserved. The title Science is a
registered trademark of AAAS.

REPORTS

proaches have been and are being considered. For
example, in Singapore, where 84% of the popu-
lation lives in public housing (35), regulations that
explicitly recognize the role of spatial segregation in
sectarianism specify the percentage of ethnic groups
to occupy housing blocks (36). This legally
compels ethnic mixing at a scale finer than that
which our study finds likely to lead to violence.
Given the natural tendency toward social separa-
tion, maintaining such mixing requires a level of
authoritarianism that might not be entertained in
other locations. Still, despite social tensions (37),
the current absence of violence provides some
support to our analysis. The alternative approach—
aiding in the separation process by establishing
clear boundaries between cultural groups to
prevent violence—has also gained recent atten-
tion (38, 39). Although further studies are
needed, there exist assessments (39) of the impact
of historical partitions in Ireland, Cyprus, the
Indian subcontinent, and the Middle East that
may be consistent with the understanding of type
separation and a critical scale of mixing or
separation presented here.

The insight provided by this study may help
inform policy debates by guiding our understanding
of the consequences of policy alternatives. The
purpose of this paper does not include promoting
specific policy options. Although our work re-
inforces suggestions to consider separation, we are
not diminishing the relevance of concerns about the
desirability of separation or its process. Even where
separation may be indicated as a way of preventing
violence, caution is warranted to ensure that the
goal of preventing violence does not become a
justification for violence. Moreover, even a peaceful
process of separation is likely to be objectionable.
There may be ways to positively motivate
separation using incentives, as well as to mitigate
negative aspects of separation that often include
displacement of populations and mobility barriers.

Our results for the range of filter diameters that
provide good statistical agreement between
reported and predicted violence in the former
Yugoslavia and India suggest that regions of width
less than 10 km or greater than 100 km may
provide sufficient mixing or isolation to reduce the
chance of violence. These bounds may be affected
by a variety of secondary factors including social
and economic conditions; the simulation resolu-
tion may limit the accuracy of the lower limit; and
boundaries such as rivers, other physical barriers,
or political divisions will surely play a role. Still,
this may provide initial guidance for strategic
planning. Identifying the nature of boundaries to
be established and the means for ensuring their
stability, however, must reflect local issues.

Our approach does not consider the relative
merits of cultures, individual acts, or immediate
causes of violence, but rather the conditions that may
promote violence. It is worth considering whether, in
places where cultural differentiation is taking place,
conflict might be prevented or minimized by political
acts that create appropriate boundaries suited to the
current geocultural regions rather than the existing

historically based state boundaries. Such bounda-
ries need not inhibit trade and commerce and need
not mark the boundaries of states, but should allow
each cultural group to adopt independent behav-
iors in separate domains. Peaceful coexistence
need not require complete integration.

References and Notes
1. M. White, Deaths by Mass Unpleasantness: Estimated

Total for the Entire 20th Century, http://users.erols.com/
mwhite28/warstat8.htm (September 2005).

2. D. L. Horowitz, Ethnic Groups in Conflict (Univ. of
California Press, Berkeley and Los Angeles, ed. 2, 2000).

3. B. Harff, T. R. Gurr, Ethnic Conflict in World Politics
(Westview, Boulder, ed. 2, 2004).

4. S. Huntington, The Clash of Civilizations and the Remaking of
World Order (Simon & Schuster, New York, 1996).

5. D. Chirot, M. E. P. Seligman, Eds., Ethnopolitical Warfare:
Causes, Consequences, and Possible Solutions (American
Psychological Association, Washington, DC, 2001).

6. M. Reynal-Querol, J. Conflict Resolut. 46, 29 (2002).
7. T. R. Gulden, Politics Life Sciences 21, 26 (2002).
8. H. Buhaug, S. Gates, J. Peace Res. 39, 417 (2002).
9. A. Varshney, Ethnic Conflict and Civic Life: Hindus and

Muslims in India (Yale Univ. Press, New Haven, CT, 2003).
10. M. D. Toft, The Geography of Ethnic Violence: Identity,

Interests, and the Indivisibility of Territory (Princeton
Univ. Press, Princeton, NJ, 2003).

11. J. Fox, Religion, Civilization, and Civil War: 1945 through the
New Millennium (Lexington Books, Lanham, MD, 2004).

12. M. Mann, The Dark Side of Democracy: Explaining Ethnic
Cleansing (Cambridge Univ. Press, New York, 2004).

13. I. Lustick, Am. Polit. Sci. Rev. 98, 209 (2004).
14. Materials and methods are available as supporting

material on Science Online.
15. T. C. Schelling, J. Math. Sociol. 1, 143 (1971).
16. J. Mimkes, J. Therm. Anal. 43, 521 (1995).
17. H. P. Young, Individual Strategy and Social Structure

(Princeton Univ. Press, Princeton, NJ, 1998).
18. R. Van Kempen, A. S. Ozuekren, Urban Stud. 35, 1631 (1998).
19. Y. Bar-Yam, in Dynamics of Complex Systems (Perseus

Press, Cambridge, MA, 1997), chap. 7.
20. A. J. Bray, Adv. Phys. 43, 357 (1994).
21. I. M. Lifshitz, V. V. Slyozov, J. Phys. Chem. Solids 19, 35

(1961).
22. D. A. Huse, Phys. Rev. B 34, 7845 (1986).

23. W. Easterly, R. Levine, Q. J. Econ. 112, 1203 (1997).
24. P. Collier, A. Hoeffler, Oxf. Econ. Pap. 50, 563 (1998).
25. R. H. Bates, Am. Econ. Rev. 90, 131 (2000).
26. J. D. Fearon, D. D. Laitin, Am. Pol. Sci. Rev. 97, 75 (2003).
27. D. N. Posner, Am. J. Pol. Sci. 48, 849 (2004).
28. I. Daubechies, Ten Lectures on Wavelets, (SIAM,

Philadelphia, 1992).
29. A. Arneodo, E. Bacry, P. V. Graves, J. F. Muzy, Phys. Rev.

Lett. 74, 3293 (1995).
30. P. Ch. Ivanov et al., Nature 383, 323 (1996).
31. Map of Yugoslavia, Courtesy of the University of Texas

Libraries., www.lib.utexas.edu/maps/europe/yugoslav.jpg.
32. R. Petrovic, Yugosl. Surv. 33, 3 (1992).
33. K. Chaudhuri, Frontline 18 (no. 2), www.hinduonnet.com/

fline/fl1802/18020330.htm.
34. Final Report, Carnegie Commission on Preventing Deadly

Conflict, www.wilsoncenter.org/subsites/ccpdc/pubs/
rept97/finfr.htm.

35. A. Brief Background, Housing and Development Board,
Singapore Government, www.hdb.gov.sg/fi10/fi10296p.nsf/
WPDis/About%20UsA%20Brief%20Background%20-%
20HDB’s%20Beginnings.

36. Ethnic Integration Policy, Housing and Development
Board, Singapore Government, www.hdb.gov.sg/fi10/
fi10201p.nsf/WPDis/Buying%20A%20Resale%
20FlatEthnic%20Group%20Eligibility.

37. D. Murphy, Christian Science Monitor, 5 February 2002,
www.csmonitor.com/2002/0205/p07s01-woap.html.

38. J. Tullberg, B. S. Tullberg, Politics Life Sciences 16, 237 (1997).
39. C. Kaufmann, Int. Secur. 23, 120 (1998).
40. We thank G. Wolfe, M. Woolsey, and L. Burlingame for

editing the manuscript; B. Wang for assistance with figures;
M. Nguyen and Z. Bar-Yam for assistance with identifying
data; and I. Epstein, S. Pimm, F. Schwartz, E. Downs, and
S. Frey for helpful comments. We acknowledge internal
support by the New England Complex Systems Institute and
the U.S. government for support of preliminary results.

Supporting Online Material
www.sciencemag.org/cgi/content/full/317/5844/1540/DC1
Methods
Figs. S1.1 to S4.3
SOM Text
Table S1
References
Bibliography

30 November 2006; accepted 13 August 2007
10.1126/science.1142734

Crystal Structure of an Ancient
Protein: Evolution by
Conformational Epistasis
Eric A. Ortlund,1* Jamie T. Bridgham,2* Matthew R. Redinbo,1 Joseph W. Thornton2†

The structural mechanisms by which proteins have evolved new functions are known only indirectly.
We report x-ray crystal structures of a resurrected ancestral protein—the ~450 million-year-old
precursor of vertebrate glucocorticoid (GR) and mineralocorticoid (MR) receptors. Using structural,
phylogenetic, and functional analysis, we identify the specific set of historical mutations that
recapitulate the evolution of GR’s hormone specificity from an MR-like ancestor. These
substitutions repositioned crucial residues to create new receptor-ligand and intraprotein contacts.
Strong epistatic interactions occur because one substitution changes the conformational position
of another site. “Permissive” mutations—substitutions of no immediate consequence, which
stabilize specific elements of the protein and allow it to tolerate subsequent function-switching
changes—played a major role in determining GR’s evolutionary trajectory.

D
o
w

n
lo

a
d
e
d
f
ro

m
w

w
w

.s
ci

e
n
ce

m
a
g
.o

rg
o

n
D

e
ce

m
b
e
r

1
1
,

2
0
0
7

A
central goal in molecular evolution is to
understand the mechanisms and dynam-
ics by which changes in gene sequence

generate shifts in function and therefore pheno-
type (1, 2). A complete understanding of this

process requires analysis of how changes in protein
structure mediate the effects of mutations on
function. Comparative analyses of extant proteins
have provided indirect insights into the diversifi-
cation of protein structure (3–6), and protein

1544 14 SEPTEMBER 2007 VOL 317 SCIENCE www.sciencemag.org

A B

F
o
ld

a
ct

iv
a
tio

n

30 4010HomoGR RajaGR HomoMR
8 3020
6

20
410

102
0 0
-10 -9 -8 -7 -6 -5 -11 -10 -9 -8 -7 -6 -5 -11 -10 -9 -8 -7 -6

0

Hormone (log M)

TetrapodGR TeleostGR ElasmobranchGR MRs(8)
(4) (6) (1) 20

AncGR2

~420 Ma

15

10

5

0
-11 -10 -9 -8 -7 -6

36aa
+1∆

20

15
AncGR1

10

25aa

~440 Ma
5

0
-11 -10 -9 -8

30
AncCR

20

-7 -6 C

C18

Aldosterone
Cortisol
DOC

10 C17
~470 Ma

0
-11 -10 -9 -8 -7 -6

C11

ormones.

REPORTS

engineering studies have elucidated structure-
function relations that shape the evolutionary
process (7–11). To directly identify the mecha-
nisms by which historical mutations generated
new functions, however, it is necessary to
compare proteins through evolutionary time.

Here we report the empirical structures of an
ancient protein, which we “resurrected” (12) by
phylogenetically determining its maximum likeli-
hood sequence from a large database of extant se-
quences, biochemically synthesizing a gene coding
for the inferred ancestral protein, expressing it in
cultured cells, and determining the protein’s
structure by x-ray crystallography. Specifically, we
investigated the mechanistic basis for the functional
evolution of the glucocorticoid receptor (GR), a
hormone-regulated transcription factor present in all
jawed vertebrates (13). GR and its sister gene, the
mineralocorticoid receptor (MR), descend from the
duplication of a single ancient gene, the ancestral
corticoid receptor (AncCR), deep in the vertebrate
lineage ~450 million years ago (Ma) (Fig. 1A) (13).
GR is activated by the adrenal steroid cortisol and
regulates stress response, glucose homeostasis, and
other functions (14). MR is activated by aldosterone
in tetrapods and by deoxycorticosterone (DOC) in
teleosts to control electrolyte homeostasis, kidney

1Department of Chemistry, University of North Carolina,
Chapel Hill, NC 27599, USA. 2Center for Ecology and
Evolutionary Biology, University of Oregon, Eugene, OR
97403, USA.

*These authors contributed equally to this work.
†To whom correspondence should be addressed. E-mail:
joet@uoregon.edu

Fig. 1. (A) Functional evolution

and colon function, and other processes (14). MR is
also sensitive to cortisol, though considerably less
so than to aldosterone and DOC (13, 15).
Previously, AncCR was resurrected and found to
have MR-like sensitivity to aldosterone, DOC, and
cortisol, indicating that GR’s cortisol specificity is
evolutionarily derived (13).

To identify the structural mechanisms by
which GR evolved this new function, we used
x-ray crystallography to determine the structures
of the resurrected AncCR ligand-binding domain
(LBD) in complex with aldosterone, DOC, and
cortisol (16) at 1.9, 2.0, and 2.4 Å resolution,
respectively (table S1). All structures adopt the
classic active conformation for nuclear receptors
(17), with unambiguous electron density for each
hormone (Fig. 1B and figs. S1 and S2). AncCR’s
structure is extremely similar to the human MR
[root mean square deviation (RMSD) = 0.9 Å for
all backbone atoms] and, to a lesser extent, to the
human GR (RMSD = 1.2 Å). The network of
hydrogen-bonds supporting activation in the
human MR (18) is present in AncCR, indicating
that MR’s structural mode of action has been
conserved for >400 million years (fig. S3).

Because aldosterone evolved only in the
tetrapods, tens of millions of years after AncCR,
that receptor’s sensitivity to aldosterone was
surprising (13). The AncCR-ligand structures
indicate that the receptor’s ancient response to
aldosterone was a structural by-product of its
sensitivity to DOC, the likely ancestral ligand,
which it binds almost identically (Fig. 1C). Key
contacts for binding DOC involve conserved

surfaces among the hormones, and no obligate
contacts are made with moieties at C11, C17, and
C18, the only variable positions among the three
hormones. These inferences are robust to uncer-
tainty in the sequence reconstruction: We modeled
each plausible alternate reconstruction [posterior
probability (PP) > 0.20] into the AncCR crystal
structures and found that none significantly af-
fected the backbone conformation or ligand inter-
actions. The receptor, therefore, had the structural
potential to be fortuitously activated by aldoster-
one when that hormone evolved tens of millions
of years later, providing the mechanism for evo-
lution of the MR-aldosterone partnership by mo-
lecular exploitation, as described (13).

To determine how GR’s preference for cortisol
evolved, we identified substitutions that occurred
during the same period as the shift in GR function.
We used maximum likelihood phylogenetics to de-
termine the sequences of ancestral receptors along
the GR lineage (16). The reconstructions had strong
support, with mean PP >0.93 and the vast majority
of sites with PP >0.90 (tables S2 and S3). We
synthesized a cDNA for each reconstructed LBD,
expressed it in cultured cells, and experimentally
characterized its hormone sensitivity in a reporter
gene transcription assay (16). GR from the com-
mon ancestor of all jawed vertebrates (AncGR1 in
Fig. 1A) retained AncCR’s sensitivity to aldoster-
one, DOC, and cortisol. At the next node, however,
GR from the common ancestor of bony vertebrates
(AncGR2) had a phenotype like that of modern
GRs, responding only to cortisol. This inference is
robust to reconstruction uncertainty: We introduced

D
o
w

n
lo

a
d
e
d
f
ro

m
w

w
w

.s
ci

e
n
ce

m
a
g
.o

rg
o

n
D

e
ce

m
b
e
r

1
1
,
2
0
0
7

of corticosteroid receptors. Dose-
response curves show transcrip-
tion of a luciferase reporter gene
by extant and resurrected ances-
tral receptors with varying doses
(in log M) of aldosterone (green),
DOC (orange), and cortisol (pur-
ple). Black box indicates evolution
of cortisol specificity. The number
of sequence changes on each
branch is shown (aa, replacement;
D, deletion). Scale bars, SEM of
three replicates. Node dates from
the fossil record (19, 20). For com-
plete phylogeny and sequences,
see fig. S10 and table S5. (B)
Crystal structure of the AncCR LBD
with bound aldosterone (green,
with red oxygens). Helices are la-
beled. (C) AncCR’s ligand-binding
pocket. Side chains (<4.2 Å from
bound ligand) are superimposed
from crystal structures of AncCR
with aldosterone (green), DOC
(orange), and cortisol (purple).
Oxygen and nitrogen atoms are
red and blue, respectively; dashed
lines indicate hydrogen bonds.
Arrows show C11, C17, and C18
positions, which differ among the h

www.sciencemag.org SCIENCE VOL 317 14 SEPTEMBER 2007 1545

20

15

10

AncGR1+
L111Q

AncGR1+
S106P, L111Q

0

5

10

15

20

5

0
-11 -10 -9 -8 -7 -6 -11 -10 -9 -8 -7 -6 -5

AncGR1+
S106P

0

5

10

15

20AncGR120

15

10

5

0
-11 -10 -9 -8 -7 -6 -11 -10 -9 -8 -7 -6 -5

REPORTS

plausible alternative states by mutagenesis, but
none changed function (fig. S4). GR’s specificity
therefore evolved during the interval between these
two speciation events, ~420 to 440 Ma (19, 20).

During this interval, there were 36 substitutions
and one single-codon deletion (figs. S5 and S6).
Four substitutions and the deletion are conserved in
one state in all GRs that descend from AncGR2 and
in another state in all receptors with the ancestral
function. Two of these—S106P and L111Q (21)—
were previously identified as increasing cortisol
specificity when introduced into AncCR (13). We
introduced these substitutions into AncGR1 and
found that they recapitulate a large portion of the
functional shift from AncGR1 to AncGR2, radi-
cally reducing aldosterone and DOC response
while maintaining moderate sensitivity to cortisol
(Fig. 2A); the concentrations required for half-
maximal activation (EC50) by aldosterone and
DOC increased by 169- and 57-fold, respectively,
whereas that for cortisol increased only twofold. A
strong epistatic interaction between substitutions
was apparent: L111Q alone had little effect on
sensitivity to any hormone, but S106P dramatically
reduced activation by all ligands. Only the
combination switched receptor preference from
aldosterone and DOC to cortisol. Introducing these
historical substitutions into the human MR yielded
a completely nonfunctional receptor, as did
reversing them in the human GR (fig. S7). These
results emphasize the importance of having the
ancestral sequence to reveal the functional impacts
of historical substitutions.

To determine the mechanism by which these
two substitutions shift function, we compared the
structures of AncGR1 and AncGR2, which were
generated by homology modeling and energy
minimization based on the AncCR and human
GR crystal structures, respectively (16). These
structures are robust to uncertainty in the recon-
struction: Modeling plausible alternate states did
not significantly alter backbone conformation,
interactions with ligand, or intraprotein interactions.
The major structural difference between AncGR1

Fig. 2. Mechanism for switching A
AncGR1’s ligand preference from al-

and AncGR2 involves helix 7 and the loop
preceding it, which contain S106P and L111Q
and form part of the ligand pocket (Fig. 2B and fig.
S8). In AncGR1 and AncCR, the loop’s position is
stabilized by a hydrogen bond between Ser106 and
the backbone carbonyl of Met103 . Replacing Ser106

with proline in the derived GRs breaks this bond
and introduces a sharp kink into the backbone,
which pulls the loop downward, repositioning and
partially unwinding helix 7. By destabilizing this
crucial region of the receptor, S106P impairs
activation by all ligands. The movement of helix
7, however, also dramatically repositions site 111,
bringing it close to the ligand. In this conforma-
tional background, L111Q generates a hydrogen
bond with cortisol’s C17-hydroxyl, stabilizing the
receptor-hormone complex. Aldosterone and DOC
lack this hydroxyl, so the new bond is cortisol-
specific. The net effect of these two substitu-
tions is to destabilize the receptor complex with
aldosterone or DOC and restore stability in a
cortisol-specific fashion, switching AncGR2’s pref-
erence to that hormone. We call this mode of
structural evolution conformational epistasis, be-
cause one substitution remodels the protein back-
bone and repositions a second site, changing the
functional effect of substitution at the latter.

Although S106P and L111Q (“group X” for
convenience) recapitulate the evolutionary switch
in preference from aldosterone to cortisol, the
receptor retains some sensitivity to MR’s ligands,
unlike AncGR2 and extant GRs. We hypothesized
that the other three strictly conserved changes that
occurred between AncGR1 and AncGR2 (L29M,
F98I, and deletion S212D) would complete the
functional switch. Surprisingly, introducing these
“group Y” changes into the AncGR1 and AncGR1 +
X backgrounds produced completely nonfunc-
tional receptors that cannot activate transcription,
even in the presence of high ligand concentrations
(Fig. 3A). Additional epistatic substitutions must
have modulated the effect of group Y, which pro-
vided a permissive background for their evolution
that was not yet present in AncGR1.

The AncCR crystal structure allowed us to
identify these permissive mutations by analyzing
the effects of group Y substitutions (Fig. 3B).
In all steroid receptors, transcriptional activity
depends on the stability of an activation-function
helix (AF-H), which is repositioned when the
ligand binds, generating the interface for tran-
scriptional coactivators. The stability of this
orientation is determined by a network of inter-
actions among three structural elements: the loop
preceding AF-H, the ligand, and helix 3 (17).
Group Y substitutions compromise activation be-
cause they disrupt this network. S212D eliminates
a hydrogen bond that directly stabilizes the AF-H
loop, and L29M on helix 3 creates a steric clash
and unfavorable interactions with the D-ring of
the hormone. F98I opens up space between helix
3, helix 7, and the ligand; the resulting instability
is transmitted indirectly to AF-H, impairing
activation by all ligands (Fig. 3B). If the protein
could tolerate group Y, however, the structures
predict that these mutations would enhance
cortisol specificity: L29M forms a hydrogen
bond with cortisol’s unique C17-hydroxyl, and
the additional space created by F98I relieves a
steric clash between the repositioned loop and
Met108 , stabilizing the key interaction between
Q111 and the C17-hydroxyl (Fig. 3B).

We hypothesized that historical substitutions
that added stability to the regions destabilized by
group Y might have permitted the evolving pro-
tein to tolerate group Y mutations and to complete
the GR phenotype. Structural analysis suggested
two candidates (group Z): N26T generates a new
hydrogen bond between helix 3 and the AF-H
loop, and Q105L allows helix 7 to pack more
tightly against helix 3, stabilizing the latter and,
indirectly, AF-H (Fig. 3B). As predicted, intro-
ducing group Z into the nonfunctional AncGR1 +
X + Y receptor restored transcriptional activity,
indicating that Z is permissive for Y (Fig. 3A).
Further, AncGR1 + X + Y + Z displays a fully
GR-like phenotype that is unresponsive to
aldosterone and DOC and maintains moderate

B

D
o
w

n
lo

a
d
e
d
f
ro

m
w

w
w

.s
ci

e
n
ce

m
a
g
.o

rg
o

n
D

e
ce

m
b
e
r

1
1
,
2
0
0
7

dosterone to cortisol. (A) Effect of
substitutions S106P and L111Q on the
resurrected AncGR1’s response to hor-
mones. Dashed lines indicate sensitivity

F
o
ld

a
ct

iv
a
tio

n

to aldosterone (green), cortisol (purple),
and DOC (orange) as the EC50 for
reporter gene activation. Green arrow
shows probable pathway through a
functional intermediate; red arrow,
intermediate with radically reduced
sensitivity to all hormones. (B) Struc-
tural change conferring new ligand
specificity. Backbones of helices 6 and
7 from AncGR1 (green) and AncGR2
(yellow) in complex with cortisol are
superimposed. Substitution S106P Hormone (log M)
induces a kink in the interhelical loop
of AncGR2, repositioning sites 106 and 111 (arrows). In this background, L111Q forms a new hydrogen bond with cortisol’s unique C17-hydroxyl (dotted red line).

1546 14 SEPTEMBER 2007 VOL 317 SCIENCE www.sciencemag.org

REPORTS

cortisol sensitivity. Both N26T and Q105L are
required for this effect (table S4). Strong epistasis
is again apparent: Adding group Z substitutions
in the absence of Y has little or no effect on ligand-
activated transcription, presumably because the
receptor has not yet been destabilized (Fig. 3A).
Evolutionary trajectories that pass through func-
tional intermediates are more likely than those
involving nonfunctional steps (22), so the only
historically likely pathways to AncGR2 are those
in which the permissive substitutions of group Z
and the large-effect mutations of group X occurred
before group Y was complete (Fig. 3C).

Fig. 3. Permissive substitutions in the
evolution of receptor specificity. (A)
Effects of various combinations of
historical substitutions on AncGR1’s
transcriptional activity and hormone-
sensitivity in a reporter gene assay.
Group Y (L29M, F98I, and S212D) abol-
ishes receptor activity unless groups X
(S106P, L111Q) and Z (N26T and
Q105L) are present; the XYZ combina-
tion yields complete cortisol-specificity.
The 95% confidence interval for each
EC50 is in parentheses. Dash, no acti-
vation. (B) Structural prediction of
permissive substitutions. Models of

AncGR1 (green) and AncGR2 (yellow)
are shown with cortisol. Group X and Y
substitutions (circles and rectangles)
yield new interactions with the C17-
hydroxyl of cortisol (purple) but de-
stabilize receptor regions required for
activation. Group Z (underlined) imparts
additional stability to the destabilized
regions. (C) Restricted evolutionary
paths through sequence space. The
corners of the cube represent states for
residue sets X, Y, and Z. Edges represent
pathways from the ancestral sequence
(AncGR1) to the cortisol-specific combi-

Our discovery of permissive substitutions in the
AncGR1-AncGR2 interval suggested that other
permissive mutations might have evolved even
earlier. We used the structures to pred

Ecology homework help

letters to nature

5. Charlesworth, B. The effect of background selection against deleterious mutations on weakly selected,
linked variants. Genet. Res. 63, 213±227 (1994).

6. Fay, J., Wycoff, G. J. & Wu, C.-I. Positive and negative selection on the human genome. Genetics 158,
1227±1234 (2001).

7. McDonald, J. H. & Kreitman, M. Adaptive evolution at the Adh locus in Drosophila. Nature 351, 652±
654 (1991).

8. Charlesworth, B., Morgan, M. T. & Charlesworth, D. The effect of deleterious mutations on neutral
molecular variation. Genetics 134, 1289±1303 (1993).

9. Maynard Smith, J. & Haigh, J. The hitch-hiking effect of a favourable gene. Genet. Res. 23, 23±35 (1974).
10. Begun, D. J. & Aquadro, C. F. levels of naturally occuring DNA polymorphism correlate with

recombination rates in D. melanogaster. Nature 356, 519±520 (1992).
11. Begun, D. The frequency distribution of nucleotide variation in Drosophila simulans. Mol. Biol. Evol.

18, 1343±1352 (2001).
12. Kliman, R. Recent selection on synonymous codon usage in Drosophila. J. Mol. Evol. 49, 343±351 (1999).
13. Adams, M. D. et al. The genome sequence of Drosophila melanogaster. Science 287, 2185±2195 (2000).
14. Powell, J. R. & DeSalle, R. Drosophila molecular phylogenies and their uses. Evol. Biol. 28, 87±138

(1995).
15. Haldane, J. B. S. The cost of natural selection. J. Genet. 55, 511±524 (1957).
16. Kimura, M. Evolutionary rate at the molecular level. Nature 217, 624±626 (1968).
17. Thompson, J. D., Higgins, D. G. & Gibson, T. J. ClustalWÐimproving the sensitivity of progressive

multiple alignment through sequence weighting, position-speci®c gap penalties and weight matrix
choice. Nucl. Acids Res. 22, 4673±4680 (1994).

18. Xia, X. Data Analysis in Molecular Biology and Evolution (Kluwer Academic, London, 2000).
19. Rozas, J. & Rozas, R. DnaSP version 3: an integrated program for molecular population genetics and

molecular evolution analysis. Bioinformatics 15, 174±175 (1999).
20. Yang, Z. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl.

Biosci. 13, 555±556 (1997).

Supplementary Information accompanies the paper on Nature’s website
(http://www.nature.com).

Acknowledgements
We thank B. Charlesworth, C.-I. Wu, S. Otto, M. Whitlock, T. Johnson, P. Awadalla,
J. Gillespie, G. McVean and P. Keightley for helpful discussions, and E. Moriyama for help
with data collection. N.G.C.S. was funded by the Biotechnology and Biological Sciences
Research Council (BBSRC) and A.E.-W. is funded by the Royal Society and the BBSRC.

Competing interests statement
The authors declare that they have no competing ®nancial interests.

Correspondence and requests for materials should be addressed to A.E.-W.
(e-mail: a.c.eyre-walker@sussex.ac.uk).

………………………………………………………..
Testing the neutral theory of
molecular evolution with
genomic data from Drosophila
Justin C. Fay*², Gerald J. Wyckoff*² & Chung-I Wu*³

* Committee on Genetics, University of Chicago, Chicago, Illinois 60637, USA
³ Department of Ecology and Evolution, University of Chicago, Chicago,
Illinois 60637, USA

…………………………………………………………………………………………………………………………….

Although positive selection has been detected in many genes, its
overall contribution to protein evolution is debatable1. If the bulk
of molecular evolution is neutral, then the ratio of amino-acid (A)
to synonymous (S) polymorphism should, on average, equal that
of divergence2. A comparison of the A/S ratio of polymorphism in
Drosophila melanogaster with that of divergence from Drosophila
simulans shows that the A/S ratio of divergence is twice as highÐa
difference that is often attributed to positive selection. But an
increase in selective constraint owing to an increase in effective
population size could also explain this observation, and, if so, all
genes should be affected similarly. Here we show that the differ-
ence between polymorphism and divergence is limited to only a

² Present addresses: Department of Genome Sciences, Lawrence Berkeley National Laboratory, Berkeley,
California 94720 (J.C.F.); Department of Human Genetics, University of Chicago, Chicago, Illinois 60637,
USA (G.J.W).

fraction of the genes, which are also evolving more rapidly, and this
implies that positive selection is responsible. A higher A/S ratio of
divergence than of polymorphism is also observed in other species,
which suggests a rate of adaptive evolution that is far higher than
permitted by the neutral theory of molecular evolution.
The neutral theory holds that the bulk of DNA divergence

between species is driven by mutation and drift, rather than by
positive darwinian selection3. But because the effect of positive
selection is often masked by negative selection4, detecting positive
selection is a challenging task. A rate of amino-acid substitution
greater than that of synonymous substitution can be explained only
by positive selection5, but such a criterion is very stringent as
negative selection lowers the rate of amino-acid substitution. A
high rate of amino-acid substitution is limited mostly to genes that
are involved in resistance to disease or in sexual reproduction, where
there is continual room for improvement6,7.
The McDonald±Kreitman test can detect positive selection even

in the presence of negative selection through a ratio of amino-acid
divergence to synonymous divergence greater than that of
polymorphism2. The A/S ratio of divergence is in¯ated above
polymorphism by advantageous amino-acid mutations, which
quickly sweep through a population but have a cumulative effect
on divergence. The McDonald±Kreitman test has been applied to
many genes individually, but only a few have yielded a signi®cant
excess of amino-acid divergence (Drosophila genes are reviewed in
refs 8, 9). This may in part be caused by a lack of power in detecting
positive selection in individual genes unless a large number of
adaptive substitutions have occurred.
For those genes that have yielded a signi®cant McDonald±

Kreitman test result, the A/S ratio of divergence is more than twice
as great as polymorphism10±12 . The effects of positive selection may
also be obscured by slightly deleterious amino-acid mutations
that in¯ate the A/S ratio of polymorphism but not divergence.
The effects of slightly deleterious mutations can be removed by
comparing common polymorphism with divergence, because dele-
terious amino-acid mutations are kept at low frequency in the
population4. This can only be done when the data from a large
number of genes are combined; individual genes rarely contain
more than a few common amino-acid polymorphisms.
An important but rarely appreciated assumption of the

McDonald±Kreitman test is that the selective constraint on a gene
remains constant over time. The selective constraint on a gene is
determined by the proportion of amino-acid mutations that are
deleterious3, 2Ns , -1, so both a change in the selection coef®cient
(s) and a change in effective population size (N) can result in a
change in selective constraint. Although it is well known that
selective constraint is not static across phylogenetic lineages13,14,
this assumption is rarely justi®ed in applications of the McDonald±
Kreitman test. Whereas the strength of selection on each gene might
¯uctuate over time depending on the genetic or environmental
background, a genome-wide change in constraint, such as that
caused by a change in effective population size, should produce a
consistent increase or decrease in the A/S ratio across all genes.
Alternatively, under positive selection each gene might be affected
to a different degree and some genes might not be affected at all.
To compare genomic patterns of amino-acid and synonymous

Table 1 Polymorphisms in D. melanogaster and divergence from D. simulans

Gene* Class Amino-acid Synonymous A/S
polymorphism, A polymorphism, S

………………………………………………………………………………………………………………………………………………………..
X-linked Rare (#12.5%) 4 67 0.06

Common (.12.5%) 6 46 0.13
Divergence 42 189 0.22

Autosomal Rare 79 126 0.63
Common 44 118 0.37
Divergence 421 521 0.81

………………………………………………………………………………………………………………………………………………………..
* There are 5 X-linked and 31 autosomal genes with a sample size of eight or greater (see text for the
data from all 45 genes).

1024 © 2002 Macmillan Magazines Ltd NATURE | VOL 415 | 28 FEBRUARY 2002 | www.nature.com

letters to nature

Table 2 African and non-African common polymorphism and divergence

Class Population Amino-acid Synonymous A/S
polymorphism, A polymorphism, S

………………………………………………………………………………………………………………………………………………………..
Polymorphism Non-African 48 124 0.39

African 40 159 0.25
Divergence 413 663 0.62
………………………………………………………………………………………………………………………………………………………..

site evolution, we tabulated polymorphism in D. melanogaster and
divergence from D. simulans from 45 gene surveys (Methods). If all
amino-acid and synonymous variation is neutral, then the A/S ratio
of polymorphism and divergence should be constant. The A/S ratio
of divergence (598/950 = 0.63) is signi®cantly greater than that of
common polymorphism (65/224 = 0.29; P , 10 -6). We compared
divergence with the common rather than the total polymorphism
because deleterious mutations at low frequency in¯ate the A/S ratio
of polymorphism. For the 36 genes with sample sizes of eight or
greater, there is a signi®cant excess of rare over common amino-acid
variation in autosomal genes (P = 0.022; Table 1), as is observed in
humans4. The absence of a difference in X-linked genes suggests that
the deleterious mutations are partially recessive and are more
readily eliminated from the X chromosome.
Both positive selection and an increase in selective constraint on

amino-acid changes can produce a higher A/S ratio of divergence
than of polymorphism. But only under certain restrictive conditions
is a genome-wide change in constraint possible. One such condition
is an increase in effective population size that is neither too distant
nor too recent in the evolutionary past. If this possibility can be
ruled out, positive selection may be the only viable explanation for
the high rate of amino-acid divergence.
If an increase in selective constraint resulted from a population

size increase associated with the spread of D. melanogaster outside
Africa15, it might be more appropriate to compare the A/S ratio of
the African population with that of divergence. Table 2, which
includes the 32 genes for which both African and non-African
populations were surveyed, shows that there is a signi®cantly larger
A/S ratio of divergence than of polymorphism in either population.
If a recent increase in effective population size increased constraint
on amino-acid polymorphism in both African and non-African
populations, then patterns of synonymous polymorphism might be
skewed towards rare variants. Neither African or non-African
populations show this pattern16. Finally, if there has been a decrease
in effective population size along the D. melanogaster lineage17,18, the
A/S ratio of polymorphism should be greater than that of divergence
between the two species.

12

10

8

6

4

2

0

–12 –8 –4 0 4 8 >10

ka < 0.02

ka > 0.02

Excess of amino-acid divergence

N
u
m

b
e
r

o
f

g
e
n
e
s

Figure 1 The distribution of the excess of amino-acid divergence contributed by each
gene. For reference, fast and slowly evolving genes are denoted by a rate of amino-acid
substitution (ka) greater than (®lled bars) or less than (open bars) 2%.

Table 3 Polymorphism and divergence in neutral and fast genes

Genes* Class Amino-acid Synonymous A/S
polymorphism, A polymorphism, S

………………………………………………………………………………………………………………………………………………………..
Neutral Rare 31 90 0.34

Common 16 69 0.23
Divergence 65 247 0.26

Fast Rare 48 36 1.33
Common 28 49 0.57
Divergence 356 274 1.30

………………………………………………………………………………………………………………………………………………………..
*X-linked genes are excluded.

If an increase in effective population size has produced a genome-
wide increase in selective constraint, the A/S ratio of all genes should
be affected. In Fig. 1, the distribution of each gene’s contribution to
the excess of amino-acid divergence suggests that there are two
classes of gene: neutral and rapidly evolving. The neutral class
comprises 34 genes that deviate by less than 10 amino-acid sub-
stitutions from that expected on the basis of the A/S ratio of all
common polymorphism. The remaining 11 genes all have a higher
A/S ratio of divergence than of polymorphism, and account for the
whole difference in the A/S ratio of polymorphism and divergence.
These genes are Acp26Aa, Acp29Ab, anon1A3, anon1E9, anon1G5, ci,
est-6, Ref2P, Rel, tra and Zw. As expected under positive selection,
which increases the rate of protein evolution, these 11 genes have a
high rate of amino-acid substitution (Fig. 1).
Can the pattern in Fig. 1 be explained by selection or demogra-

phy? Table 3 shows that, in the rapidly evolving genes, the A/S ratios
of divergence and of rare polymorphism are much higher than the
A/S ratio of the common polymorphism. This is expected if the
genes are under positive selection. Although a large increase in
population size in the recent past could account for the difference
between the A/S ratio of divergence and that of common poly-
morphism, this explanation is incompatible with the very small
difference found in the 26 neutral genes. Because both the neutral
and rapidly evolving genes have a higher A/S ratio of rare poly-
morphism than of common polymorphism, both should have been
affected by an increase in effective population size.
If positive selection is common, other species should also have an

A/S ratio of divergence greater than that of polymorphism. In
addition, any demographic scheme is not likely to be shared by
several species. In a study of eight genes in D. simulans, Drosophila
mauritiana and Drosophila sechellia, the A/S ratio of polymorphism
(A/S = 32/183) is 34% that of divergence (28/55)19. In a study of 42
genes with polymorphism in both D. melanogaster and D. simulans,
the A/S ratio of polymorphism is 65% that of divergence (N. G. C.
Smith and A. Eyre-Walker, personal communication). In another
study of 23 genes, the A/S ratio of polymorphism (45/305) is 30%
that of divergence along the D. simulans lineage (65/133)20. In
humans, the A/S ratio of common polymorphism (70/122) found
in 181 genes is 65% that of divergence (3,660/4,151) found in a
different set of 182 human and Old World monkey genes4.
Although these genomic patterns of variation are not explained

easily by the neutral theory, slightly deleterious mutations must
clearly be accounted for in attempting to measure positive selection.
In humans, 38% of amino-acid polymorphism was estimated to be
slightly deleterious4, and in D. melanogaster the estimate is 26%,
(0.63 – 0.37) ́ 126/123, from the combined neutral and rapidly
evolving genes (Table 3). These slightly deleterious mutations,
which are emphasized by the nearly neutral theory21, could
become effectively neutral and ®xed during a population bottleneck
of suf®cient severity, providing a burst of amino-acid substitutions
and an increase in the A/S ratio of divergence. We control for the
impact of these slightly deleterious mutations by comparing the
rapidly evolving class of gene to the neutral class (Fig. 1, Table 3).
Additional genomic data from other species will be needed to
estimate the general impact of these slightly deleterious mutations
on protein evolution. M

NATURE | VOL 415 | 28 FEBRUARY 2002 | www.nature.com © 2002 Macmillan Magazines Ltd 1025

letters to nature

Methods
Data
A literature search yielded 45 genes for which polymorphism had been surveyed in
D. melanogaster and for which an outgroup sequence was available. Of these, 36 had a
sample size of eight or greater, 32 had been surveyed in at least two African and two non-
African individuals and 10 were of X-linked genes. The 45 genes and their references are
listed in Supplementary Information.

Analysis
Polymorphism data was tabulated by hand or from GenBank accession numbers using
SITES21 or DNASP22. For each polymorphic site, the minor allele was classi®ed as rare
(# 12.5%) or common (. 12.5%). The cutoff of 12.5% was chosen to exclude deleterious
mutations from the common frequency class and to include those genes with samples of
eight or more in the analysis of rare compared to common polymorphism. Cutoffs of 10
and 15% produce similar results. We treated three alleles segregating at a single nucleotide
as two segregating sites and excluded complex variations. Divergence data was obtained by
comparing a randomly chosen sequence of D. melanogaster with that of D. simulans or, if
unavailable, either D. mauritiana or D. sechellia. The number of amino-acid and
synonymous substitutions between species was estimated using Kimura’s two-parameter
model to correct for multiple hits.

The contribution of each gene to the excess number of amino-acid substitutions was
calculated as the excess number of amino-acid substitutions minus the excess number of
amino-acid polymorphisms found in each gene. The excess for polymorphism and
divergence is A – S ́ (65/224), where A and S are the number of amino-acid and
synonymous substitutions, respectively, and 65/224 is the total number of amino-acid
polymorphisms divided by synonymous polymorphisms. (Ideally, the excess of amino-
acid divergence in each gene should be calculated using only polymorphism and
divergence in that gene but there is rarely suf®cient polymorphism in a single gene for
comparison with divergence.) We also calculated the contribution to the excess separately
for three groups of genes sorted by their rate of amino-acid divergence. The two methods
produced a similar distribution so the simpler method using a single group of genes was
used.

Received 27 June; accepted 4 December 2001.

1. Nei, M. Molecular Evolutionary Genetics (Columbia Univ. Press, New York, 1987).
2. McDonald, J. H. & Kreitman, M. Adaptive protein evolution at the Adh locus in Drosophila. Nature

351, 652±654 (1991).
3. Kimura, M. The Neutral Theory of Molecular Evolution (Cambridge Univ. Press, Cambridge, 1983).
4. Fay, J. C., Wyckoff, G. J. & Wu, C.-I. Positive and negative selection on the human genome. Genetics

158, 1227±1234 (2001).
5. Kimura, M. Preponderance of synonymous changes as evidence for the neutral theory of molecular

evolution. Nature 267, 275±276 (1977).
6. Yang, Z. & Bielawski, J. P. Statistical methods for detecting molecular adaptation. Trends Ecol. Evol. 15,

496±503 (2000).
7. Wyckoff, G. J., Wang, W. & Wu, C.-I. Rapid evolution of male reproductive genes in the descent of

man. Nature 403, 304±309 (2000).
8. Weinreich, D. M. & Rand, D. M. Contrasting patterns of nonneutral evolution in proteins encoded in

nuclear and mitochondrial genomes. Genetics 156, 385±399 (2000).
9. Moriyama, E. N. & Powell, J. R. Intraspeci®c nuclear DNA variation in Drosophila. Mol. Biol. Evol. 13,

261±277 (1996).
10. Eanes, W. F., Kirchner, M. & Yoon, J. Evidence for adaptive evolution of the G6pd gene in the

Drosophila melanogaster and Drosophila simulans lineages. Proc. Natl Acad. Sci. USA 90, 7475±7479
(1993).

11. Begun, D. J. & Whitley, P. Adaptive evolution of relish, a Drosophila NF-kB/IkB protein. Genetics 154,
1231±1238 (2000).

12. Tsaur, S. C., Ting, C. T. & Wu, C. I. Positive selection driving the evolution of a gene of male
reproduction, Acp26Aa, of Drosophila: II. Divergence versus polymorphism. Mol. Biol. Evol. 15, 1040±
1046 (1998).

13. Langley, C. H. & Fitch, W. M. An examination of the constancy of the rate of molecular evolution.
J. Mol. Evol. 3, 161±177 (1974).

14. Ohta, T. Synonymous and nonsynonymous substitutions in mammalian genes and the nearly neutral
theory. J. Mol. Evol. 40, 56±63 (1995).

15. Lachaise, D. M., Cariou, M.-L., David, J. R., Lemeunier, F. & Tsacas, L. The origin and dispersal of the
Drosophila melanogaster subgroup: a speculative paleogeographic essay. Evol. Biol. 22, 159±225
(1988).

16. Andolfatto, P. Contrasting patterns of X-linked and autosomal nucleotide variation in Drosophila
melanogaster and Drosophila simulans. Mol. Biol. Evol. 18, 279±290 (2001).

17. Akashi, H. Codon bias evolution in Drosophila: Population genetics of mutation-selection drift. Gene
205, 269±278 (1997).

18. McVean, G. A., Vieira, J. Inferring parameters of mutation, selection and demography from patterns
of synonymous site evolution in Drosophila. Genetics 157, 245±257 (2001).

19. Kliman, R. M. et al. The population genetics of the origin and divergence of the Drosophila simulans
complex species. Genetics 156, 1913±1931 (2000).

20. Begun, D. J. The frequency distribution of nucleotide variation in Drosophila simulans. Mol. Biol. Evol.
18, 1343±1352 (2001).

21. Ohta, T. Slightly deleterious mutant substitutions during evolution. Nature 246, 96±98 (1973).
22. Hey, J. & Wakeley, J. A coalescent estimator of the population recombination rate. Genetics 145, 833±

846 (1997).
23. Rozas, J. & Rozas, R. DnaSP version 3: an integrated program for molecular population genetics and

molecular evolution analysis. Bioinformatics 15, 174±175 (1999).

Supplementary Information accompanies the paper on Nature’s website
(http://www.nature.com).

Acknowledgements
This work was supported by grants from the NIH and NSF to C.-I.W. and a Genetics
Training Grant and a Department of Education PhD fellowship to J.C.F.

Competing interests statement