Gene isolation and DNA sequencing
Knowing how many genes determine a phenotype, and where the genes are located, is a first step in understanding the genetic basis of a phenotype.
A second step is determining the sequence of the gene, or genes, determining the phenotype and understanding how the expression of the genes is regulated at the transcriptional level.
Subsequent steps involve analysis of post-transcriptional events, understanding how the genes fit into metabolic pathways and how these pathways interact with the environment.
In terms of genetic analysis, the challenge is what do with all the DNA in a higher plant.
Complete genome sequences are coming, but aren't yet available for many plants. The $1,000 genome sequence challenge
Even when complete genome sequence information is available, there will always be reason to study allelic diversity and interactions at specific loci and to compare genome sequences of multiple individuals.
The Thousand Genomes Project
One approach to dealing with the complexity and size of eukaryotic genomes is to use recombinant DNA technology.
The tools of recombinant DNA technology
Obtaining DNA:
Restriction enzymes (see notes on Molecular Markers): _____________________
Vectors: _____________________
The role of the vector is to propagate and maintain
the DNA fragments generated by restriction. Additional requisites of
vectors are the efficiency and simplicity of inserting and retrieving
the DNA fragments. The choice of vector will depend on the objective
of the experiment. A key feature of the cloning vector is size of the
DNA fragment insert that it can efficiently and reliably handle. An example: the
principle of cloning a DNA fragment in a plasmid vector.
Common vectors and approximate insert sizes
Vector |
Insert size(kb) |
Plasmid |
~ 1 |
Lambda phage |
~ 20 |
Cosmid |
~ 50 |
P1 Artificial Chromosome (PAC) |
~ 100 |
Bacterial Artificial Chromosomes (BAC) |
~ 200 |
Libraries: _____________________
Libraries are repositories of DNA fragments cloned in their vectors. Libraries can be classified based on the cloning vector – e.g. plasmid, BAC, etc. Alternatively, the library can be described in terms of the source of the cloned DNA fragments.
A. Genomic libraries: If total genomic DNA is digested and the fragments are cloned into an appropriate vector, this is a genomic library. In principle, this library should consist of samples of the all the genomic DNA present in the organism, including both coding and non-coding sequences. For a gene hunt a large insert vector would be employed. Libraries that are intended to serves as sources of probes for linkage map construction will consist of smaller inserts. Ideally, every copy of every gene (or a portion of every sequence) should be represented somewhere in the genomic library. There are strategies for enriching genomic libraries for specific types of sequences and removing specific types of sequences – e.g. highly repetitive sequences.
B. cDNA libraries: A cDNA (complementary DNA) library is generated from mRNA transcripts, using the enzyme reverse transcriptase, which can create a DNA complement to a mRNA template. Since the cDNA library is based on mRNA, the library will represent only the genes that are expressed in the tissue and/or developmental stage that are sampled.
Hybridization and amplification (see notes on Molecular Markers).
Determining DNA sequence. Advances in technology have removed the technical obstacles to determining the nucleotide sequence of a gene, chromosome region, or genome. The starting point for any sequencing project – be it of a single cloned fragment or of an entire genome - is a defined fragment of DNA that is labeled at one end.
A. The principles of sequencing.
- Start with a defined fragment of DNA
- Based on this template, generate a population of molecules differing in size by one base of known composition.
- Use electrophoresis to fractionate the population molecules based on size
- The base at the truncated end of each of the fractionated molecules is determined and used to establish the nucleotide sequence.
B. Dideoxy sequencing (Cycle sequencing animation):
A dideoxy nucleotide lacks a 3' OH and once incorporated, it will terminate strand synthesis.
C.Genome sequencing: _____________________
Genomes come in many sizes: the units are base pairs (bp) kilobases (kb) or megabases (Mb).
C-Value database at Kew Gardens
D. Perspectives on automated sequencing.
- Local costs
- 454 technology
- What would you do if it became possible to sequence the equivalent of a full human genome for only $1,000?
Text Readings:
Chapter 13