Gene isolation and DNA sequencing

Knowing how many genes determine a phenotype, and where the genes are located, is a first step in understanding the genetic basis of a phenotype.

A second step is determining the sequence of the gene, or genes, determining the phenotype and understanding how the expression of the genes is regulated at the transcriptional level. 

Subsequent steps involve analysis of post-transcriptional events, understanding how the genes fit into metabolic pathways and how these pathways interact with the environment.

In terms of genetic analysis, the challenge is what do with all the DNA in a higher plant.

Complete genome sequences are coming, but aren't yet available for many plants.  The $1,000 genome sequence challenge

Even when complete genome sequence information is available, there will always be reason to study allelic diversity and interactions at specific loci and to compare genome sequences of multiple individuals.

The Thousand Genomes Project

One approach to dealing with the complexity and size of eukaryotic genomes is to use recombinant DNA technology

The tools of recombinant DNA technology

Obtaining DNA:

 

 

 

Restriction enzymes (see notes on Molecular Markers):  _____________________

 

 

Vectors: _____________________

The role of the vector is to propagate and maintain the DNA fragments generated by restriction. Additional requisites of vectors are the efficiency and simplicity of inserting and retrieving the DNA fragments. The choice of vector will depend on the objective of the experiment. A key feature of the cloning vector is size of the DNA fragment insert that it can efficiently and reliably handle. An example: the principle of cloning a DNA fragment in a plasmid vector.

Common vectors and approximate insert sizes

Vector 

Insert size(kb) 

Plasmid 

~ 1  

Lambda phage 

~ 20  

Cosmid

~ 50

P1 Artificial Chromosome (PAC)

~ 100

Bacterial Artificial Chromosomes (BAC)

~ 200

Libraries:   _____________________

Libraries are repositories of DNA fragments cloned in their vectors. Libraries can be classified based on the cloning vector – e.g. plasmid, BAC, etc.   Alternatively, the library can be described in terms of the source of the cloned DNA fragments.

A.   Genomic libraries: If total genomic DNA is digested and the fragments are cloned into an appropriate vector, this is a genomic library. In principle, this library should consist of samples of the all the genomic DNA present in the organism, including both coding and non-coding sequences. For a gene hunt a large insert vector would be employed.  Libraries that are intended to serves as sources of probes for linkage map construction will consist of smaller inserts. Ideally, every copy of every gene (or a portion of every sequence) should be represented somewhere in the genomic library.  There are strategies for enriching genomic libraries for specific types of sequences and removing specific types of sequences – e.g. highly repetitive sequences.  

B.   cDNA libraries: A cDNA (complementary DNA) library is generated from mRNA transcripts, using the enzyme reverse transcriptase, which can create a DNA complement to a mRNA template. Since the cDNA library is based on mRNA, the library will represent only the genes that are expressed in the tissue and/or developmental stage that are sampled.  

Hybridization and amplification (see notes on Molecular Markers). 

 

Determining DNA sequence.   Advances in technology have removed the technical obstacles to determining the nucleotide sequence of a gene, chromosome region, or genome.  The starting point for any sequencing project – be it of a single cloned fragment or of an entire genome - is a defined fragment of DNA that is labeled at one end.

A. The principles of sequencing.   

B. Dideoxy sequencing (Cycle sequencing animation)

A dideoxy nucleotide lacks a 3' OH and once incorporated, it will terminate strand synthesis. 

Graphic of the process

Sequencer output

C.Genome sequencing: _____________________

Genomes come in many sizes: the units are base pairs (bp) kilobases (kb) or megabases (Mb).
C-Value database at Kew Gardens


D. Perspectives on automated sequencing.

 

 

Text  Readings:

Chapter 13