Textbooks covering this topic

Brown (2001). Chapters 6 and 8.

Primrose et al. (2001) Chapters 6 and 7.

Gene library

A random collection of DNA fragments, typically representing the entire genome of an organism, that have been inserted into a cloning vector.

Gene libraries are sometimes called genomic libraries or gene banks.

The use of random fragments is sometimes referred to as 'shotgun cloning'.

cDNA library

A random collection of cDNA (complementary DNA) fragments, typically representing the entire mRNA of a tissue, that have been inserted into a cloning vector. 

The mRNA from the tissue is converted into DNA to form the library.

If a particular gene sequence is required, it may be easier to find it in a cDNA library rather than a gene library, which is larger. However, the cDNA library screened would have to be from a tissue where the gene in question was being expressed, i.e. mRNA was being transcribed. For example, if one wanted the sequence of the insulin gene you would expect to find it in a cDNA library derived from the mRNA of b cells in the Islets of Langerhans in the pancreas.

Stages in production of a library:

  1. Production of random fragments of genomic or cDNA (depending on type of library).
  2. Insertion of fragments into suitable vector.
  3. Replication of fragments in host-vector system.
  4. Screening and selection of clones for desired recombinants.

Random fragments can be produced in several ways:

Click on thumbnails below in turn for diagrams of stages in production of a gene library.

library-01.gif (14082 bytes)    library-02.gif (15399 bytes)
  Diagram 1          Diagram 2

It is best if the fragments are as large as possible because this means there will be fewer clones to screen but, unfortunately, insert size is limited by vector. Therefore, use vectors that take large inserts. The most popular prokaryotic vector used is lambda phage (l) which can take inserts up to about 20kb. Lambda vectors can take larger inserts than plasmid vectors and are more stable than cosmid vectors. Increasingly nowadays eukaryotic vectors such as YAC's are used which can take enormous inserts which reduces the number of clones that require screening. This is particularly important with large genomes, e.g. human (size = 2.8 x106 kilobases = 2.8 x109 bases).

How many clones are required for a library?

In theory:

Minimum number of clones required (n) =          size of genome (b)      
                                                                         mean size of fragments (a)

For example, if using human genome and lambda phage vector:

Minimum number of clones required (n) =         2.8 x106 kb     =   1.4 x 105 clones
                                                                               20 kb

However, in practice the number of clones must be greater than this because the process is random. Therefore, some sequences will be included in the library more than once due to overlaps and some sequences will be missed if only the minimum number of clones is used. Clarke and Carbon (1976) devised a formula which calculates the probability (P) of any DNA sequence being included in a library of random fragments.

Actual number of clones required (N) =        ln(1-P) 

Where,    P = probability
                n = minimum number of clones required (b/a)
                ln = natural logarithm

For example, to achieve 95% (0.95) probability of including a particular sequence in a random human gene library:

N =         ln(1-0.95)          =    4.2 x 105 clones 
         ln(1-1/1.4 x 105)

In other words, 3 times (4.2 x 105/1.4 x 105) the theoretical minimum number of clones have to be used to achieve 95% probability of including a given sequence in the library. It should also be noted that to achieve 100% (1.00) probability an infinite number of clones would have to be used!

Another version of the formula:

N =        ln(1-P) 

Where,    P = probability
                a = mean size of fragments
                b = size of genome
                ln = natural logarithm

The number of clones required also indicates the number of clones that have to be screened to find a particular sequence (maybe a gene) in a gene library.


How many clones would be required to achieve 99% probability of including a sequence in a human gene library using a cosmid vector? Try this yourself first and don't be tempted to simply click on the answer!


Screening of libraries

It should be obvious that with large genomes, such as human, there will be a great number of clones to screen for particular sequences or genes of interest. 

Two alternative approaches

  1. Screen for the DNA sequence in the clones.
  2. Screen for the products due to expression of the sequence.

Grunstein & Hogness (1975) devised a colony hybridization method to detect DNA sequences in transformed bacterial colonies using in situ hybridization with a radiolabelled probe. This method is suitable for screening libraries produced using plasmid or cosmid vectors.

For lambda gene libraries there are modified plaque hybridization methods which can be used to screen recombinant plaques.

However, in both cases appropriate probes are required. These can be synthesized assuming one has some knowledge of the gene sequences.

The second approach is immunological and uses techniques like ELISA and specific antibodies to detect antigenic expression products produced by clones, and to identify and select the ones of interest. This may be necessary if one only has knowledge of the product but not the sequence. However, it relies on the DNA sequence being expressed which cannot always be assumed, particularly in the case of eukaryotic genes inserted into prokaryotic host cells.

Uses of gene libraries

To obtain the sequences of genes for analysis, amplification, cloning, and expression. 

Once the sequence is known probes, primers, etc. can be synthesized for further diagnostic work using, for example, hybridization reactions, blots and PCR. Knowledge of a gene sequence also offers the possibility of gene therapy.

Also, gene expression can be used to synthesize a product in particular host cells, e.g. synthesis of human gene products in prokaryotic cells.





































I think you may have cheated here and not really worked out the answer yourself!
You might be stuck because the question mentions a cosmid vector rather than a lambda phage vector, as in the example.
To help you: cosmid vectors take inserts up to 40kb in size.

Try again and then click on answer.

Real answer (honest!)























Real answer

N = 3.2 x 105

N =      ln(1-P)  

P = 99% (0.99)
a = 40 kb
b = 2.8 x106 kb
a/b = 0.0000143

N =            ln(1-0.99)     

Doubling the fragment size (in this case from 20 to 40kb) will halve the number of clones required but, in this question, the probability was increased (from 95 to 99%) which will increase the number of clones required. The two factors together cancel each other out to some extent.