53  Restriction enzyme cut sites

Open in Google Colab | Download notebook

Data set download


Restriction enzymes cut DNA at specific locations called restriction sites. The sequence at a restriction site is called a recognition sequence. Here are the recognition sequences of some commonly used restriction enzymes.

Restriction enzyme Recognition sequence
HindIII AAGCTT
EcoRI GAATTC
KpnI GGTACC

a) New England Biosystems sells purified DNA of the genome of λ-phage, a bacteriophage that infect E. coli. You can download the FASTA file containing the sequence here. Write a function to parse the FASTA file and return the sequence. Use the function to load in the sequence as a string.

b) Write a function with call signature

restriction_sites(seq, recog_seq)

that takes as arguments a sequence and the recognition sequence of a restriction enzyme sites and returns the indices of the first base of each of the restriction sites in the sequence. Use this function to find the indices of the restriction sites of λ-DNA for HindIII, EcoRI, and KpnI. Compare your results with those given here, which contain a comprehensive list of locations of restriction sites for a variety of enzymes.