A Scheme implementation of the DNA to Protein conversion program given in Python 3 at: https://www.geeksforgeeks.org/dna-protein-python-3/
The program creates two functions: one to read a sequence of text from a file, returning all the contents as one string without line breaks; and a second to convert each DNA triple in the string into a single letter protein.
For the first function, we use read-line
, which reads all the text on a
line without line breaks, so our read-sequence-file
function can
append all the lines together to make a complete string.
For the second function, we use an association list for the table: we could use a hash-table or map, but the association list is built-in to R7RS-small and the example is small enough that efficiency does not matter.
(import (scheme base) (scheme file) (scheme read) (scheme write)) (define (read-sequence-file filename) (with-input-from-file ; <1> filename (lambda () (do ((line (read-line) (read-line)) ; <2> (sequence "" (string-append sequence line))) ; <3> ((eof-object? line) sequence))))) ; <4> (define (translate sequence) (let ((table ; <5> '(("ATA" . #\I) ("ATC" . #\I) ("ATT" . #\I) ("ATG" . #\M) ("ACA" . #\T) ("ACC" . #\T) ("ACG" . #\T) ("ACT" . #\T) ("AAC" . #\N) ("AAT" . #\N) ("AAA" . #\K) ("AAG" . #\K) ("AGC" . #\S) ("AGT" . #\S) ("AGA" . #\R) ("AGG" . #\R) ("CTA" . #\L) ("CTC" . #\L) ("CTG" . #\L) ("CTT" . #\L) ("CCA" . #\P) ("CCC" . #\P) ("CCG" . #\P) ("CCT" . #\P) ("CAC" . #\H) ("CAT" . #\H) ("CAA" . #\Q) ("CAG" . #\Q) ("CGA" . #\R) ("CGC" . #\R) ("CGG" . #\R) ("CGT" . #\R) ("GTA" . #\V) ("GTC" . #\V) ("GTG" . #\V) ("GTT" . #\V) ("GCA" . #\A) ("GCC" . #\A) ("GCG" . #\A) ("GCT" . #\A) ("GAC" . #\D) ("GAT" . #\D) ("GAA" . #\E) ("GAG" . #\E) ("GGA" . #\G) ("GGC" . #\G) ("GGG" . #\G) ("GGT" . #\G) ("TCA" . #\S) ("TCC" . #\S) ("TCG" . #\S) ("TCT" . #\S) ("TTC" . #\F) ("TTT" . #\F) ("TTA" . #\L) ("TTG" . #\L) ("TAC" . #\Y) ("TAT" . #\Y) ("TAA" . #\_) ("TAG" . #\_) ("TGC" . #\C) ("TGT" . #\C) ("TGA" . #\_) ("TGG" . #\W)))) (do ((i 0 (+ i 3)) ; <6> (result '() (cons (cdr (assoc (substring sequence i (+ i 3)) table)) ; <7> result))) ((>= i (string-length sequence)) ; <8> (list->string (reverse result)))))) (let* ((dna-sequence (read-sequence-file "dna_sequence.txt")) (protein-sequence (translate dna-sequence)) (target-sequence (read-sequence-file "amino_acid_sequence.txt"))) (display "Comparing translated with target: ") (display (equal? protein-sequence target-sequence)) (newline))
- Opens the given file as an input port
- Reads each line from the current input port as a string
- Joining the strings together in turn
- ... until the end of file is reached, when the read sequence is returned.
- The conversion table is stored in an association list.
- An index takes us through the string, one triple at a time
- ... looking up each codon in turn, and recording the protein letter.
- When the string is fully processed, turn the sequence of letters into a string to return.