Gaurav’s analysis of the occurrence of Shine-Dalgarno-like motifs in prokaryotic coding regions has just been published in GBE. This is the first bioinformatics analysis we’ve done, and we are pretty pleased with the paper!
Bioinformatics analyses of bacterial genomes have uncovered interesting patterns of specific sequence features. Although it is tempting to invoke selection, it is important to distinguish the effects of selection from genetic drift, biophysical constraints, or indirect selection. For instance bacterial translation initiation usually requires ribosomal binding to the Shine-Dalgarno (SD) sequence in a gene’s 5′ untranslated region. Previous analyses showed that SD-like motifs are rare within protein coding genes of Escherichia coli and Bacillus subtilis (Li et al 2012, Nature). These authors suggested that because ribosomes pause at internal SD-like motifs, selection against them also explains codon bias across bacteria. However, it is important to consider alternative hypotheses. The SD sequence is GC-rich and well conserved across bacteria; hence its occurrence will vary simply as a function of genomic GC% (which ranges from 13-75% across bacteria). Experimental evidence also suggests positive selection on SD-like motifs: “programmed” internal ribosomal pauses are critical for proper folding and targeting of some proteins (e.g. Fluman et al 2014, Elife). We found that after accounting for the genomic GC content, ~50 out of 284 prokaryotic genomes showed no evidence of selection against internal SD-like motifs. Furthermore, selection on these motifs seems to vary according to their location. For instance, the C-terminal ends of genes are relatively enriched in SD-like motifs, potentially to initiate translation of the downstream gene. In contrast, the N-terminal ends of genes are depleted in SD-like motifs, perhaps due to their deleterious effects on local mRNA structure (known to affect gene expression). Our work thus highlights the complicated nature of selection acting on sequence elements and motifs, and the importance of accounting for genome-wide features such as GC content.