Of Mice, Men, Genes, and More
In an earlier post I wondered how it was that a mere 30,000 genes could program for all the complexity of a human. The Washinton post today has a story on How Science is Rewriting the Book on Genes that provides some interesting clues. It's a little worse than I thought - it seems that a mere 22,000 genes manage to code for at least 100,000 proteins - but science is decyphering how it is done.
The secret lies in the sublety of the control programs for gene translation and transcription. For one thing, all that stuff once labelled "junk DNA" no longer looks quite so junky.
It was long thought that the active 5 percent of our DNA consisted almost entirely of genes coding the instructions for making proteins. But it turns out that's not true.
It's now clear that more of those evolutionarily preserved stretches of DNA don't code for proteins than those that do. By one estimate, 70 percent of the conserved elements are non-coding.
"The majority of what evolution cared about is stuff we didn't know about a few years ago," says Eric S. Lander, a geneticist and head of the Broad Institute of MIT.
So what are these conserved non-coding elements? They are molecules worthy of the "Star Wars" cantina scene -- insulators, micro-RNAs, exon-splicing enhancers, 3'-untranslated hairpins and other weird characters only now emerging from the shadows.
What they have in common, other than that they are never translated into proteins, is that they regulate the activity of genes that do carry instructions to make proteins. They turn them on and off, tweak them to make one version of a protein rather than another, increase or decrease the efficiency of production, and coordinate the sequential or simultaneous action of genes.
From the standpoint of computer science, this makes tremendous sense. The problem was never that there weren't enough proteins to build cells out of, the problem was that there didn't seem to be enough control program to integrate the whole assembly process. If much of the control program is DNA that isn't transcribed into proteins, and there are more proteins than genes anyway, information content makes more sense.
The whole thing is still a marvel of sublety and efficiency - I found the article fascinating.
Comments
Post a Comment