This post is by Joe Pickrell, and is part of an experiment where I will be posting summaries and critiques of the main points of papers I review for journals. Apologies in advance for any misunderstandings and errors on my end; please correct these in the comments.
TL;DR: A clever analysis of RNA sequencing data identifies natural genetic variation influencing mitochondrial tRNA processing in humans.
I recently reviewed a manuscript titled “High Resolution Genomic Analysis of Human Mitochondrial RNA Sequence Variation”, which has now been published. Overall I thought the paper was creative and surprising; I’d be interested in hearing other folks’ thoughts.
The initial goal of this study seems to have been to use RNA-seq to quantify variation in mitochondrial RNA and DNA sequences. The authors sequenced cDNA libraries prepared from mRNA from whole blood in ~700 individuals, and focused specifically on sequencing reads that mapped to the mitochondrial genome. Since each individual in principle inherited a single mitochondrial genome from their mother, there should be essentially no sequence-level variation within individuals (modulo sequencing and mapping artifacts, more on this later).
The authors then did a simple analysis: they looked for positions in the mitochondrial transcriptome where they observed more than a single base in an individual. They identified ~600 such sites (some observed in multiple individuals), which they call “heteroplasmies”. Putting aside potential technical explanations for these sites, heteroplasmies could be due to either 1) variation at the DNA level (e.g. mutations that have occurred in mitochondria of the individual’s blood during their lifetime) or 2) variation at the RNA level (post-transcriptional modifications of RNA through mechanisms like RNA editing).
Main result: A genetic variant in MRPP3 influences processing of mitochondrial tRNAs
At 13 of the heteroplasmic sites, the authors noticed that their data contained multiple alleles (rather than the two you might expect from a new mutation or a simple RNA editing event). They also made an odd observation: 11 of these 13 sites fell in the ninth position of tRNA genes. By reference to what is known about tRNA biology, they argue that the particular patterns of mismatches they observe at these sites are caused by the presence of RNA methylation (which causes the observed mismatches via reverse transcriptase errors).
Under this model, the proportion of non-reference alleles at a site is a quantitative measure of the fraction of mitochondria in an individual that is methylated at the site. The authors reasoned that as a quantitative phenotype, genetic variants influencing methylation levels might be mapped by standard human genetics methods. Shown at the top of the post is a “Manhattan plot” showing the authors’ results from a genome-wide association study of (putative) tRNA methylation in the mitochondria. The result is essentially every human geneticist’s dream: there’s a single strong peak centered on a nonsynonymous SNP in a biologically plausible gene (in this case, MRPP3, a gene involved in processing of mitochondrial tRNAs).
Putting all of this together, is seems that there is variation in mitochondrial tRNA methylation (or some other modification that could cause similar reverse-transcriptase errors) among individuals in a population, and that this variation is partially due to a trans-acting genetic variant of relatively large effect. I found this is quite impressive.
A note of caution regarding estimates of the total number of heteroplasmies
At various points in the paper, the authors include other results that are often interesting but not as important to the main conclusion. One of these that is worth thinking about is the overall number of heteroplasmic sites.
The authors estimate that in their samples, there are around 600 mitochondrial sites that have multiple alleles (note that this is a sum of DNA-level heteroplasmies and RNA-level heteroplasmies). I have a nagging suspicion that this is an overestimate.
The reason for this suspicion is that I’m worried about mapping errors from “nuclear mitochondrial DNA” (AKA Numt) sequences causing false inference of heteroplamies. Examination of some of the reported sites suggests that the alleles of the “heteroplasmies” indeed are consistent with instead being due to mismapping errors from autosomal sequences.
For example, below is a screenshot of the UCSC genome browser surrounding two “heteroplasmic” sites from Supplementary Table 1. I’m showing the sequence of the reference mtDNA (at the top), as well as the sequences of all relevant Numts (using the NumtS Sequence track). As you can see, at the two sites called by the authors, the alternative “allele” at the site matches the sequence of the Numt. My guess is that there is no mitochondrial sequence variation at these two sites, just mis-mapped sequencing reads that originated from the Numts.
It’s unclear how many of the sites identified by the authors are potentially affected by mapping errors (though note none of the 13 used in the mapping experiment described above have any indication of such problems to my eye). For people interested in quantifying the overall extent of the phenomenon observed by the authors, this seems like a potentially important source of error to take into account.