RNA samples, sequencing protocols and study design
In order to evaluate the performance of RNA-seq methods for profiling standard, low and ultra-low quantity samples, we compared the quality metrics, gene detection and differential analysis capacities of three different groups of RNA-seq kits. The different conditions used are detailed in Table 1. The first group is based on the widely used Illumina TruSeq technology which we used for total RNA and mRNA extraction, unstranded and stranded, with input amounts of 1 μg and 100 ng. The second group is based on SMARTer technology which we used for total RNA extraction, performing the ribosomal RNA depletion with two different kits (RiboGone and RiboZero) and using input amounts of 100 ng and 10 ng. The third group is based on a combination of SMARTer Ultra-Low plus Nextera technologies for ultra-low input amounts of 750 pg and 130 pg. For the RNA input material, we used two human RNA sample preparations previously used in reference RNA-seq studies9,11,12,13 (see methods). All samples were sequenced in duplicate giving a total of 80 sequenced libraries. Paired-end reads were then checked for quality, aligned to the human genome, and counts were generated for all human genes. Lastly, for the gene detection and differential analysis studies, mapped reads were sampled at four different levels. The sampling level reflects the sequencing depth at which the libraries will be sequenced. This important parameter can of course influence the global cost of the project, but also the type of downstream analysis that can be done. For instance, analyzing transcript isoforms would require a high sequencing depth, whereas restricting the analysis to the most highly expressed genes could be done safely at a much lower sequencing depth.
Read quality metrics
Figure 1A shows the total number of reads sequenced for each sample. All the 1 μg samples (mRNA TruSeq, stranded mRNA TruSeq and stranded total RNA TruSeq categories) display close values (average values per class of 53, 49 and 55 M reads respectively). The 100 ng samples give average values similar to the 1 μg samples (average values of 50, 52 and 52 M mapped reads for stranded mRNA TruSeq, stranded total RNA TruSeq and RiboGone stranded total RNA SMARTer categories), with the exception of the RiboZero stranded total RNA SMARTer samples, which display a much lower output at an average of 31 M reads. The 10 ng input category (RiboZero stranded total RNA SMARTer) shows the lowest value with an average of 22 M reads. The two ultra-low samples, mRNA SMARTer UL 750 pg and 130 pg yielded a relatively high number of reads with an average of 45 and 42 M reads.
The quality metrics mapping rate, duplication rate and average insert size displayed in Fig. 1 were all calculated for a uniform sampling level of 2 × 10 M. Figure 1B shows the mapping rate for the different categories. Not surprisingly, the three 1 μg categories have very high mapping rates with an average percentage of 95–96% of all reads mapped. For the 100 ng samples, two categories show a high read mapping rate (stranded mRNA and total RNA TruSeq), while the other two categories show lower mapping rates (56% and 69% for the RiboGone and the RiboZero stranded total RNA SMARTer respectively). The 10 ng category shows an average mapping rate of 69%, while the two SMARTer ultra-low samples (750 pg and 130 pg) display much higher mapping rates at around 93%. The kits RiboGone SMARTer for 100 ng input total RNA, RiboZero SMARTer for 100 ng input total RNA and RiboZero SMARTer 10 ng input total RNA clearly show lower mapping rates than for the other conditions. Although the mapping rates for these three conditions are comparable but not exactly the same, the number of PCR cycles are the same for the three kits, suggesting the absence of a correlation. We checked the unmapped reads with fastQC22, but did not find any particular pattern that could easily explain these low mapping rates. This is most likely due to less efficient sample preparation for these kits. Mapping rates on ribosomal RNA sequences are available in Supplementary Table S1. Read quality metrics for transcriptome mappings are available in Supplementary Dataset S1. All the quality metrics values used in Figs 1 and 2 are available in Supplementary Dataset S4.
Duplication rate is shown in Fig. 1C. Unsurprisingly, the rates are very low for all the 1 μg samples (around 8% on average for all three categories). As expected the duplication rates for the 100 ng samples are higher, however large variations in average values can be seen depending on the kit used. The lowest average rate is 10% for the category stranded total RNA TruSeq, followed by stranded mRNA TruSeq and RiboGone stranded total RNA SMARTer at 25% and 23% respectively. The last 100 ng group displays a very high rate at 61% (RiboZero stranded total RNA SMARTer). The 10 ng samples have an extremely high rate of duplication with an average of 94%. The two ultra-low sample groups have relatively low duplication rates with average values of 11% and 16% (750 pg and 130 pg respectively). However, caution should be taken in interpreting these values since the ultra-low kit has two consecutive steps for PCR amplification, and the values observed here represent duplication rates for the second stage only (see methods). In the previous paragraph, we noticed that the mapping rates are very low for the three Ribogone and Ribozero SMARTer kits, but their respective duplication rates are quite different, which does not suggest a relationship between these two metrics.
For the average insert size (Fig. 1D), we observed very little variation between the different categories, with an overall average insert size of around 110 nt.
Globally, the results show that the 1 μg groups display the best results for quality metrics in terms of the high number of reads produced, as well as very high mapping rates and low duplication rates. No real distinction can be made at this level between the stranded versus unstranded or mRNA versus total RNA conditions. This group can be considered as the reference group in terms of performance and acceptable metrics levels. For the 100 ng input groups, the results are more contrasted. The two TruSeq groups (stranded mRNA and stranded total RNA) performed well, with a total number of reads and mapping rate comparable to that of the 1 μg groups. Their duplication rates were higher than for the 1 μg groups, although still in the acceptable ranges. The RiboGone stranded total RNA SMARTer group had a significantly lower mapping rate, while the last group, the RiboZero stranded total RNA SMARTer showed the lowest number of reads, a relatively low mapping rate and a very high duplication rate. Quality metrics for the 10 ng samples showed very poor results in all samples compared to the reference 1 μg group. The number of reads is extremely low, the mapping rate is relatively low and the duplication rate very high. On the other hand, the metrics for the two ultra-low input amounts are reasonably good compared to the reference samples, with a relatively high number of reads and a very high mapping rate and low duplication rate, although this last metric should be interpreted cautiously for this group, for the reasons mentioned above. Read quality metrics analysis showed that rRNA depletion methods (RiboGone and especially RiboZero) have a negative effect on RNA-seq quality. The rRNA depletion method was efficient in removing rRNA and tRNA, but was less efficient than a polyA selection method. Notably, it involves one more purification step than for polyA selection and the method loses more RNA of interest. However, rRNA depletion methods are able to give access to other types of RNA molecules (such as long non-coding RNAs or nascent RNAs) because of the use of random primers for the reverse transcription step, and these methods could also be more efficient than polyA selection in cases of low quality RNA (for example, a RIN score <6).
Figure 2 shows the normalized rates of mapping to intergenic, exonic and intronic regions of the genome. As might be expected, there is a clear distinction between the mRNA and total RNA samples. The mRNA samples all map predominantly to exonic regions, then to intronic and finally to intergenic regions. On the other hand, for the total RNA samples an almost equal proportion of reads map to the exonic and intronic regions, and considerably less reads map to the intergenic regions (although this smaller proportion is still higher than for the mRNA samples). All the mRNA samples have similar proportions, with a slight decrease in the exonic regions for the ultra-low (SMARTer UL) groups. In the total RNA samples, the rates are very similar for the different kits and input amounts.
Gene detection and counting
Figure 3 shows the number of genes detected for each sequencing protocol and sequencing depth (2 × 2, 2 × 5, 2 × 10 and 2 × 25 M reads). We used a low threshold whereby a gene was considered expressed if it had a CPM (counts per million) value greater than or equal to one in two replicates (see methods). Overall, the different groups showed similar values for the number of genes detected. As also shown in Fig. 3, the number of detected genes increases with the sequencing depth for all the categories, and the 2 × 2 M sequencing depth category has the lowest number of detected genes in all the groups. Only the ultra-low input samples show a slight decrease in the number of genes detected. Note that if we take the top 5,000 expressed genes in each condition, the number of genes common to all conditions is equal to 1,854. If we take the standard input amount and the low input amount categories shown in Fig. 4A,B, we can see that the proportion of non-coding RNAs and pseudogenes is higher for the total RNA groups. This is easily explained by the fact that non-coding transcripts are much better covered by total RNA protocols. The proportions of protein-coding, non-coding and pseudogenes are similar for the low and ultra-low input amount samples (10 ng stranded total RNA SMARTer and mRNA SMARTer UL 750 and 130 pg, see Supplementary Fig. S1). Figure 5 shows the normalized percentage of mapped reads from the 5′ end to the 3′ end of the genes. For the mRNA Truseq 1 μg, we clearly see a coverage bias towards the 3′ end of the genes. This is a known fact and is a consequence of the polyA selection for these samples. The profiles are much more balanced for the stranded total RNA TruSeq, a consequence of the random priming process. For the low input quantities (100 ng and 10 ng), we can see that the coverage is much more random due to the dilution and a random priming of the reads along the gene. Gene counts for 0.1, 1 and 10 CPM (counts per million) thresholds are displayed in Supplementary Table S2 for the reference sample unstranded mRNA TruSeq 1 μg. A ROC curve showing the true detection rate as a function of the false detection rate for all input quantities and preparation kits at the 2 × 25 M sampling level is given in Supplementary Fig. S2. The Supplementary Fig. S3 shows scatter plots and correlation values between detected genes for all conditions at the 2 × 25 M sampling level and detected genes for the reference condition unstranded mRNA TruSeq 1 μg at the 2 × 25 M sampling level.
In this section, we evaluate the impact of the different kits and input amounts on a typical differential gene expression analysis. For each condition, we compute the list of differentially expressed genes (DEG) between samples A and B (with two technical replicates for each sample) with EdgeR23, at a given sampling depth (2 × 2 M reads, 2 × 5 M reads, 2 × 10 M reads and 2 × 25 M reads) and for a given corrected p-value threshold. We obtained, as a result, a list of DEG that can be compared between the different conditions. Figure 6 shows the number of differentially expressed genes (DEG) in sample A (mixture of normal human tissues) and B (human brain tissues) replicates. For the standard input unstranded mRNA samples, a minimum of 11,059 DEG can be seen for the 2 × 2 M sampling level and a maximum of 17,855 DEG for the 2 × 15 M sampling level. The number of DEG is similar for the standard stranded mRNA, while a slight decrease can be observed for the 1 μg total RNA samples: this is visible at all sampling levels. For the low input (100 ng) samples, the DEG profiles observed for the stranded mRNA and total RNA TruSeq are very similar to those of their standard counterparts. In contrast, there is a clear decrease in the number of DEG detected for the low input amount (100 ng) RiboGone and RiboZero stranded SMARTer samples, at all sampling levels. Out of all the groups, the low input (10 ng) RiboZero stranded SMARTer samples clearly show the lowest number of DEG for all sampling levels. The number of DEG for the ultra-low samples is clearly higher at all sampling levels compared to the low input (10 ng) RiboZero stranded total RNA SMARTer samples. The number of DEG in the ultra-low samples is comparable to that of the low input (100 ng) RiboZero stranded total RNA SMARTer, with even higher levels observed for the sampling levels 2 × 2 M and 2 × 5 M. Overall, despite their very low input amounts, the ultra-low samples performed reasonably well for the detection of DEG. Note that if we take the standard rate of 5% FDR (false discovery rate) level, we obtain a total of 2,463 DEG common to all the conditions. We calculated Gene Ontology biological processes enrichment for this list of genes using the enrichR tool24. We found a total of 285 biological processes categories enriched with a p-value < 0.05. The complete list of categories with p-values, adjusted p-values, proportions, scores and list of genes associated to each category is available in Supplementary Dataset S3. Many of the top enriched categories are related to the central nervous system, such as “central nervous system development”, “axonogenesis”, “glutamate receptor signaling pathway”, “neuron projection morphogenesis” or “neurotransmitter transport”. This is expected, since the B sample is composed of various brain tissue extracts while the A sample is a mixture of various tissues (including some brain tissues). There are nonetheless other types of biological processes categories enriched, such as “cytoskeleton organization” or “elastic fiber assembly”.
Table 2 shows an overlap in the number of DEG between the unstranded mRNA TruSeq 1 μg samples used as the reference set and a selection of kits/input amounts. The overlap with the reference set is highest for the 1 μg input amounts (91% and 79%), but is still relatively high for the 100 ng categories (89% and 84%), and, as expected, is lower for the ultra-low input amount (54% and 55%) but still above 50%. This is not the case for the 10 ng RiboZero stranded total RNA SMARTer which falls below this level to 26%, thus much lower than the ultra-low samples. The complete list of DEG with ensembl gene IDs, log fold change, log CPM, p-values and FDR for mRNATruseq_1 ug at 2 × 25 M sampling depth is available in Supplementary Dataset S2.
Gene expression accuracy
The samples labelled C and D are a mixture of the Human Universal Reference Total RNA (A) and Human Brain Reference RNA (B), with ratios of 75% A + 25% B and 25% A + 75% B respectively. Since we have the gene expression levels for samples A and B, we can predict the theoretical gene expression values for the mixture samples C and D and compare these to the values obtained by analyzing the samples. Thus we can evaluate the accuracy of the gene expression levels in samples C and D for all conditions tested. The boxplots in Figs 7 and 8 show the gene expression level ratios between the real and predicted values for samples C and D respectively. The red dotted line in the figures indicates the perfect agreement (ratio value of 1) between the real and the predicted values. For the sample C ratios (Fig. 7), in most cases the median value for each group is aligned or very close to the red dotted line, indicating very good agreement between the predicted and real gene expression values. For the sample D ratios, the agreement is globally good, but as can be seen from the figure there are larger deviations from the red dotted line, for instance for the standard input TruSeq stranded total RNA, the low input (100 ng) TruSeq stranded mRNA group, the low input (10 ng) RiboZero stranded total RNA SMARTer group and the ultra-low input (130 pg) mRNA SMARTer UL. We also clustered samples A, B, C and D according to the normalized counts (Supplementary Fig. S4) and in most cases, the samples clustered logically, showing the C samples to be more closely related to the A samples, and the D samples to be more closely related to the B samples. Furthermore, mRNA samples and Total RNA samples also cluster together, as well as A, B, C and D samples. Interestingly, we can see on the figure that several RiboZero stranded total RNA Smarter samples with 10 ng input amount do not cluster well with the other samples and form outliers.