Single cell sequencing

From Infogalactic: the planetary knowledge core
Jump to: navigation, search

Single cell sequencing examines the sequence information from individual cells with optimized next generation sequencing (NGS) technologies, providing a higher resolution of cellular differences and a better understanding of the function of an individual cell in the context of its microenvironment.[1]

Background

A typical human cell consists of about 6 billion base pairs of DNA and 600 million bases of mRNA. With such huge amount of sequence, it is expensive and time-consuming to sequence by traditional Sanger sequencing. By using deep sequencing of DNA and RNA from single cell, cellular functions can be investigated extensively.[1] Like typical NGS experiments, the protocols of a single cell sequencing generally contain the following steps: isolation of single cell, nucleic acids extraction and amplification, sequencing library preparation, sequencing and bioinformatic data analysis. It is more challenging to perform single cell sequencing in comparison with sequencing from cells in bulk. The minimal amount of starting materials from a single cell make degradation, sample loss and contamination exert pronounced effects on quality of sequencing data. In addition, due to the picogram level of the amount of nucleic acids used,[2] heavy amplification is often needed during sample preparation of single cell sequencing, resulting in the uneven coverage, noise and inaccurate quantification of sequencing data.

Recent technical improvements make single cell sequencing a promising tool for approaching a set of seemly inaccessible problems. For example, heterogeneous samples, rare cell types, cell lineage relationships, mosaicism of somatic tissues, analyses of microbes that cannot be cultured, and disease evolution can all be elucidated through single cell sequencing.[3] Single cell sequencing was selected as the method of the year 2013 by nature publishing group.[4]

Single cell genome (DNA) sequencing

Single cell DNA genome sequencing involves isolating a single cell, performing whole-genome-amplification (WGA), constructing sequencing libraries and then sequencing the DNA using a next-generation sequencer (ex. Illumina). It can be used in metagenomics studies and when sequencing the first time from novel species. In addition, it can be united with high throughput cell sorting of microorganisms and cancer. One popular method used for single cell genome sequencing is multiple displacement amplification and this enables research into various areas such as microbial genetics, ecology and infectious diseases. Furthermore, data obtained from microorganisms might establish processes for culturing in the future.[5] Some of the tools that can be used for single cell genome sequencing include: SPAdes, IDBA-UD, Cortex and HyDA.[6]

Method

File:Single Cell Genome Sequencing Workflow.pdf
This figure illustrates steps involved in workflow of single cell genome sequencing. MDA stands for multiple displacement amplification.

Multiple displacement amplification (MDA) is widely used technique, enabling amplifying femtograms of DNA from bacterium to micrograms for the use of sequencing. Reagents required for MDA reactions include: random primers and DNA polymerase from bacteriophage phi29. In 30 degree isothermal reaction, DNA is amplified with included reagents. As the polymerases manufacture new strands, a strand displacement reaction takes place, synthesizing multiple copies from each template DNA. At the same time, the strands that were extended antecedently will be displaced. MDA products result in a length of about 12 kb and ranges up to around 100 kb, enabling its use in DNA sequencing.[5] Other method includes MALBAC.[7]

Limitations

In comparison to PCR, MDA has higher amplification bias, either overrepresenting or underrepresenting different regions of the template, resulting in loss of some sequences. Because of this, amplification bias from one MDA reaction will be represented to the next reaction as well. Two ways to improve genome coverage include: to pool single-cell MDA reaction from the same cell type and to pool before the reaction is performed. Several ways to identify cells of same strain are: fluorescent in situ hybridization (FISH) and conformation characteristics.[5]

Single-nucleotide polymorphisms (SNPs), which are a big part of genetic variation to the human genome, and copy number variation (CNV), pose problem in single cell sequencing as well as the amount of DNA extracted from a single cell is very limited. Due to its scantiness of DNA, accurate analysis of DNA pose problem even after amplification as its coverage is low and susceptible for errors. With MDA, average genome coverage is less than 80% and SNPs that are not covered during sequencing reads will be opted out. In addition, MDA shows high ratio of allele dropout, not detecting alleles from heterozygous samples. There are various SNP algorithms at use but currently there are none specific to single cell sequencing. MDA with CNV also pose a problem in that it identified false CNVs that conceal the real CNVs. To solve this, when patterns can be generated from false CNVs, algorithm can detect and eradicate these noises to produce true variants.[7]

Applications

Microbiomes are the major targets of single cell genomics due to its difficulty for culturing. Single cell genomics is the one way to identify microbiomes’ identities and its genomes. The first microorganism used for single cell sequencing was a bacterium. When the data will be assembled in the near future, several new functions of these organisms might be discovered and might provide pros and cons regarding human health.[8]

Cancer sequencing is also an emerging application of scDNAseq. Fresh or frozen tumors may be analyzed and categorized with respect to SCNAs, SNVs, and rearrangements quite well using whole genome DNAS approaches [9] Cancer scDNAseq is particularly useful for examining the depth of complexity and compound mutations present in amplified therapeutic targets such as receptor tyrosine kinase genes (EGFR, PDGFRA etc.) where conventional population-level approaches of the bulk tumor are not able to resolve the co-occurrence patterns of these mutations within single cells of the tumor. Such overlap may provide redundancy of pathway activation and tumor cell resistance.

Single-cell RNA sequencing (scRNA-seq)

Current methods for quantifying molecular states of cells, from microarray to standard RNA-seq analysis, mostly depend on estimating the mean value from millions of cells by averaging the signal of individual cells. Given the heterogeneity of cell population, measurement of the mean values of signals overlooks the internal interactions and differences within a cell population that may be crucial for maintaining normal tissue functions and facilitating disease progression. Thus the cell-averaging experiments provide only partial information of the molecular state of the system.[10][11]

Single-cell RNA sequencing (scRNA-seq) provides the expression profile of individual cells. Through genes clustering analyses, rare cell types within a cell population can be identified, thereby making characterization of the subpopulation structure of a heterogeneous cell population become available. While tumor heterogeneity can be attributed to accumulated mutations, even genetically identical cells, under the same environment, display high variability of gene and protein expression levels.[12] However, RNA with low copy number, which may exert important functions in the cells, is usually undetectable or regarded as noise in traditional cell-averaging methods. Single-cell RNA sequencing on a large number of single cells can identify such uncommon RNA and also reveal the copy-number distribution of the whole mRNA population in individual cells. Knowledge about the shape of distribution can be used to understand the mechanisms of transcription regulation.[11][13]

Experimental procedures

File:RNA-Seq workflow-5.pdf
Single-cell RNA sequencing workflow

Despite the advances in sequencing technologies, it is still unattainable to sequence RNA directly from single cell. Thus, in the current scRNA-seq protocols, RNA still needs to be converted to cDNA for sequencing. Principally, the current scRNA-seq methods contain the following steps: isolation of single cell and RNA, reverse transcription (RT), amplification, library generation and sequencing.

The ideal scRNA-seq preserves and accurately quantifies the initial relative abundance of mRNA in a cell, covers the entire transcript lengths with equal representation at each position, and retains strand information.[13] Nevertheless, a variety of noise and bias may be introduced in various steps of scRNA-seq protocol. For example, the step of reverse transcription is critical as the efficiency of the RT reaction determines the percentage of a cell’s RNA population that is eventually analyzed by the sequencer. The processivity of reverse transcriptases and the priming strategies used will affect full-length cDNA production and the generation of libraries biased toward 3’ or 5' end of genes.

In the amplification step, either PCR or in vitro transcription (IVT) is currently used to amplify cDNA. One of the advantages of PCR-based methods is able to generate full-length cDNA. However, different PCR efficiency on particular sequences (for instance, GC content and snapback structure) will also be exponentially amplified, producing libraries with uneven coverage. On the other hand, while libraries generated by IVT can avoid PCR-induced sequence bias, specific sequences may be transcribed inefficiently, thus causing sequence drop-out or generating incomplete sequences.[1][10] Several scRNA-seq protocols have been published: Tang et al.,[14] STRT,[15] SMART-seq,[16] CEL-seq[17] and Quartz-seq.[18]

Applications

The number of circulating tumor cells (CTC) in peripheral blood of cancer patients has been shown to correlate to prognosis.[19] However, it is challenging to enumerate and characterize the isolated CTCs as they are often contaminated with a large number of leukocytes and erythrocytes. Single cell RNA-seq could be applied to differentiate cancer cells from normal blood cells and obtain the expression profiles of tumor cells at the same time. Similarly, single cell RNA-seq can also be used to analyze rare cell types in early human embryo and adult stem cells, both of which exist transiently and difficult to be characterized with current technologies.

Considerations

Isolation of single cells

There is currently no standardized technique for single-cell isolation. Individual cells can be collected by micromanipulation, for example by serial dilution or by using a patch pipette or nanotube to harvest a single cell.[20][21] The advantages of micromanipulation are ease and low cost, but they are laborious and susceptible to misidentification of cell types under microscope. Laser-capture microdissection (LCM) can also be used for collecting single cells. Although LCM preserves the knowledge of the spatial location of a sampled cell within a tissue, it is hard to capture a whole single cell without also collecting the materials from neighboring cells.[10][22][23] High-throughput methods for single cell isolation include fluorescence-activated cell sorting (FACS) and microfluidics. Both of FACS and microfluidics are accurate, automatic and capable of isolating unbiased samples. However, both methods require detaching cells from their microenvironments first, thereby causing perturbation to the transcriptional profiles in RNA expression analysis.[24][25]

Number of cells to be analyzed

scRNA-Seq

Generally speaking, for a typical bulk cell RNA-sequencing (RNA-seq) experiment, ten million reads are generated and a gene with higher than the threshold of 50 reads per kb per million reads (RPKM) is considered expressed. For a gene that is 1kb long, this corresponds to 500 reads and a minimum coefficient of variation (CV) of 4% under the assumption of the Poisson distribution. For a typical mammalian cell containing 200,000 mRNA, sequencing data from at least 50 single cells need to be pooled in order to achieve this minimum CV value. However, due to the efficiency of reverse transcription and other noise introduced in the experiments, more cells are required for accurate expression analyses and cell type identification.[10]

References

  1. 1.0 1.1 1.2 Lua error in package.lua at line 80: module 'strict' not found.
  2. Lua error in package.lua at line 80: module 'strict' not found.
  3. Lua error in package.lua at line 80: module 'strict' not found.
  4. Lua error in package.lua at line 80: module 'strict' not found.
  5. 5.0 5.1 5.2 "Lua error in package.lua at line 80: module 'strict' not found."
  6. Lua error in package.lua at line 80: module 'strict' not found.
  7. 7.0 7.1 "Lua error in package.lua at line 80: module 'strict' not found."
  8. Lua error in package.lua at line 80: module 'strict' not found.
  9. (Francis J, Zheng CZ, Maire C et al. Cancer Discovery 2014).
  10. 10.0 10.1 10.2 10.3 "Lua error in package.lua at line 80: module 'strict' not found."
  11. 11.0 11.1 Lua error in package.lua at line 80: module 'strict' not found.
  12. Lua error in package.lua at line 80: module 'strict' not found.
  13. 13.0 13.1 "Lua error in package.lua at line 80: module 'strict' not found."
  14. Lua error in package.lua at line 80: module 'strict' not found.
  15. Lua error in package.lua at line 80: module 'strict' not found.
  16. Lua error in package.lua at line 80: module 'strict' not found.
  17. ,Lua error in package.lua at line 80: module 'strict' not found.
  18. Lua error in package.lua at line 80: module 'strict' not found.
  19. Lua error in package.lua at line 80: module 'strict' not found.
  20. Lua error in package.lua at line 80: module 'strict' not found.
  21. Lua error in package.lua at line 80: module 'strict' not found.
  22. Lua error in package.lua at line 80: module 'strict' not found.
  23. Lua error in package.lua at line 80: module 'strict' not found.
  24. Lua error in package.lua at line 80: module 'strict' not found.
  25. Lua error in package.lua at line 80: module 'strict' not found.