To investigate the cellular diversity across human cortex, a low-bias approach to profile cell-type diversity was sought, constrained by the challenge of working with precious and limited tissue sources. Individual layers of cortex were dissected from tissues covering the middle temporal gyrus (MTG), anterior cingulate gyrus (CgGr), primary visual cortex (V1C), primary motor cortex (M1C), primary somatosensory cortex (S1C) and primary auditory cortex (A1C) derived from human brain, and nuclei were dissociated and sorted using the neuronal marker NeuN. Nuclei were sampled from postmortem and neurosurgical (MTG only) donor brains, and expression was profiled with SMART-Seq v4 or 10x v3 RNA-sequencing.
This database of cell types includes experimental data derived from adult human brain. Human brain tissue samples from either postmortem or neurosurgical origin were made available through the generosity of tissue donors. Clinical summaries and donor characteristics are provided in this document as well as a description of the criteria for acceptance of use in this study.
To prepare and archive tissues from suitable cases, whole postmortem brain specimens were bisected through the midline, and individual hemispheres were embedded in alginate for slabbing. Coronal brain slabs were cut at 0.5-1cm intervals through each hemisphere and the slabs were then frozen in a bath of dry ice and isopentane, vacuum sealed in freezer bags to prevent frost damage, and stored at -80°C until use. Regions of interest were subsequently removed from tissue slabs, sectioned on a vibratome and processed for nuclei isolation.
Neurosurgical donor tissue (MTG only) was received from patients undergoing surgery for epilepsy or brain tumors. The tissue blocks received were distal, apparently normal cortical tissue removed to access underlying pathological brain tissues. Tissue was transported in chilled ACSF, sectioned at 350µm and stored at -80°C until they were processed for nuclei isolation.
Single nuclei were captured by gating on DAPI-positive events, excluding debris and doublets, and then gating on NeuN signal, which allowed for the isolation of either NeuN-positive (neuronal) or NeuN-negative (non-neuronal) events.
SMART-Seq v4 Ultra Low Input RNA Kit for Sequencing (Takara #634894) was used per the manufacturer’s instructions for cDNA synthesis of single-cell RNA and subsequent amplification. Sequencing libraries were prepared using the NexteraXT DNA Library Preparation kit (Illumina FC-131-1096) with NexteraXT Index Kit V2 Set A, B, C, or D (FC-131-2001, 2002, 2003, or 2004) or custom 8-base or 10-base Unique Design index primers designed and manufactured by IDT (Integrated DNA Technologies). NexteraXT DNA Library prep was done at either 0.5x volume manually or 0.4x or 0.2x volume on the Mantis instrument (Formulatrix). Pooled sequencing libraries were sent to an outside vendor for sequencing on an Illumina HiSeq 2500 instrument. All of the library pools were run using Illumina High Output V4 chemistry. RNA sequencing services were provided by Covance Genomics Laboratory, Seattle subsidiary of LabCorp Group of Holdings, and The Broad Institute Genome Sequencing Platform.
SMART-Seq v4 (1x) amplification
SMART-Seq v4 (0.5x) amplification
Raw read (fastq) files were aligned to the GRCh38 human genome sequence (Genome Reference Consortium, 2011) with the RefSeq transcriptome version GRCh38.p2 (current as of 4/13/2015) and updated by removing duplicate Entrez gene entries from the gtf reference file for STAR processing. For alignment, Illumina sequencing adapters were clipped from the reads using the fastqMCF program. After clipping, the paired-end reads were mapped using Spliced Transcripts Alignment to a Reference (STAR) using default settings. Reads that did not map to the genome were then aligned to synthetic construct (i.e. ERCC) sequences and the E. coli genome (version ASM584v2). Quantification was performed using summerizeOverlaps from the R package GenomicAlignments. Expression levels were calculated as counts per million (CPM) of exonic plus intronic reads.
Single nucleus suspensions were frozen in a solution of 1X PBS, 1% BSA, 10% DMSO, and 0.5% RNAsin Plus RNase inhibitor (Promega, N2611) and stored at -80°C. At the time of use, frozen nuclei were thawed at 37°C and processed for loading on the 10x Chromium instrument as described (dx.doi.org/10.17504/protocols.io.nx3dfqn). Samples were processed using the 10x Chromium Single Cell 3’ Reagent Kit v3. 10x chip loading and sample processing was done according to the manufacturer’s protocol. Gene expression was quantified using the default 10x Cell Ranger v3 pipeline except substituting the curated genome annotation used for SMART-seq v4 quantification. Introns were annotated as “mRNA,” and intronic reads were included in expression quantification.
Nuclei were included in the clustering analysis if they passed all QC criteria.
SMART-seq v4 criteria:
10x v3 criteria:
Nuclei passing QC criteria were grouped into transcriptomic cell types using an iterative clustering procedure previously reported in (Tasic et al. 2018; Hodge, Bakken et al., 2019). Briefly, intronic and exonic read counts were summed, and log2-transformed expression was centered and scaled across nuclei. X- and Y-chromosomes and mitochondrial genes were excluded to avoid nuclei clustering based on sex or nuclei quality. Differentially expressed genes were selected, principal components analysis (PCA) reduced dimensionality, and a nearest neighbor graph was built using up to 20 principal components. Clusters were identified with Louvain community detection (or Ward's hierarchical clustering if N < 3000 nuclei), and pairs of clusters were merged if either cluster lacked marker genes. Clustering was applied iteratively to each sub-cluster until clusters could not be further split.
Cluster robustness was assessed by repeating iterative clustering 100 times for random subsets of 80% of nuclei. A co-clustering matrix was generated that represented the proportion of clustering iterations that each pair of nuclei were assigned to the same cluster. We defined consensus clusters by iteratively splitting the co-clustering matrix as described (Tasic et al. 2018; Hodge, Bakken et al., 2019).
Clusters were curated based on outlier values of the initial QC values or cell class marker expression (GAD1, SLC17A7, SNAP25). Clusters were identified as donor-specific if they included fewer nuclei sampled from donors than expected by chance. To confirm exclusion, clusters automatically flagged as outliers or donor-specific were manually inspected for expression of broad cell class marker genes, mitochondrial genes related to quality, and known activity-dependent genes.
The clustering pipeline is implemented in the R package “scrattch.hicat”, and the clustering method is provided by the “run_consensus_clust” function.
Data generation was supported by multiple awards, including Brain Initiative Cell Census Network (BICCN) award U01MH114812 from the National Institute of Mental Health and the National Institute of Neurological Disorders and Stroke, and by the Allen Institute for Brain Science.