File Requirements and Limits

Key points before starting

MapMyCells can map any matrix in which rows are "cells" and columns are genes! MapMyCells will attempt to run on anything that “looks like” cell-by-gene data. It will do great with single cell sequencing data and spatial transcriptomics data but will probably not work well with bulk sequencing data. In will also work on single-cell epigenetics data that has been summarized to gene-level metrics in some situations, but review the results carefully. While we strongly encourage outputting matrices in which rows are "cells" and columns are genes, MapMyCells can transpose the data if needed.

MapMyCells expects genes in user data to match the species of the reference taxonomy. This means that cross species mappings require appropriate orthologs in your input data. An R library and precomputed csv files to convert NCBI gene IDs, Ensembl gene IDs, and gene symbols between species is available in this GitHub repository. Users should ideally provide Ensembl gene IDs in their h5ad file, but gene symbols or NCBI Gene IDs can also be provided.

MapMyCells has a file limit of 2 GB. Code for compressing data and for splitting data set into multiple input files are included below, or use the code version which does not have a size restriction.

MapMyCells can accept h5ad files or csv files as input. H5ad files are produced by AnnData, a widely used tool for creating, manipulating, and saving large data matrices, such as for expression data. If your cell-by-gene data is in csv format you can either directly upload to MapMyCells or follow the R or Python guides below to convert to h5ad. Any additional data in the h5ad file will be ignored for mapping, but may be useful for downstream analyses.

Example mouse file Example human file Convert genes between species Cite this tool

Creating csv input files

All programming languages and -omics software applications provide standard methods for outputting matrics to csv files. For example, the fwrite() function in the data.table library and the savetxt() function in the NumPy library will efficiently output numeric matrices to csv in R and python, respectively. To decrease file size, gzip compression is encouraged (leading to csv.gz file extensions). This can be done directly as files are written out, or through third-party applications (like 7-Zip, which comes standard on Windows).

As an example, a well-formatted csv file should look something like this:

,ENSMUSG00001,ENSMUSG00002,ENSMUSG00003,...
cell0,0,1,2,...
cell1,1,0,0,...
cell2,3,0,1,...
...

Creating h5ad input files in R & Python (alternative to csv input)

Using Python

We provide scripts and R and python for converting files from standard file formats (csv, hd5f, h5ad) into compressed h5ad files ready for upload to MapMyCells. The scripts are broken up into two sections:

Input: how to read in your data into R/python and store it in AnnData object.
Output: how to output your variable in a compressed h5ad file, check the size, and then split the output file into multiple files for upload to MapMyCells if the size exceeds 2GB.

These scripts require access to R or python through user-provided means (e.g., installed on a local computer or accessed via cloud computing). R can be installed at CRAN, while python can be installed through Miniconda. Instructions for how to run the python script on Google Colaboratory are in development.

Download python script Download R script

Input file requirements, limits, and creation

Key points before starting

Creating csv input files

Creating h5ad input files in R & Python (alternative to csv input)

Using Python

Your browser is out-of-date!

Allen Institute

Connect

Contact