image representing the current explore topic

(NEW) MapMyCells: Input file requirements, limits, and creation

Key points before starting

MapMyCells can map any matrix in which rows are "cells" and columns are genes! MapMyCells will attempt to run on anything that “looks like” cell-by-gene data. It will do great with single cell sequencing data and spatial transcriptomics data but will probably not work well with bulk sequencing data. In will also work on single-cell epigenetics data that has been summarized to gene-level metrics in some situations, but review the results carefully.

MapMyCells requires an h5ad file format. These files are produced by AnnData, a widely used tool for creating, manipulating, and saving large data matrices, such as for expression data. If your cell-by-gene data is in csv format, the guide below walks through how to convert to h5ad using R or Python. Additional data can be provided in the h5ad file but will be ignored for mapping.

MapMyCells expects genes in user data to match the species of the reference taxonomy. This means that cross species mappings require appropriate orthologs in your input data. An R library and precomputated csv files to convert NCBI gene IDs, Ensembl gene IDs, and gene symbols between species is available in this GitHub repository.  Users should ideally provide Ensembl gene IDs in their h5ad file, but gene symbols or NCBI Gene IDs can also be provided.

MapMyCells has a file limit of 2 GB. Code for compressing data and for splitting data set into multiple input files are included below, or use the code version which does not have a size restriction.

Example mouse file Example human file Convert genes between species Cite this tool

Creating h5ad input files in R & Python

We provide scripts and R and python for converting files from standard file formats (csv, hd5f, h5ad) into compressed h5ad files ready for upload to MapMyCells.  The scripts are broken up into two sections:

  1. Input: how to read in your data into R/python and store it in AnnData object.
  2. Output: how to output your variable in a compressed h5ad file, check the size, and then split the output file into multiple files for upload to MapMyCells if the size exceeds 2GB.

These scripts require access to R or python through user-provided means (e.g., installed on a local computer or accessed via cloud computing). R can be installed at CRAN, while python can be installed through Miniconda. Instructions for how to run the python script on Google Colaboratory are in development.

Download python script [FIX LINK] Download R script [FIX LINK]