Step-by-step guide to using MapMyCells

While MapMyCells itself is straightforward (drag and drop!), there are a few steps before and after and some important tips and tricks to get the most out of the tool. This page goes through the steps from preparing your data for mapping.

Step 1: Preparing your data for mapping

MapMyCells requires a cell (rows) by gene (columns) matrix in an h5ad, csv, or csv.gz format as input. Gene identifiers are recommended, but gene symbols are also allowed. The green button walks through how to set up an input file, while the yellow buttons include start-to-finish R code for all steps of the process.

Input File Requirements & Conversion using R or python

Start to finish: human middle temporal gyrus Start to finish: mouse motor cortex

Frequently asked questions:

Can I map rat data to mouse brain cell types? While mapping between species is not officially supported, we have found that it works well for phylogenetically related species in some cases. Use GeneOrthology to convert gene symbols between species.
Can I use MapMyCells for bulk RNA-seq data? No. While MapMyCells will technically run with any sample by gene matrix, since each sample only gets a single cell type assignment and bulk tissue contains multiple cell types, these results will not be meaningful.
What should my h5ad file look like? The links above all provide the structure of h5ad files and links to examples. We also provide an example mouse h5ad file HERE.
I’ve never coded before. How do I make an h5ad file? We recognize this may be a challenge for some folks and plan to provide other options in the future. For now, we recommend Google Colaboratory which can run python code online with no setup needed.

Step 2: Mapping your data to a reference

MapMyCells provides a variety of reference taxonomies and algorithms for mapping using both a web interface and code-based approaches, which all rely on the same file format. The blue buttons link to the mapping tools, while the green button describes the output files.

Go directly to the MapMyCells UI Run the code on your own

Understanding algorithm output

Frequently asked questions:

I have too many cells to upload to MapMyCells. What should I do? The web interface requires cells <2GB in size. If your file is too big even after compression (see input links above) you have two options: (1) split your data set into multiple input files to upload separately, or (2) use the code version which does not have a size restriction. Since all algorithms treat each cell separately, you should get the same answer no matter how you divide your data set (although there might be very slight differences due to random sampling).
Can I map to only a subset of cell types in one of the reference taxonomies? Currently this is ONLY possible using code (some more details here). This functionality will be added to the web interface in 2025.
I’d like to create my own reference taxonomy to map against. How do I do that? While we encourage the use of reference cell type hosted on this site, we recognize this is not always possible. Scrattch.taxonomy provides both standard schema and associated file format for creation of cell type taxonomies in an h5ad format compatible with both R and python. You can also follow the approach from the previous question.
How should I decide which algorithm to use? In most cases, we recommend using the default. You could also consider trying multiple algorithms and comparing the results (see below). Regardless of the algorithm chosen, it is important to check that your mapping results make sense. More on this below.
Can I use other mapping algorithms? Currently three mapping algorithms are supported in MapMyCells. Additional approaches are available in scrattch.mapping (the companion package to scrattch.taxonomy). We also provide data for all taxonomies for download.
I’ve run MapMyCells and I can’t find my mapping results. What should I do? If your unpacked zip file does not have csv file with mapping results, check the validation log, which often will provide sensible messages explaining why your mapping failed. In most cases there is a problem with the input file. If you are still stuck, post a question to the Community Forum. Be sure to include your run ID, which looks something like this: “1698255812324-4d53ffc5-9c7c-4dff-b0b4-e4caf0923569”.
I’m struggling to understand how to use the python code. What should I do? The GitHub repository includes detailed documentation on recommended workflows, input formats, and outputs, including step-by-step Jupyter notebooks for defining and mapping against any taxonomy. If you are still stuck after reviewing these notebooks, post a question to the Community Forum.

Step 3: Using mapping results

A common reason for running MapMyCells is to assign cell type names for user data that would replace or supplement the clustering analysis typically done as part of a single cell omics experiment. Therefore, a critical next step is to connect the cell type names and confidences returned from MapMyCells with original cell-level information in your study. The black button goes to an interactive tools to visualize and sanity check your mapping results after joining these data together, while the yellow buttons link to start-to-finish R scripts showing similar comparisons.

Interactive exploration: Annotation Comparison Explorer

Using R code: human middle temporal gyrus Using R code: mouse motor cortex

Frequently asked questions:

Can I really forego clustering? In many cases, yes! If you are collecting cells from healthy mouse brain or from human neocortex in healthy or Alzheimer’s disease donors, our reference taxonomies will likely include all cell types in your sample. To improve cross-study comparison, we encourage using our cell types instead of defining your own.
How can I compare clustering and mapping results? Annotation Comparison Explorer (or ACE; above) provides interactive visualizations comparing any annotations. This includes confusion matrices and river plots comparing clustering and mapping results, and visualizations to explore individual clusters. When using ACE be sure to create a single csv file containing your mapping results, clustering, and any other relevant metadata. The R code examples provide additional code-based approaches for comparison.
I’ve successfully used MapMyCells. How can I cite your work? To cite the tool, please refer to MapMyCells as “MapMyCells (RRID:SCR_024672)” (see this link). We would also encourage users to cite the manuscript for the reference taxonomy.

Getting help

The Allen Institute provides and ever-growing list of written (linked above) and video materials to aid users in successful use of MapMyCells. We also have an active Community Forum for reviewing previously asked questions and asking your own.

MapMyCells webinar (47 min) Video tutorial (24 min) Community Forum

Step-by-step guide to using MapMyCells

Your browser is out-of-date!

Allen Institute

Connect

Contact