image representing the current explore topic

MapMyCells: Output files

MapMyCells produces two output files. A “standard” CSV output file and an “extended” JSON output file. These files are archived into a single .zip file for download. Modern operating systems all natively support unpacking zip files, usually via a right-click + "Extract all" command.

 

  • validation_log.txt: Log of messages produced by job. Even returned for failed jobs. Useful for debugging. If the mapping failed, this is probably the file you want.

  • my_job.csv: Returned by all algorithms. CSV table of mapping results. If the mapping worked, this is probably the file you want.

  • my_job.json: Only returned by Hierarchical and Flat mapping. More detailed results and metadata stored in a JSON file.

  • my_job_summary_metadata.json: JSON file recording number of cells mapped to cell types and number of genes mapped to Ensembl IDs.

 

To extract the individual files in the command line run

tar -xf path/to/downloaded/file.zip

at which point, the constituent CSV and JSON files should appear in your current working directory.

 

Alternatively, run

tar -xvf my_tar_file.zip --directory my_directory

to unpack the files to an existing directory of your choice, e.g. my_directory.

 

The contents of these files are documented in detail here: https://github.com/AllenInstitute/cell_type_mapper/blob/main/docs/output.md 

 

At a high level, suffice it to say that, modulo a few lines of metadata prefixed with a ‘#’, the CSV file is meant to be read into a dataframe as in (for Python)

import pandas
data_frame = pandas.read_csv(‘path/to/output.csv’, comment=’#’)

or an Excel spreadsheet. The JSON file is the serialized representation of a dict with more detailed results for those comfortable with deserializing JSON blobs. The JSON file is also where the metadata associated with the mapping run is saved.