Skip to main content
Version: 13.1.1

How to structure the subjects metadata file

Background

SODA helps you prepare the subjects metadata file conveniently. While SODA automatically generates the file in the required structure, we explain here how it must be structured according to the SPARC rules in order to provide some insight about the structure of the file generated by SODA.

How to

  • Format: The subjects file is accepted in either xlsx, csv, or json format. SODA generates it in the xlsx format based on the template provided by the Curation Team.
  • Location in the dataset: The subjects file is typically expected in the high-level dataset folder.
  • Content: The subject_id is mandatory (highlighted in bold and italic below) for all datasets and must be provided with one Value. The other experimental setup elements (highlighted in bold-only below) are also mandatory when available. The remaining fields are either recommended or optional. Custom fields can also be added to the subjects.xlsx file.
    • subject_id: Lab-based schema for identifying each subject. This field should match the primary's sub-folder names. The subject_id must be unique.
    • pool_id: If data is collected on multiple subjects at the same time include the identifier of the pool where the data file will be found. If this is included it should be the name of the top level folder inside primary.
    • experimental group: This field refers to the experimental group that a subject is assigned to in the research project.
    • Age: Age of the subject (e.g., hours, days, weeks, years old) or if unknown, leave it empty. For your convenience, SODA separates this entry into 2 fields: A number field (e.g: 1, 2, 3) and a unit field (e.g: hours, days, weeks, etc). If an ISO format is expected for this entry, enter the ISO-formatted text in the number field, and select N/A for the unit field.
    • Sex: This is the sex of the subject, or if unknown, leave it empty.
    • Species: This is the species of the subject. When users start typing to search for a species, SODA provides species suggestions based on the NCBI taxonomy.
    • Strain: This is the organism strain of the subject.
    • RRID for strain: This is the Research Resource Identifier Identification (RRID) for the strain of the subject. SODA utilizes Scicrunch to identify the RRID of the strain users provide.
    • Additional Fields (e.g. MINDS): Provide any additional fields that you would like to include in your subjects.xlsx file.
    • Age category: The age category that the subject belongs to. An search field with suggestions based on list derived from UBERON life cycle stage is provided in the interface for your convenience.
    • Age range (min): This is the minimal age (youngest) of the research subjects. The format for this field is numerical value + space + unit (spelled out).
    • Age range (max): This is the maximal age (oldest) of the research subjects. The format for this field is numerical value + space + unit (spelled out).
    • Handedness: This refers to the preference of the subject to use the right or left hand, whenever applicable.
    • Genotype: This refers to the genetic makeup of genetically modified alleles in transgenic animals belonging to the same subject group. Ignore this field if the RRID is already provided.
    • Reference atlas: Enter here the reference atlas and organ.
    • Protocol title: This field refers to the title of the protocol within Protocols.io once the research protocol is uploaded to Protocols.io.
    • Protocol.io location: This refers to the Protocol.io URL for the protocol title.
    • Experimental log file name: This is a file containing experimental records for each sample, whenever applicable.

Was this page helpful?