Skip to main content
Version: 8.x

How to structure the dataset description metadata file


SODA helps you prepare the dataset description metadata file conveniently. While SODA automatically generates the file in the required structure, we explain here how it must be structured according to the SPARC rules in order to provide some insight about the structure of the file generated by SODA.

How to

  • Format: The dataset description file is accepted in either xlsx, csv, or json format. SODA generates it in the xlsx format based on the template provided by the Curation Team.
  • Location in the dataset: The dataset_description file must be included in the high-level dataset folder.
  • Content: Rows with yellow coloring indicates that it is an optional row.
    • Metadata Version: The values for this role must not be changed.
    • Type: The type of this dataset, specifically whether it is experimental or computational.

Basic information:

  • Title: Descriptive title for the dataset. This field should match exactly with your dataset name on Pennsieve.
  • Subtitle: Brief description of the study and the data set. Equivalent to the abstract of a scientific paper. This could match the subtitle provided on Pennsieve.
  • Keywords: A set of 3-5 keywords other than already mentioned in the above elements that will aid in search of your dataset once published on the SPARC portal. Each keyword must be provided in a separate column.
  • Funding: Specify the number of your SPARC award (mandatory in the OT2OD0XXXXX format) and other funding award if applicable (optional). If multiple award numbers are specified, each award number must be specified in a separate column.
  • Acknowledgements: Acknowledgements beyond funding and contributors.

Study information:

  • Study purpose: A description of the study purpose for the structured abstract
  • Study data collection: A description of the study data collection process for this dataset. Used to generate the structured abstract.
  • Study primary conclusion: A description of the primary conclusion drawn from the study for the structured abstract.
  • Study organ system: The major organ systems related to this study.
  • Study approach: The experimental approach or approaches taken in this study.
  • Study technique: The experimental techniques used in this study.
  • Study collection title: A description of the study data collection process for this dataset. Used to generate the structured abstract.

Contributor information:

  • Contributor name: Name of any contributors to the dataset. These individuals need not have been authors on any publications describing the data, but should be acknowledged for their role in producing and publishing the dataset. If more than one, add each contributor in a new column. For each contributor it is mandatory at least one affiliation, at least one role, and role as Contact Person or Corresponding Author.
  • Contributor ORCiD: This is the contributor's ORCID ID number. If you do not have one, you can sign up for one at It must be in the format It is not mandatory but highly recommended.
  • Contributor affiliation: Institutional affiliation for contributors. A ror ID in the format could be provided if available. If multiple affiliations, each must be semi-colon separated in a single cell.
  • Contributor role: Role(s) of the contributor. It must one of the following roles provided by the Data Cite schema: PrincipleInvestigator, Creator, CoInvestigator, ContactPerson, DataCollector, DataCurator, DataManager, Distributor, Editor, Producer, ProjectLeader, ProjectManager, ProjectMember, RelatedPerson, Researcher, ResearchGroup, Sponsor, Supervisor, WorkPackageLeader, Other. The definition of each of these roles is provided in the document here. At most one PrincipalInvestigator and at least one CorrespondingAuthor are required. If more than one role is to be specified for a contributor, each must be comma-separated in a single cell.

Related protocol, paper, dataset, etc.:

  • Identifier description: A description of the referent of the identifier.
  • Relation type: The relationship that this dataset has to the related identifier. For example, the originating article would be this dataset IsDescribedBy originating article.
  • Identifier: DOIs* of published articles that were generated from this dataset.
  • Identifier type: Short description of URL content.

Participant information:

  • Number of subjects: Number of unique subjects in this dataset, should match subjects metadata file.
  • Number of samples: Number of unique samples in this dataset, should match samples metadata file. Set to zero if there are no samples.

URLs (if still private) / DOIs (if public) of protocols from related to this dataset.

Was this page helpful?