Stand-alone Dataset Repositories

It is important to deposit research data into repository that can assign a persistent identifier, in addition to a specialized repository such as SoyBase for relevant datasets. Examples of persistent identifier are DOIs (Ditital Object Identifiers), ARKs (Archival Resource Keys), and genomic data accessions at the major genomic data repositories: GenBank, EMBL, DDBJ; and variant data at the European Variation Archive (EVA). SoyBase will also assign identifiers internally, but relies on external persistent identifiers for data stability and to aid in integrating with other resources. Some frequently-used repositories are listed below. Please contact us if you have questions or would like to contribute data.

USDA Ag Data Commons (data.nal.usda.gov)
Free service. The Ag Data Commons "About" message: "a public, government, scientific research data catalog and repository available to help the agricultural research community share and discover research data funded by the United States Department of Agriculture and meet Federal open access requirements."

Dryad (DataDryad.org)
Requires free registration to use. Requires a Publishing Charge of $120 up front. Additional charge for package of files in excess of 20GB. Assigns a DOI to packages. Packages can contain multiple file types. Minimal metadata required. Text searching of user supplied metadata.

Figshare (Figshare.com)
Requires free registration to use. Provides web access to user uploaded datasets, PDFs, Presentations and videos. Assigns DOI to files of any format. Minimal metadata required for submission. Text based searching of metadata. Up to 20GB of free space and an upload limit of 5GB. Suggest a licensing of CC-BY for figures, media, posters, paper and filesets and CC0 for datasets and metadata (see above for license definitions).

Mendeley Data (data.mendeley.com)
Provides web accessible storage for datasets. Collects user supplied metadata and allows text searches of that metadata. Supplies links to internal or external data archives. A DOI is provided to datasets that pass Mendeley review. Users select license to publish their data under (see above for license options). Storage is free at the moment.

Nature Scientific Data (nature.com/sdata)
A peer-reviewed, open access journal for the descriptions of scientific datasets. Submissions of descriptions of big or small datasets. Archives the information needed to interpret, reuse and reproduce data. Provides a way to search for datasets using standard vocabularies and metadata collected in a standardized way. Supplies links to the repositories where the data is actually housed. Publication charge of $1675 USD. This gives < 100GB storage at Figshare. For uploads of more than 100GB additional charges apply. All content-types are published under a Creative Commons Attribution 4.0 International Licence (CC BY; see above for definition)

Zenodo (Zenodo.org)
Requires free registration to use. Provides web access to user uploaded datasets, PDFs, Posters, Presentations and Software. Minimal metadata required for submission. Text based searching. Can restrict access. Provides DOI's to datasets. Can only accept files < 50GB at a time. Users can have multiple datasets. Users must specify a license for all publicly available files (see above for license options). Funded by CERN.

Data Licensing Types -- Creative Commons Licenses

Attribution : CC BY
This license lets others distribute, remix, tweak, and build upon your work, even commercially, as long as they credit you for the original creation. This is the most accommodating of licenses offered. Recommended for maximum dissemination and use of licensed materials.

Attribution-ShareAlike : CC BY-SA
This license lets others remix, tweak, and build upon your work even for commercial purposes, as long as they credit you and license their new creations under the identical terms. This license is often compared to “copyleft” free and open source software licenses. All new works based on yours will carry the same license, so any derivatives will also allow commercial use. This is the license used by Wikipedia, and is recommended for materials that would benefit from incorporating content from Wikipedia and similarly licensed projects.

Attribution-NoDerivs : CC BY-ND
This license allows for redistribution, commercial and non-commercial, as long as it is passed along unchanged and in whole, with credit to you.

Attribution-NonCommercial : CC BY-NC
This license lets others remix, tweak, and build upon your work non-commercially, and although their new works must also acknowledge you and be non-commercial, they don’t have to license their derivative works on the same terms.

Attribution-NonCommercial-ShareAlike : CC BY-NC-SA
This license lets others remix, tweak, and build upon your work non-commercially, as long as they credit you and license their new creations under the identical terms.

Attribution-NonCommercial-NoDerivs : CC BY-NC-ND
This license is the most restrictive of the six main licenses, only allowing others to download your works and share them with others as long as they credit you, but they can’t change them in any way or use them commercially.