| Evidence-based decision making | Evidence based decision-making is a process for making decisions about a program, practice, or policy that is grounded in best available research and informed by experiential evidence from the field and contextual evidence. |
| Curated Data | Data is curated by repository personnel to ensure it follows standardized data structures with comprehensive metadata, so that it is reusable by data consumers. |
| Scientific reproducibility | The measurement can be obtained with stated precision by a different team, a different measuring system, in a different location on multiple trials. For computational experiments, this means that an independent group can obtain the same result using artifacts which they develop completely independently. |
| Systems interoperability | Data is discoverable and harvestable by users and machines with interoperable Application Programming Interfaces (APIs), formats and semantics. This enables machine-actionability and streamlined workflows for researchers. |
| Third Party Platform Integration | Data from repositories can be harvested directly into tools from other service providers, such as federated data systems, open science platforms and compute infrastructure. |
| Major Science Infrastructure data management | Science which involves large collaborations with dedicated facilities, and involving large data volumes and multinational investments. |
| Cybersecurity | The repository implements appropriate safeguards (e.g., tiered access, credentialing of data users, firewalls) to protect data from inappropriate access (data manipulation, restricted data, etc.), with monitoring and measures for breaches. |
| Sensitive Data Services | Services (e.g., authentication, access agreements, regulatory adherence) that safeguard sensitive data against unwarranted access or disclosure, where sensitive data may include: personal information, personal health information, educational records, customer records, financial information, criminal information, geographic information (e.g., detailed locations of endangered species), confidential employee information. |
| AI/ML Data Provision | Data repositories may provide access to trustworthy AI-ready data which can be used for AI and ML applications for training, validation and operational usage. Additionally, some repositories provide services such as data visitation and annotation functionality that can support AI models. |
| Technological Responsiveness and Evolution | Repository infrastructure evolves to meet emerging technological needs and anticipates the needs of research and user communities. |
| Open Data Access | Data repositories provide inclusive and equitable access to data, minimizing any barriers to participation while mindful of legal and sensitive data constraints |
| Reuse of Data | Repositories ensure that sufficient documentation and metadata are available to support understanding and reuse of data |
| Data Deposit Services | Repositories accept data and metadata based on defined criteria to ensure relevance and understandability for users. Researchers are typically guided through a documented data deposit process with the support of repository data stewards. |
| Researcher Community Focused Services | Data repositories are expected to meet the needs of their target user community, which can vary depending on level of maturity and customized needs. This may include development or adoption of community data practices, tools and services. Repositories may use different means to gather feedback and inputs from their community, such as surveys, interviews, focus groups, and more. |
| Near-real time access | Many data repositories provide real-time data streams with minimal latencies that serve monitoring and forecasting systems (e.g., meteorological data for weather predictions, earthquake detections), benefiting both science and society. |
| Data Quality Assurance | Repositories follow processes for assessing, measuring, improving dataset quality such that they can be distributed with sufficient information (e.g., quality annotations, uncertainty or bias information) for users to evaluate fitness-for-purpose. |
| Aggregated Data Products | Some repositories provide curated and aggregate datasets such as long-term time-series or geospatially aggregated data for a region, enabling studies that require more expansive data (e.g., climate change) |
| Rare Dataset Access | Data repositories provide access and preservation for rare data which may include, for example, those from remote areas, historical times, unique situations (e.g., rare disease outbreaks) and natural disasters. The notion of rare and valued data may vary from one community to another. |
| Geographic, Temporal and Thematic Data Discovery and Accessibility | Repositories streamline access according to particular locations, time periods, or theme. |
| Categorized, Classified and Labelled Data | Repositories often provide a consistent and standardized approach to categorizing, classifying and labelling (i.e., annotating, coding) data results or features. In many cases, these enhancements utilize community-accepted ontologies. |
| Trustworthiness | Repositories can be recognized as trusted sources of data as a result of adherence to best practices, which is often demonstrated through certification. |
| Long-term Active Data Preservation | Repositories assure long term preservation of data and have contingency plans for file format transfer and future data migration, such that data holdings continue to be interpretable and useable into the future. |
| Technical Quality Assurance | Repositories provide technical quality assurance by ensuring datasets comply with a range of standard criteria including acceptable formats, metadata schema, metadata content, and links to other digital objects. |
| Persistent Identifier Application | Repositories assign citations and persistent identifiers (PIDs) to digital objects with comprehensive metadata, including relationships to other relevant PIDs. PIDs remain accessible even when digital objects are no longer available. |
| Best Practice Networks | Repositories provide stability, resourcing, and visibility to support network and relationship building. This, in turn, supports trusted community standards. |
| Cost-Effective Data Management | Data repositories provide expertise and services that enables research data management to be executed more cost-effectively than when researchers attempt to perform these tasks themselves. |