| Evidence-based Decision Making | Evidence based decision-making is a process for making decisions about a program, practice, or policy that is grounded in the best available research and informed by contextualized experiential evidence and experimental or observational data. Curated and contextualized quality data is foundational to information and knowledge used for decision-making by governments, industry, academia, international bodies and other policy-making bodies. |
| Data Curation | Data curation takes place throughout the data lifecycle. Repositories offer data curation services provided by qualified personnel in coordination with researchers. Curators ensure that the data follows standardized data structures with comprehensive metadata, following community best practices where applicable, so that the data is understandable and reusable by data consumers. The work of curators is ongoing to ensure data and metadata remain usable in the present and into the future. |
| Scientific Reproducibility and Robustness | Data repositories enable the integrity of the scientific record by ensuring that data underlying research findings, together with associated metadata on the experimental design and/or computational procedure, are preserved and accessible. This data and metadata is a necessary step to support reproducibility, which as defined by the Turing Way handbook as “a result is reproducible when the same analysis steps performed on the same dataset consistently produces the same answer.” (https://book.the-turing-way.org/reproducible-research/overview/overview-definitions.html). On the same token, different analysis with the same data an be used to establish a similar, and therefore robust, result. |
| Systems Interoperability | Data and metadata are discoverable and harvestable by users and machines with interoperable Application Programming Interfaces (APIs), formats and semantics. This enables machine-actionability and streamlined workflows for research, and supports synthesis of diverse collections. |
| Third Party Platform Integration | Repositories can be designed so that appropriately licenced data and metadata will integrate with tools and services from other providers, such as federated data systems, open science platforms, computing infrastructures and search engines. These integrations streamline data access and enhance usability of the data and metadata. |
| Data Management of Major Research Infrastructure | Dedicated data management systems, processes and governance are important to work together with major research infrastructure (facilities) through large-scale investments (sometimes international). This collaboration is key to enabling generated data to be preserved, accessible, and usesable in the project and by the wider global community. Major science infrastructure supports collaborative, multi-national projects that involve substantial data volumes, require specialized computing resources, and relies on dedicated facilities and workforce to conduct ambitious scientific research. |
| Cybersecurity | The repository implements appropriate safeguards (e.g., tiered access, credentialing of data users, firewalls) to prevent cyber-attacks and to protect data (e.g., data manipulation, restricted data access, etc.), with monitoring and measures for breaches. |
| Sensitive Data Services | Repositories can provide services such as authentication, anonymisation, regulatory adherence, secure transfer and controlled access that safeguard sensitive data, in its many forms, against unwarranted access or disclosure. |
| Artificial Intelligence Data Provision | Data repositories provide access to trustworthy AI-ready data which can be used for AI applications for training, validation and operational usage. Additionally, some repositories provide services such as data visitation and annotation functionality that can support AI models. Transparency and provenance of data usage by AI applications is important to demonstrate reliable outputs and to provide credit. |
| Technological Responsiveness and Evolution | Repository infrastructure evolves to meet emerging technological needs and anticipates the needs of research and user communities, as well as institutions. |
| Data Reuse | Repositories strive to ensure that sufficient metadata and documentation are available to support the interpretation and reuse of data by humans and machines. Data are provided in sustainable formats with a clear licence to govern the terms of reuse. |
| Open Data Access | Data repositories provide inclusive and equitable access to data, minimizing any barriers to participation, employing appropriate strategies including controlled access for legal and sensitive data constraints. Open data access contributes to national and institutional open science obligations and policies. |
| Data Deposit Services | Repositories accept data and metadata based on defined criteria such as a deposit policy to ensure relevance and understandability for users. Depositors are often guided through a documented data deposit process with the support of repository data stewards. |
| Community Focused Services | Data repositories provide services to meet the needs of their target user community, which can vary depending on their level of maturity and customized needs. This may include development or adoption of community data practices, visualizations, tools and services. Repositories may use different means to regularly gather feedback and inputs from their community, such as surveys, interviews, focus groups, and more. |
| Near-Real Time Access | Many data repositories provide near real-time data streams with minimal latencies that serve monitoring and forecasting systems (e.g., meteorological data for weather predictions, earthquake detections). |
| Data Quality Assurance | Repositories follow processes for assessing, measuring and improving dataset quality such that they can be distributed with sufficient information (e.g., quality annotations, uncertainty or bias information) for users to evaluate fitness-for-purpose. |
| Aggregated Data Products | Many repositories provide curated and aggregated datasets such as long-term time-series or geospatially aggregated data for a region, enabling studies that require more expansive data (e.g., climate change, longitudinal surveys). |
| Rare Dataset Access | Data repositories provide access and preservation for rare data which may include, for example, those from remote areas, historical times, unique situations (e.g., rare disease outbreaks, once-in-a-lifetime events) and natural disasters. The notion of rare and valued data may vary from one community to another. |
| Multi-facet Data Discovery and Accessibility | Repositories can streamline discovery and access according to location, time period, subject, and/or theme. |
| Categorized and Labelled Data | Repositories often provide a consistent and standardized approach to categorizing and labelling (i.e., annotating, coding) data results or features. In many cases, these enhancements utilize community-accepted ontologies which are often developed with data repository staff input. |
| Trustworthiness | Repositories can be recognized as trustworthy data sources by formal certification and/or adherence to community-accepted standards and best practices. Characteristics of trustworthy data repositories include transparent governance, community-compliant metadata and data standards, appropriate curation and long-term preservation. |
| Long-term Active Data Preservation | Repositories commit to the long-term preservation of data by implementing contingency plans for file format transfer and future data migrations, ensuring that data holdings continue to be interpretable and useable over time. Re-appraisal and retention policies are used to inform the length of time data are to be retained. |
| Technical Quality Assurance | Repositories provide technical quality assurance by ensuring datasets comply with a range of standard criteria including acceptable formats, metadata schema, persistent identifiers, and links to other digital objects. |
| Persistent Identifier Application | Repositories recommend citations and assign persistent identifiers (PIDs) to digital objects with comprehensive metadata, including relationships to other relevant PIDs. Repositories ensure that PIDs remain resolvable even when the research object is no longer available, so that its metadata can be accessed and rationale for its removal can be provided. Citations enable downstream impact to be measured and reported. |
| Best Practice Networks | Repositories provide stability, resourcing, and visibility to support network and relationship building. This, in turn, supports trusted community standards. |
| Cost-Effective Data Management | Data repositories provide expertise and services that enable research data management to be executed more cost-effectively than when researchers perform these tasks themselves. |
| |
| New benefits | |
| Interdisciplinary Study Potential | Repositories facilitate interdisciplinary research by making datasets discoverable and usable across different academic fields, fostering collaboration and innovation beyond traditional disciplinary boundaries. These datasets promote novel problem-solving, addressing complex global challenges such as sustainability, public health, and urban development. |
| Community Driven Standards and Protocols | Data repository representatives often participate in working groups and committees to develop and maintain community standards and protocols. Standards and protocols that are decided by broad community input are more likely to be adopted and adhered to. |
| Provision of Education Resources | Data repositories provide openly accessible datasets that educators can integrate into Open Educational Resources (OER), enriching course materials and enhancing the availability of freely accessible, high-quality teaching resources for diverse learner communities. |
| User Training and Support | Data repositories provide comprehensive training programs and ongoing support to help users effectively utilize data interfaces and related tools. |
| Provenance and Versioning | Repositories track and preserve different versions of datasets, enabling users to review modifications, access previous versions, and understand the progression of data changes. Provenance information for data processing and modifications enables users to trace the origin and history of data. |
| Data Usage and Impact Analysis | Repositories provide tracking of dataset impact through citation counts, usage statistics, and user demographics within ethical and privacy constraints, enabling researchers to demonstrate the value of their data contributions and satisfying funder mandates. |
| Researcher Collaboration Building | Open access to datasets drive a common meeting point or interaction between researchers, helping build collaboration around that dataset. |
| Citizen Science | Data repositories support citizen science initiatives through partnerships, developing customized data ingestion and displays as needed. These partnerships serve those communities but also make these data accessible for research. |
| Data Licensing and Legal Compliance | Repositories provide clear licensing options and ensure compliance with legal and regulatory requirements, helping users understand how data can be used and shared. |
| Data Rescue Services | Repositories offer services to recover and preserve at-risk or legacy data, ensuring that valuable information is not lost due to technological obsolescence or neglect. |
| Bias Mitigation | Repositories provide tools and guidelines to identify and mitigate biases in datasets, promoting ethical and equitable use of data. |