The Workshop
Our workshop on Evaluating Dataset AI Readiness for Data Repositories: Considerations and Approaches held earlier this month was a great success! Harnessing AI for science and society is a key objective for organizations around the world, from international to national to disciplinary scales. Data repositories have an essential role to play in providing quality data to support quality outputs for AI. This workshop was for data repository representatives who are involved with curating metadata and data (or provisioning of tools for that curation), although anyone was welcome and many interested parties joined us. The emphasis was on assisting repositories in their pursuits to deliver AI-ready data by examining and discussing the ESIP AI Data Readiness Checklist created by the ESIP Data Readiness cluster.
Nearly 50 people joined us from across the world. Attendees came from Australia, Austria, Canada, Chile, Columbia, Finland, Hungary, India, Portugal, South Africa and several other places. Attendees also represented a diverse range of disciplines, including Agriculture, Earth and Ocean Sciences, Health, Neuroscience, Social Sciences, and more.
After introductions and acknowledging that the WDS-ITO office is based on the Lək̓ʷəŋən (Songhees and Esquimalt) Peoples territory (we also respect that the Lək̓ʷəŋən and W̱SÁNEĆ Peoples historical relationship with the land continues to this day), Douglas Rao (representing the ESIP Data Readiness cluster) presented an overview and rationale of the cluster and the checklist. Next, the majority of the workshop was dedicated to an in-depth review and discussion for each of the checklist categories. Finally, a few related initiatives were highlighted, including FARR, RDA FAIR4ML IG and more.
View the slides, Watch the recording
The Pilot
Following the success of this workshop, we at the WDS-ITO are going to run a pilot project for AI Data Readiness using a worksheet based off of this checklist. We would be assisting participants in evaluating a dataset of their choice. More specifically, it would be a dialogue to both assist participants with the evaluation of a dataset and (with participant permission) to use any feedback to improve this checklist.
The motivation to pursue a pilot execution of the ESIP AI Data Readiness Checklist is two-fold:
1) to assist data repositories in their efforts to produce more AI-ready datasets;
2) to more fully evaluate and discuss the efficacy of the checklist.
We are targeting a small cohort of 6 to 12 data repositories with diverse disciplinary and geographical representation. A kickoff meeting will introduce the cohort to one another, review the checklist and help determine what datasets they would like to evaluate. They will have about 3 months to complete the worksheet, with a mid-term review meeting and one-on-one sessions along the way, to review progress and discuss challenges. A concluding meeting will allow cohort members to summarize their results and learnings. At the conclusion of the pilot execution, the WDS-ITO will compile feedback for the ESIP Data Readiness Cluster and report key findings.
At the end of this, participants would gain a greater understanding of how AI Ready a specific dataset is, any gaps in how that dataset is managed and a more in-depth understanding of what it takes to get a dataset AI Ready. Anyone interested in joining this pilot can email us at wds-ai-workshop@oceannetworks.ca by November 1, 2024 in order to indicate their interest.