The AIthena project deliverable, D2.2, focuses on ML DevOps tools for data governance and provenance, and is titled “ML DevOps-oriented data lifecycle governance and provenance framework”. The primary objective of Task 2.2 (within Work Package 2) is to develop frameworks that facilitate the creation of reliable, transparent, and accountable AI systems for Connected, Cooperative, and Automated Mobility (CCAM). The AIthena project emphasises balancing innovation with ethical responsibilities and ensuring privacy preservation and compliance with legislative frameworks such as the EU AI Act.
Deliverable D2.2 defines DataOps as a set of practices aimed at improving the quality and speed of data analytics. Key principles include continuous integration and deployment (CI/CD), collaboration, automation, data quality and governance, reproducibility, monitoring and data provenance. Ensuring trust, safety, regulatory compliance and ethical accountability in ML systems is critical, especially in high-stakes domains like autonomous mobility, and this requires data governance and provenance.
In the AIthena project, Data Cards are introduced as structured summaries that provide essential information about datasets, thereby enhancing transparency and accountability. AIthena researchers have also considered various DataOps tools, such as FiftyOne for dataset characteristics and curation and ClearML for dataset versioning and integration into DataOps pipelines.
Furthermore, deliverable D2.2 outlines methods for preparing data and creating provenance pipelines, including data aggregation, selection, cleaning, exploration, visualisation, version control and measuring data drift.
In their report, the authors of D2.2 emphasised the importance of data lifecycle governance and provenance in building trustworthy AI systems for CCAM. They highlighted the project’s contributions to transparency, privacy preservation and ethical accountability in AI development.
To learn more about AITHENA’s work on ML DevOps tools for data governance and provenance, read the project deliverable here AITHENA-D2.2-ML-DevOps-oriented-data-life-cycle-governance-and-provenance.pdf