Decoding the EU AI Act: Data Governance requirements

Ana Teles
Sep 28, 2023
5 min read

Updated: Jul 31, 2024

Peope standing in front of AI connected city — Generated with Midjourney

Written by Ana Carolina Teles, AI & GRC Specialist at Palqee Technologies

The EU Artificial Intelligence Act, introduced by the European Commission in 2021, serves as a comprehensive legislative framework governing the use of artificial intelligence technologies across European Union member states. This framework specifies criteria for high-risk AI systems providers, with data governance being a central element, requiring these technologies to comply with its rules.

Considering this, we will outline a step-by-step roadmap for translating data governance into practical actions in this post.

Ensure you don't miss the opportunity to get your hands on the complimentary Palqee EU AI Act Framework!

Download it here

Understanding Data Governance under The Eu AI Act

Before launching into the practical implementation of data governance, it is worth to take a look at the definition to gain an overview:

Data governance, in essence, is the orchestration of people, processes, and technology to ensure that data is managed as an asset across the organisation. This means treating data with the same respect and diligence as any other strategic asset, ensuring its accuracy, accessibility, consistency, and security. 

Implementing good data governance can be quite challenging for AI companies. Many struggle to get access to representative and sufficient data. Methods used to obtain the data aren’t always without ethical and sometimes legal concerns (such as scraping the internet).

Nonetheless, the EU AI Act highlights the need for detailed reporting, forcing companies to implement and manage data governance effectively.

The good news is, that most AI companies are already following some of these requirements to an extend as it’s considered best practice in AI to clean and prepare the data to achieve good generalisation in AI. The EU AI Act demands that these requirements are put into defined processes and procedures, to ensure the same level of quality is maintained.

Establishing a Data Governance Framework

According to the EU AI Act Article 10, a comprehensive data governance framework involves the strategic management of data assets, encompassing design, collection, preparation, and more. It should address the following areas covering training, validation, and testing data sets:

Design Choices: According to the Act, the AI system's data design choices must align with its intended purpose and ethical considerations.

This means that the provider must ensure that the system's data design choices, including the selection of a diverse dataset, bias review, incorporation of underrepresented groups, meticulous feature selection, data pre-processing, and regular audits, are in harmony with the system's intended goal of delivering precise and impartial generalisation.

Data Collection: Protocols for data collection should be implemented, ensuring data accuracy, legality, and relevance.

The provider should establish precise data collection guidelines, outlining the required data types and their sources. Subsequently, create a validation process involving cross-referencing and validation checks to ensure data accuracy. Besides, it is necessary to comply with applicable data protection laws and regulations, including the acquisition of required permissions for personal data collection. Lastly, maintain data relevance by consistently reviewing and updating data collection procedures to accommodate evolving requirements.

Data Preparation: Operations such as annotation, labelling, cleaning, enrichment, and aggregation are mandated by the proposed Act to enhance data quality.

Like the previous point, it all boils down to defining processes and guidelines. Here, the company should ensure the establishment of clear processes that ensure consistency and accountability in how data is prepared. Additionally, define specific expectations for employees engaged in the development of AI systems.

Assumptions Formulation: Assumptions about the data's measurement and representation should be formulated, maintaining transparency.

During this stage, a record of data assumptions regarding measurement and representation should be maintained. Clear reasoning should be provided for these assumptions, outlining what the data is expected to represent and how the model should perform during training. Company stakeholders should be involved in reviewing and endorsing these assumptions to enhance transparency and effectively mitigate risks.

Availability Assessment: Data availability, quantity, and suitability should be assessed to ensure the adequacy of training, validation, and testing data sets.

Ensure the documentation of this process and involve another individual in its review. The aim of this exercise is to present evidence of the considerations made during your data set assessment.

Bias Examination: The EU framework mandates the scrutiny of data sets for possible biases that may affect AI system outcomes.

This process could involve using advanced analytical tools to identify patterns, evaluating demographic disparities, and implementing corrective measures to ensure unbiased and equitable results, aligning with ethical and legal considerations.

Addressing Gaps and Shortcomings: Any data gaps or shortcomings should be identified and addressed to enhance the overall data quality.

If you don’t have enough data, consider methods such as data enrichment as previously mentioned. Using synthetic data. This of course needs to be well assessed as well, and using such methods needs to be analysed on a case-by-case basis and the intended purpose of the high-risk AI system.

What should you do if you don’t have enough data to ensure effective data governance?

The most important is that you’re aware and have measures in place to identify potential limitations in your data set and how it can influence your model’s performance.

If finding enough data or data that is representative for the intended purpose of your high-risk AI system is challenging, include this as part of your risk management process. Consider what measures you can implement to reduce risks and improve robustness over time and be clear what risks you’re willing to accept and why.

Make sure to inform your customers on potential risks and shortcomings and keep them informed about how you’re improving your system. Regulators and stakeholders will want to see evidence on your thought and decision-making process to ensure responsible AI given your available resources, not a 100% perfect and unbiased AI, which regulators recognise is impossible.

Personal Data Processing for Bias Detection

To prevent discrimination resulting from biases in AI systems, the EU AI Act permits the processing of special categories of personal data for bias monitoring, detection, and correction. These special categories encompass sensitive data, including information related to racial or ethnic origin, religious or philosophical beliefs, genetic data, biometric data, health data, and more.

While the processing of special categories of personal data is permitted, it is subject to European data protection laws, which entails the need for implementation of suitable measures to ensure the protection of such data.

To ensure alignment with these laws, the Act includes a provision that allows AI system providers to restrict the use of specific privacy-enhancing methods like encryption and pseudonymisation if employing these techniques would hinder the detection and correction of bias.

Continuous Monitoring and Updates

Data governance is an ongoing process. Your company should implement mechanisms for continuous monitoring of data quality, biases, and updates to ensure that the AI system remains compliant and effective throughout its lifecycle. Some of the mechanisms included are:

Routine Data Quality Assessment: Adopt a proactive stance by regularly assessing the quality and accuracy of the data used to train, validate, and test AI systems.
Stay Current with Updates: In the fast-evolving landscape of AI, it's essential to keep the AI system updated with the latest data, algorithms, and industry best practices.
Stakeholder Collaboration: Companies should involve relevant stakeholders, including data scientists, domain experts, and legal professionals, to collaboratively oversee the continuous monitoring and updates.
Documentation and Reporting: Maintain detailed documentation of the monitoring and update processes, including data quality assessments, bias detection outcomes, and changes made to the AI system.