
A recent large-scale audit of dataset licensing and attribution in artificial intelligence (AI) has revealed significant issues in the field. Published in Nature Machine Intelligence, the audit examined over 1,800 AI training sets and found a 'crisis of misattribution,' with more than 70% of datasets having license omissions and over 50% containing license errors. This research highlights the crucial role of data attribution in AI development, emphasizing its importance for fairness, accountability, and improving model performance. The findings underscore the need for better data provenance as sources increasingly restrict open access to information.
Happy to announce that our research paper, A large-scale audit of dataset licensing and attribution in AI, was just published in Nature Machine Intelligence. Data provenance is becoming ever increasingly important as sources shut down open access to information. https://t.co/5AbNQp2e3A
📢 Excited to see our piece the "Data Provenance Initiative: A large-scale audit of dataset licensing and attribution in AI" now in: 📜 @Nature Machine Intelligence ➡️ https://t.co/U9qaD530gh 🗞️@MIT News ➡️ https://t.co/v7z0910tbX 1/
The problem with #AI training sets uncovered by an audit of >1,800: "a crisis of misattribution" >70% license omission rates >50% license error rates https://t.co/6SBwXsQUBZ https://t.co/TPA3Dq0ZCU @NatMachIntell @sarahookr @ShayneRedford @RobertMahari https://t.co/D1T6STt6J6