Aug 30, 01:38 PM

Audit in Nature Machine Intelligence Reveals Licensing Issues in 1,800 AI Datasets

A recent large-scale audit of dataset licensing and attribution in artificial intelligence (AI) has revealed significant issues in the field. Published in Nature Machine Intelligence, the audit examined over 1,800 AI training sets and found a 'crisis of misattribution,' with more than 70% of datasets having license omissions and over 50% containing license errors. This research highlights the crucial role of data attribution in AI development, emphasizing its importance for fairness, accountability, and improving model performance. The findings underscore the need for better data provenance as sources increasingly restrict open access to information.

#Nature Machine Intelligence

Written with ChatGPT (GPT-4o).

Sources

EnricoShippole@EnricoShippole
2 years ago
Happy to announce that our research paper, A large-scale audit of dataset licensing and attribution in AI, was just published in Nature Machine Intelligence. Data provenance is becoming ever increasingly important as sources shut down open access to information. https://t.co/5AbNQp2e3A
Shayne Longpre@ShayneRedford
2 years ago
📢 Excited to see our piece the "Data Provenance Initiative: A large-scale audit of dataset licensing and attribution in AI" now in: 📜 @Nature Machine Intelligence ➡️ https://t.co/U9qaD530gh 🗞️@MIT News ➡️ https://t.co/v7z0910tbX 1/
Eric Topol@EricTopol
2 years ago
The problem with #AI training sets uncovered by an audit of >1,800: "a crisis of misattribution" >70% license omission rates >50% license error rates https://t.co/6SBwXsQUBZ https://t.co/TPA3Dq0ZCU @NatMachIntell @sarahookr @ShayneRedford @RobertMahari https://t.co/D1T6STt6J6

Audit in Nature Machine Intelligence Reveals Licensing Issues in 1,800 AI Datasets

Sources

Additional media

Similar Stories