Why Smart Data Licensing Will Define the Next Era of AI

Is it fair use to train AI on copyrighted materials?

Courts are starting to weigh in. Early rulings in the Anthropic and Meta cases this summer offer signals - but not definitive clarity. Judge Alsup acknowledged Anthropic’s effort to purchase books for training, but warned that holding millions of titles without full rights could imply real financial risk. The Meta case? Inconclusive, with major questions left hanging due to insufficient arguments.

Meanwhile, a month ago Cloudflare just launched “pay-to-crawl.” A big move considering their internet coverage of 20%. It means creators can now charge for their publicly available content for AI training. A clear step toward recognizing the value of high-quality content.

We’ve seen this movement before. Spotify replaced Pirate Bay by making music available smartly, hence worth paying for, with compensation to creators. As a result - the music industry is stronger than ever. The same flywheel will work for AI - if we set the incentives right we give content creators continued motivation to continue writing, filming, researching, shooting photos, producing news, and other data.

Because the most useful and valuable data isn’t publicly available. It’s proprietary, protected, non-public.

My view: smart rights and/or licensing models are the future. Models that reward data owners and those who create, and give AI systems the quality fuel they need. This isn’t about fairness alone. It’s about unlocking potential, performance, innovation, for building high-accuracy AI that works for the benefit of humanity - in a sustainable way.

Time to align incentives. Unlock new data sources. And build a better AI ecosystem - globally - benefitting both ambitious AI builders and content owners. What’s your take?

View all posts