A profile of nonprofit Common Crawl, which scraped billions of web pages since 2013, including paywalled articles, to build an archive used by OpenAI and others (Alex Reisner/The Atlantic)
PositiveArtificial Intelligence

Common Crawl, a nonprofit organization, has been scraping billions of web pages since 2013, including paywalled articles, to create a vast archive that is now utilized by OpenAI and other tech entities. This initiative is significant as it democratizes access to information, enabling researchers and developers to train AI models more effectively. By providing a rich dataset, Common Crawl plays a crucial role in advancing AI technology and fostering innovation in various fields.
— Curated by the World Pulse Now AI Editorial System







