What just happened on archive.org today, as best we know:
Tens of thousands of requests per second for our public domain OCR files were launched from 64 virtual hosts on amazon’s AWS services. (Even by web standards,10’s of thousands of requests per second is a lot.)
This activity brought archive.org down for all users for about an hour.
We are thankful to our engineers who could scramble on a Sunday afternoon on a holiday weekend to work on this.
We got the service back up by blocking those IP addresses.
But, another 64 addresses started the same type of activity a couple of hours later.
We figured out how to block this new set, but again, with about an hour outage.
How this could have gone better for us:
Those wanting to use our materials in bulk should start slowly, and ramp up.
Also, if you are starting a large project please contact us at firstname.lastname@example.org, we are here to help.
If you find yourself blocked, please don’t just start again, reach out.
Again, please use the Internet Archive, but don’t bring us down in the process.
Whether you are a teacher, filmmaker, journalist, scientist or historian, having access to recordings about the tobacco, drug and other industries can be invaluable.
Still frames from a Marlboro commercial compilation.
For more than fifteen years, archivists at the University of California, San Francisco (UCSF) Industry Documents Library (IDL) have curated a collection of more than 5,000 video and audio files documenting the marketing, manufacturing, sales, and scientific research of tobacco, chemical, drug, and food products, as well as materials produced by public health advocates. As of 2023, the collection has received more than 300,000 views.
This wealth of information is available to the public through the UCSF Industry Archives Videos on the Internet Archive. The recordings include commercials, focus groups, internal corporate meetings and communications, depositions of tobacco industry employees, and government hearings.
Most of the files were made public beginning in 1998, following a lawsuit involving 46 states against tobacco manufacturers. In the settlement, the court ordered the companies to restrict advertising and release internal documents. “The industry put out misinformation for years to hold off on regulations,” said Rachel Taketa, IDL processing and reference archivist at UCSF. Having access to these materials provides new insight into marketing strategies that can help the public be on the lookout for future industry activities.
“It provides transparency and accountability,” said Kate Tasker, IDL managing archivist at UCSF. Examples from the collection are marketing campaigns and materials that targeted marginalized groups, in particular women and the African American and LGBTQ+ communities. “We talk to community advocacy organizations that often say it is powerful to show these videos to a group where it lays out clearly what the industry was doing to their community. It empowers people and inspires them to take action.”
Senate hearings in regards to S1883 The Tobacco Education Control Act of 1990.
UCSF archivists say the partnership with the Internet Archive provides users with two different access points and expands the audience for the collection beyond academics. The Medical Heritage Library has also added videos and audio files from UCSF into its larger collection on the Internet Archive, spreading the materials’ reach even further.
Next, the UCSF archivists are looking to develop new ways of working with and accessing the collection, using automated transcription to enable data scientists to analyze the recordings in new ways. The IDL is also adding opioid industry recordings to the collection as part of its work on the Opioid Industry Documents Archive, a collaboration with Johns Hopkins University. These new recordings will enable the public to learn more about the circumstances leading to the opioid crisis.
“It’s exciting to be connected to such an innovative organization as the Internet Archive,” Tasker said. “It’s out in front of a lot of big issues that most digital archives are facing. Whenever we’re looking to do something with a new media type, format, or a new way of distributing content to people, archivists and librarians look to what the Internet Archive is doing as a guide.”