The Slow Cancellation of Online Libraries (part 2)

Part 1: https://networkcultures.org/blog/2024/09/22/henry-warwick-the-slow-cancellation-of-online-libraries/

We’re in an interesting juncture where smaller libraries are easily transported (I have a USB drive that will hold 64GB of data that is literally the size of my thumbnail.)

These are available in 128GB as well.

The problem is, the “hero” level systems like UBUweb / libgen / sci-hub / etc. are so huge that reliable offline storage is still expensive and bulky. A five bay RAID is about $250, and then the 5 20TB drives are about $600 each, so with tax, that’s over $3500+, and that’s just to cover biggies like libgen and Sci-Hub and UBUweb, with RAID zero. That does not include any of it being functional as a locus of online available files – it would be like an Alexandria Project – a big box of files. Also, local climates impact survivability of these things. I sent an Alexadria project to Papua New Guinea and it last a few years and the heat and humidity killed it. They enjoyed it while it lasted, though.

The Cuban file distro is truly inspiring, mostly because it’s competent and operational, and operating under a significant threat from the central govt. I bow to them. I see  El Paquete Semanal as a distribution mechanism, which could benefit from something like libgen / sci-hub / UBUweb on a drive. I read somewhere that there’s been about 500,000 feature films made in the past 100 years. IF they’re all rendered in HD and come out to about 4GB per film, that’s 2000 TB, and outside any conceivably affordable collection device. Even if one culls it per Sturgeon’s Law (90% of everything is crap) that’s still 200TB. Someone (I forget who) said that Sturgeon was correct, and that of the remaining 10%, 90% of that is pretty dodgy. So, that would get us to 20TB and about 5000 films. Which would still take 14 years to watch if you watched one a day.

This makes me consider the contradiction of the uncountably finite. Mathematicians like to say if a set is finite, it is logically countable. This is true, as far as it goes. However, Graham’s number is finite, and if you assigned a terabyte for every particle in the universe you would still have little more than a rounding error to Graham’s number.

3↑↑↑↑3(64)

3↑ = 27
3↑↑ = 7,625,597,484,987
3↑↑↑ = 3 to whatever 3^ 7,625,597,484,987 is
3↑↑↑↑ = 3 raised to whatever ^^^^^ is
Then iterate that 64 times in sequence.

And of course, Graham’s number is a rounding error compared to TREE(3), and TREE(3) is effectively zero to RAYO(n).

These are all finite numbers, but none of them are the least bit renderable in base 10. Which leaves us in a logical quandary – it is finite, but not really countable in any real sense. They are logically countable (they are finite) but they might as well be infinite in terms of human experience. We are faced with a similar problem, albeit on a microscopic scale compared to these numbers, with this media glut.

I don’t really see a way out of this. And for the first time in human history, we will have to “curate”, a pretty word for consigning to the digital recycle bin, thousands, if not millions, of people’s intellectual output to be consciously and decisively erased.

And we all know who will get completely fucked out of that stab at posterity.

So, where does that go? Media (calling it Digital Media at this point is redundant) is about to get tackled, if not body slammed, by generative Artificial Communication systems – text, audio, video – it’s all exploding into a 99% solution of a foaming frothy fountain of digital crap. Systems will be built to recognise the “AI” made work, and it will shoveled wholesale into the hopper of the forgotten and disappeared. The differences between this culling and previous is that previously, the objects of knowledge acquisition, usually books, scrolls, and codices, simply rotted away or burned up with the libraries that housed them. Whether accidental or maleficent the destruction was “fair” as it was non-judgemental – “BURN THE BOOKS OF THE HEATHEN / INFIDELS / (subaltern).” Now we can be much more discriminatory – we can use AI to choose for us.

In opposition, we librarians and archivists, will save what we can, and these objects of knowledge will be subject to BSODs, boot sector failures, and digital rot over time. And the digital rot includes bit depth (try and boot a 16bit drive on a 64bit computer. good luck with that) as well as file system incompatibilities (which we see today for example: HFS vs NTFS vs FAT32, etc.) and then of course encryption, where brute force at 1,000 tries per second will still take longer than the universe will exist to break.

But we will continue, we will make these drives, we will hoard a small subset of published works, curated to our tastes and preferences, and we’ll hold onto them as long as we can, and transfer them to people we trust, and they will make hypercurated collections of their preferences. I’m convinced we are in the middle of the greatest dark age in human history. With everything digital and completely dependent on a fragile and unsustainable distribution network of resources, we will be as opaque as the monolith in 2001 A Space Odyssey.

It’s clear that while online libraries can be arbitrarily large, they are vulnerable. In further news of the crackdown on online libraries, this article appeared in Ars Technica. All it shows is the increasing vulnerability of online libraries.

On Thursday, some links to the notorious shadow library Library Genesis (Libgen) couldn’t be reached after a US district court judge, Colleen McMahon, ordered what TorrentFreak called “one of the broadest anti-piracy injunctions” ever issued by a US court.

 In her order, McMahon sided with textbook publishers who accused Libgen of willful copyright infringement after Libgen completely ignored their complaint.

 To compensate rightsholders, McMahon ordered Libgen to pay $30 million, but because nobody knows who runs the shadow library, it seems unlikely that publishers will be paid any time soon, if ever.

 Because Libgen’s admins remain anonymous and elusive—and previously avoided paying a different set of publishers $15 million in 2017—McMahon granted publishers’ request for an uncommonly broad injunction that may empower publishers to go further than ever to destroy the shadow library.

On 8 Oct, an article in Tom’s Hardware by Anton Shilov says that according to the IEEE, we will see 60TB hard drives by 2028.

The arrival of energy-assisted magnetic recording (EAMR) technologies like Seagate’s HAMR will play a crucial role in accelerating HDD capacity growth in the coming years. According to the new IEEE International Roadmap for Devices and Systems Mass Data Storage, we will see 60 TB hard disk drives in 2028. If the prediction is accurate, we will see HDD storage capacity doubling in just four years, something that did not happen for a while. Also, IEEE believes that HDD unit sales will increase.

While online systems are clearly on the run, offline storage is becoming increasingly powerful. If these online systems can hold on for a few more years, then the drive space will arrive for their offline storage. Offline storage may be slower to travel, but the amount of data that is available upon arrival is extraordinary. This will help provide some resilience for offline libraries. A 60TB hard drive would hold Libgen, Ubuweb, and Sci-hub, all in one place. In the meantime, we will have to “make do” with our patchier and heavily curated private collections. That the possibility of public access to knowledge is now contingent on private libraries is an irony that we will simply have to live with.

Share