The Slow Cancellation of Online Libraries

Some 15 years ago, Sean Dockray, Geert Lovink, and myself were discussing online libraries and my research into offline libraries. There were many possibilities for the future of online libraries. Sean, as the founder and director of AAAARG.ORG, the now defunct online library, was pretty positive about it all, even as he himself was begin hounded by the publishing industry. AAAARG started as ARG, and as he had to close the site down, he would just add an A. It worked, for a while. Now in 2024, AAAARG (in it’s final version, AAAAARG.FAIL) is gone as Sean is presently facing significant litigation.

I figured such like would happen, continue, and eventually force online libraries completely off the net. This would result in the only libraries of ebooks being private ones found on personal hard drives. I even assembled one as an experiment as part of my EGS PhD project – “The Alexandria Project”. It was an expensive experiment – large capacity hard drives were (and still are) expensive, and, for them to work in a useful way, they need to be curated, and the file names needed rationalization and standardization of some sort. In 2014 INC published the result of my research as a Network Notebook, entitled Radical Tactics of the Offline Library.

As time went on, I found other things occupied my time, although I did keep an eye on online library activity. It is very clear that online libraries are waning – Ubu-web is on terminal life support, aaaaarg is gone, Sci-Hub is facing difficulties, and Libgen is getting sketchier and more precarious with every passing month, Z-Library isn’t as open as Libgen as it requires membership. Monoskop has transformed into something else, textz is profoundly limited, Open Media Library still isn’t off the ground. BAK.MA and PAD.MA are br8illiant projects that are similarly structured video sites that document and host user made videos of a critical political nature that are local to Turkey and India respectively.

This is not a complete list by any means. These are simply some of the better known online library projects. They are all facing similar fates and conditions of banning (viz AAAAARG), or becoming moribund (viz UBU-WEB), or becoming increasingly precarious (viz. libgen). Obviously if online libraries have any hope of survival, they will have to be in the form of offline libraries which are personal property and undetectable to the powers that be. Sometimes, I hate being right.

Offline libraries have their own material exigencies, mostly in terms of storage space. Using a rule of thumb of about 5MB per document it’s pretty easy to calculate how much storage these systems require if you know how many books are in it. IIRC, Libgen says they have about 2.5M books.  2.5M x 5MB = 12.5TB. I think there’s more than that now, because the last time I checked on TOR, the libgen pile was around 18TB (?) something like that. More than I could handle. And that was several years ago.

The only secure option, in my less than humble opinion, is an offline RAID, which is then duplicated at some expense, or, is transferred to a scattering of less expensive, if more mechanically precarious, non-RAID drives, which are then copied to other drives through personal networks, etc.

As also noted in our discussion, what gets lost in huge collections are notions of curation (viz, the tragedy of aaaaarg) and organisation. (Hello the 21st Century re-enactment of the 19th Century Dewey Decimal System? Only without the massive Eurocentrism? This devolves into a database / data entry problem in any case.)

Personal, private, libraries have historically been essential for knowledge preservation – whether it was Rome in 24CE, Persia in 524CE, Florence in 1524CE or today in Amsterdam in 2024CE – what private individuals can hold as their personal possessions of books (electronic or otherwise) will, in aggregate, always far exceed public institutions.

The loss of Ubuweb is something I especially find troubling. It’s why I’m not so hung up on losing science or math books – they describe properties of the Universe. If you pursue the adventure of that knowledge, you will find the same results. A good example is Pythagoras. They’ve found clay tablet discussions of A2+B2=C2 in Sumeria a thousand years prior to Pythagoras. That sort of information  humanity can always reacquire. Same with physics, etc. With science, we have results we can theoretically reconstitute exactly, as it maps the workings of the material universe – if you rigorously examine nature, you will come to / re-enact extremely similar if not identical scientific conclusions. 2 + 2 will always = 4.

But the vagaries of culture and the wisps of thought that course through populations and how these visions, desires, drives, and ideas are concretised as “art” are not something people will “reinvent” or “recover”. Once gone, it’s gone. For example: Studebaker Automobiles sponsored the Mister Ed (the talking horse) TV show in the early 1960s. It’s why Wilbur drove a Lark. (Why do I know this? don’t ask – long story) Once the avi/mp4 files of Mister Ed disappear or are no longer readable, that fact will pass into story. No one will know, because no one can find out. The situation with cultural artefacts is vastly more fraught. Simply preserving electronic cultural objects is just the first rung, and we’re not even there. Interpretive structures for cross-temporal understanding of electronic cultural objects? Nope. And that’s just something as simple as .txt files or USB protocol. When you get to mp4/avi /wav/amf/flac/pdf  and compression algorithms and attendant file systems, all of which are completely dependent on the kind intentions of a given operating system’s file structure, it’s Nopety nope nope nope.

It’s been said there’s a light at the end of the tunnel. I’m pretty sure it’s New Jersey, as that’s where we are going: we’re entering the New Jersey of Modernity. Endlessly crowded and polluted unsustainable suburbs transected by vast highways amid methane belching landfill projects. “I was just a broken head, I sold a world that others plundered now I stumble through the garbage, slide and tumble slide and stumble.”

Thus: the most permanent option is tactical and commutative – as I noted over a decade ago – people curating personal libraries and sharing them. Online libraries require offline repositories of their content.

My friend Florian gave me some tips on how to scrape and strip-mine ubuweb. I think that’s a doable project, and one worth pursuing – relatively low hanging fruit.

Libgen is much higher hanging fruit – whatever can be found on TOR and DL’d to a massive RAID drive would be step 1. Then stripping the files out of the database as pdf / epubs, and naming them to a standard. I tried standardized naming convention with my Alexandria Project of about 100,000 books. I still get massive anxiety just thinking about that – my OCD was so out of control, renaming all those files. My preferred system, which I encourage others to use is:

LastName, FirstName – TITLE – (pub) – year.FileType
Warwick, Henry – Radical Tactics of the Offline Library – (INC) – 2014.pdf

I also assembled a 4TB music library, which required similar attention. I about lost my shit doing all that. I was in VERY poor mental health afterwards and ended up in years of therapy and medication that I’m still working through, never mind significant RSI issues which make typing even something like this slow and difficult. These are some of the (several) reasons why I’m such a scattered and lousy writer anymore, and why most of my output has been musical for the past 10 years. Perhaps these file renaming systems are something a narrowly trained Artificial Communication (Esposito) system can do.

As far as aaaaarg goes, if there was a dump of files from aaaaarg that someone might have collected and can copy and distribute, that would help. Years ago, aaaaarg had a system that was cool – every day there was an email that had direct links to the latest uploads. That made collecting aaaaarg pretty easy. Then they stopped doing that, so then it required going to aaaaarg, then to sort and find latest uploads, and then DL them. My collecting aaaaarg reduced then, as it was time consuming – faster to just click a pile of links and get Downloads directly. So, for aaaaarg, I think it would be great if pdf / epub files were dumped to a drive which could then be copied similar to Alexandria Project drives. How that would be managed is anyone’s guess – such networks tend to be self-generated and small and evanescent.

SciHub is much like libgen, only even more difficult to scrape, as there’s no TOR repository. Same goes for Z-library. These systems are going to eventually fail, and hosting and developing them will become increasingly difficult and expensive. A really good counter example is YouTube. A lifetime of video is uploaded there every day. Eventually YouTube will either become unprofitable or perhaps ignored and out of fashion, or some kind of black swan event takes it out of commission. In any case, its plug will get pulled. One would think that it should be preserved – it’s the single greatest warehouse of information on the planet in human history. Can one say with any conviction that it won’t eventually disappear and everything all these creatives have poured into their work get zeroed out and dumped in a landfill? YouTube is a matter of public cultural preservation, the essence of a shared civilization, yet the countless exabytes of video it contains is unsustainable.

So, like YouTube or Vimeo, I see the dire necessity for offline preservation – if aaaarg / libgen / ubuweb / etc are not preserved offline on drives and then distributed on sub rosa networks it will be a catastrophic permanent loss to scholarship and independent research immediately, and a loss to future generations deprived of the cultural artefacts and conversations of our time, as they will be hived off one by one, and paywalled until they cease being profitable, and then disposed of – the servers decommissioned and recycled.

At the same time, there are other online systems, such as Soulseek.

Soulseek is a Peer To Peer network (P2P) depends on a pair of central servers that are central to conducting searches and hosting chat rooms. They are not involved with the actual transferal of content as that is done P2P and is managed by the Soulseek client software.

I looked into soulseek’s operations with ebooks. I opened up Soulseek and searched  on something very generic: “marx” and “.pdf”. I got hundreds of hits. This seemed promising. I then searched on “Mark Fisher” and  “.pdf” and got a few dozen hits available on 20 different clients. The hits were rather redundant – Fisher’s output was limited, so everyone is sharing the few books he made. Still – twenty clients have Mark Fisher books.

I then searched on “Artificial Communication” and  “Elena Esposito”, and got exactly NOTHING. Which was unfortunate, as it is available for free download from MIT, and it is an important book.

I then searched on myself, “Henry Warwick”, and lo and behold – 4 servers came up!

2 had Radical Tactics of the Offline Library by way of INC
1 had an interview of me by my friend Darwin Grosse (RIP Darwin, you are missed)
1 was a link to Dionne WARWICK singing “Then Came You” which is in a collection of songs by HENRY Fambrough of the Spinners. So, that one doesn’t really count.

It seems that as long as Soulseek is working, it could be a way to distribute texts. Until it gets shut down, of course, and that is an ongoing problem – Soulseek has been sued for enabling copyright infringement, and the people running Soulseek are not rich.

Getting even a moderately large library available to Soulseek users, such as the contents of an old Alexandria Project drive, wouldn’t be especially difficult per se. Such would simply be put into the folder where Soulseek can share files. Getting it all noticed by Soulseek is another story. Making it available is one thing – getting the system to acknowledge it and make it findable is something else.

That said, theoretically, Soulseek should be able to work and can be seen as a viable alternative. Also, Soulseek operates like Napster – the file linked to a given client is ONLY in that one place and if it the person does not have their computer turned on and hooked up to the internet, it won’t be available. The best systems for this would be with people who have symmetrical upload/download accounts. However, in that regard, having the personal bandwidth on one’s internet account to do that is another issue – my upload speed is vastly slower than my download speed, and that would create bottlenecks. Any *huge* library will attract many users, and since it is P2P, many people will be competing for that user’s bandwidth, making for very slow downloads. Something the size of libgen, even with a symmetric account running at a gigabit per second, would be throttled to a crawl with serious use.

As decent an idea Soulseek is, it seems clear we are looking at a twilight of the online library and to depend on Soulseek smacks of desperation.

I’ve been watching this slow creeping electronic death for years, and it’s not a good situation, and it’s not getting better. It’s cultural rust.

Tags: ,

Share