Pit Schultz: Europe is not a Data Grossraum

Europe is Not a Data Grossraum by Pit Schultz (@pitsch)

Apart from outdated spatial metaphors, the sectorization of data economies points to the risk of the emergence of monopoly platforms. There is a danger of  “natural” sectorwize monopolies, which, thanks to telecommunication-driven 5G infrastructure (EdgeML, IoT), allows vertical integration and the centralization of value chains, bypassing the principles of net neutrality.

Instead of coordinating research and development, much redundancy in competition is created. A consistent renewal of Open Data guidelines in the area of algorithms, data structures, training data and publications is necessary, which learns from past mistakes. New suitable licensing models have been developed which help to prevent it being possible to create a direct data pipeline, e.g. from Wikidata to the Google Knowledge Graph, without any financial compensation. Anyway – where is the mention of Wikipedia as a European cost-effective counter model to Silicon Valley, with Diderot and d’Alembert in uncharted digital territory? The European Bertelsmann search engine and the digital library have all but disappeared into the Babylonian metadata jungle.

Data is the oil of the 21st century only in so far as it is better not to base an economy on it without being prepared for unpleasant side effects. The many cases of scraping, as well as the problem of patent trolls, show that today’s copyright law with absurdities such as ancillary copyright blocks all digital development. Only a radical open source and open standard strategy in the field of machine learning can give Europe a unique selling point. To understand data economics as a “win-win marketplace”, however, would be a mistake, because platform economics tends to be “The-Winner-Takes-it-all”. Amazon evaluates data internally to optimize brick & mortar logistics. Google makes its money not by selling data but by advertising, etc. etc. If data sales occur, as recently with Avast, then this is usually ethically questionable. What you can sell are complex machine learning-supported services or rather entire environments in which processes (constraints) can be abstracted and logistically optimized (such as the German enterprise resource planning software company SAP).

The recently published EU policy paper A European Strategy for Data reads like a clueless manifesto. Not even the distinction between AI (AGI) and machine learning is made here, and generalities regarding image recognition and bias are served in a public-friendly way. Instead of questioning the ethics of the traditional concepts of ownership in the digital world, references are made to the protection of privacy. As you can imagine, private data sets are irrelevant for most training models (e.g. translation software such as deepl, used here). Somewhere in the margins, it is pointed out that reproducibility should be a criterion, which is equivalent to the disclosure of training data.

Rather than further promoting the spreadsheet mentality, buzzword/bullshit bingo and McKinseyfication of European digital policy, it’s time to identify the structural principles that distinguish, and have made digital networks successful, compared to industrial and financial neo-liberal economic models. For example, there is no classic economic parameterization within source code production, the cyclical, iterative ‘agile’ management, but also the barter agreements of Internet providers, as well as the absence of a money economy within transnational platform monopolies.

The fact that Mark Zuckerberg’s internal planned economy is based on insufficient concepts and ‘Californian dreams’ such as speculation on VR and Augmented Reality as well as home appliances, does not change the decisive competitive advantage of using entirely privatized user data sets to extract added value in machine learning. This is precisely where regulation should step in. The proposal of interoperability to move private user data to imaginary competing platforms, or to make them transparent for a small subset of data structures, so that we can communicate across all platforms, does little to change the respective monopoly position in Deep Learning concerning the complexity and depth of the already accumulated data sets.

Instead of promoting a ‘platforming’ of sectors by top-down sectorisation into “data rooms”, an approach for Europe would make sense that deals with counter- and successor models of these quasi-natural monopoly structures. Data space sectoring according to the “Airbus” and “Transrapid” models should give way to an approach in which complete human-generated content models, such as Wikipedia, are to be positioned against American platform monopolies. Their market value has not even been measurable to any extent, yet thanks to the creative commons lisence without any lisence fees, it has long since been exploited by Google & Co.

Just as the portal model disappeared in the dotcom crisis, it is quite conceivable that the era of platforms, i.e. privatized public spaces on the Internet, will soon be a thing of the past if Europe focuses on its core competence of technological innovation through regulation. That administrative power is also a sleeping giant in networks, embodied by the Federal Network Agency, which regulates the large-scale infrastructure of transport, water, gas, electricity and information. It can be observed that the spatial-physical property is, more or less, bridged by the network, depending on the type of network, and made to disappear. Paul Virilio already pointed at this disappearance of space. It has recently been geopolitically reterritorialized by theorists that follow in the footsteps of right-wing conservative thinkers such as Carl Schmitt. This intellectual trend corresponds with the populist, separatist and reactionary local movements, which, as an anti-globalization movement of the right, retroactively divides and re-nationalizes international network structures, coming from the application level.

The question concerning “AI”

For the time being, the term is outdated because it refers to Strong AI (Minsky et al), i.e. top-down ontological and rule-based. Artificial General Intelligence would be the better term, distinguished from machine learning or deep learning.

Positioned against the then-dominant term of cybernetics, “AI” comes from the early days of computers, which also spoke of ‘electron brains’ and the ‘general problem solver’. It is often overlooked that the origin of neural networks derives from analogue computing (perceptron) and brings certain features from this technical branch of development. Following Friedrich Kittler, machine learning had its breakthrough due to a hardware (r)evolution and this consisted in a massive parallelization through the availability of graphics cards, which today are equipped with thousands of mini-CPU cycles (according to the von Neumann architecture), running in parallel. This parallelism allows the operational algorithmic complexity to map multidimensional non-Euclidean vector spaces (besides other types of data structures) that help to statistically reduce the parameters of a complex reality, step by step, which in turn has little to do with space and time metaphors of classical media theory, or with the search for the mind or soul.

It is interesting to note that, again for simplicity’s sake, we are working with layers, i.e. two dimensional matrices that are related to each other and perform billions of matrix operations, layer by layer, based on thresholds that are in relational ways dependent on each other. This threshold logic reminds us of fuzzy logic, in contrast to Boolean algebra. Each layer in deep learning takes over certain statistical tasks of complexity reduction, training gigantic big data stocks are recursively averaged for their redundancies and differences. The “machine”, in a social, unconscious, linguistic sense, i.e. the redundancies of the production of difference, becomes extractable and repeatable—within limits.

The implications are obviously explosive in terms of political economy. In absence of a universal theory,  machine learning, empirically and iteratively, develops innumerable recipes concerning the combinatoric architecture of these layers, some speak of an alchemist approach, or the black box problem because in trained machine learning models algorithms, data and data structures fuse to an impenetrable amalgam. Therefore many try to introduce reversibility and control structures by some additional effort, the easiest way to debug ML would be to disclose the complete training data. This calls into question Big Data’s walled garden system politically and ethically, because at the very least scientific auditing, institutional access, etc. must be provided.

The geopolitical arms race for machine learning is largely uncoordinated, so many are trying to reinvent the wheel at the same time with a lot of money. In some cases, competition has taken advantage of the network effects of the technology itself, which can be seen in the rise of Google (using open-source strategies) and the triumphant advance of Linux/open source across the cloud infrastructure. The upcoming revolution of machine learning concerning domain-specific singularity moments can only be achieved if models, training data, algorithms and documentation are published under an open science / open data policy. Then it is also possible to avoid the waste of resources of competition and to better coordinate research and development. The competitor who implements such an Open Data strategy will have a strategic advantage. There is still no Richard Stallman of machine learning.

All the speculations about consciousness, from homunculi, Rokko’s basilisk to uploading the mind, are as amusing as d’Alembert’s dream, and probably part of a narrative that will soon fade away. In that sense, it would be good to stick to Michel Foucault’s anti-humanism and target the techniques of power itself instead of indulging in a geological species-type narcissism, which presents itself as part of the Anthropocene discourse.

A sci-fi scenario that I prefer is the following. When a crisis occurs in the financial sector, the ML-driven prediction algorithms will create a singularity moment that eventually develops a recursive local autonomy. A more or less irrational herd behaviour of the actors is then deliberately exploited and generated, comparable to malware, to not only pull the financial system into the abyss but at the same time provides a price control system which from this point onwards has much more complex game-theoretical options than all the stock market players in the world together that follow mass psychological redundancies. It is then no longer assets but algorithmic complexity that puts the Invisible Hand in place, which is only an alias for the 1% that do not necessarily concentrate a large part of human intelligence in themselves.

When AI is warned against, in Davos, or by Peter Thiel, one could imagine scenarios in which the economic a priori is being replaced by a technical one, which is exactly what the technological determinism of the climate crisis points to. However, I do not see a linear determinism but rather a stochastic one that can be derived from the cycle processes of ecology, but also from the iterative development of technology in which social and technical aspects are two aspects of the same process and are only separated by our insufficient knowledge culture. For the conceivability of a process of socialization, i.e. socialization of certain technologies (following the model of the Norwegian oil industry), the social sciences, for example, lack technological and economic knowledge. Conversely, it can be argued that precisely this lack of information serves certain interests.

Intelligence in this context is always already artificially-technologically constructed, through the techniques of writing, recording, distribution and governability. An intelligence test constructs technological measurability of human performance in certain problems, based on cultural techniques such as written language, mathematics and statistics—with a multitude of underlying technical processes that make intelligence numerically countable. The same applies to university degrees or citation frequencies that meet the demand for standardization and quantifiability from the business sector. These performative tests measure cultural competence intending to reproduce certain abilities and hide what is called ‘social’ or ‘emotional’ intelligence. Rather than imagining an anthropomorphic intelligence of the technological, which has long since mechanized itself, it would be interesting to question the nature and quality of institutionalized, administrative intelligence that is utilized today through formalized processes and procedures.