Panel 4: Contextual Modeling
An unstorable and unmanageable amount of data is coming at us, bringing with it a host of new strategies for grasping and analyzing the huge amount of bits and bites, such as visualization models.
mc schraefel: Beyond Keyword Search
Dr schraefel is reader in the Intelligence, Agents and Multimedia Group at the University of Southampton, UK .
Schraefel first emphasizes that in contrast to what people may assume from a visualization expert, she is not ‘in love with graphs’ and actually most of the time, big fat graphs suck. The research she will present here deals with the circumstances of serendipity. Following the idea that ‘fate favors the prepared mind’, she argues that discoveries never happen by chance and an important challenge lies in designing tools that support serendipitous discovery.
She then presents the audience with a 1987 video by Apple computers, which introduces the ‘Knowledge Navigator’; a tablet-like personal device with a natural language interface, a virtual ‘digital assistant’ and access to a global network of information. Outdated as the device may seem today, the digital assistant seemed able to create graphs by getting data out of its embodied context (such as other people’s documents), and be mined and combined to answer a variety of questions. In 1987, schraefel comments, this was a vision of exploration, heterogeneous sources, representation and integration that still inspires research into knowledge building today.
Schraefel notes how Google is the current search paradigm – “what else do you need?”. Drawing a parallel, she notes how Newton’s model of Mathematica set the tone for seeing the world for ages until it turned out that in some spaces, the model was flawed. It is much the same with Google’s document-centric, single source search without interrelations – the model frames the questions that may be asked. In order to enable knowledge gathering, we need a different one.
In a 2005 Scientific American article, Tim Berners-Lee, Ora Lassiler and Jim Hendler introduced machine readable mark-up and the Semantic Web as a new paradigm that moved away from keyword search and toward structured data and ontologies. Ontologies in this sense are subject-predicate-object joints, such as a composer-is a–person, or a person-has a-name etcetera. By giving data a rich (and often multiple) metadata context and using some logic, one may infer properties to objects that are not explicitly labeled, and enable knowledge gathering from heterogeneous sources.
Does this imply a reprise of Victorian taxonomies? Nope, quoting schraefel: “it is more pomo than that”, objects are described from multiple contexts. There is no über-ontology and we are slowly learning to be ‘ok’ with the fact that we don’t know everything controllably, and be messy. Following Berners-Lee, she emphasized the importance of liberating our data; placing sources freely on the web so that we may ask questions other than the document kind, and create information rather than merely retrieve it.
How can we make data useful? Schraefel introduces mspace.fm, a multi faceted column based framework for exploring large sets of data. Instead of offering a result set only, the rich context and the mspace UI allow users to see where results have sprung from. The approach also allows for visualizations such as timelines that may lead to new questions and explorations. If we just get what we want, schraefel argues, then how will we make new discoveries?
Moving from public to personal data fluidity is the next topic she addresses. If we can find ways to keep control over our privacy, giving the computer awareness of what we are doing and licensing it to use this contextual data, will allow us to do valuable things with our data. Many of the data streams we create (such as Web trails or Twitter and Facebook data) sit in data centers without being useful. Atomate.me is an example of a program to control your data online and use it to support the work you are doing.
The current research question of the Intelligence, Agents and Multimedia Group is how to enable people to ask questions of the data and finding new information by mining data context. If you don’t like what Google is doing, schraefel argues, change your tool to one that is more command driven, exploratory and serendipitous. Concluding her presentation, she urges the Deep Search audience to take a look at webscience.org and participate to add a much needed humanities perspective to the geek-oriented discussion about knowledge making.
In the Q&A, Yuk Hui asks how mc schraefel sees the relation between context and privacy. Also, he feels that contextual awareness may lead to short-circuiting rather than serendipity. Schraefel comments that she did not mean to imply that serendipity would by default result from contextual exploration. It will however enable people to develop a richer pallet of knowledge, by which the opportunities for serendipity simply go up. As for privacy; she feels that the focus should be not on prohibiting access to information, but on developing policies to persecute people who misuse it.
Another question addresses the general frustration with the linked data already available on the Web; why is nobody using it? It may simply be a case of customer satisfaction – faceted browsing interfaces seem a pain to use most of the time, especially for the less experienced user. Schraefel replies that this is certainly so. Documents are easy, we ‘know’ documents historically, and this capacity still misses for data sources. Reaching a critical mass of data available on the Web, we now face the fact that there are no standards as to representation. It is a brand new literacy. Yes, we need to get the data out there so we can run the experiment, but until we clear up the databases in a few years time we will have to deal with this pain.
Karl H. Müller: From a Tiny Island of Survey Data to the Ocean of Transactional Data
Karl Müller is director of the Wiener Institute for Social Science Documentation and Methodology (WISDOM) .
Müller notes that his talk is in some ways a continuation of the previous one; mc schraefel focused on ‘looking at the data’, while Müller will ask us to ‘be aware of data’, or at the very least infuse some skepticism with the audience. The focus will be less on people vis-à-vis information, and more on the way in which societies gain information about their internal states.
Müller distinguishes a period of flat search (1750-2000) and one of deep search (2000 onwards). Flat searches are observations and measurements taken in different contexts than the ongoing processes or routines, such as surveys. A survey is a social interaction between you and a respondent according to a script. The data gathered however is not about this interaction but about another context such as, perhaps, your working conditions. In deep search however, the observation and measurement processes occur within the settings of processes and events to be measured.
The flat search period came into being in the 18th century with the census state and statistical offices that appeared all over Europe. It inspired a wave of micro data that Müller terms the ‘victory march of survey data’, which lasted until well into the 1990s. In relation to flat search, Müller emphasizes that data is not data; describing the epistemological status of the survey he makes a distinction between over-learned facts (such as your name and other context-independent data) and under-learned facts (context-dependent data). Under-learned facts have an entirely different logic – they are inconsistent and tend to be easily forgotten. “All in all life satisfaction” is a well-known example of a survey question yielding such data.
Another flaw in survey data is often caused by the multiple-choice question; Müller takes the example of questions about the political orientation of citizens, to which the answer could be either right or left wing. In his example, 51% of the respondents checked both options. This, he claims, clearly shows how our cognitive organization has (and should have) inconsistencies. Furthermore, the arrangement of under-learned facts such as work satisfaction across Europe is nearly homogeneous, while working conditions tend to vary widely. We should therefore develop skepticism toward this type of data.
In the next few decades, big changes lie ahead in the way societies gain knowledge about their internal states, Muller says. Surveys change; new ‘deliberative online surveys’ are surfacing which offer the user information support and open time horizons in order to stimulate informed decision-making. Furthermore, from tiny islands of survey data, we will move toward an ocean of transactional data. Previously trivial assessment models will become more complex with a strong potential for visual analysis. Moving toward a map for deep search methods and designs.
Concluding his presentation, which is now running out of time, Müller briefly presents what he terms the evolution of information and societies in terms of code systems, in four stages:
Stage 1: Darwin-Societies (4 bio. years – 500 mio. years): Genetic Code,
Stage 2: Polanyi-Societies (500 mio years – 1 mio. year): Implicit Practices, Communication, Neural Code,
Stage 3: Piaget-Societies: (1940/1950 – 1 mio. year): The Age of Languages, Scriptures, Symbolic Codes,
Stage 4: Turing Societies (from 1940/50 onwards): Turing Creatures (app. 100 bio.), Machine-Code-Based, Man-Turing Creature-Interactions, and Societal Deep Search.
Closing Discussion (some excerpts)
Felix Stalder notes that one of the recurring themes at Deep Search ll is seeing metadata as a way of dealing with exponential growth and unstructuredness, as well as the notion of ‘rent’ as the commercial exploitation of extracted metadata. But also: How do we create knowledge about knowledge, and what are the politics of metadata without imposing value systems? Elisabeth van Couvering feels that the Web has been quite successful precisely because there is no such hierarchy or local agreements. Many metadata schemes have come and gone, but we are still facing the fact that we cannot trust metadata in terms of who assigns what, except by looking at the source. Felix Stalder then clarifies that he sees Pagerank as a giant metadata system – it produces data about documents that is becoming proprietary. Isn’t there a way around this?
Matteo Pasquinelli comments that we are now entering a new phase of theory or activism; do we want more clarity or more obscurity? He is curious as to the political orientation of this discussion, which he feels has not developed much in the past few years. mc Schaefer emphasizes that the document isn’t the smallest measurement for action, it is the data. Right now, in the way it is delivered to us it takes a lot of care to make data useable at all. Even on a non-political level we haven’t figured out how to best work with it. What seems clear is that we should move away from corporatization and towards these new potential spaces of data data, as opposed to data from documents.
Sebastian Giesmann asks the presenters; if we want to regain the search technologies that are out there, should we privatize them or make them public? Is having access to an RSS feed enough, as mc schraefel mentioned? Private actors seem better able to set up the technology side of it. Should we have a European Search Engine, more recommendations in social networking sites – where do you see this going? mc schraefel replies that as long as there is capitalism, there will be innovation. If search engines could be seen as a utility, as Elisabeth mentioned before, we should take a look at how public utilities are doing. Selling off the backbone of the Internet to private Telco’s by the Clinton administration got the business and investments going. Google really is a metaphor for the state of society, it is not about search so much. The Web is so woven into our life model, the question is what party runs it best? Should it be run by the government as a public utility? After all, she argues, governments are selling off their public utilities.
Felix notes that it is a very depressing choice between Google and the State, perhaps we are asking the wrong questions. Matteo mentioned open source and free culture – in between the state and market category, that is the space to think about. Some layers of regulation are necessary, but the state is the least one to have any influence on. Matteo notes that also for distributed computing, we need computing power – it is the same discussion for the governance of the net. What are the options for counter-governance? mc schraefel the mentions the post-earthquake Haiti mapping activities. They were not ordered by the government or Google, but arose because people had tools and were able do that. Enabling spontaneous goodness – there is a counter layer, or a mid-layer. Chad Wellmon comments that we also have other institutional models that engage Google or the state. For instance there is Google books, an institutional collaboration between Google and several universities that, while not too democratic, is an alterior model too.
In the final question round, an audience member wonders how search will move beyond factual knowledge. Elisabeth van Couvering notes that prior to her research on search engines, she never considered the web to be a place of information and facts but one of friends, entertainment, learning, reading etcetera. The Web isn’t all about information, and search isn’t either. mc schraefel adds that in terms of metadata and privacy, the best things come of love and lust online. Matching criteria in dating sites, privacy algorithms and cryptography in the porn industry.
As a final remark in response to my own question as to how we may move on with search research, perhaps on a European level, after the current conference, Felix comments that Deep search ll, except from being a sequel to the first Deep Search, is also the fourth conference within a broader network of research that has aimed to develop a certain cultural awareness on search engines over the last three or four years. We are in the process of thinking about creating a relatively loose network that may produce some density within the discursive field between these episodic events and gatherings.
The Society of the Query collaborative research blog is a first attempt to gather dispersed research on this topic and perhaps discover emerging themes. Comments and ideas as to the future of these collaborations and cultural search research in general are very welcome; please send an email to Srividya Balasubramanian: srividya (at) networkcultures (dot) org.