Rethinking Search: 010011

Posted: June 17, 2013 at 1:27 pm  |  By: Frederiek Pennink  | 

“An anti Google. Or, Maimonides builds a Wiki,” explains composer and poet Chris Mann in a short text on the website of The Jewish Museum in New York – the museum that commissioned the rather particular search engine 010011. Together with Sepand Ansari, Mann has designed a ‘machine’ that is “[a] celebration of the question you are trying to learn how to ask. A machine for making sense.”

A digital desk for research
Mann likens 010011 to a working space similar to that of a desk. Thus, in the place of articles, and some books and notebooks the machine uses digital texts, dictionaries and some links to Twitter and Wikipedia. Just as a desk is convenient for juxtaposing sources of information, 010011 allows all sources of information to be dragged anywhere within the working space it provides. When you enter a keyword in the search box of 010011 it returns lists with texts that use this keyword. When you click on one of these texts it expands and forms a new element that you can drag anywhere you want. You can choose to keep searching in the same list, but it is also possible to enter another keyword and thereby create a new list with texts (still on the same page). There is no limit to the amount of lists you can create or to the amount of keywords you can use.

An important element of 010011 is that you can connect keywords in the opened texts by dragging lines between them. When these keywords are connected a ‘synthesized’ text will form a new element. 010011 also includes a separate box in which you can make notes and it is also used for links and suggestions for possible supplementary searches. To get an idea of what the search engine looks like (although, of course, it’s best to go see for yourself) see the picture given above of a page with lists, links and synthesized texts (the texts in green, blue and red).

The anti Google
So, how does this ‘digital desktop’ form an ‘anti Google’ as Mann proposes? In a reply to questions asked via email, Mann elaborates that 010011 is a way of interrogating links and ideas, and that it employs the user to synthesize responses and connections. In so doing, he argues, it becomes a way of thinking. In contrast to other web search engines that focus on providing you with the most relevant answers, 010011 does not return any ‘answers’ to ‘questions’. Instead, it provides you with sources with which you can formulate questions. In Mann’s words: “while Google is a repository of facts and statements dressed as answers which are dedicated to trying to get you to ask the right questions, 010011 is a way of celebrating the question.” It is a machine for making sense in that it requires your input to make sense of the data it provides based on your keywords. It is an active process, and as Mann concludes in the text on website of The Jewish Museum: “Knowledge after all is only knowledge if it’s in motion.”

To seek information or to answer questions
What is interesting about 010011’s goal to ‘celebrate the question’ is that it introduces questions about what it exactly is that we expect from web search engines. Are we looking for information to make sense with or are we looking for ‘definite answers’ (or both)? To see what we expect from search engines let us look at how we define them. The online Merriam Webster dictionary gives the following definition:

“computer software used to search data (as text or a database) for specified information; also : a site on the World Wide Web that uses such software to locate key words in other sites”

It is notable that the definition only states that you search for data for specified information and that it doesn’t mention questions or answers. Consequently, the definition perfectly describes what 010011 does, but it is questionable whether it also describes what Google does or aims to do. Take a look at what Google’s chief executive Eric Schmidt told journalists in a 2007 interview with the Financial Times:

“The goal is to enable Google users to be able to ask the question such as ‘What shall I do tomorrow?’ and ‘What job shall I take?’ ”

If it aims to answer such personal questions, it could well be argued that the information users of Google (are stimulated to) seek can no longer be specified as ‘data (as text or a database)’.  The results that Google returns would thus be better defined as answers instead of data.  As the space of this blog is too limited to elaborate on the nature of data a definition taken from the online Merriam Webster dictionary will have to do.  It defines ‘answer’ as following:

a : something spoken or written in reply to a question
b : a correct response <knows the answer>

And ‘data’ is defined as:

[F]actual information (as measurements or statistics) used as a basis for reasoning, discussion, or calculation

Mann’s description of Google as “a repository of facts and statements dressed as answers” thus seems quite fitting. In a way, Google resembles an oracle more than it does a search engine.

Data or a friend’s suggestion?
What Mann thus shows us with 010011 is that we might have to reconsider what we expect from search engines. With techniques such as personalization, location awareness and, especially, semantic search techniques, search engines are stimulating searchers to use questions to search for answers rather than using keywords to search for data. These techniques give the impression that they perfectly understand our questions, even if the keywords we use are vague and ambiguous. A good example of how these techniques are currently being used is Google’s recent development of showing instant results with the so-called Google Knowledge Graph. It suggests that it knows what you want to know, and as a consequence, searchers might expect that Google not only knows all the answers, but also that it understands and knows all questions. Stimulated by this belief searchers might then indeed turn to personal questions such as “what job shall I take?” However, if searchers at the same time expect that search engines handle ‘data’ – factual information – this might lead to confusion about what a search engine truly is and does. Searchers should be able to recognize that the ‘instant answer’ given by Google on such a personal question is merely a subjective interpretation (based on algorithms) of the ‘limited’ amount of data it has collected. It is more or less similar to having a friend answer something like: “Based on how well I know you, your location and the vacancies I know of, I think it would be best to go for job ‘x’” (with the difference that of course Google has a far broader knowledge base to derive its answer from). The information provided by a search engine on such personal questions can never be more factual than what a friend would suggest (since it is evenly biased), and it is important that searchers also do not expect it to be more factual than that.

Making sense of search
To summarize and conclude: the search engine 010011 has demonstrated that there is a difference between searching for answers and searching for data. It is important to recognize and understand what kind of results we expect from a search engine in order to avoid confusion between different types of search. Some search engines can be perfectly used to search for data and links to sources, whereas others are great for distilling answers from data they have collected from their users. They all have their merits, but the differences between them have to be clear. Perhaps we should look for new definitions of search that take into account our expectations.

This is the second blog in the ‘Rethinking Search’ series. The first blog was on YossarianLives and lateral thinking.

 

Creating poems with Google’s autocomplete

Posted: June 7, 2013 at 2:38 pm  |  By: Frederiek Pennink  | 

There has been a lot to do about Google’s autocomplete function. A quick search produces articles on how it has been accused of perpetuating prejudices and that it has just had its first day in court in Germany. On this blog too there has been a post about the ‘politics of autocomplete’ which discusses how it may violate personal rights and that it can be politically-loaded and controversial

Users of the Reddit page ‘googlepoems’, however, have uncovered another, more good-natured, side to this much-discussed function. The combination of suggestions made by autocomplete can at times, namely, be quite poetic and humorous. See this example that was posted on the blog googlepoetics.com.

What is perhaps most interesting about these poems is that they give you a sense of what matters to people. This varies from the deep and profound, to the mundane – all within the space of a single poem. To see some examples, visit the Reddit page ‘googlepoems’ and the blog Google Poetics, or have some fun by creating your own.

Book Review: Google and the Culture of Search (2013)

Posted: May 28, 2013 at 9:55 am  |  By: Frederiek Pennink  | 

‘Google and the Culture of Search’, written by Ken Hillis, Michael Petit and Kylie Jarrett, begins with an anecdote about Dave and Justin who are setting up a new business. Reflecting on how they have used Google in their preparations, Justin mentions how he views Google as a useful ‘thing’ that has helped them through ‘everything’, whereas Dave dismisses this as magical thinking and recalls how he actually had to ‘dig into’ this ‘magical box’ that Justin is describing (p. xi – xii).  What is interesting about these views on Google – according to the authors – is that they illustrate that when “online search enters the picture”, material forces (business practices and the world of political economy) intersect and overlap with metaphysical forces (the divine and magic) (p. xii). In the rest of the chapters, then, the authors set out to explore and elaborate on why this might be so.

Google as a consecrated actor
One of the main arguments they make in explaining the presence of metaphysical forces in the ‘culture of search’ is that Google has achieved a ‘consecrated status’. Bourdieu describes consecration as a form of ‘cultural legitimacy’ from peers and elite audiences and argues how it allows an actor “to define what constitutes a field’s best practice and in so doing also to influence the field’s internal dynamics” (p. 43). Google, the authors argue, gains its legitimacy by accumulating symbolic capital not only through its ‘nerd status’ and its roots in Californian ideology, but more importantly through its corporate slogan and cornerstone of its brand identity “Don’t be evil”. The authors describe how this accumulating of symbolic capital has shaped the idea that Google has a ‘higher calling’ and how this image is further stimulated by the firm’s wide range of philanthropic, environmental, and social justice issues that it supports (ibid.). In return – and much in line with how Bourdieu described the role of a consecrated actor – Google gets to define what ‘good search’ means and in so doing their algorithms actively shape how we encounter information in our culture of search and therefore how we come to know (p. 52). Most searchers do not know how the black box of search returns relevant results so successfully (and thereby comfortably); they only know that it does. It barely needs saying how this acceptance of ‘business-as-God’ has made Google a rather powerful entity.

From Neoplatonism to the World Brain
What follows is a whirlwind of theories, concepts, arguments and examples that serve to logically explain and reflect on Google’s ambition to organize all the ‘world’s information’. What lies at the core of this ambition? How exactly do the metaphysical and material forces intersect? And should we look into ways of curbing Google’s material, economic and metaphysical power, for example by appealing to the purportedly forgotten collective ‘we’ (p. 198)? The authors leave nothing unexplored as they move from concepts such as ‘nous’ and Neoplatonism to Borges’ dystopian writings on the Library of Babel, to Wells’ ideas of the World Brain, and to reflections on the reunification of the godhead with technology. With this all-embracing approach, the authors of Google and the Culture of the Search provide their readers with a very thorough and precise text, but it should be added that because of their extensive use of philosophical concepts and terms it forms a demanding read for readers who lack such a theoretical background

What does Google want?
Finally, one of the things that stands out in Google and the Culture of Search is its sharp interpretation and analysis of ‘what Google wants’ and what this says about our current  ‘culture of search’. What is central to Google’s aim to organize all the world’s information is, namely, to answer the question “What shall I do tomorrow?” For the authors of Google and the Culture of Search this question illustrates both the arrogance and naivety of Google’s utopian project. It shows many similarities with fundamentalist belief systems that think of their ‘way’ as the only ‘way’. It ignores the moral lesson of the Tower of Babel myth that the construction of a universal One always goes hand in hand with hubris, immoral power imbalances, and deep loss (p. 202). They quote Fredric Jameson’s assessment of the political value of utopia to stress how their value lies in showing us the “ideological shackles under which we currently labor” (p. 203).

In this assessment of Google’s utopian vision, then, Google and the Culture of Search stimulates readers to think about the challenges posed by Google’s combining of metaphysical forces with material forces. Do we want the business-as-God that Google is to define our existence, by updating the cogito to “I search therefore I am”? In a time in which the question “What did you do before Google? has become important both epistemologically and ontologically, Google and the Culture of Search makes a provoking read and provides many new insights into how Google (problematically?) positions itself in our ‘society of the query’.


Source
Ken Hillis, Michael Petit, and Kylie Jarrett. Google and the Culture of Search  (2013). Routledge, New York, NY

Interested in online search and the power of Google? The INC is looking for an intern

Posted: May 21, 2013 at 10:56 am  |  By: Serena Westra  |  Tags: , , , ,

Internship at INC
Institute of Network Cultures (INC) is a media research centre that actively contributes to the field of network cultures through research, events, publications and online dialogue. INC was founded in 2004 by media theorist Geert Lovink as part of the Amsterdam University of Applied Sciences (Hogeschool van Amsterdam). The institute acts as a framework sustaining several research projects, conferences, meetings and publications.

The institute is looking for a research intern with production skills and overall research skills for the Society of the Query #2 conference. The internship runs for 3 months to half a year (to be decided on), and starts August 26, 2013.

Society of the Query

Society of the Query #2
On November 7th and 8th 2013 the Institute of Network Cultures will organize the second Society of the Query event, focusing on online search, search engines and alternatives strategies. The aim of Society of the Query is not only to open up new perspectives by bringing together scholars from different relevant disciplines (e.g. information science, sociology, media and communication) but also to increase the public’s awareness and knowledge about the societal and cultural implications of web search. This will include artistic approaches, which may help us to question this highly routinized practice.

In 2009, the conference Society of the Query has already tackled a number of these questions. While this has contributed to a better understanding of the impact of search engines, many open questions remain and the dynamics in this field have led to new ones: How does the rise of the social web affect search engines and the practices around them? Which consequences do innovations like personalization, localization or autocomplete have? How can we re-think the established search routines? With the second conference and the publication of a reader in the beginning of 2014 answers are sought and the debate around web search is restarted.

For more information about Society of the Query visit:
http://networkcultures.org/query

For the event to be organized on the 7nd and 8th of November 2013 Amsterdam INC is looking for a JUNIOR RESEARCHER (internship), with PRODUCTION skills.

We are looking for an enthusiastic, energetic, inquisitive and precise (former) student with knowledge and interests in the field of new media. As the conference has an international scope, active English skills are required, in speaking and writing. You are strong socially and theoretically. The Institute of Network Cultures offers you the possibility for an internship of a period of 3 months to half a year (to be decided on), starting from August 26, 2013.

Research tasks:
• Attend meetings • write literature reviews on subjects related to the Society of the Query research initiative • collect interesting and relevant literature • assist the program committee with writing the program (which will be published on the website and in print) • write contributions for the Society of the Query blog

Production tasks:
• attend meetings • work as part of the organizational team • prepare the congress location • be responsible for registration • assist at developing and executing a communication plan • add and update the program to the blog

The internship will take up 4 days a week. Less days is negotiable. Please note that the workload will be less in the beginning of the internship, building up towards a few weeks of full-time work around the actual event.

For further information you can contact Miriam Rasch: miriam[@]networkcultures.org.
Applications: if you are interested please send your motivation (1 page) and your CV to
miriam[@]networkcultures.org by the 9th of June 2013.

Rethinking Search: YossarianLives!

Posted: May 16, 2013 at 1:52 pm  |  By: Frederiek Pennink  |   |  1 Comment

In a series of blogs I will look at alternative search engines in order to rethink the standard concept of “search”. I will explore what alternative ways of using algorithms to organize knowledge there are and what we can learn from these new methods. The first alternative search engine I will look at is YossarianLives!

Discussions of the ‘society of the query’ or ‘the culture of search’ often feature the critique that search engines isolate us intellectually. Web companies do their best to tailor their search results to suit our preferences, which is why we hardly come across conflicting viewpoints or other information that can broaden our view. This problem was most famously addressed in the book The Filter Bubble by Eli Pariser. The logical next step is thus to wonder how we can avoid this bubble, and this is where YossarianLives! comes into view.

Introducing metaphorical search
YossarianLives! was named after the main character in Joseph Heller’s novel Catch-22 to highlight the ‘catch’ that comes with using search algorithms. They create a paradoxical situation since their use “both simultaneously helps us […] and hurts us”; they offer us guidance in the big jungle of existing knowledge, but this comes at the cost of the reinforcement of that same knowledge (description from the ‘about’ section of the homepage). To solve this dilemma, the makers of YossarianLives! aimed to create an engine with which to assist searchers in the search and creation of new knowledge. The result is what has now been coined a ‘metaphorical search engine’. YossarianLives! uses your query and translates it into a concept, after which it analyzes the structural components and attributes of the concept. After this it searches for concepts with similar structural attributes, to finally return a result from an entirely new domain. To illustrate the difference between regular search and metaphorical search see the difference when using the query ‘question’ in Google Images and YossarianLives!:

question

Unlike, say, Google Images, YossarianLives! returns results that have no immediately obvious relationship with the query. It is up to you to discover a metaphorical link between the concepts. However, if you think the results are too obvious or too vague, there is also the option to change the ‘conceptual distance’ from ‘moderate’ to ‘distant’ or ‘close’. YossarianLives! is a visual search engine and thus only shows images as results.

Loosening up thought patterns
Because of its use of metaphorical connections YossarianLives! recalls the practice of lateral thinking. This term – coined by Edward de Bono in 1967* – refers to the method of approaching problems in a creative and non-linear manner. De Bono explains that the most basic principle of lateral thinking is that “any particular way of looking at things is only one from among many other possible ways” (1967, p. 63). Instead of “moving straight ahead with the development of one particular pattern”, in lateral thinking one tries to produce as many alternative patterns as possible. An important aspect of lateral thinking is thus the search for alternatives. Whereas with “normal” search people tend to stop when they come to a promising approach, with a lateral search for alternatives “one acknowledges the promising approach and may return to it later, but one goes on generating other alternatives” (ibid.). De Bono believes that through this method people can make more thoughtful choices since they will be more aware of all the possible alternatives, and this knowledge will thus add value to the final decision. Besides that, he also thinks that lateral thinking is a great tool for getting new insights, since the method loosens up rigid patterns and thereby provokes new patterns of thought.

Challenging the filter bubble
Lateral thinking seems to provide a useful insight into how YossarianLives! challenges the dilemma of the filter bubble. Currently, most search tools focus on narrowing down our choices as much as possible to return highly relevant results only. When the results are not what we are looking for we adjust the query to something that will affirm our ‘rigid thought patterns’. We are stimulated to think vertically by digging deeper and deeper ‘in the same hole’. It is fair to assume – as De Bono does as well – that most people will stop at the very first promising result they come across and that this leads to the passive search behaviour that lies at the heart of the filter bubble. YossarianLives!, however, challenges this passive behaviour by having the searchers asses and create the relevance of the results themselves. In the example for which the query “question” was used in the metaphorical search engine (see picture) there were many results that did not seem relevant at all. Consequently, different searchers will end up with completely different interpretations and choices. If you compare this metaphorical approach to the vertical approach of Google Images, it becomes clear that a vertical search approach merely seems to reinforce ‘rigid thought patterns’.  The query “questions” unsurprisingly corresponds with question marks and “love” with countless pictures of hearts.

Imagining lateral search
So what can we learn from YossarianLives! or more generally from lateral thinking? It could well be argued that the main promise of this “alternative approach” (as opposed to the vertical approach) lies in what De Bono described as the very purpose of lateral thinking, namely “to help develop the habit of looking for alternatives instead of blindly accepting the most obvious approach” (1967, p. 64). Especially for children growing up in the ‘society of the query’ it seems essential to have some sort of tool that can stimulate them to become aware of their own search behaviour. At the same time it should be stressed that lateral thinking should not be viewed as the one and only ‘magical method’ to think creatively with. De Bono emphasizes throughout his book how it should be used as a useful tool alongside of vertical thinking. For factual queries and the exploring of a very specific concept it would of course be of little use to deliberately keep on searching for alternatives. Another problem that the makers of YossarianLives! encountered is how their metaphorical results can be hit-or-miss: “a metaphor that is astonishing and insightful to one person can be mundane to another, and meaningless to someone else” (see this blog on the ‘Stephen Fry problem’). This problem illustrates the difficulty of coming up with algorithms that successfully master the conceptual distance: results should neither be too random nor too obvious.

To summarize this blog then: if we agree that the approach towards search adopted by YossarianLives! corresponds with the lateral method of thinking as described by De Bono, we can conclude that a lateral search engine could form a useful tool to challenge the filter bubble.

*Reference: de Bono, Edward (1967). The Use of Lateral Thinking. New York: Penguin

For more alternative search engines check out the list we compiled here.

 

Call for contributions: Society of the Query Reader

Posted: May 7, 2013 at 12:52 pm  |  By: Miriam Rasch  | 

picture-3CALL FOR CONTRIBUTIONS (print / PDF)

Society of the Query Reader

The INC Reader Series, edited by Geert Lovink, give an overview of the present day research, critique, and artistic practices in a thematic research field at once broad and limited. The set up is multidisciplinary, with academic (humanities, social sciences, software studies etc.), artistic, and activist contributors.

Following the success of the previous INC readers we would like to put together an anthology with key texts considering online search and search engines. In parallel with the second Society of the Query conference which will take place in Amsterdam on November 7-8 2013, the Institute of Network Cultures is devoted to produce a reader that brings together actual theory about the foundation and history of search, the economics of search engines, search and education, alternatives, and much more.

This publication is edited by René König and Miriam Rasch, and produced by the Institute of Network Cultures in Amsterdam, to be launched early 2014. It will be open access and available in print and various digital formats (see below for information on the INC reader series).

POSSIBLE TOPICS
Theory and Foundations of Search // Googlization: Mapping Google’s Dominance // Search Engines and Education // Searching Elsewhere: Non-Western Perspectives // Personalization: Testing the Filter Bubble // Regulation in a Globalizing World // Localization as the New Paradigm // Software Matters: Sociotechnical and Algorithmic Cultures // Showcasing Alternative Search Engines

WE INVITE
Internet, visual culture and media scholars, researchers, artists, curators, producers, lawyers, engineers, open-source and open-content advocates, activists, conference participants, and others to submit materials and proposals.

FORMATS
We welcome interviews, dialogues, essays and articles, images (b/w), email exchanges, manifestos, with a maximum of 8,000 words, but preferably shorter at around 5,000 words. For scope and style, take a look at the previous INC Readers and the style guide (pdf).

WANT TO JOIN?
Send in your proposal (500 words max.) before June 15th, 2013. You may expect a response before July 15th, 2013.

DEADLINE FOR CONTRIBUTIONS
September 15th, 2013.

EMAIL TO
Miriam Rasch (publications Institute of Network Cultures) at miriam[at]networkcultures[dot]org

MORE INFORMATION
Society of the Query: http://networkcultures.org/query
INC readers: http://networkcultures.org/publications

ABOUT THE READER SERIES
The INC reader series are derived from conference contributions and produced by the Institute of Network Cultures in Amsterdam. They are available (for free) in print and pdf; check http://networkcultures.org/publications.

INC Reader #8: Geert Lovink and Miriam Rasch (eds), Unlike Us Reader: Social Media Monopolies and Their Alternatives, Amsterdam: Institute of Network Cultures, 2013.

INC Reader #7: Geert Lovink and Nathaniel Tkacz (eds), Critical Point of View: A Wikipedia Reader, Amsterdam: Institute of Network Cultures, 2011.

INC Reader #6: Geert Lovink and Rachel Somers Miles (eds), Video Vortex Reader II: moving images beyond YouTube, Amsterdam: Institute of Network Cultures, 2011.

INC Reader #5: Scott McQuire, Meredith Martin, and Sabine Niederer (eds.), Urban Screens Reader, Amsterdam: Institute of Network Cultures, 2009.

INC Reader #4: Geert Lovink and Sabine Niederer (eds.), Video Vortex Reader: Responses to YouTube, Amsterdam: Institute of Network Cultures, 2008.

INC Reader #3: Geert Lovink and Ned Rossiter (eds.), MyCreativity Reader: A Critique of Creative Industries, Amsterdam: Institute of Network Cultures, 2007.

INC Reader #2: Katrien Jacobs, Marije Janssen and Matteo Pasquinelli (eds.), C’Lick Me: A Netporn Studies Reader, Amsterdam: Institute of Network Cultures, 2007.

INC Reader #1: Geert Lovink and Soenke Zehle (eds.), Incommunicado Reader, Amsterdam: Institute of Network Cultures, 2005.

CONTACT
Miriam Rasch
Publications Institute of Network Cultures
miriam[at]networkcultures[dot]org
t: +31 (0)20 595 1865

René König
ITAS, Karlsruhe Institute of Technology
kontakt[at]renekoenig[dot]eu
t: +49 (0)721 608 22209

Evaluating Google as an Epistemic Tool

Posted: December 4, 2012 at 6:34 pm  |  By: admin  |  Tags: , , , , , , ,  |  2 Comments

Cross posting from my KMi blog (which has paragraph level commenting enabled to facilitate discussion).

I’ve just read an article which explicitly considers the evaluation of search engines with respect to their epistemic functions under a social epistemological perspective. There’s a pre-print available http://www.phil.cam.ac.uk/teaching_staff/simpson/simpson_index.html and the citation is: Evaluating Google as an Epistemic Tool, Metaphilosophy 43:4 (2012) 426-445.
Interestingly, the article arose from PhD research funded by Microsoft Research Cambridge – so it’s great to see they have an interest in the knowledge implications of their tools, and evaluation of them along those lines. I’ve been thinking about this a bit, but the below was written in a morning, so apologies if it isn’t clear.

The Article

The article suggests:

  • “Search engines are epistemically significant because they play the role of a surrogate expert” (p.428)
  • Search engines should be assessed by:
    1. precision and recall, where precision is a measure of relevance of recalled documents (relevant:irrelevant) and recall is a measure of completeness of the set (relevantrecalled:relevantontheweb) (p.431)
    2. ‘timeliness’ – the duration it takes for searchers to find a relevant link (thus, if the first result on a SERP is relevant, this will give the best score) (p.432)
    3. ‘authority prioritisation’ – they should prioritise those sources which are credible. This could be assessed in the same way as timeliness with relevance replaced by reliability (p.433). Computational markers for such ranking are challenging to achieve. I would suggest that ‘link is in part to measure this quality.
    4. Objectivity – “a search engine’s results are objective when their rank ordering represents a defensible judgement about the relative relevance of available online reports”, thus, if there are 2 sides to a story with an equal quantity of ‘hits’ behind them, ordering such that the first 50% regard one side, and the latter 50% the other lacks objectivity.

Now this last point is particularly interesting and I shall return to it shortly. First I’ll cover what Simpson goes on to argue.

  1. Personalisation in search results occurs
  2. We have a tendency towards confirmation bias, we are biased to information which affirms our prior beliefs
  3. Our searches may seek affirming information, but moreover search engines – in customising our results based on our prior searches and result opening – may also do this
  4. They therefore fail to represent objectively the domain being searched over, instead representing a subset of that relevant domain which affirms the searchers prior beliefs
  5. Personalisation fails ’4′ above.

The point is “When the bubble is based on individual personalisation, only “epistemic saints” are immune from its effects – the people who scrupulously ensure that they seek out opposing views for every search they conduct in which they there are importantly opposed views on the matter. The rest of us one hopes, make the effort to seek out opposing views in serious enquiries”. (p.439)

Simpson thus suggests two solutions:

  1. Rational enquirers should turn off personalisation (throw off the shackles of Plato’s cave!) or use search engines such as DuckDuckGo
  2. There is a prima facie case for regulation of search engine providers because there is a public good in objectivity

Objections & comments

On ‘objectivity’

There is a straw man at play here – we’re asked to imagine the world exactly as it is (with important philosophers from many nationalities), but imagine you search for “important philosophers” and the top 1/3 of results are German, the next 1/3 neither French nor German, and the bottom 1/3 French (p.435). But this isn’t how search engines work – if we assume that 1/3 of all important philosophers are French, 1/3 German, 1/3 neither, and that the internet reflects this to some degree, we can almost certainly assume that there will be at least a ‘mixing’ of philosophers although there might be some bias.

Now, there is a concern that not all the information is on the internet – so there’s an interesting question here about the testimony of silence (when can the absence of testimony be taken to tell you something?) and the epistemic virtues of searchers (what abilities should searchers deploy prior to making assumptions based on the absence of results?) – but these are separate issues. As an aside, an obvious example cases here is in the gender bias of historical artefacts, and the subsequent gender bias on Wikipedia – the silence tells us something, but not what can be taken at face value (that women did very little); we expect good epistemic agents to assess such information in light of the epistemic norms which include some awareness of historiography.

I think this is a pretty minor issue in so far as the actual concern being raised is whether or not the search engine reflects the epistemic environment or not. If not, but – algorithmically – it should, that is of concern because it suggests some downgrading of websites for questionable reasons (they contain French philosophers, they’re not English language sites, etc.). But by meeting the first 3 requirements using some version of the pagerank algorithm this should be less possible – issues regarding the testimony of silence aside. Where such issues remain (despite good algorithms) this is a problem with the epistemic community – the community has failed to highlight relevant links, write articles, connect pages, etc.

So, the issue of objectivity in so far as objectivity goes is, does the SERP reflect the epistemic landscape of the internet, as a whole. I think there is an issue with this in that it is entirely possible to “swamp” issues (e.g. google bombing) such that there are two takes on a concept, but only one is presented – but I do not think this is an issue of search engine objectivity, but rather of the epistemic community, and indeed the community often puts rather a lot of effort into combating this issue, with punitive action taken by search engine providers against those sites that engage in such practices. I will, below, talk about one possible solution to this issue of ‘presenting the various takes’ on an issue.

On the testimonial nature of search engines

This concern relates to what we’re taking the search engine to do. And again, I’d say this is a relatively minor point, but an important one. The article in various places implies that the search engine itself is a surrogate expert, or that it is epistemically responsible for the contents of the epistemic environment (across which it surveys/crawls), or that it provides testimony in itself (the Knowledge Graph is an interesting counter point to me here) – but I do not think this is the epistemic quality of a search engine. Search engines can testify that x is considered a good informant – but that is all. They are good at pointing out experts – and indeed, those are the criteria they are judged against. Simpson does make it clear (p.428) that what search engines testify on, is not the content itself, but rather who might be a good source of testimony. However, I think this causes a problem for some arguments (see the above paragraph). What the search engine represents is qualities of an epistemic community.

On personalisation and objectivity

The crux of Simpson’s article is that personalisation is bad because it breaches objectivity – a criteria against which search engines should be judged. I have raised above one objection to the issue of objectivity. I think there’s also an issue regarding what ‘objective’ means – particularly given the use of a broadly pragmatist epistemology (which I take social epistemology to be), it seems odd to think about objectivity outside of the context of use, and for some uses personalisation may be important and indeed the only way to make sense of information, while for others it might be rather more pernicious. Here I raise some more issues in the specific context of personalisation.

Geolocation and (good) personalisation – the redundancy of ‘nearby’

Context sensitive points at which when we say “is there a restaurant” the “!nearby” is almost redundant – it is entirely sensible for search engines to do the same thing. That is completely within what we would understand by ordinary testimony, but still involves personalisation. This isn’t an issue of “find me the nearest expert” – because the web makes that somewhat redundant – rather it’s an issue of “find me the most relevant expert”; and that just happens to be regarding nearby restaurants.

Geolocation and (good) personalisation – understanding local customs, and requirements for deep/shallow information

Two people search for things in two different countries, we don’t need to imagine a benign dictator (or brainwashed populace) to think that there might be normative reasons why that population would be searching for a different ‘angle’ on the topic to the international population. For example, particularly religious or (perhaps less controversially) cultural practices of a fairly benign but ‘unusual’ nature (a festival or something). A local search engine could do the job, google needs to personalise to do so. This seems intuitively appropriate. In fact, the exact same information could be presented both to a ‘local’ and ‘non-local’, but with locals receiving results which gave more detail – including but not limited to, geographic information, local websites regarding upcoming events, and so on. Again, this is in line with testimonial knowledge – a search engine might direct someone to me to give a novice an overview of some topic, while sending me to some other website. This of course hints at another facet of search personalisation – expertise might matter, informants that do not take into account the capabilities of their audience to understand their testimony fail to give knowledge in their testimonial role.

Geolocation and (bad) personalisation – living with a racist

We can also imagine a context situation in which two neighbours get two different sets of results, through no impact of their own behaviours (perhaps one of them has a racist housemate and the other doesn’t). In all other ways, the two users are the same. Let’s say they have no knowledge of some civil rights movement figure, but there are a number of websites on that figure including some run by white supremacist groups (as is the case for Martin Luther King). We don’t have to imagine the 50/50 division of results described in 4. above to see the concern of objectivity here. Nor do we need to imagine that the searchers have made different searches on this occasion; both have used the same query, but one has received results from across the web, while the other is receiving a results set which is skewed towards the subset of biased (racist, etc.) results. We would hope both users would exercise their epistemic abilities to make judgements on the information found. However, in this case the concern is not only with the individual searcher – although the search engine could do more to help them – it is with the personalisation on which the results were returned, and the appropriateness of customising results for that individual.

Geolocation and (bad) personalisation – group dominance

My understanding of Google’s search in China is that, what they were doing was filtering out search results which those within the Great Firewall would not have been able to access. That is, they were not acting as a filter of themselves, but rather removing results which had already been vetoed by the Chinese Government. (I may be wrong about that – do correct me if I am). We can imagine a case such as this, or another in which the dominant group opens particular results more often, and makes particular searches more often (ones which are close enough that they’d influence returned results), and that these groups are tied to some particular geographic location such that a search engine can personalise results based on this data. In both these cases, the concern is that rather than pagerank or ‘link juice’ or whatever other measure which is designed to take account of the broad epistemic environment, the results are biased towards a particular subset of pages with an epistemic perspective while elsewhere (globally) the full set of results is returned. Further force is added to the example if we imagine an individual in such a country who is attempting to find information on the ‘other side’ – but cannot, due to an imposed personalisation. The knowledge terrain would appear very different to such a searcher, despite their own epistemic virtue. Such an example, however, simply extends the issue – even in a location full of ‘epistemic devils’ search engines should properly be judged if they provide results that fail on some measure of ‘objectivity’, limiting their indications of ‘good informants’ to only a biased subset.

Recall and Objectivity

If we recall Simpson’s criteria for assessment of search engines (as highlighted above):

  1. precision and recall, where precision is a measure of relevance of recalled documents (relevant:irrelevant) and recall is a measure of completeness of the set (relevantrecalled:relevantontheweb) (p.431
  2. ‘timeliness’ – the duration it takes for searchers to find a relevant link (thus, if the first result is relevant, this will give the best score) (p.432)
  3. ‘authority prioritisation’ – they should prioritise those sources which are credible. This could be assessed in the same way as timeliness with relevance replaced by reliability (p.433). Computational markers for such ranking are challenging to achieve. I would suggest that ‘link is in part to measure this quality.
  4. Objectivity – “a search engine’s results are objective when their rank ordering represents a defensible judgement about the relative relevance of available online reports”, thus, if there are 2 sides to a story with an equal quantity of ‘hits’ behind them, ordering such that the first 50% regard one side, and the latter 50% the other lacks objectivity.

One outcome of the preceding argument is, I think, that objectivity – while a useful summary concept – is in fact entailed in the first three. This is because as I have framed it, the concern with search engine personalisation is that they fail with regard to ‘recall’ (they ignore relevant results) as long as ‘authority prioritisation’ and ‘timeliness’ hold. That is, either they fail to return relevant results, or if they do, they are not defensibly ranked such that authority and timeliness hold true – in particular that of ‘authority’ on which the epistemic qualities of the searching agent have little bearing. There are issues related to the presence of ‘poor’ information on the internet, but the epistemic community – which search engines ‘survey’ when they crawl – acts to address these, and does so rather effectively such that blips in search engine informativeness regarding good sources for testimony, are just that – blips. This is not incompatible with their status as objective informants regarding good sources of information, human informants would be in the same position. Furthermore, we expect good informants to be prepared to tell us information they may not agree with, many a courtroom drama is predicated on just such as assumption. The issues of relevance, timeliness (not ‘hiding away’ information in that case), and authority (the credibility of results) are key here too – where testimony is contextualised (but this is not the place to discuss this analogy further/I have no time to right now!).

Summary

I have presented some arguments for personalisation using ordinary standards of testimony, and some against, based on geolocation (although most could be adapted to other features of personalisation).

My suggestion is that it is not the case that personalisation is bad qua unobjective, but that, when giving an objective judgement of testimony – and when making judgements on the likelihood of someone elses testimony meeting your information needs – we expect informants to tell us the substantive assumptions they have made to meet their conclusions. Search engines often fail to do this, except where there is good (often advertising based) reason for them to do so (clarifying geolocation, for example). Furthermore, where these assumptions are explicit, the impact of them is often not made clear.

Suggestions

In so far as I think personalisation is often useful, I think it should remain. However, there are some considerations:

  1. Rather than keeping implicit personalisation facets, they should be made clear – perhaps as added terms in a search query (this would work, for example, for restaurant searches which could add postcodes to queries)
  2. Suggested search features sometimes include elements of such alteration – highlighting ‘deeper’ queries (adding key words) or broader queries (removing query terms). The appearance of multimedia in the main SERP probably helps here too.
  3. Features to indicate why personalisation has occurred might be useful (although, of course this is then open to gaming). It is hard to see how this could be done without “we selected this result because based on your previous searches, you’re a racist” or “you’re a climate change denier”, etc. being returned…
  4. The things we search for are likely to contain epistemic bias (indeed, part of my research interest is on this topic!) which search engines in their current state may not be able to deal with, and which it may not be their place to do so in their role as surrogate experts. For example a search for “Al Gore inconvenient truth” is likely to return rather different results than a search for “Al Gore liar” – this is about the epistemic judgement of searchers in their query formation, not personalisation. Subsequent results may be personalised off the back of such searches (and this may be problematic as discussed above), but in the first instance this is not the concern.
  5. Search engines that use a sort of ‘faceted search’ in which ‘takes’ on concepts are classified such that for a given concept ‘x’ pages which relate to one definition might appear together, while an opposing definition might appear separately. Facets could be added for location, difficulty, and so on. Some of this is already implemented. Some of it is implicit – but should perhaps not be.

Based on ’3′ above, we might argue – as I have indicated in a number of places above – that search engines should not be considered as informants, but as second order informants – informants that they know the answer (as Simpson also indicates). In this case, a significant component of their behaviour should be oriented to covering the web, and avoiding presenting subsets unless there are good reasons for personalisation – which should be made explicit, as we would expect in cases of testimonial knowledge. In some cases, personalisation is the action of a good informant (unless the question is “know of any good restaurants? – oh, but I’m going to Birmingham tonight”), in many such assumptions should be made clear, perhaps by ’5′. Removing the option reduces the quality of the search engine as an epistemic tool. In making the tool a good epistemic tool, unfortunately, bad epistemic agents may still use information poorly; but this is already the case. Agents who chose to bias their results already do so through their queries, and the sites they select. Search engines are not in a position to assess the epistemic agents who use them, but they able to position themselves as good epistemic tools, ones that survey the epistemic landscape, and attempt to avoid undue assumption – that is the role of the good informant.

I think, then, that I’m making some key statements here:

  1. Personalisation sometimes makes sense
  2. There are two issues of objectivity – one regards the recall, authority, and timeliness (and precision possibly) of results in SERPs, the other is to do with the epistemic community
  3. Objectivity of SERPs and of the epistemic community (including the searching agent) should not be conflated, Simpson’s analysis of objectivity and his definition lends itself to conflation. Objectivity for search engines is (probably) entailed in their recall, authority and timeliness (2. above).
  4. Search engines could do more to make it clear the assumptions they make when personalising results

 

On the way to academic search engine optimization?

Posted: November 14, 2012 at 6:20 pm  |  By: admin  |  Tags: , ,

I have recently talked about academic search engine optimization at the EASST conference in Copenhagen. The term was coined by Beel et al. (2010) who suggested that academic papers should be adapted to the technology of search engines in order to be indexed and ranked properly, for example by using correct metadata and carefully selecting and placing keywords. This can be seen as an indicator for the relevance of search engines in the scholarly realm. At the same time, it shows that academic content is not always well-represented in search engines. Beel et al. were focusing on scientific publications in academic search engines such as Google Scholar. But of course we may broaden their approach and wonder about the position of academic content within generic web search engines where it has to compete with countless other perspectives.

Misrepresentation of scientific content

I have already talked about the example of 9/11 (e.g. here and here) which illustrates how the algorithmic selection and ranking mechanisms of search engines can differ from those of traditional gatekeepers (e.g. publishers). In this case, the so-called “9/11 Truth movement” presented a fundamentally different account of the events of September 11 than the official authorities, scientific experts and most mass media institutions: They argued that they have proof for a controlled demolition of the World Trade Center, indicating that 9/11 was an “inside job” orchestrated by the US government. This claim became very popular and was supported by websites which framed themselves as scientific, for example the Architects & Engineers for 9/11 Truth or the Journal of 9/11 Studies. Apparently, the “Truth movement” has established very well-liked networks leading to high rankings of their websites for 9/11-related searches, whereas websites representing e.g. the actual scientific reports by NIST seem rather underrepresented. Besides, many people seem to search for content like this.

Optimization or spam?

Having this case in mind, one could argue that the call for optimizing academic content for search engines is more than necessary if science wants to keep its position as a knowledge authority in modern societies. Of course, 9/11 is just one example to illustrate this development and we may think of others which are more worrisome. For example, commercial actors often dominate the field of health information (Mager 2009). Whenever economic interests are involved, it is not unlikely that they will dominate search engine rankings because they are usually better equipped than non-commercial actors like universities and often also better than academic publishers who to a large extent “hide” articles in the “academic invisible web” (Lewandowski/Mayr 2006). Moreover, most academics will probably regard marketing techniques such as search engine optimization as rather unethical. Indeed, Beel et al. were criticized for promoting “spamming” with their approach. They answered with a study which actually supported this criticism. In order to gain insights into Google Scholar´s vulnerability towards manipulation attempts Joeran Beel and Bela Gipp tested various techniques, for example adding invisible text or creating fake citations by uploading  nonsense publications (citations are used as an important ranking factor in Google Scholar). In short, they were remarkably successful and concluded: “Google Scholar is far easier to spam than the classic Google Search for Web pages” (Beel/Gipp 2010). Apparently, Google Scholar is not very strict when it comes to judging whether content is scientific or not. Often it is enough to publish PDF-files that are structured in the typical way of academic journal articles with sections like “introduction”, “methodology”, “results” and “references”. It seems like broad coverage is more important to Google than scientific rigor. Therefore, it is not surprising that people of the “9/11 truth movement” even lead Google Scholar´s results for the query “WTC 7” (with a paper by Steven Jones, a prominent figure of the “9/11 Truth movement”).


Manipulating Google Scholar by adding invisible text (image credit: Beel/Gipp 2010)

An algorithmic shift

Depending on one´s point of view, Google Scholar´s fairly inclusive policy can be embraced as a democratization of the highly exclusive and selective academic publishing system. It provides scholars access to papers which are not part of this elitist and mostly expensive circle. But the flaws of Google´s automated selection and ranking process are also obvious. As Beel and Gipp have shown, it is pretty easy to manipulate Google Scholar. This does not only open academia´s door for alternative fringe science but also for commercial interests, potentially leading to undesired activities like green washing, underplaying risks of drugs and technologies etc. In any case, there are hardly any possibilities for academics to directly interfere here. Google ultimately defines the relevance of scientific content in their search engines, not academics. Given the wide usage of Google services inside and outside of academia, I think it is justified to speak of an algorithmic shift in information dissemination. Of course, this shift is not limited to Google but includes many more platforms which apply algorithms to organize information. More and more, these algorithms decide over the relevance of scholars and their texts. Evidently, it is still humans who have to assess the actual scientific relevance of publications. But algorithms may ultimately provide or deny visibility. Academics will have to deal with these new mechanisms of inclusion and exclusion. Search engine optimization is one answer to this algorithmic shift but given the vulnerability to manipulation, it is debatable whether it is desirable. Unfortunately, this is more or less the only way to impact the questionable rankings by Google and other service providers.

My related Prezi presentation can be found here. We discuss these developments in greater detail in our book Cyberscience 2.0. Research in the Age of Digital Social Networks

Digital Methods Winter School

Posted: October 25, 2012 at 3:57 pm  |  By: admin  |  Tags: ,

This summer, I attended the summer school of the Digital Methods Initiative at the University of Amsterdam. I can say that I have learned a lot there and I believe that anybody who does research on search engines will find some of the DMI tools useful. My comparative study of Google Autocomplete is one example for their application and you can find many more at the DMI wiki.

I should mention that the DMI team is not only professional but also really nice. Together with an extremely international and interdisciplinary group of participants, a great and productive working atmosphere was created. Therefore, I absolutely recommend attending one of their programs.

The next chance will be the winter school Data Sprint: The New Logistics of Short-form Method, 22-25 January 2013. It is addressed to “PhD candidates, advanced MA students and motivated scholars”. If you have no prior experience with digital methods, the more comprehensive 2-weeks summer school might suit you better. The winter school includes a workshop as well as a mini-conference with the opportunity to present and discuss research papers.

Googling “9/11”: A cross-cultural comparison of suggestions for a loaded term

Posted: October 1, 2012 at 6:22 pm  |  By: admin  |  Tags: , ,  |  1 Comment

In my recent blog post I explained the politics of autocomplete, a feature that suggests queries while they are being typed. We may wonder how this is affected by the trend of creating tailor-made search engine results for specific audiences. How do Google´s suggestions differ from country to country (and language to language)?

Methodology and research question

I tried to gain insights into this question during a small project at the summer school of the Digital Methods Initiative (DMI) this year. With a tool provided by the Digital Methods team I was able to conduct a limited cross-cultural comparison for the query “9/11”. On July 4th and 5th 2012, I crawled the query in 4 languages (English, Arabic, Hebrew, German) in 12 country versions (Australia, Canada, UK, US, Egypt, Lebanon, Palestinian Territories, Iraq, Israel, Germany, Austria, Switzerland). Before I noticed that Google ranked websites with alternative accounts of 9/11 (commonly described as “conspiracy theories”) fairly high (see my post here). Thus, I wanted to know whether this is also reflected in the autocomplete suggestions and if there are differences between the language versions. While the used tool is very helpful to retrieve the suggestions, it does not provide any help for interpreting them. The output comes in form of numerous tables, simply listing the suggestions in the order of their appearance in Google, together with some additional information (see below). Google´s API allows for retrieving ten results (the suggestions have partly been limited to four in the regular search interface).

Output of the DMI autocomplete tool

To interpret this vast amount of data, I categorized each suggestion with Grounded Theory oriented content analysis. That means these categories resulted from the data itself, rather than forcing them onto the data before analysis. The outcome is eight categories which helped to structure the data.

Categories for the suggestions

To get a summarizing overview of all the data, I created a word cloud (with Wordle) which shows all the suggestions in relation to their frequency (the bigger the word, the more often it appeared in the suggestions) and in the color of the categories. The rare cases of suggestions in non-Latin script (Arabic/Hebrew) were translated into English and brackets indicating the original script were added.

Results

The word cloud immediately reveals a striking result: Google´s suggestions for 9/11 over-represent alternative accounts of the event, most notably the word “conspiracy” which was suggested most often. Additionally, a number of “conspiracy-related” queries appear, for example, “truth” (pointing to the so-called “9/11 truth movement” which advocates alternative accounts) or the catch phrase “[was an] inside job”.

Overview: Word cloud of all suggestions

On the contrary, only few suggestions refer to the mainstream account or “neutral” facts. Another frequently suggested term is “memorial”, probably pointing to the 9/11 memorial museum in New York City. Less visible, but also striking are the rare suggestions of queries which are not related to September 11, 2001. These are more common in the Arabic countries, for example in regard to a section in the Quran. The following tables show the categorized results for all the analyzed language and country versions.

Google´s suggestions for the query “9/11″ in English

Google´s suggestions for the query “9/11″ in German

Google´s suggestions for the query “9/11″ in Hebrew

Google´s suggestions for the query “9/11″ in Arabic

The following figure shows only the categories but gives a better overview for direct comparison.

Categorized suggestions in comparison

The differences across the language versions are significant, especially between the Arabic and the Western countries, while there are only slight variations between countries with the same language version. Alternative accounts seem to play a major role in the Arab world, followed by German-speaking countries, Hebrew and finally the English-speaking countries. The latter ones apparently have a more heterogeneous interest in 9/11, including jokes which otherwise only appear in Hebrew but in none of the Arabic or German language versions. In all Western countries “memorial” is the most popular query, whereas all Arabic countries suggest the slogan “was an inside job” first.

Discussion

Google´s suggestions for the query “9/11” dominantly associate it with the events of September 11, 2001. Given the massive global impact of this incident this was expectable, although we can think of other meanings for this query (for example the car Porsche 911, other events on September 11 or the emergency number). The general popularity of alternative accounts is striking. The observation that they are particularly relevant in the Arabic world is supported by a representative study which showed that many people in the Arabic world see other forces behind 9/11 than Al Qaeda. However, we must be careful when we draw conclusions about a society´s opinion based on Google´s suggestions. First of all, we cannot be sure if they really represent what users actually search for, although that´s what Google claims. Secondly, even if autocomplete represents previous queries, they give us only insights into a very specific part of a population, namely those who actively search for the term “9/11” with Google. This also implies a certain language bias. Although the term 9/11 is also commonly related to the September 11 attacks in non-English countries, it is still mostly used in English queries. Only in the German-speaking countries we find combinations where 9/11 is paired with a local expression and related to the incident (e.g. “9/11 wahrheit”, “9/11 ablauf”). Therefore, it might be useful to conduct comparative studies with queries in the local language.

Still, these results give interesting insights into the local differences of Google´s search engine. They show that the autocomplete suggestions vary significantly between language versions but rather slightly between countries within one language.