Updates

Posted: April 25, 2013 at 2:28 pm  |  By: Frederiek Pennink  | 

As René König wrote earlier on this blog – none of the issues discussed in the last Society of the Query conference have become any less relevant. This is why we are returning with a follow-up conference on search in November 2013! For more posts, updates, research and reports of the previous conference, go to this blog.

Evaluating Google as an Epistemic Tool

Posted: December 4, 2012 at 6:34 pm  |  By: sjgknight  |  Tags: , , , , , , ,  |  2 Comments

Cross posting from my KMi blog (which has paragraph level commenting enabled to facilitate discussion).

I’ve just read an article which explicitly considers the evaluation of search engines with respect to their epistemic functions under a social epistemological perspective. There’s a pre-print available http://www.phil.cam.ac.uk/teaching_staff/simpson/simpson_index.html and the citation is: Evaluating Google as an Epistemic Tool, Metaphilosophy 43:4 (2012) 426-445.
Interestingly, the article arose from PhD research funded by Microsoft Research Cambridge – so it’s great to see they have an interest in the knowledge implications of their tools, and evaluation of them along those lines. I’ve been thinking about this a bit, but the below was written in a morning, so apologies if it isn’t clear.

The Article

The article suggests:

  • “Search engines are epistemically significant because they play the role of a surrogate expert” (p.428)
  • Search engines should be assessed by:
    1. precision and recall, where precision is a measure of relevance of recalled documents (relevant:irrelevant) and recall is a measure of completeness of the set (relevantrecalled:relevantontheweb) (p.431)
    2. ‘timeliness’ – the duration it takes for searchers to find a relevant link (thus, if the first result on a SERP is relevant, this will give the best score) (p.432)
    3. ‘authority prioritisation’ – they should prioritise those sources which are credible. This could be assessed in the same way as timeliness with relevance replaced by reliability (p.433). Computational markers for such ranking are challenging to achieve. I would suggest that ‘link is in part to measure this quality.
    4. Objectivity – “a search engine’s results are objective when their rank ordering represents a defensible judgement about the relative relevance of available online reports”, thus, if there are 2 sides to a story with an equal quantity of ‘hits’ behind them, ordering such that the first 50% regard one side, and the latter 50% the other lacks objectivity.

Now this last point is particularly interesting and I shall return to it shortly. First I’ll cover what Simpson goes on to argue.

  1. Personalisation in search results occurs
  2. We have a tendency towards confirmation bias, we are biased to information which affirms our prior beliefs
  3. Our searches may seek affirming information, but moreover search engines – in customising our results based on our prior searches and result opening – may also do this
  4. They therefore fail to represent objectively the domain being searched over, instead representing a subset of that relevant domain which affirms the searchers prior beliefs
  5. Personalisation fails ’4′ above.

The point is “When the bubble is based on individual personalisation, only “epistemic saints” are immune from its effects – the people who scrupulously ensure that they seek out opposing views for every search they conduct in which they there are importantly opposed views on the matter. The rest of us one hopes, make the effort to seek out opposing views in serious enquiries”. (p.439)

Simpson thus suggests two solutions:

  1. Rational enquirers should turn off personalisation (throw off the shackles of Plato’s cave!) or use search engines such as DuckDuckGo
  2. There is a prima facie case for regulation of search engine providers because there is a public good in objectivity

Objections & comments

On ‘objectivity’

There is a straw man at play here – we’re asked to imagine the world exactly as it is (with important philosophers from many nationalities), but imagine you search for “important philosophers” and the top 1/3 of results are German, the next 1/3 neither French nor German, and the bottom 1/3 French (p.435). But this isn’t how search engines work – if we assume that 1/3 of all important philosophers are French, 1/3 German, 1/3 neither, and that the internet reflects this to some degree, we can almost certainly assume that there will be at least a ‘mixing’ of philosophers although there might be some bias.

Now, there is a concern that not all the information is on the internet – so there’s an interesting question here about the testimony of silence (when can the absence of testimony be taken to tell you something?) and the epistemic virtues of searchers (what abilities should searchers deploy prior to making assumptions based on the absence of results?) – but these are separate issues. As an aside, an obvious example cases here is in the gender bias of historical artefacts, and the subsequent gender bias on Wikipedia – the silence tells us something, but not what can be taken at face value (that women did very little); we expect good epistemic agents to assess such information in light of the epistemic norms which include some awareness of historiography.

I think this is a pretty minor issue in so far as the actual concern being raised is whether or not the search engine reflects the epistemic environment or not. If not, but – algorithmically – it should, that is of concern because it suggests some downgrading of websites for questionable reasons (they contain French philosophers, they’re not English language sites, etc.). But by meeting the first 3 requirements using some version of the pagerank algorithm this should be less possible – issues regarding the testimony of silence aside. Where such issues remain (despite good algorithms) this is a problem with the epistemic community – the community has failed to highlight relevant links, write articles, connect pages, etc.

So, the issue of objectivity in so far as objectivity goes is, does the SERP reflect the epistemic landscape of the internet, as a whole. I think there is an issue with this in that it is entirely possible to “swamp” issues (e.g. google bombing) such that there are two takes on a concept, but only one is presented – but I do not think this is an issue of search engine objectivity, but rather of the epistemic community, and indeed the community often puts rather a lot of effort into combating this issue, with punitive action taken by search engine providers against those sites that engage in such practices. I will, below, talk about one possible solution to this issue of ‘presenting the various takes’ on an issue.

On the testimonial nature of search engines

This concern relates to what we’re taking the search engine to do. And again, I’d say this is a relatively minor point, but an important one. The article in various places implies that the search engine itself is a surrogate expert, or that it is epistemically responsible for the contents of the epistemic environment (across which it surveys/crawls), or that it provides testimony in itself (the Knowledge Graph is an interesting counter point to me here) – but I do not think this is the epistemic quality of a search engine. Search engines can testify that x is considered a good informant – but that is all. They are good at pointing out experts – and indeed, those are the criteria they are judged against. Simpson does make it clear (p.428) that what search engines testify on, is not the content itself, but rather who might be a good source of testimony. However, I think this causes a problem for some arguments (see the above paragraph). What the search engine represents is qualities of an epistemic community.

On personalisation and objectivity

The crux of Simpson’s article is that personalisation is bad because it breaches objectivity – a criteria against which search engines should be judged. I have raised above one objection to the issue of objectivity. I think there’s also an issue regarding what ‘objective’ means – particularly given the use of a broadly pragmatist epistemology (which I take social epistemology to be), it seems odd to think about objectivity outside of the context of use, and for some uses personalisation may be important and indeed the only way to make sense of information, while for others it might be rather more pernicious. Here I raise some more issues in the specific context of personalisation.

Geolocation and (good) personalisation – the redundancy of ‘nearby’

Context sensitive points at which when we say “is there a restaurant” the “!nearby” is almost redundant – it is entirely sensible for search engines to do the same thing. That is completely within what we would understand by ordinary testimony, but still involves personalisation. This isn’t an issue of “find me the nearest expert” – because the web makes that somewhat redundant – rather it’s an issue of “find me the most relevant expert”; and that just happens to be regarding nearby restaurants.

Geolocation and (good) personalisation – understanding local customs, and requirements for deep/shallow information

Two people search for things in two different countries, we don’t need to imagine a benign dictator (or brainwashed populace) to think that there might be normative reasons why that population would be searching for a different ‘angle’ on the topic to the international population. For example, particularly religious or (perhaps less controversially) cultural practices of a fairly benign but ‘unusual’ nature (a festival or something). A local search engine could do the job, google needs to personalise to do so. This seems intuitively appropriate. In fact, the exact same information could be presented both to a ‘local’ and ‘non-local’, but with locals receiving results which gave more detail – including but not limited to, geographic information, local websites regarding upcoming events, and so on. Again, this is in line with testimonial knowledge – a search engine might direct someone to me to give a novice an overview of some topic, while sending me to some other website. This of course hints at another facet of search personalisation – expertise might matter, informants that do not take into account the capabilities of their audience to understand their testimony fail to give knowledge in their testimonial role.

Geolocation and (bad) personalisation – living with a racist

We can also imagine a context situation in which two neighbours get two different sets of results, through no impact of their own behaviours (perhaps one of them has a racist housemate and the other doesn’t). In all other ways, the two users are the same. Let’s say they have no knowledge of some civil rights movement figure, but there are a number of websites on that figure including some run by white supremacist groups (as is the case for Martin Luther King). We don’t have to imagine the 50/50 division of results described in 4. above to see the concern of objectivity here. Nor do we need to imagine that the searchers have made different searches on this occasion; both have used the same query, but one has received results from across the web, while the other is receiving a results set which is skewed towards the subset of biased (racist, etc.) results. We would hope both users would exercise their epistemic abilities to make judgements on the information found. However, in this case the concern is not only with the individual searcher – although the search engine could do more to help them – it is with the personalisation on which the results were returned, and the appropriateness of customising results for that individual.

Geolocation and (bad) personalisation – group dominance

My understanding of Google’s search in China is that, what they were doing was filtering out search results which those within the Great Firewall would not have been able to access. That is, they were not acting as a filter of themselves, but rather removing results which had already been vetoed by the Chinese Government. (I may be wrong about that – do correct me if I am). We can imagine a case such as this, or another in which the dominant group opens particular results more often, and makes particular searches more often (ones which are close enough that they’d influence returned results), and that these groups are tied to some particular geographic location such that a search engine can personalise results based on this data. In both these cases, the concern is that rather than pagerank or ‘link juice’ or whatever other measure which is designed to take account of the broad epistemic environment, the results are biased towards a particular subset of pages with an epistemic perspective while elsewhere (globally) the full set of results is returned. Further force is added to the example if we imagine an individual in such a country who is attempting to find information on the ‘other side’ – but cannot, due to an imposed personalisation. The knowledge terrain would appear very different to such a searcher, despite their own epistemic virtue. Such an example, however, simply extends the issue – even in a location full of ‘epistemic devils’ search engines should properly be judged if they provide results that fail on some measure of ‘objectivity’, limiting their indications of ‘good informants’ to only a biased subset.

Recall and Objectivity

If we recall Simpson’s criteria for assessment of search engines (as highlighted above):

  1. precision and recall, where precision is a measure of relevance of recalled documents (relevant:irrelevant) and recall is a measure of completeness of the set (relevantrecalled:relevantontheweb) (p.431
  2. ‘timeliness’ – the duration it takes for searchers to find a relevant link (thus, if the first result is relevant, this will give the best score) (p.432)
  3. ‘authority prioritisation’ – they should prioritise those sources which are credible. This could be assessed in the same way as timeliness with relevance replaced by reliability (p.433). Computational markers for such ranking are challenging to achieve. I would suggest that ‘link is in part to measure this quality.
  4. Objectivity – “a search engine’s results are objective when their rank ordering represents a defensible judgement about the relative relevance of available online reports”, thus, if there are 2 sides to a story with an equal quantity of ‘hits’ behind them, ordering such that the first 50% regard one side, and the latter 50% the other lacks objectivity.

One outcome of the preceding argument is, I think, that objectivity – while a useful summary concept – is in fact entailed in the first three. This is because as I have framed it, the concern with search engine personalisation is that they fail with regard to ‘recall’ (they ignore relevant results) as long as ‘authority prioritisation’ and ‘timeliness’ hold. That is, either they fail to return relevant results, or if they do, they are not defensibly ranked such that authority and timeliness hold true – in particular that of ‘authority’ on which the epistemic qualities of the searching agent have little bearing. There are issues related to the presence of ‘poor’ information on the internet, but the epistemic community – which search engines ‘survey’ when they crawl – acts to address these, and does so rather effectively such that blips in search engine informativeness regarding good sources for testimony, are just that – blips. This is not incompatible with their status as objective informants regarding good sources of information, human informants would be in the same position. Furthermore, we expect good informants to be prepared to tell us information they may not agree with, many a courtroom drama is predicated on just such as assumption. The issues of relevance, timeliness (not ‘hiding away’ information in that case), and authority (the credibility of results) are key here too – where testimony is contextualised (but this is not the place to discuss this analogy further/I have no time to right now!).

Summary

I have presented some arguments for personalisation using ordinary standards of testimony, and some against, based on geolocation (although most could be adapted to other features of personalisation).

My suggestion is that it is not the case that personalisation is bad qua unobjective, but that, when giving an objective judgement of testimony – and when making judgements on the likelihood of someone elses testimony meeting your information needs – we expect informants to tell us the substantive assumptions they have made to meet their conclusions. Search engines often fail to do this, except where there is good (often advertising based) reason for them to do so (clarifying geolocation, for example). Furthermore, where these assumptions are explicit, the impact of them is often not made clear.

Suggestions

In so far as I think personalisation is often useful, I think it should remain. However, there are some considerations:

  1. Rather than keeping implicit personalisation facets, they should be made clear – perhaps as added terms in a search query (this would work, for example, for restaurant searches which could add postcodes to queries)
  2. Suggested search features sometimes include elements of such alteration – highlighting ‘deeper’ queries (adding key words) or broader queries (removing query terms). The appearance of multimedia in the main SERP probably helps here too.
  3. Features to indicate why personalisation has occurred might be useful (although, of course this is then open to gaming). It is hard to see how this could be done without “we selected this result because based on your previous searches, you’re a racist” or “you’re a climate change denier”, etc. being returned…
  4. The things we search for are likely to contain epistemic bias (indeed, part of my research interest is on this topic!) which search engines in their current state may not be able to deal with, and which it may not be their place to do so in their role as surrogate experts. For example a search for “Al Gore inconvenient truth” is likely to return rather different results than a search for “Al Gore liar” – this is about the epistemic judgement of searchers in their query formation, not personalisation. Subsequent results may be personalised off the back of such searches (and this may be problematic as discussed above), but in the first instance this is not the concern.
  5. Search engines that use a sort of ‘faceted search’ in which ‘takes’ on concepts are classified such that for a given concept ‘x’ pages which relate to one definition might appear together, while an opposing definition might appear separately. Facets could be added for location, difficulty, and so on. Some of this is already implemented. Some of it is implicit – but should perhaps not be.

Based on ’3′ above, we might argue – as I have indicated in a number of places above – that search engines should not be considered as informants, but as second order informants – informants that they know the answer (as Simpson also indicates). In this case, a significant component of their behaviour should be oriented to covering the web, and avoiding presenting subsets unless there are good reasons for personalisation – which should be made explicit, as we would expect in cases of testimonial knowledge. In some cases, personalisation is the action of a good informant (unless the question is “know of any good restaurants? – oh, but I’m going to Birmingham tonight”), in many such assumptions should be made clear, perhaps by ’5′. Removing the option reduces the quality of the search engine as an epistemic tool. In making the tool a good epistemic tool, unfortunately, bad epistemic agents may still use information poorly; but this is already the case. Agents who chose to bias their results already do so through their queries, and the sites they select. Search engines are not in a position to assess the epistemic agents who use them, but they able to position themselves as good epistemic tools, ones that survey the epistemic landscape, and attempt to avoid undue assumption – that is the role of the good informant.

I think, then, that I’m making some key statements here:

  1. Personalisation sometimes makes sense
  2. There are two issues of objectivity – one regards the recall, authority, and timeliness (and precision possibly) of results in SERPs, the other is to do with the epistemic community
  3. Objectivity of SERPs and of the epistemic community (including the searching agent) should not be conflated, Simpson’s analysis of objectivity and his definition lends itself to conflation. Objectivity for search engines is (probably) entailed in their recall, authority and timeliness (2. above).
  4. Search engines could do more to make it clear the assumptions they make when personalising results

 

On the way to academic search engine optimization?

Posted: November 14, 2012 at 6:20 pm  |  By: René König  |  Tags: , ,

I have recently talked about academic search engine optimization at the EASST conference in Copenhagen. The term was coined by Beel et al. (2010) who suggested that academic papers should be adapted to the technology of search engines in order to be indexed and ranked properly, for example by using correct metadata and carefully selecting and placing keywords. This can be seen as an indicator for the relevance of search engines in the scholarly realm. At the same time, it shows that academic content is not always well-represented in search engines. Beel et al. were focusing on scientific publications in academic search engines such as Google Scholar. But of course we may broaden their approach and wonder about the position of academic content within generic web search engines where it has to compete with countless other perspectives.

Misrepresentation of scientific content

I have already talked about the example of 9/11 (e.g. here and here) which illustrates how the algorithmic selection and ranking mechanisms of search engines can differ from those of traditional gatekeepers (e.g. publishers). In this case, the so-called “9/11 Truth movement” presented a fundamentally different account of the events of September 11 than the official authorities, scientific experts and most mass media institutions: They argued that they have proof for a controlled demolition of the World Trade Center, indicating that 9/11 was an “inside job” orchestrated by the US government. This claim became very popular and was supported by websites which framed themselves as scientific, for example the Architects & Engineers for 9/11 Truth or the Journal of 9/11 Studies. Apparently, the “Truth movement” has established very well-liked networks leading to high rankings of their websites for 9/11-related searches, whereas websites representing e.g. the actual scientific reports by NIST seem rather underrepresented. Besides, many people seem to search for content like this.

Optimization or spam?

Having this case in mind, one could argue that the call for optimizing academic content for search engines is more than necessary if science wants to keep its position as a knowledge authority in modern societies. Of course, 9/11 is just one example to illustrate this development and we may think of others which are more worrisome. For example, commercial actors often dominate the field of health information (Mager 2009). Whenever economic interests are involved, it is not unlikely that they will dominate search engine rankings because they are usually better equipped than non-commercial actors like universities and often also better than academic publishers who to a large extent “hide” articles in the “academic invisible web” (Lewandowski/Mayr 2006). Moreover, most academics will probably regard marketing techniques such as search engine optimization as rather unethical. Indeed, Beel et al. were criticized for promoting “spamming” with their approach. They answered with a study which actually supported this criticism. In order to gain insights into Google Scholar´s vulnerability towards manipulation attempts Joeran Beel and Bela Gipp tested various techniques, for example adding invisible text or creating fake citations by uploading  nonsense publications (citations are used as an important ranking factor in Google Scholar). In short, they were remarkably successful and concluded: “Google Scholar is far easier to spam than the classic Google Search for Web pages” (Beel/Gipp 2010). Apparently, Google Scholar is not very strict when it comes to judging whether content is scientific or not. Often it is enough to publish PDF-files that are structured in the typical way of academic journal articles with sections like “introduction”, “methodology”, “results” and “references”. It seems like broad coverage is more important to Google than scientific rigor. Therefore, it is not surprising that people of the “9/11 truth movement” even lead Google Scholar´s results for the query “WTC 7” (with a paper by Steven Jones, a prominent figure of the “9/11 Truth movement”).


Manipulating Google Scholar by adding invisible text (image credit: Beel/Gipp 2010)

An algorithmic shift

Depending on one´s point of view, Google Scholar´s fairly inclusive policy can be embraced as a democratization of the highly exclusive and selective academic publishing system. It provides scholars access to papers which are not part of this elitist and mostly expensive circle. But the flaws of Google´s automated selection and ranking process are also obvious. As Beel and Gipp have shown, it is pretty easy to manipulate Google Scholar. This does not only open academia´s door for alternative fringe science but also for commercial interests, potentially leading to undesired activities like green washing, underplaying risks of drugs and technologies etc. In any case, there are hardly any possibilities for academics to directly interfere here. Google ultimately defines the relevance of scientific content in their search engines, not academics. Given the wide usage of Google services inside and outside of academia, I think it is justified to speak of an algorithmic shift in information dissemination. Of course, this shift is not limited to Google but includes many more platforms which apply algorithms to organize information. More and more, these algorithms decide over the relevance of scholars and their texts. Evidently, it is still humans who have to assess the actual scientific relevance of publications. But algorithms may ultimately provide or deny visibility. Academics will have to deal with these new mechanisms of inclusion and exclusion. Search engine optimization is one answer to this algorithmic shift but given the vulnerability to manipulation, it is debatable whether it is desirable. Unfortunately, this is more or less the only way to impact the questionable rankings by Google and other service providers.

My related Prezi presentation can be found here. We discuss these developments in greater detail in our book Cyberscience 2.0. Research in the Age of Digital Social Networks

Digital Methods Winter School

Posted: October 25, 2012 at 3:57 pm  |  By: René König  |  Tags: ,

This summer, I attended the summer school of the Digital Methods Initiative at the University of Amsterdam. I can say that I have learned a lot there and I believe that anybody who does research on search engines will find some of the DMI tools useful. My comparative study of Google Autocomplete is one example for their application and you can find many more at the DMI wiki.

I should mention that the DMI team is not only professional but also really nice. Together with an extremely international and interdisciplinary group of participants, a great and productive working atmosphere was created. Therefore, I absolutely recommend attending one of their programs.

The next chance will be the winter school Data Sprint: The New Logistics of Short-form Method, 22-25 January 2013. It is addressed to “PhD candidates, advanced MA students and motivated scholars”. If you have no prior experience with digital methods, the more comprehensive 2-weeks summer school might suit you better. The winter school includes a workshop as well as a mini-conference with the opportunity to present and discuss research papers.

Googling “9/11”: A cross-cultural comparison of suggestions for a loaded term

Posted: October 1, 2012 at 6:22 pm  |  By: René König  |  Tags: , ,  |  1 Comment

In my recent blog post I explained the politics of autocomplete, a feature that suggests queries while they are being typed. We may wonder how this is affected by the trend of creating tailor-made search engine results for specific audiences. How do Google´s suggestions differ from country to country (and language to language)?

Methodology and research question

I tried to gain insights into this question during a small project at the summer school of the Digital Methods Initiative (DMI) this year. With a tool provided by the Digital Methods team I was able to conduct a limited cross-cultural comparison for the query “9/11”. On July 4th and 5th 2012, I crawled the query in 4 languages (English, Arabic, Hebrew, German) in 12 country versions (Australia, Canada, UK, US, Egypt, Lebanon, Palestinian Territories, Iraq, Israel, Germany, Austria, Switzerland). Before I noticed that Google ranked websites with alternative accounts of 9/11 (commonly described as “conspiracy theories”) fairly high (see my post here). Thus, I wanted to know whether this is also reflected in the autocomplete suggestions and if there are differences between the language versions. While the used tool is very helpful to retrieve the suggestions, it does not provide any help for interpreting them. The output comes in form of numerous tables, simply listing the suggestions in the order of their appearance in Google, together with some additional information (see below). Google´s API allows for retrieving ten results (the suggestions have partly been limited to four in the regular search interface).

Output of the DMI autocomplete tool

To interpret this vast amount of data, I categorized each suggestion with Grounded Theory oriented content analysis. That means these categories resulted from the data itself, rather than forcing them onto the data before analysis. The outcome is eight categories which helped to structure the data.

Categories for the suggestions

To get a summarizing overview of all the data, I created a word cloud (with Wordle) which shows all the suggestions in relation to their frequency (the bigger the word, the more often it appeared in the suggestions) and in the color of the categories. The rare cases of suggestions in non-Latin script (Arabic/Hebrew) were translated into English and brackets indicating the original script were added.

Results

The word cloud immediately reveals a striking result: Google´s suggestions for 9/11 over-represent alternative accounts of the event, most notably the word “conspiracy” which was suggested most often. Additionally, a number of “conspiracy-related” queries appear, for example, “truth” (pointing to the so-called “9/11 truth movement” which advocates alternative accounts) or the catch phrase “[was an] inside job”.

Overview: Word cloud of all suggestions

On the contrary, only few suggestions refer to the mainstream account or “neutral” facts. Another frequently suggested term is “memorial”, probably pointing to the 9/11 memorial museum in New York City. Less visible, but also striking are the rare suggestions of queries which are not related to September 11, 2001. These are more common in the Arabic countries, for example in regard to a section in the Quran. The following tables show the categorized results for all the analyzed language and country versions.

Google´s suggestions for the query “9/11″ in English

Google´s suggestions for the query “9/11″ in German

Google´s suggestions for the query “9/11″ in Hebrew

Google´s suggestions for the query “9/11″ in Arabic

The following figure shows only the categories but gives a better overview for direct comparison.

Categorized suggestions in comparison

The differences across the language versions are significant, especially between the Arabic and the Western countries, while there are only slight variations between countries with the same language version. Alternative accounts seem to play a major role in the Arab world, followed by German-speaking countries, Hebrew and finally the English-speaking countries. The latter ones apparently have a more heterogeneous interest in 9/11, including jokes which otherwise only appear in Hebrew but in none of the Arabic or German language versions. In all Western countries “memorial” is the most popular query, whereas all Arabic countries suggest the slogan “was an inside job” first.

Discussion

Google´s suggestions for the query “9/11” dominantly associate it with the events of September 11, 2001. Given the massive global impact of this incident this was expectable, although we can think of other meanings for this query (for example the car Porsche 911, other events on September 11 or the emergency number). The general popularity of alternative accounts is striking. The observation that they are particularly relevant in the Arabic world is supported by a representative study which showed that many people in the Arabic world see other forces behind 9/11 than Al Qaeda. However, we must be careful when we draw conclusions about a society´s opinion based on Google´s suggestions. First of all, we cannot be sure if they really represent what users actually search for, although that´s what Google claims. Secondly, even if autocomplete represents previous queries, they give us only insights into a very specific part of a population, namely those who actively search for the term “9/11” with Google. This also implies a certain language bias. Although the term 9/11 is also commonly related to the September 11 attacks in non-English countries, it is still mostly used in English queries. Only in the German-speaking countries we find combinations where 9/11 is paired with a local expression and related to the incident (e.g. “9/11 wahrheit”, “9/11 ablauf”). Therefore, it might be useful to conduct comparative studies with queries in the local language.

Still, these results give interesting insights into the local differences of Google´s search engine. They show that the autocomplete suggestions vary significantly between language versions but rather slightly between countries within one language.

The politics of autocomplete

Posted: September 20, 2012 at 3:34 pm  |  By: René König  |  Tags: ,  |  3 Comments

In September 2010 Google introduced autocomplete (also Google Suggest) to its search engine. Based on previous queries, it tries to predict what we want to search for while we are still typing. How does this impact search engine usage and what to do the suggested queries tell us about our societies? Already now it is clear that the suggestions are often problematic. They may violate personal rights and can be politically-loaded and controversial.

Trials against Google

As I briefly pointed out in my last blog post, Bettina Wulff (Germany´s former first lady) recently sued Google because its autocomplete feature supported the rumor that she worked as a prostitute. She was not the first one who saw her personal rights violated by Google´s suggestions. Earlier this year, a Japanese man fight and won a law case against the search engine provider because it affiliated his name to criminal acts. He was afraid that Google´s suggestions could damage his reputation and might cost him his job. In 2011 Google lost another law suit fought by an Italian business man who did not want to be associated with suggestions like “truffa” (fraud), similar to another case in France. In Ireland a legal settlement solved the conflict between Google and a hotel which did not like Google´s suggestion “receivership” next to its name. The courts worldwide seem to be rather on the complainants´ side than on Google´s. While most will agree that false allegations should not be supported by search engines we may also ask: what if they are correct? Where do personal rights end and where does manipulation begin?

Censorship and manipulation

Google usually argues that its algorithms simply mirror the web and that autocomplete is just based on previous queries. However, the company also declares it applies “a narrow set of removal policies for pornography, violence, hate speech, and terms that are frequently used to find content that infringes copyrights.” At least in some cases it is debatable what should fall under this category. For example, Emil Protalinski has pointed out that censoring “thepiratebay” was not justified because the platform also provides legitimate torrent links and not only pirated material.

In any case, this relatively new feature is another powerful way of influencing our information practices. Although it comes in the subtle shape of mere suggestions it may have massive impact on users´ search behavior. It disciplines us. We get rewarded if we follow the suggestions because they make us type less. Are we also going to feel bad if we search for something which does not appear here, knowing it might be illegal or at least a query on the margin of society?

Search engine optimizers have already acknowledged the power of autocomplete. They try to polish the images of brands by highlighting “the positive aspects or activities associated with the brand and push negative values out” (Brian Patterson 2012).

Search engine optimizers try to manipulate Google´s suggestions.

Image by Search Engine Land (2012)

Cultural impact

Such manipulations question the alleged democratic principle of autocomplete. Do the suggestions really represent what people search for? Even if they do, we may question the “wisdom of crowds” (Surowiecki) as masses have always had the potential to turn into mobs. Can autocomplete foster prejudices by re-producing them with its suggestions? Does it manipulate the public similar as big tabloids do, as Krystian Woznicki wonders? Romanians, for example, were confronted with not very flattering predictions for the query “Romanians are”: “scum”, “ugly”, “rude” were among Google´s associations. A campaign tried to change this by asking users to google for positive attributes like “Romanians are smart”. Try yourself if it was successful (at my computer in the Netherlands it doesn´t seem so).

I would love to hear more about your experience (maybe even research?) on autocomplete. What do the suggestions tell us about our societies and how can we use them for social research? Soon, I´m going to write more about a cross-cultural comparison of autocomplete suggestions here.

Stay tuned!

Society of the Query returns

Posted: September 10, 2012 at 6:30 pm  |  By: René König  |   |  1 Comment

Search engines are so deeply rooted in our daily routines that most people rarely think about them. It is a black boxed technology, appearing simple on the surface but with a sophisticated infrastructure and complicated functionalities underneath. The hype around Facebook and other social media platforms has drawn the attention even further away from these opaque services. Nevertheless, there is no doubt search engines are still dominating the current Internet and nothing indicates that this will change anytime soon.

None of the problems discussed in INC´s Society of the Query initiative a few years ago has become less relevant: Google is still dominating the market and alternative providers can hardly compete. Online search remains an unquestioned practice and a new literacy to understand this technology is largely missing. Google´s policy on privacy has not become any less problematic and many legal questions are unanswered. For example, just recently, Germany´s former first lady has sued Google because its autocomplete feature supported the rumor that she worked as a prostitute.

Controversy: Google´s suggests “prostitute” for the name of Bettina Wulff

So, there is little doubt that search engines need to be studied and the attention shift towards social media platforms has made it even more important to address the issues related to them. We are going to do this in various ways: Firstly, we want to find out more about the recent developments in this field. Which new insights can we gain from theoretical and empirical research? Are there artistic approaches which give us a novel perspective on search engines and their impact on culture? How do technological advances change our search experience (for example, the increasing usage of search engines through mobile devices or the trend of personalization)?

In order to tackle these questions, we will benefit from the already existing networks which emerged from the first Society of the Query initiative. But of course we are also eager to get to know more researchers, artists and activists who work on innovative projects in this area. We have already created a mailing list to get in touch and exchange information quickly. Please join if you are interested. We also would like to revive this blog which will ideally become a vibrant collaborative platform for various pieces to the wide area of search: short articles, reviews, news on related events etc. Drop me a line, if you would like to contribute. Finally, we aim at a follow-up conference of Society of the Query in September 2013.

So stay tuned and get in touch!

 

P.S.: I´m a new INC intern. If you want to know more about me and my work click here.

Robert Darnton writes about the challenges that libraries face these days

Posted: December 20, 2010 at 1:32 pm  |  By: admin  | 

In the Xmas issue of the New York Review of Books Robert Darnton writes about the challenges that libraries face these days. Towards the end of his article he discusses the initiative to start a Digital National Archive in the United States. In this context Darnton makes some interesting remarks about Google Books:

“Perhaps even Google itself could be enlisted in the cause. It has digitized about two million books in the public domain. It could turn them over to the DPLA as the foundation of a collection that would grow to include more recent books—at first those from the problematic period of 1923–1964, then those made available by their rights holders. Google would lose nothing by this generosity; each digitized book that it made available could, if other donors agree, be identified as a contribution from Google; and it might win admiration for its public-spiritedness.”

http://www.nybooks.com/articles/archives/2010/dec/23/library-three-jeremiads/?page=3

Deep Search ll: Panel 4, Contextual Modeling and closing discussion

Posted: July 11, 2010 at 5:04 pm  |  By: Shirley Niemans  |  Tags: , , ,  |  1 Comment

Panel 4: Contextual Modeling

An unstorable and unmanageable amount of data is coming at us, bringing with it a host of new strategies for grasping and analyzing the huge amount of bits and bites, such as visualization models.

mc schraefel: Beyond Keyword Search
Dr schraefel is reader in the Intelligence, Agents and Multimedia Group at the University of Southampton, UK
.

Screen shot 2010-07-11 at 4.55.11 PMSchraefel first emphasizes that in contrast to what people may assume from a visualization expert, she is not ‘in love with graphs’ and actually most of the time, big fat graphs suck. The research she will present here deals with the circumstances of serendipity. Following the idea that ‘fate favors the prepared mind’, she argues that discoveries never happen by chance and an important challenge lies in designing tools that support serendipitous discovery.

She then presents the audience with a 1987 video by Apple computers, which introduces the ‘Knowledge Navigator’; a tablet-like personal device with a natural language interface, a virtual ‘digital assistant’ and access to a global network of information. Outdated as the device may seem today, the digital assistant seemed able to create graphs by getting data out of its embodied context (such as other people’s documents), and be mined and combined to answer a variety of questions. In 1987, schraefel comments, this was a vision of exploration, heterogeneous sources, representation and integration that still inspires research into knowledge building today.

Schraefel notes how Google is the current search paradigm – “what else do you need?”. Drawing a parallel, she notes how Newton’s model of Mathematica set the tone for seeing the world for ages until it turned out that in some spaces, the model was flawed. It is much the same with Google’s document-centric, single source search without interrelations – the model frames the questions that may be asked. In order to enable knowledge gathering, we need a different one.

In a 2005 Scientific American article, Tim Berners-Lee, Ora Lassiler and Jim Hendler introduced machine readable mark-up and the Semantic Web as a new paradigm that moved away from keyword search and toward structured data and ontologies. Ontologies in this sense are subject-predicate-object joints, such as a composer-is a–person, or a person-has a-name etcetera. By giving data a rich (and often multiple) metadata context and using some logic, one may infer properties to objects that are not explicitly labeled, and enable knowledge gathering from heterogeneous sources.

Does this imply a reprise of Victorian taxonomies? Nope, quoting schraefel: “it is more pomo than that”, objects are described from multiple contexts. There is no über-ontology and we are slowly learning to be ‘ok’ with the fact that we don’t know everything controllably, and be messy. Following Berners-Lee, she emphasized the importance of liberating our data; placing sources freely on the web so that we may ask questions other than the document kind, and create information rather than merely retrieve it.

Read the rest of this entry »

Define: Web Search, Semantic Dreams in the Age of the Engine

Posted: July 8, 2010 at 12:25 pm  |  By: Shirley Niemans  |  Tags: , , ,

During my research internship at the Institute of Network Cultures in 2008/2009, I was given the opportunity to explore the broad field of Web search using the Institute’s elaborate network and the extensive knowledge of its staff, and to deliver an editorial outline for the Society of the Query conference. This research also culminated in an MA thesis in December 2009 that has recently become available for downloading at the Igitur Library of Utrecht University. Please find an abstract below, and a download link here.

Abstract: In 2000, Lucas Introna and Helen Nissenbaum argued that search engines raise not just technical, but distinctly ethical and political questions that seem to work against the basic architecture of the Web, and the values that allowed for its growth. Their article was the starting point of a critical Web search debate that is still gaining foothold today. When we consider the semantic metaphor that has been inspiring a refashioning of the Web architecture since 2001, we can see the exact same values of inclusivity, fairness and decentralization reappear that fueled the development of the original WWW. This thesis will explore the ‘promise’ of the Semantic Web in light of the current debate about the politics of Web search. I will argue that a balanced debate about Semantic Web developments is non-existent and that this is problematic for several reasons. Concluding the thesis, I will consider the dubious position of the W3C in enforcing the implementation of new standards and the power of protocol to be an ‘engine of change’.