Evaluating Google as an Epistemic Tool

Cross posting from my KMi blog (which has paragraph level commenting enabled to facilitate discussion).

I’ve just read an article which explicitly considers the evaluation of search engines with respect to their epistemic functions under a social epistemological perspective. There’s a pre-print available http://www.phil.cam.ac.uk/teaching_staff/simpson/simpson_index.html and the citation is: Evaluating Google as an Epistemic Tool, Metaphilosophy 43:4 (2012) 426-445.
Interestingly, the article arose from PhD research funded by Microsoft Research Cambridge – so it’s great to see they have an interest in the knowledge implications of their tools, and evaluation of them along those lines. I’ve been thinking about this a bit, but the below was written in a morning, so apologies if it isn’t clear.

The Article

The article suggests:

“Search engines are epistemically significant because they play the role of a surrogate expert” (p.428)
Search engines should be assessed by:
1. precision and recall, where precision is a measure of relevance of recalled documents (relevant:irrelevant) and recall is a measure of completeness of the set (relevant^recalled:relevant^ontheweb) (p.431)
2. ‘timeliness’ – the duration it takes for searchers to find a relevant link (thus, if the first result on a SERP is relevant, this will give the best score) (p.432)
3. ‘authority prioritisation’ – they should prioritise those sources which are credible. This could be assessed in the same way as timeliness with relevance replaced by reliability (p.433). Computational markers for such ranking are challenging to achieve. I would suggest that ‘link is in part to measure this quality.
4. Objectivity – “a search engine’s results are objective when their rank ordering represents a defensible judgement about the relative relevance of available online reports”, thus, if there are 2 sides to a story with an equal quantity of ‘hits’ behind them, ordering such that the first 50% regard one side, and the latter 50% the other lacks objectivity.

Now this last point is particularly interesting and I shall return to it shortly. First I’ll cover what Simpson goes on to argue.

Personalisation in search results occurs
We have a tendency towards confirmation bias, we are biased to information which affirms our prior beliefs
Our searches may seek affirming information, but moreover search engines – in customising our results based on our prior searches and result opening – may also do this
They therefore fail to represent objectively the domain being searched over, instead representing a subset of that relevant domain which affirms the searchers prior beliefs
Personalisation fails ‘4’ above.

The point is “When the bubble is based on individual personalisation, only “epistemic saints” are immune from its effects – the people who scrupulously ensure that they seek out opposing views for every search they conduct in which they there are importantly opposed views on the matter. The rest of us one hopes, make the effort to seek out opposing views in serious enquiries”. (p.439)

Simpson thus suggests two solutions:

Rational enquirers should turn off personalisation (throw off the shackles of Plato’s cave!) or use search engines such as DuckDuckGo
There is a prima facie case for regulation of search engine providers because there is a public good in objectivity

Objections & comments

On ‘objectivity’

There is a straw man at play here – we’re asked to imagine the world exactly as it is (with important philosophers from many nationalities), but imagine you search for “important philosophers” and the top 1/3 of results are German, the next 1/3 neither French nor German, and the bottom 1/3 French (p.435). But this isn’t how search engines work – if we assume that 1/3 of all important philosophers are French, 1/3 German, 1/3 neither, and that the internet reflects this to some degree, we can almost certainly assume that there will be at least a ‘mixing’ of philosophers although there might be some bias.

Now, there is a concern that not all the information is on the internet – so there’s an interesting question here about the testimony of silence (when can the absence of testimony be taken to tell you something?) and the epistemic virtues of searchers (what abilities should searchers deploy prior to making assumptions based on the absence of results?) – but these are separate issues. As an aside, an obvious example cases here is in the gender bias of historical artefacts, and the subsequent gender bias on Wikipedia – the silence tells us something, but not what can be taken at face value (that women did very little); we expect good epistemic agents to assess such information in light of the epistemic norms which include some awareness of historiography.

I think this is a pretty minor issue in so far as the actual concern being raised is whether or not the search engine reflects the epistemic environment or not. If not, but – algorithmically – it should, that is of concern because it suggests some downgrading of websites for questionable reasons (they contain French philosophers, they’re not English language sites, etc.). But by meeting the first 3 requirements using some version of the pagerank algorithm this should be less possible – issues regarding the testimony of silence aside. Where such issues remain (despite good algorithms) this is a problem with the epistemic community – the community has failed to highlight relevant links, write articles, connect pages, etc.

So, the issue of objectivity in so far as objectivity goes is, does the SERP reflect the epistemic landscape of the internet, as a whole. I think there is an issue with this in that it is entirely possible to “swamp” issues (e.g. google bombing) such that there are two takes on a concept, but only one is presented – but I do not think this is an issue of search engine objectivity, but rather of the epistemic community, and indeed the community often puts rather a lot of effort into combating this issue, with punitive action taken by search engine providers against those sites that engage in such practices. I will, below, talk about one possible solution to this issue of ‘presenting the various takes’ on an issue.

On the testimonial nature of search engines

This concern relates to what we’re taking the search engine to do. And again, I’d say this is a relatively minor point, but an important one. The article in various places implies that the search engine itself is a surrogate expert, or that it is epistemically responsible for the contents of the epistemic environment (across which it surveys/crawls), or that it provides testimony in itself (the Knowledge Graph is an interesting counter point to me here) – but I do not think this is the epistemic quality of a search engine. Search engines can testify that x is considered a good informant – but that is all. They are good at pointing out experts – and indeed, those are the criteria they are judged against. Simpson does make it clear (p.428) that what search engines testify on, is not the content itself, but rather who might be a good source of testimony. However, I think this causes a problem for some arguments (see the above paragraph). What the search engine represents is qualities of an epistemic community.

On personalisation and objectivity

The crux of Simpson’s article is that personalisation is bad because it breaches objectivity – a criteria against which search engines should be judged. I have raised above one objection to the issue of objectivity. I think there’s also an issue regarding what ‘objective’ means – particularly given the use of a broadly pragmatist epistemology (which I take social epistemology to be), it seems odd to think about objectivity outside of the context of use, and for some uses personalisation may be important and indeed the only way to make sense of information, while for others it might be rather more pernicious. Here I raise some more issues in the specific context of personalisation.

Geolocation and (good) personalisation – the redundancy of ‘nearby’

Context sensitive points at which when we say “is there a restaurant” the “!nearby” is almost redundant – it is entirely sensible for search engines to do the same thing. That is completely within what we would understand by ordinary testimony, but still involves personalisation. This isn’t an issue of “find me the nearest expert” – because the web makes that somewhat redundant – rather it’s an issue of “find me the most relevant expert”; and that just happens to be regarding nearby restaurants.

Geolocation and (good) personalisation – understanding local customs, and requirements for deep/shallow information

Two people search for things in two different countries, we don’t need to imagine a benign dictator (or brainwashed populace) to think that there might be normative reasons why that population would be searching for a different ‘angle’ on the topic to the international population. For example, particularly religious or (perhaps less controversially) cultural practices of a fairly benign but ‘unusual’ nature (a festival or something). A local search engine could do the job, google needs to personalise to do so. This seems intuitively appropriate. In fact, the exact same information could be presented both to a ‘local’ and ‘non-local’, but with locals receiving results which gave more detail – including but not limited to, geographic information, local websites regarding upcoming events, and so on. Again, this is in line with testimonial knowledge – a search engine might direct someone to me to give a novice an overview of some topic, while sending me to some other website. This of course hints at another facet of search personalisation – expertise might matter, informants that do not take into account the capabilities of their audience to understand their testimony fail to give knowledge in their testimonial role.

Geolocation and (bad) personalisation – living with a racist

We can also imagine a context situation in which two neighbours get two different sets of results, through no impact of their own behaviours (perhaps one of them has a racist housemate and the other doesn’t). In all other ways, the two users are the same. Let’s say they have no knowledge of some civil rights movement figure, but there are a number of websites on that figure including some run by white supremacist groups (as is the case for Martin Luther King). We don’t have to imagine the 50/50 division of results described in 4. above to see the concern of objectivity here. Nor do we need to imagine that the searchers have made different searches on this occasion; both have used the same query, but one has received results from across the web, while the other is receiving a results set which is skewed towards the subset of biased (racist, etc.) results. We would hope both users would exercise their epistemic abilities to make judgements on the information found. However, in this case the concern is not only with the individual searcher – although the search engine could do more to help them – it is with the personalisation on which the results were returned, and the appropriateness of customising results for that individual.

Geolocation and (bad) personalisation – group dominance

My understanding of Google’s search in China is that, what they were doing was filtering out search results which those within the Great Firewall would not have been able to access. That is, they were not acting as a filter of themselves, but rather removing results which had already been vetoed by the Chinese Government. (I may be wrong about that – do correct me if I am). We can imagine a case such as this, or another in which the dominant group opens particular results more often, and makes particular searches more often (ones which are close enough that they’d influence returned results), and that these groups are tied to some particular geographic location such that a search engine can personalise results based on this data. In both these cases, the concern is that rather than pagerank or ‘link juice’ or whatever other measure which is designed to take account of the broad epistemic environment, the results are biased towards a particular subset of pages with an epistemic perspective while elsewhere (globally) the full set of results is returned. Further force is added to the example if we imagine an individual in such a country who is attempting to find information on the ‘other side’ – but cannot, due to an imposed personalisation. The knowledge terrain would appear very different to such a searcher, despite their own epistemic virtue. Such an example, however, simply extends the issue – even in a location full of ‘epistemic devils’ search engines should properly be judged if they provide results that fail on some measure of ‘objectivity’, limiting their indications of ‘good informants’ to only a biased subset.

Recall and Objectivity

If we recall Simpson’s criteria for assessment of search engines (as highlighted above):

precision and recall, where precision is a measure of relevance of recalled documents (relevant:irrelevant) and recall is a measure of completeness of the set (relevant^recalled:relevant^ontheweb) (p.431
‘timeliness’ – the duration it takes for searchers to find a relevant link (thus, if the first result is relevant, this will give the best score) (p.432)
‘authority prioritisation’ – they should prioritise those sources which are credible. This could be assessed in the same way as timeliness with relevance replaced by reliability (p.433). Computational markers for such ranking are challenging to achieve. I would suggest that ‘link is in part to measure this quality.
Objectivity – “a search engine’s results are objective when their rank ordering represents a defensible judgement about the relative relevance of available online reports”, thus, if there are 2 sides to a story with an equal quantity of ‘hits’ behind them, ordering such that the first 50% regard one side, and the latter 50% the other lacks objectivity.

One outcome of the preceding argument is, I think, that objectivity – while a useful summary concept – is in fact entailed in the first three. This is because as I have framed it, the concern with search engine personalisation is that they fail with regard to ‘recall’ (they ignore relevant results) as long as ‘authority prioritisation’ and ‘timeliness’ hold. That is, either they fail to return relevant results, or if they do, they are not defensibly ranked such that authority and timeliness hold true – in particular that of ‘authority’ on which the epistemic qualities of the searching agent have little bearing. There are issues related to the presence of ‘poor’ information on the internet, but the epistemic community – which search engines ‘survey’ when they crawl – acts to address these, and does so rather effectively such that blips in search engine informativeness regarding good sources for testimony, are just that – blips. This is not incompatible with their status as objective informants regarding good sources of information, human informants would be in the same position. Furthermore, we expect good informants to be prepared to tell us information they may not agree with, many a courtroom drama is predicated on just such as assumption. The issues of relevance, timeliness (not ‘hiding away’ information in that case), and authority (the credibility of results) are key here too – where testimony is contextualised (but this is not the place to discuss this analogy further/I have no time to right now!).

Summary

I have presented some arguments for personalisation using ordinary standards of testimony, and some against, based on geolocation (although most could be adapted to other features of personalisation).

My suggestion is that it is not the case that personalisation is bad qua unobjective, but that, when giving an objective judgement of testimony – and when making judgements on the likelihood of someone elses testimony meeting your information needs – we expect informants to tell us the substantive assumptions they have made to meet their conclusions. Search engines often fail to do this, except where there is good (often advertising based) reason for them to do so (clarifying geolocation, for example). Furthermore, where these assumptions are explicit, the impact of them is often not made clear.

Suggestions

In so far as I think personalisation is often useful, I think it should remain. However, there are some considerations:

Rather than keeping implicit personalisation facets, they should be made clear – perhaps as added terms in a search query (this would work, for example, for restaurant searches which could add postcodes to queries)
Suggested search features sometimes include elements of such alteration – highlighting ‘deeper’ queries (adding key words) or broader queries (removing query terms). The appearance of multimedia in the main SERP probably helps here too.
Features to indicate why personalisation has occurred might be useful (although, of course this is then open to gaming). It is hard to see how this could be done without “we selected this result because based on your previous searches, you’re a racist” or “you’re a climate change denier”, etc. being returned…
The things we search for are likely to contain epistemic bias (indeed, part of my research interest is on this topic!) which search engines in their current state may not be able to deal with, and which it may not be their place to do so in their role as surrogate experts. For example a search for “Al Gore inconvenient truth” is likely to return rather different results than a search for “Al Gore liar” – this is about the epistemic judgement of searchers in their query formation, not personalisation. Subsequent results may be personalised off the back of such searches (and this may be problematic as discussed above), but in the first instance this is not the concern.
Search engines that use a sort of ‘faceted search’ in which ‘takes’ on concepts are classified such that for a given concept ‘x’ pages which relate to one definition might appear together, while an opposing definition might appear separately. Facets could be added for location, difficulty, and so on. Some of this is already implemented. Some of it is implicit – but should perhaps not be.

Based on ‘3’ above, we might argue – as I have indicated in a number of places above – that search engines should not be considered as informants, but as second order informants – informants that they know the answer (as Simpson also indicates). In this case, a significant component of their behaviour should be oriented to covering the web, and avoiding presenting subsets unless there are good reasons for personalisation – which should be made explicit, as we would expect in cases of testimonial knowledge. In some cases, personalisation is the action of a good informant (unless the question is “know of any good restaurants? – oh, but I’m going to Birmingham tonight”), in many such assumptions should be made clear, perhaps by ‘5’. Removing the option reduces the quality of the search engine as an epistemic tool. In making the tool a good epistemic tool, unfortunately, bad epistemic agents may still use information poorly; but this is already the case. Agents who chose to bias their results already do so through their queries, and the sites they select. Search engines are not in a position to assess the epistemic agents who use them, but they able to position themselves as good epistemic tools, ones that survey the epistemic landscape, and attempt to avoid undue assumption – that is the role of the good informant.

I think, then, that I’m making some key statements here:

Personalisation sometimes makes sense
There are two issues of objectivity – one regards the recall, authority, and timeliness (and precision possibly) of results in SERPs, the other is to do with the epistemic community
Objectivity of SERPs and of the epistemic community (including the searching agent) should not be conflated, Simpson’s analysis of objectivity and his definition lends itself to conflation. Objectivity for search engines is (probably) entailed in their recall, authority and timeliness (2. above).
Search engines could do more to make it clear the assumptions they make when personalising results