On the way to academic search engine optimization?

I have recently talked about academic search engine optimization at the EASST conference in Copenhagen. The term was coined by Beel et al. (2010) who suggested that academic papers should be adapted to the technology of search engines in order to be indexed and ranked properly, for example by using correct metadata and carefully selecting and placing keywords. This can be seen as an indicator for the relevance of search engines in the scholarly realm. At the same time, it shows that academic content is not always well-represented in search engines. Beel et al. were focusing on scientific publications in academic search engines such as Google Scholar. But of course we may broaden their approach and wonder about the position of academic content within generic web search engines where it has to compete with countless other perspectives.

Misrepresentation of scientific content

I have already talked about the example of 9/11 (e.g. here and here) which illustrates how the algorithmic selection and ranking mechanisms of search engines can differ from those of traditional gatekeepers (e.g. publishers). In this case, the so-called “9/11 Truth movement” presented a fundamentally different account of the events of September 11 than the official authorities, scientific experts and most mass media institutions: They argued that they have proof for a controlled demolition of the World Trade Center, indicating that 9/11 was an “inside job” orchestrated by the US government. This claim became very popular and was supported by websites which framed themselves as scientific, for example the Architects & Engineers for 9/11 Truth or the Journal of 9/11 Studies. Apparently, the “Truth movement” has established very well-liked networks leading to high rankings of their websites for 9/11-related searches, whereas websites representing e.g. the actual scientific reports by NIST seem rather underrepresented. Besides, many people seem to search for content like this.

Optimization or spam?

Having this case in mind, one could argue that the call for optimizing academic content for search engines is more than necessary if science wants to keep its position as a knowledge authority in modern societies. Of course, 9/11 is just one example to illustrate this development and we may think of others which are more worrisome. For example, commercial actors often dominate the field of health information (Mager 2009). Whenever economic interests are involved, it is not unlikely that they will dominate search engine rankings because they are usually better equipped than non-commercial actors like universities and often also better than academic publishers who to a large extent “hide” articles in the “academic invisible web” (Lewandowski/Mayr 2006). Moreover, most academics will probably regard marketing techniques such as search engine optimization as rather unethical. Indeed, Beel et al. were criticized for promoting “spamming” with their approach. They answered with a study which actually supported this criticism. In order to gain insights into Google Scholar´s vulnerability towards manipulation attempts Joeran Beel and Bela Gipp tested various techniques, for example adding invisible text or creating fake citations by uploading  nonsense publications (citations are used as an important ranking factor in Google Scholar). In short, they were remarkably successful and concluded: “Google Scholar is far easier to spam than the classic Google Search for Web pages” (Beel/Gipp 2010). Apparently, Google Scholar is not very strict when it comes to judging whether content is scientific or not. Often it is enough to publish PDF-files that are structured in the typical way of academic journal articles with sections like “introduction”, “methodology”, “results” and “references”. It seems like broad coverage is more important to Google than scientific rigor. Therefore, it is not surprising that people of the “9/11 truth movement” even lead Google Scholar´s results for the query “WTC 7” (with a paper by Steven Jones, a prominent figure of the “9/11 Truth movement”).


Manipulating Google Scholar by adding invisible text (image credit: Beel/Gipp 2010)

An algorithmic shift

Depending on one´s point of view, Google Scholar´s fairly inclusive policy can be embraced as a democratization of the highly exclusive and selective academic publishing system. It provides scholars access to papers which are not part of this elitist and mostly expensive circle. But the flaws of Google´s automated selection and ranking process are also obvious. As Beel and Gipp have shown, it is pretty easy to manipulate Google Scholar. This does not only open academia´s door for alternative fringe science but also for commercial interests, potentially leading to undesired activities like green washing, underplaying risks of drugs and technologies etc. In any case, there are hardly any possibilities for academics to directly interfere here. Google ultimately defines the relevance of scientific content in their search engines, not academics. Given the wide usage of Google services inside and outside of academia, I think it is justified to speak of an algorithmic shift in information dissemination. Of course, this shift is not limited to Google but includes many more platforms which apply algorithms to organize information. More and more, these algorithms decide over the relevance of scholars and their texts. Evidently, it is still humans who have to assess the actual scientific relevance of publications. But algorithms may ultimately provide or deny visibility. Academics will have to deal with these new mechanisms of inclusion and exclusion. Search engine optimization is one answer to this algorithmic shift but given the vulnerability to manipulation, it is debatable whether it is desirable. Unfortunately, this is more or less the only way to impact the questionable rankings by Google and other service providers.

My related Prezi presentation can be found here. We discuss these developments in greater detail in our book Cyberscience 2.0. Research in the Age of Digital Social Networks