Antoine Isaac: Semantic Search for Europeana

Society of the Query

Thanks for the opportunity to talk. I work at the VU and I am talking about the project Europeana. This is the result of a teamwork of the University, I am just presenting it.

What is Europeana? It is a portal which that want to interconnect museum archives. Access to digital content. Currently there are 50 providers. and the number is growing. 10 million objects is the target. Maybe from a more practical : we want to create access but we also want create channels to other websites and so on. Such a thing does not go without challenges. The very issue of providing access that is very difficult. They are of an iterational nature. And how to get data besides the pictures? The method is to use metadata. Antoine shows the current portal, which he explains as a “basic search box” (picture needed). If a search query is done, different result are given that are linked to the search (pic, books etc). You can start refining you search by filtering (language, data and so on). This is called semantic search and it allows you to refine your search. To some extend this is not matching the richness of data that is out there in the databases. The idea is to go a step beyond semantic enables search. Some functionalities are explained, such as clustering. Antoine explains that by exploiting semantics, we can exploit relations that are stored in the objects. We can use information that is actually there already in the meta data. Some kind of organized knowledge is already there, we want to exploit it. The proper information properly accessible , that is the goal.

A semantic layer on top the ‘normal’ results is presented. A graph is shown of a semantic web. It needs to become more useful for users, according to Antoine. A common concept that can aggregate for relations. A screen shot is given of the prototype. It is a mini-version of the total project: three museums are currently represented. You can start typing your search. The first difference ( from normal search engine red) is that it will be able to provide you with concepts and locations that could match your string. If you select one of the results , you get a number of new possible links and clusters via criteria. It is important to notice that the results are coming from really specific entities. We can see that the subject “egypt” for example gives a whole set of related objects. It is much more than a returned string.

This idea of having controlled entities can be used in more complex means. Users can  start exploring further knowledge and concepts.  An example is given on the search “egypt’ and the meta results. We are now searching via concept/relations.  This is an example of richer information. I also got clusters like works created by somebody who was in Egypt and so on… The reason for getting this object in the results is that in the metadata links back to the subject (query). There is a kind of person space emergent here.  Via this person, we can find out the place and we end up in Cairo. One very important point is that we benefit from existing models and vocabularies. Via labels on concepts, these concepts can be linked. It is very important because now you can access this information. We continue by determining these links (exact matches and relational matches). The main advantage of metadata is that it is heterogeneous. There a different description models. You cannot really anticipate it. Some form of alignment is required in order for the system to work, because these databases use different vocabularies. A data cloud is presented which represents the different vocabularies in the three different museums. These vocabularies are glued together.

The semantics in our case are getting structure in the data. It is about coupling the data.. It is a flexible architecture. It is about loading data. This makes ingestion for new data easy.  You don’t need to fully merge the workings of all the institutions/ content providers.  It is about connecting structures together. It allows easier access to the different vocabularies. You can start your search and you are provided with different vocabularies. Next, we have to bring in more vocabularies. You can have quality data in this system.  Finally, this vision  of the variable links model is nice, but some semantic matching level problems occur. This is difficult. A link is given: here you can try the portal here

Rogers: Don’t you need an army if you want to actually make the links and translation between all the records?
Isaac: you are right, we actually implemented something (the three museums vocabularies), we are not experts on library science. Until recently, however, the library scientist did not come out of their institutions. Now, they start to realize they can integrate their knowledge. I believe this is an added value.

Rogers: Is this more than digitizing library systems? Is this indexible by Google?
Isaac: Yes, it should be.
Rogers: is it deep indexible? isn’t this a huge policy question?
Isaacs: This prototype publishes the data. You can see the source of the data.

Pembleton: analogy: Tim Bernes-Lee created a website that can point to all your data. What I see here is the- same move. By linking the concepts, not the data. This provides a richer web.
Rogers: Is this a Europe-island web, then?
Cramer: We already have such a system: it is called RSS.

Audience: A method that I see here is: we need glue to link existing concepts and vocabularies. The other is to generate new vocabularies . To me that seems to be a large debate.
Pembleton: We use the same underlying technology.  I see more added value rather than competition.
Cramer: RDFA is not a vocabulary, it is a language to format the vocabulary (which is a huge difference).