Technical Description – Digital Livonia

Digital Livonia ( dl.tlu.ee) is an open digital platform that will provide integrated access to sources and various databases about medieval Livonia (c. 1200–1550). It is not a software development project, our aim is to keep the data generated and enriched by the project independent from the technical solution. The reason for this is simple – technology ages quickly and keeping the information systems up and running cannot be the main goal for a small research group. Therefore, the most suitable approach for the project is where the main focus is put into preserving the collected data in a meaningful and accessible way by using and combining tools which have been tested and proven useful for the given tasks. As we promote an open culture, all the tools we will use, as well as, all the helper-tools we create, are and will be open-source. The Digital Livonia project will produce and work with different kinds of datasets. We will build our own databases based on research by the project members. We will digitize new material, which will be made accessible via the Academic Library of Tallinn University online platform ETERA ( etera.ee). We will also include sources from other repositories, such as the Estonian National Archive ( ra.ee/dgs), Estonian Museum Information System – MuIS ( muis.ee), University of Tartu Library (dspace.ut.ee), and others. This variety of sources will open for us multiple challenges on how to bring together and work with a range of material collected by different institutions, where the access level, as well as, the quality of the digitization and descriptive metadata varies significantly. This diversity of data and ideas defines for us the tools and approaches to follow. We will create the user interface by fusing different approaches and good practices for easy and user-friendly access to our resources. Below is a more detailed description of the methods, standards and approaches we are going to use.

Functional requirements

To meet the project goals we have set the following functional requirements for the Digital Livonia platform (see Fig. 1):

Ability to host and work with interlinked databases. It has to be quite simple for the end user to create and link the new database with existing databases and aggregated datasets in the system.
Aggregate data from other systems. It needs to be relatively easy to set up a new remote repository.
Make data (aggregated, as well our own) accessible via API to be consumed and analysed by other systems. For example, network, textual and GIS analyses.
Tools for enriching datasets, such as adding keywords and concepts based on controlled vocabularies.
Semantic search over all the databases and aggregated datasets

Databases

One of the main aims of the project is to build major prosopographic and other digital databases concerning medieval Livonian history. Currently the main tool for researchers to collect their data has been spreadsheets (i.e Excel). Spreadsheets are very powerful but have their limitations when it comes to collaborative use, linking multiple datasets together or defining concepts based on conceptual reference model (CRM). They are very good though for initial data collection and for normalising the data structure. Our plan is to work out a data-model based on spreadsheets and then move over to a relational database system. In our project we aim on generic database schema to support the evolution of databases and changes of user viewpoints. A traditional database management approach assumes a stable database structure where new datasets need to fit in to the predefined structure. [1] While research often introduces new challenges and questions, we need to be flexible in our data management to accommodate changes we were not able to foresee. CIDOC Conceptual Reference Model The CIDOC Conceptual Reference Model (CRM) provides definitions and a formal structure for describing the implicit and explicit concepts and relationships used in cultural heritage documentation. [2] CRM is like a ‘semantic glue’, which helps to link together different data sources and datasets. One of the data providers of the project, Estonian Museum Information System (MuIS), uses CIDOC CRM for their data structure. There are CIDOC CRM extensions also for ancient and medieval text which we will look into. [3]

Our web application would benefit from it, as making queries based on predefined concepts will return far more accurate results. For example, we can easily find all the people related to a certain event (i.e. birth) without actually processing the data itself or without understanding the language of the returned items. In this project we will aim to map all the databases created according to CIDOC CRM. For aggregated datasets basic mapping will be done as well.
Aggregation In addition to the research databases, the Digital Livonia project will prepare new digital online editions and online collections of medieval Livonian sources. This corpus will be hosted at the Academic Library of Tallinn University ETERA platform. ETERA is a standalone library and repository system with the support of data access interface for third party applications (API). As our aim is not to host and maintain the digitized material, we will use their services and make all the content available over their API. Our system will aggregate
the basic metadata from the remote systems (including MuIS etc.) to make data browsing and basic search faster. For the full text searches, queries are made directly into the remote system.

Our main data providers with API access:

TLU Academic Library ETERA – www.etera.ee
MuIS – www.muis.ee
Estonian National Archives (partial) – www.ra.ee/dgs
University of Tartu Library – www.dspace.ut.ee
Estonian National Registry of Cultural Monuments – www.register.muinas.ee

We have already established preliminary memorandums of cooperation with our main partners (Estonian National Archives, Tallinn University Academic Library) in order to guarantee flexible and reliable data exchange between the Digital Livonia platform and major data providers.

Enriching aggregated data

In some cases we will enrich the aggregated sources with new or amended information. We will keep a track record of all the changes bearing in mind the possibility of sending enriched data back to its original source. This approach has not been used a lot (probably not at all in Estonia) and it would introduce interesting challenges for us as well as for the data owner. We will focus on user interface design for differentiating the original data from the edited and enriched data. For example one can easily see what is the original data and what is the content added by the project team. Also, different crowdsourcing methods such as social validation could be used in the data enrichment process. For example one fact or interpretation can be validated by several researchers to gain more validity.

Metadata

Each data item, for example, digitized pages of the manuscript or text document derived from it, has some sort of a descriptive data attached to it. This could be the name of the document, basic information about the content and some dates. This, so-called metadata, describes the data and is often used for searching and browsing the datasets. One of the most widely used metadata standards is Dublin Core (DC). As our main data providers (ETERA and MuIS) use DC, we need to build our approach on top of that. We will set up our databases and enriched datasets using the DC schema following the Premodern Manuscripts Application Profile. [4] This set of fields are designed specially to aid medievalists. It is also relatively easy to map DC to CIDOC CRM. [5]

Controlled vocabularies

Controlled vocabularies will help to talk the same language when it comes to terminology (typologies, taxonomies, place names) and how we describe something in our dataset. Vocabularies and gazetteers will help to combine multiple datasets where the same concept is called by different terms. For example, when searching for Viljandi we would get results also for Fellin and Felin.

We are planning to use vocabularies which are used by other heritage institutions and where we can contribute as well. One of those vocabularies is provided and hosted by the Getty Foundation. [6] We try not to compile our own vocabularies or build our own gazetteer – rather complementing (translating) existing ones. Vocabularies like concepts will also help to link different datasets and -fields.

User Interface (UI)

Our technical implementation and data-model will be set up in a way that it would be easy to create multiple user interfaces for different purposes. For example, the view and functionality for the public user would be very different than for the academic researcher. We will also envisage gamified applications (for example “Facebook” of Reval traders) on top of our system. Main idea is to keep data functional and usable even if the Digital Livonia website is not active anymore.

All the topics discussed above will be utilized by the UI. Metadata, CRM and search indexing will make sure users are provided with as accurate results to their queries as possible. The aim is to provide the UI which is more than a database with its rows and columns. It has to be intellectually accessible to users who do not have any knowledge about the topics.

We have prepared a preliminary prototype of our platform with the interface designed for academic researchers.

Other interfaces (for a wider audience) will be developed when the project gets funded.

Semantic search

Semantic search is based on understanding the context of user indent and providing far more accurate results based on that interpretation. CRM will provide the framework where the semantic search can be built on. In our project we will implement the search in a way where the user has to give their indent by selecting the context for the search term. For example, one can build a following query: show all people who are related to Hans Viant or show all places related to Hans Viant etc.

Functional requirements

To meet the project goals we have set the following functional requirements for the Digital Livonia platform (see Fig. 1):

Ability to host and work with interlinked databases. It has to be quite simple for the end user to create and link the new database with existing databases and aggregated datasets in the system.
Aggregate data from other systems. It needs to be relatively easy to set up a new remote repository.
Make data (aggregated, as well our own) accessible via API to be consumed and analysed by other systems. For example, network, textual and GIS analyses.
Tools for enriching datasets, such as adding keywords and concepts based on controlled vocabularies.
Semantic search over all the databases and aggregated datasets