Refinement of digitised newspapers
The main objective is to enhance and refine already digitised newspaper pages as part of the general aggregation process. Apart from full-text for about 10 million pages, millions of single articles with related metadata and named entities (persons, geo names, etc.) will be automatically detected, tagged and packaged for delivery to Europeana. In this way the user experience for searching and retrieving newspapers via Europeana will be dramatically enhanced compared to the current solutions.
Evaluation and Quality Assessment
Expected outcome of the project is the development of an evaluation and quality assessment infrastructure for newspaper digitisation. An important first task is to analyse use scenarios for digitised newspapers with regard to aspects like required accuracy, level of detail, speed and costs. Based on use scenarios it is possible to provide evaluation resources and tools for objectively measuring the quality of recognition results obtained from complex workflows as well as individual processing steps. For a complete evaluation infrastructure it is also required to collect representative datasets and to manually produce ground truth (a representation of the ideal result of a processing step). The performance of a specific method can then be ascertained by comparing the actual output of that method to the ground truth. Scenarios, datasets including ground truth and evaluation tools will be used to assess the potential and possibilities of already existing material from partner libraries, to evaluate the success of enhancement and refinement processes.
Aggregation and presentation of digitised newspapers for Europeana
The project aims at the aggregation, refinement and presentation of newspaper content to the freely accessible online service via the Europeana Foundation. The creation of a full-text index of newspaper content and the development of a newspaper content browser are main objectives. Moreover a survey has also been undertaken to identify and analyse all newspaper collections digitised by national, research and public libraries inEuropeby 2012.
As a result of the survey 11 libraries were chosen to become associated partners of the project. Associated partnership means that the library has the potential of bringing digitised newspaper content to Europeana and will be invited to all project events to engage in the Best Practice Network.
Metadata best practice recommendations
Within the project a Europeana Data Model (EDM) for newspapers will be created to be used inside and outside the project consortium. Based on the analysis and workflows the results will be given as best practice recommendations for the digitisation and refinement. Main objectives are the analysis of metadata models, the design and release of a comprehensive metadata model based on de-facto standards (METS, MODS, MARC, ALTO, etc) and the inclusion of a number of correct example data format in the online resource as best practice in order to support the uptake of the format.
Dissemination and Exploitation
Dissemination and Exploitation aims at raising the awareness and the promotion of the Europeana Newspapers Project through media communication and stakeholder engagement. Media communication aims at widely disseminating the projects’ objectives, results and achievements. The Workshops and Information Days that will be organized by project partners and WP6 will inform and engage stakeholders and end-users of the technical challenges of the project, content and policy related issues that the project addresses.