Sharing best practices with library professionals at LIBER2013

Five partners from the Europeana Newspapers Project travelled to Munich on June 26^th for a workshop at the LIBER 2013 conference – a major gathering for library professionals from around the world. The workshop was attended by 30 people. Through a series of presentations, the participants were introduced to the work of Europeana Newspapers and learned about best-practice techniques related to the refinement and enrichment of digital newspapers.

The workshop was based around three types of activities:

Presentations of the project’s activities, aimed at sharing best-practice techniques.
Buzz Groups, where participants could discuss the presentations and write down questions for the panel discussion.
A panel discussion, where participants had a chance to pose questions to the project’s partners.

Presentations
Marieke Willems from LIBER opened the workshop with an introduction to Europeana Newspapers and the structure of the workshop.

This was followed by Alastair Dunning from The European Library. His presentation “Surveying newspaper digitisation in European libraries, then aggregating them!” looked at the survey that was conducted among libraries on the extent of newspaper digitisation. One of the key points revealed by the survey was the fact that only 26% of the libraries have digitised more than 10% of their newspaper content.

The survey is now being re-conducted, to allow more libraries to share their experience with newspaper digitisation. Add your voice.

The third speaker was Clemens Neudecker from the National Library of the Netherlands. He presented and explained many technical aspects of the project, including the complex refinement processes being used, the tools that were created specifically for the project and the development of Named Entity Recognition (NER) in four languages for newspaper content.

Stefan Pletschacher from the University of Salford spoke about “Digitisation Workflows and Evaluation Approaches”. In order to create a good user experience, Pletschacher said it was critical to have a clear idea of difference use-case scenarios and the how various features such as layout analysis, reading order detection and text recognition would be used.

The final workshop speaker was Günter Mühlberger from the University of Innsbruck.

He spoke about the ENMAP metadata profile being used by the project. A public version of this profile will be available in October. Mühlberger said that ENMAP provided a practical solution for coping with different data formats, and that this was important for a project aiming to create and make vast amounts of digital data accessible. He finished his presentation with food for thought on structural metadata: what is a headline, an advertisement, supplement or an opinion section?

Panel Discussion

The second part of the workshop was moderated by Alastair Dunning. On the panel were two of the speakers – Clemens Neudecker and Stefan Pletschacher – along with Birgit Seiderer (Bavarian State Library) and Tomas Foltyn (National Library of the Czech Republic).

The discussion started with the question: “Do you give the institutions a copy of the re-processed data?”. Neudecker confirmed that this was the case, and added that the workflow of the project also allowed libraries to evaluate old and new results after processing.

Other topics raised during the discussion included:

Named Entity Recognition (NER) – Challenges were discussed, including historical spelling variations and the occurrence of different spellings between various source materials. The Europeana Newspapers Project is developing NER software (the training data in 4 languages will follow in the second half of 2013) that is freely available here: https://github.com/KBNLresearch/europeananp-ner. The software was build upon the open source technology from Stanford University http://nlp.stanford.edu/software/CRF-NER.shtml and inspired on the work carried out by INL in Leiden during IMPACT, on which you can find more information here: http://www.digitisation.eu/tools/browse/toolbox-for-lexicon-building/named-entities-recognition-tool-nert/. All panellists stated that the extending of NER for other languages would be interesting.
Crowd Sourcing – Several questions came up concerning crowd sourcing that includes user features. Pletschacher said one study had shown that user-feature systems are often not appealing or user-friendly. Dunning disagreed and mentioned two examples from Australia http://trove.nla.gov.au/general/participating-in-digitised-newspapers-faq/and the UK http://blogs.ucl.ac.uk/transcribe-bentham/ that worked very well. Foltyn spoke about his idea to establish a step-by-step method of correcting errors in digitised content by way of crowd sourcing. The discussion ended with Dunning’s observation that niche sourcing (a more targeted type of crowd sourcing) was also a possibility.
Metadata and Zoning – Different approaches of defining structural metadata. In the case of newspapers, it was felt that the structure and layout characteristics affected the meaning and perception of the text. These issues were still being discussed and defined.
User Behaviour – A delegate from Latvia stated that some users were reluctant to consult the digital version of a newspaper. Seiderer said that some researchers needed to consult physical newspapers because they were looking for personal notes, or examining the type of ink and paper used. In other words, they were looking for more than the historical content in the newspaper.

The final question concerned the difficult field of copyright and personal rights for 20^th century newspapers. It was noted that the issue of rights is handled very differently across Europe, and that examples from Norway and Switzerland show that newspaper publishers are ready to cooperate with libraries to make more recent content available.

Further Workshops
The Europeana Newspapers Project will hold two more workshops, where you can learn more about the work we are doing with digital newspapers.

“Aggregation and Presentation” at the joint The European Library conference “Improving innovation in Europe” on September 16^th in Amsterdam. http://www.eventbrite.nl/org/3891830439?s=14727265
“European Newspapers and the Digital Agenda for Europe”, on September 29-30^th at the British Library in London.

One Reply to “Sharing best practices with library professionals at LIBER2013”

Pingback: Europeana Shares Newspaper Digitization Best Practices: Presentations From LIBER 2013 Workshop | LJ INFOdocket

One Reply to “Sharing best practices with library professionals at LIBER2013”

Leave a Reply Cancel reply