Five partners from the Europeana Newspapers Project travelled to Munich on June 26th for a workshop at the LIBER 2013 conference – a major gathering for library professionals from around the world. The workshop was attended by 30 people. Through a series of presentations, the participants were introduced to the work of Europeana Newspapers and learned about best-practice techniques related to the refinement and enrichment of digital newspapers.
The workshop was based around three types of activities:
- Presentations of the project’s activities, aimed at sharing best-practice techniques.
- Buzz Groups, where participants could discuss the presentations and write down questions for the panel discussion.
- A panel discussion, where participants had a chance to pose questions to the project’s partners.
Marieke Willems from LIBER opened the workshop with an introduction to Europeana Newspapers and the structure of the workshop.
The survey is now being re-conducted, to allow more libraries to share their experience with newspaper digitisation. Add your voice.
The final workshop speaker was Günter Mühlberger from the University of Innsbruck.
He spoke about the ENMAP metadata profile being used by the project. A public version of this profile will be available in October. Mühlberger said that ENMAP provided a practical solution for coping with different data formats, and that this was important for a project aiming to create and make vast amounts of digital data accessible. He finished his presentation with food for thought on structural metadata: what is a headline, an advertisement, supplement or an opinion section?
The second part of the workshop was moderated by Alastair Dunning. On the panel were two of the speakers – Clemens Neudecker and Stefan Pletschacher – along with Birgit Seiderer (Bavarian State Library) and Tomas Foltyn (National Library of the Czech Republic).
The discussion started with the question: “Do you give the institutions a copy of the re-processed data?”. Neudecker confirmed that this was the case, and added that the workflow of the project also allowed libraries to evaluate old and new results after processing.
Other topics raised during the discussion included:
- Named Entity Recognition (NER) – Challenges were discussed, including historical spelling variations and the occurrence of different spellings between various source materials. The Europeana Newspapers Project is developing NER software (the training data in 4 languages will follow in the second half of 2013) that is freely available here: https://github.com/KBNLresearch/europeananp-ner. The software was build upon the open source technology from Stanford University http://nlp.stanford.edu/software/CRF-NER.shtml and inspired on the work carried out by INL in Leiden during IMPACT, on which you can find more information here: http://www.digitisation.eu/tools/browse/toolbox-for-lexicon-building/named-entities-recognition-tool-nert/. All panellists stated that the extending of NER for other languages would be interesting.
- Crowd Sourcing – Several questions came up concerning crowd sourcing that includes user features. Pletschacher said one study had shown that user-feature systems are often not appealing or user-friendly. Dunning disagreed and mentioned two examples from Australia http://trove.nla.gov.au/general/participating-in-digitised-newspapers-faq/and the UK http://blogs.ucl.ac.uk/transcribe-bentham/ that worked very well. Foltyn spoke about his idea to establish a step-by-step method of correcting errors in digitised content by way of crowd sourcing. The discussion ended with Dunning’s observation that niche sourcing (a more targeted type of crowd sourcing) was also a possibility.
- Metadata and Zoning – Different approaches of defining structural metadata. In the case of newspapers, it was felt that the structure and layout characteristics affected the meaning and perception of the text. These issues were still being discussed and defined.
- User Behaviour – A delegate from Latvia stated that some users were reluctant to consult the digital version of a newspaper. Seiderer said that some researchers needed to consult physical newspapers because they were looking for personal notes, or examining the type of ink and paper used. In other words, they were looking for more than the historical content in the newspaper.
The final question concerned the difficult field of copyright and personal rights for 20th century newspapers. It was noted that the issue of rights is handled very differently across Europe, and that examples from Norway and Switzerland show that newspaper publishers are ready to cooperate with libraries to make more recent content available.
The Europeana Newspapers Project will hold two more workshops, where you can learn more about the work we are doing with digital newspapers.
- “Aggregation and Presentation” at the joint The European Library conference “Improving innovation in Europe” on September 16th in Amsterdam. http://www.eventbrite.nl/org/3891830439?s=14727265
- “European Newspapers and the Digital Agenda for Europe”, on September 29-30th at the British Library in London.