1. Could you introduce yourself, your organisation and your role in the Europeana Newspapers Project?
I work for the Berlin State Library in the position of the head of the Department for Bibliographic Services. I have a strong background as a manuscript librarian and moved on to serial materials, mainly newspapers and journals, in the last couple of years. While the bulk of my daily work centres around the library’s massive “Union Catalogue of Serials” (in German: Zeitschriftendatenbank), I am currently also responsible for all cooperation projects of the library regarding newspapers.
The Berlin State Library is the largest universal library in the German speaking world and has a history dating back more than 350 years. Currently we are proud to coordinate 2 EU funded projects. As well as Europeana Newspapers, the library coordinates the Europeana Collections 1914-1918 project, though I am only involved with the newspapers. While this is by no means the first European project we have coordinated, it certainly is one of the most interesting and challenging.
2. Why newspapers?
As a former manuscript librarian, I was surprised and truly amazed to find similarities between unique, handwritten materials and the most serial form of all publications, newspapers.
Since these often suffer a rather deplorable fate following their brief lifespan as readable objects, and either are used to light the oven, wrap the proverbial fish in or, nowadays, enter the greenish world of paper recycling rather sooner than later, very few copies of specific titles actually find their way into archiving public institutions. It doesn’t help either that newspapers, at least since the late 18th century tended to be printed on paper that makes the professional librarian want to cry: brittle wood-pulp paper that threatens to crumble away as soon as one even thinks about touching it. The net result is that historical newspapers are, very much like handwritten manuscripts, extremely scarce objects.
In another respect, newspapers fundamentally differ from manuscripts. This aspect is, of course, usage. Manuscripts are usually asked for by rather scholarly customers and require a decent degree of expertise in reading individual or even “standardized” handwritings. Frequently handwritten materials even make use of what is nowadays considered to be an almost exotic language, the lingua franca of the middle ages to the late 19th century, Latin. The level of required expertise clearly limits the number of potential customers for these specific materials.
Newspapers on the other hand are truly an “everyman’s resource”. One doesn’t need much apart from general reading skills and, what’s more, everybody has, in one way or another, a connection to newspapers, if only through the obituary of one’s own grandfather or grandmother. Add a dose of regional or local information and the odd glimpse at the world at large — its wheelings and dealings in political, economical and cultural terms — and you will find that newspapers are interesting and important for virtually everyone. This outreach to an enlarged audience is, for a librarian, an entirely pleasing aspect of our work.
3. What do you think will the project bring to your library and to the partner libraries?
The many, many newspaper editions (more than 18 million pages in total) that will be turned into electronic full-texts by the Europeana Newspaper Project will, without any doubt, make a fundamental change in the information we can offer, both on the local and the European level. Even as our work is still underway, we encounter a tremendous degree of interest of very different user communities. End users ask me almost daily if the project includes a specific newspaper title that is of particular interest to that user right now. Other projects and initiatives understand very well the importance of a large text basis as a kind of “knowledge base” for other services to be developed and regularly ask for permission to make use of our newspaper texts. Since the digital data will be published by the data providing library — in my library’s case this is ZEFYS, the local newspaper portal — and the portals of both Europeana and The European Library, the benefits for the involved libraries are obvious: They will be able to cater so much better for the needs of their audiences.
There is another aspect of the project that I find exciting. During our work we will gain sufficient expertise, aided by software, to better understand the potential outcome of our digitisation and refinement efforts.
Presently commercial service providers are often appointed to execute the digitisation and OCRing (Optical Character Recognition) of newspaper holdings. Not surprisingly many of these service providers will promise to deliver nothing less than perfect results. Frequently the library will be informed midway that due to some unexpected occurrence the results will not quite be as promised.
The Europeana Newspapers Project will, for the first time, enable librarians to realistically estimate whether a planned project is possible at all or, maybe, why such project should not be undertaken just now. It will do this by automatically evaluating the difference between the actual results of automated OCR and OLR (Optical Layout Recognition) and a “ground truthed” (i.e. fully correct) version of the content. For this evaluation we need to precisely understand which particular part of the workflow works nicely and which does not. By providing such software to the librarian community we hope to make their decisions more informed and less dependent on commercial players.
4. What do you think will be the most valuable public outcomes of the project and who will enjoy them?
The first and most obvious outcome of the project will be millions and millions of full texts derived from historical newspapers. These will be provided to our users via a number of search services and access portals. Let me say a sentence or two about the European level: We can confidently say that newspapers are reflecting all aspects of life and are therefore potentially interesting for everyone. This “everyone” is quite literally every European citizen.
Now imagine that every European will be able to tap the content of the newspaper of his home region, where he or she will find all kinds of events described. Let us take this thought one step further: said user will be able to read about a specific event not only from his regional home perspective, but he or she will also be able to compare the description of the same event in other regions or, indeed other countries. I believe that the opportunity to do so will have a tremendous impact on our understanding of our common history. I suspect that we will find more similarities between the countries than we expect. However, I also believe that we will be able to identify differences that we are currently not even aware of.
One needs to be aware that the mission of libraries is fundamentally a political one. Any democratic society requires an informed electorate. Unhindered access to information is the fundamental precondition for this. This statement is true both on a local and on a European level. Possibly on the European level the backlog demand is even greater. Since we all come from a history of nation states we will have to invest in the concept of a common European identity or even citizenship, if you will. In other words: we are dependent on understanding our common European history and identity if we want to continue to live in a unified, peaceful continent that has – hopefully – transcended the traditional mechanisms of solving conflicts between nations, warfare. Even though this is not traditional library-speak, I am convinced that this aspect of European identity building is at the very heart of all our endeavours.
5. What are the biggest challenges in the project as you see it?
The challenges we encounter in our work are by no means trivial ones. What I said before about the scarcity of specific newspapers is true. But it is also true that those newspapers that were handed down to us by our professional predecessors create fundamental problems when dealing with vast bulks of content. Looked at from this point of view, any newspaper we digitise and further refine by OCRing the actual articles has to be seen alongside a number of titles we cannot work with due to limited resources. Not surprisingly, we librarians like to think that we can now make good use of the microforms we created during the past decades. Digitisation from a microfilm is so much faster to achieve than taking the actual folio volumes of bound newspapers as a starting point. Alas, the microfilms we own are rather frequently of sub-standard quality that makes the OCR-work much more cumbersome than is to be hoped for. An additional challenge is added by the use of gothic fonts for newspapers, a phenomenon that is not exclusively restricted to the German speaking world but poses a particular problem there. It is my hope that the softwares we use will, over time, get better and better in dealing with these challenges and I believe that the technological developments we have seen in, say, the last 10 to 15 years provide a solid basis for this hope of mine.
Though I usually try to avoid this subject, I feel that it would be unfair not to mention one fundamental challenge we face as we move closer and closer towards the provision of contemporary resources, the vexed question of rights. In a world in which money is the main lubricant of all types of enterprises it is not surprising that the “commercial” and the “public” worlds will need time to solve this issue.
Another way of putting this is to say that present day newspaper publishers, who are frequently also the owners of more or less historic newspaper archives, will need to identify their preferred business models. Our ultimate goal has to be the reconciliation of positions which currently hardly seem to be reconcilable. It is my firm belief that we encounter the same challenges as any new technology, let alone the far-reaching internet technology, poses. In the past we have successfully managed such technological changes. In newspaper publishing we have seen the techniques of lithography, the rotary press and phototypesetting come and go and none of these changes have marked the end of the world as we know it. In the course of time, the involved players were always able to renegotiate their work and business relations. Without doubt this will be the case in the newspaper business as well. However, as a librarian I am not a “jack of all trades” so allow me to end this interview with a timidly whispered wish: Let the lawyers worry about that.