Quality Report 2022
The Year 2022 of the Finnish National Digital Preservation Services
General
Digital preservation services (DPS) refer to services produced together for the digital preservation of cultural heritage and research data. The development of DPS is continuous and takes place in close cooperation with the organizations that make use of them. The aim is that the most significant digitized and born-digital cultural heritage content in the Digital Preservation Service for Cultural Heritage will be preserved for future generations and that long-term utilization of the content is possible. Similarly, the Digital Preservation Service for Research Data ensures the availability and preservation of digital research data. Both services use a common digital preservation system for bit-level preservation.
The Digital Preservation Service for Cultural Heritage started preserving content in 2015 and the Digital Preservation Service for Research Data in late 2019. Organizations using the Digital Preservation Service for Research Data for preparing and storing data can also make more extensive use of Fairdata services, including the packaging service and the management interface.
The Main Results of the Year
During 2022, the amount of preserved data exceeded 2 petabytes while the annual growth was over 419 terabytes, which is slightly less than in 2021. Over 600,000 new packages accumulated into preservation.
A template for data protection impact assessment (DPIA) to preserve research data was produced for the use of the partner organizations of the Digital Preservation Service for Research Data. The template can be used to perform the assessment for preservation of research data. It was created during a use case of DPIA for digital preservation of research data. The template’s purpose is to help data controller to assess data protection impact to preservation of research data.
During the year, the Enterprise Architecture of the National Digital Preservation Services was drafted. The enterprise architecture document will be published at the beginning of 2023 and it will update and specify the enterprise architecture of the National Digital Library and the reference architecture of open science and research on the parts of digital preservation. The architecture was produced in accordance with the guidance of the Ministry of Education and Culture and the hearing of the digital preservation collaboration group. The aim of the enterprise architecture is to describe the principles of the national DPS, the services offered to partner organizations and the factors that guide operation, to support the development of the DPS, and to help combine the needs of the partner organizations and the goals and production of the DPS.
The carbon footprint of the DPS was calculated in 2022. The calculations considered emissions from the production of the device infrastructure, the footprint from daily usage, and of the people working in maintenance and development of the services. Results were reported to the digital preservation collaboration group and monitoring will continue.
A customer satisfaction survey for the digital preservation collaboration group was carried out in late 2022. The customer satisfaction with the DPS was quite positive (5.0/6). Additionally, the partner organizations were especially satisfied with the user support and consulting as a whole (5.3/6).
The digital preservation system's storage capacity was expanded by 1,8 petabytes in the 2022 which raised the storage capacity to 5,4 petabytes.
Partner Organizations
The Ministry of Education and Culture has granted capacity to the Digital preservation services for the following organizations by the end of year 2022:
Organization | Purpose of use | Capacity (TB) |
---|---|---|
Celia | Master-arkisto ja pitkäaikaisesti säilytettävät valitut uudet äänikirjat | 110 |
Kansallinen audiovisuaalinen instituutti | Valikoitu osa kotimaisen elokuvan digitoitavista aineistoista | 2400 |
Kansallisarkisto | Kansallisarkiston vastaanottamat alkujaan digitaaliset valtionhallinnon asiakirjalliset aineistot | 41 |
Kansallisarkisto | VAPA-järjestelmään siirretyt tietoaineistot | 1 |
Kansallisarkisto | Kansallisarkiston massadigitointi-hankkeen aineistot | 114 |
Kansallisarkisto | Kansallisarkiston digitaaliarkistosta siirrettävät aineistot ja takautuvan digitoinnin aineistot | 805 |
Kansallisarkisto | Kansallisarkiston yksinomaan digitaalisessa muodossa olevat yksityisarkistoaineistot | 27 |
Kansallisgalleria | Kiasman mediataiteen teosten pitkäaikaissäilytys | 20 |
Kansalliskirjasto | Kansalliskirjaston digitoimat kulttuuriperintöaineistot | 175 |
Kansalliskirjasto | Kulttuuriaineistolain nojalla kerätyt aineistot | 355 |
Kotimaisten kielten keskus Kotus | Kotuksen kielentutkimus- ja kulttuuriperintöaineistojen pitkäaikaissäilytys | 60 |
Museovirasto | Kulttuuriympäristön tutkimusraportit | 1 |
Musiikkiarkisto | Musiikkiarkiston pitkäaikaissäilytettävät aineistot | 70 |
Svenska Litteratursällskapet SLS | SLS:n pitkäaikaissäilytettävät aineistot | 50 |
Yhteiskuntatieteellinen tietoarkisto, FSD | Tietoarkiston arkistoimien tutkimusaineistojen kokoelman pitkäaikaissäilytys | 1 |
Organisaatio | Käyttötarkoitus | Kapasiteetti (TT) |
---|---|---|
Geologian Tutkimuskeskus | GTK:n tomografialaitteen tuottamat tietoaineistot | 12 |
Helsingin yliopisto | Helsingin yliopiston SMEAR-aineistojen valikoima meteorologisia - ja ilmanlaatumittauksia | 2 |
Helsingin yliopisto | M. cinxia and C. melitaearum in the Åland metapopulation system | 2 |
Helsingin yliopisto | FIRE (The Finnish Reflection Experiment) | 1 |
Helsingin yliopisto | Luomuksen aineistot | 150 |
Itä-Suomen yliopisto | SENSOTRA | 1 |
Jyväskylän yliopiston kiihdytinlaboratorio | 250-Nobeliumin hajoamisspektroskopia | 1 |
Oulun yliopisto, Sodankylän geofysikaalinen observatorio | Havaintoaineistot | 30 |
Tampereen yliopisto | Yhteiskuntatieteiden tiedekunnan Kansanperinteen arkiston A-K-kokoelma | 2 |
Turun yliopisto | Historian, kulttuurin ja taiteiden tutkimuksen arkiston aineistot (HKT-arkisto) | 20 |
Data Accumulation in 2022
About 419 terabytes of new data were received for preservation during the year and the total amount of data in preservation at the end of 2022 was over 2085 terabytes. The data accumulation during 2022 is shown in the figure below.
The DPS took during 2022 responsibility for preserving more than 600,000 content packages, and at the end of 2022 there were more than 2,600,000 content packages in preservation. The accumulation of content packages during 2022 is shown in the figure below.
Maintenance of the Digital Preservation Service
A wide range of activities are required to produce digital preservation services: maintenance tasks, development of methods and models, software development, development of equipment infrastructure, and administrative work. The following section focuses in particular on the maintenance tasks of the digital preservation services, using the model for quality reporting of IT services’ production operations, which typically focus, over a certain period of time, on the growth of data, incidents and the recovery from them.
The main objectives of maintaining the Digital Preservation Services are:
- ensure the integrity and availability of archival information packages in preservation
- monitor the functionality of the service; and
- support organizations in utilizing the DPS services (e.g. fixing invalid or incomplete submission information packages detected during ingest).
Monitoring of the Digital Preservation Services
Monitoring the DPS has been automated as far as possible. This control provides status and event information for the maintenance of the services, and also for the organizations that make use of it, which enables the experts to infer the status of the service and take the necessary measures when needed.
The following items are automatically monitored in the DPS at the moment:
- device failures (such as broken hard drives),
- broken tape drives, server availability,
- disk area fill rate,
- visibility of distributed storage areas on different servers,
- up-to-dateness of virus database for virus checks,
- storage layer integrity,
- availability of tape libraries,
- SSL certificate life cycles, and
- failed login attempts of SFTP port on frontend servers.
In addition, the following items are manually monitored:
- the progress of the work queue at ingest,
- processing submission information packages stuck in the work queue,
- checking the integrity of archival information packets,
- analysing problems with rejected transfer packets,
- replicating broken media, and
- creating copies for the dark archive.
As part of the development of the DPS, monitoring the service will be improved and new processes will be automated. This makes it possible to maintain a cost-effective service even though the amount of content to be preserved is increasing.
Quality Deviations Related to the Data in Preservation in 2022
We have together with the partner organizations considered what quality means in terms of the long-term preservation of data. It has been agreed that the integrity of the data and the reliability of preservation are of particular importance. In this case, quality deviations are situations where the preservation of data is threatened, and not for example situations where the service is temporarily unavailable.
Reporting on the quality of the service using these criteria is somewhat challenging, as the usual indicators of IT environments (e.g. service accessibility percentages) do not indicate deviations or actual threats to the preservation of the data. We have defined that situations where the preservation of data is threatened are situations where there are less than three intact copies of archival information packages of the data. These situations are typically recovered from using an intact copy on another media type. The maintenance of the DPS is able to restore these situations to normal as part of its normal operation.
During 2022, the archival information packages from two LTO-9 tapes, that had come to the end of their life cycle, were copied to new LTO-9 tapes. Altogether 88350 archival information packages were copied in the process.
One of the hundred RAID6 disks reported malfunction. The size of the malfunctioning RAID6 disk was 32 terabytes. The archival information packages on the disk were coped to a functioning RAID6 disk and the fixity of the packages was checked, all checked packages were discovered to be intact after the copying. The malfunctioning RAID6 disk was formatted and taken for reuse.
A single disk copy of an archival information package was destroyed due to a human error. The destroyed disk copy of the package was restored from a tape copy.
Scheduled fixity checks reported failure from one disk copy of one archival information package. It was discovered that the corrupted copy of the package had been unsuccessfully copied to storage during ingest earlier but had been later successfully copied to disk storage automatically.
The digital preservation system encountered nine broken disks during the year. None of the disk failures threatened the data in preservation because RAID disks protected the preserved data from corruption.
New Features of Software Development
Several operations were carried out for the Digital Preservation Service for Cultural Heritage and the digital preservation system in 2022. New tape drivers were taken into use, and to extend the capacity of the tape storage a support for multiple tape libraries was implemented. A workflow for ARC/WARC mass migration, including a stage for metadata handling, was developed. Additionally, software changes required by the changes in the specifications were implemented and metadata indexing for the search interface was improved.
The validation capacity was further extended. File format support for EPUB, DNG and AIFF was implemented. Changes to the metadata management relating to AudioMD and MIX metadata and to the metadata handling of other than recommended or acceptable file formats were made.
The software layer was migrated to Python 3 environment and Python 3 installation guides were added to the published tools. Additionally, the work for operation system migration to AlmaLinux was started.
Several operations were carried out to the Digital Preservation Service for Research Data during 2022. The packaging tool used by the service, and available also to the partner organizations, has nearly reached the end of its life. The requirements for packaging have changed so that the packaging tool does not fulfil its purpose well enough any more. For this reason, the development of a completely new tool has started in the service for which a high-level plan and a builder component for METS metadata has been completed. The tool will be made available to the partner organizations upon its full completion.
A management feature was enabled that made possible to send data for packaging on temporary storage when using the graphical user interface. The representations of the packaging stages were made clearer and software changes required by the changes in the specifications were implemented. At packaging, automatically created event history and the events related to Metax metadata database were added to the DPS’ packaging process.
Support for Partner Organization
The DPS help organizations that make use of the data in questions related to the digital preservation of the data. In particular, this support is provided during the DPS deployment process, but organizations can also submit service requests in other situations. Requests for support are received at the support address of the DPS: pas-support@csc.fi.
In 2022, a total of 140 service requests were received from organizations utilizing the DPS. The requests dealt especially with the pre-ingest and ingest operations on the data. Another significant issue was providing support in the deployment of the service. In addition to the service requests, discussions with the partner organizations are held through the digital preservation collaboration group 3-4 times a year. The established practices of the group, such as regular monitoring of software development, continued the development of the logical preservation requirement specification to update and specify earlier plans and descriptions of the DPS’ operations. Monthly virtual group meetings (#PASKaffet) with potential partner organizations, started in 2020, were continued also in 2022. Also, virtual support meetings (#PASKlinikka) were continued in 2022. The virtual support meetings are held for the partner organizations, as well as for the organizations interested to become partners, to provide discussions with a low participation threshold where the organization gets to have its own reserved time slot with the DPS' specialists.
The events and current affairs of the DPS were announced on the digitalpreservation.fi website, the Twitter account (@dpres_fi) and on an email list intended for information purposes.