Skip to main content

Services

Quality Report 2023

The Year 2023 of the National Digital Preservation Services in Finland

General

Digital preservation services (DPS) refer to services produced together for the digital preservation of cultural heritage and research data. The development of DPS is continuous and takes place in close cooperation with the organizations that make use of them. The aim is that the most significant digitized and born-digital cultural heritage content in the Digital Preservation Service for Cultural Heritage will be preserved for future generations and that long-term utilization of the content is possible. Similarly, the Digital Preservation Service for Research Data ensures the availability and preservation of digital research data. Both services use a common digital preservation system for bit-level preservation.

The Digital Preservation Service for Cultural Heritage started preserving content in 2015 and the Digital Preservation Service for Research Data in late 2019. Organizations using the Digital Preservation Service for Research Data for preparing and storing data can also make more extensive use of Fairdata services, including the packaging service and the management interface.

The Main Results of the Year 2023

During 2023, the annual growth of data in preservation was 678 terabytes, a new record in the history of the DPS. More than 970,000 new archival information packages were accumulated to the preservation, which also is a record in annual growth. The amount of content in preservation exceeded 2.7 petabytes by the end of 2023.

The storage capacity of the DPS was increased to 7.5 petabytes, which ensures sufficient space for the utilizing organizations to retain new contents. The life cycle of DPS equipment was extended until the early 2030s.

To speed up and facilitate the long-term preservation of content, a digital object validation service was launched. With the help of the validation service, anyone can easily check the preservability of their data in the DPS even before packaging and sending content to the DPS for ingestion.

An authorization reform was implemented for the user management of the Digital Preservation Service for Research Data. The reform made it easier to manage the rights of the users of the service, the organizations that utilize of the service, and the preservation agreements. At the same time, this reform enabled the automated dissemination of data through the management interface.

During 2023, the calculation of the carbon footprint of the DPS was continued on the basis of calculations from the previous year. The storage capacity was increased and the life cycle of the equipment was extended by two years. One of the most ecological data centres in Finland was selected as the new data centre by CSC in Renforsin Ranta in Kajaani. With these measures, the carbon footprint of the DPS was almost kept at the same level as reported in 2022. In deploying the new data centre, the first petabyte of preserved data was migrated to the new infrastructure at the end of 2023. The DPS will from now on be on an even more sustainable footing when regarding the future.

The organization of the DPS was reformed by establishing a joint service manager task for DPS and Fairdata Services. The goal is to improve the services and to bring the DPS and Fairdata Services closer together. Two major events were held in 2023, a seminar on digital preservation for our partner organizations, and an annual meeting of members of the Open Preservation Foundation (OPF). The two-day digital preservation seminar discussed long-term preservation experiences and perspectives with international experts and provided training in material management and the use of digital preservation services. The annual meeting of OPF was attended by representatives of the organization's international membership, including CSC. A total of 150 people participated in the events.

A customer satisfaction survey for the digital preservation collaboration group was carried out in late 2023. The customer satisfaction with the DPS was quite positive (5.2/6).

Partner Organizations

Partner organizations of the Digital Preservation Service for Cultural Heritage
OrganizationPurpose of useCapacity (TB)
CeliaMaster-arkisto ja pitkäaikaisesti säilytettävät valitut uudet äänikirjat110
Kansallinen audiovisuaalinen instituuttiValikoitu osa kotimaisen elokuvan digitoitavista aineistoista2400
KansallisarkistoKansallisarkiston vastaanottamat alkujaan digitaaliset valtionhallinnon asiakirjalliset aineistot41
KansallisarkistoVAPA-järjestelmään siirretyt tietoaineistot1
KansallisarkistoKansallisarkiston massadigitointi-hankkeen aineistot114
KansallisarkistoKansallisarkiston digitaaliarkistosta siirrettävät aineistot ja takautuvan digitoinnin aineistot805
KansallisarkistoKansallisarkiston yksinomaan digitaalisessa muodossa olevat yksityisarkistoaineistot27
KansallisgalleriaKiasman mediataiteen teosten pitkäaikaissäilytys20
KansalliskirjastoKansalliskirjaston digitoimat kulttuuriperintöaineistot1083
KansalliskirjastoKulttuuriaineistolain nojalla kerätyt aineistot355
Kotimaisten kielten keskus KotusKotuksen kielentutkimus- ja kulttuuriperintöaineistojen pitkäaikaissäilytys60
MuseovirastoKulttuuriympäristön tutkimusraportit1
MusiikkiarkistoMusiikkiarkiston pitkäaikaissäilytettävät aineistot70
PostimuseoPostimuseon filateelisen kokoelman pitkäaikaissäilytys2
Svenska Litteratursällskapet SLSSLS:n pitkäaikaissäilytettävät aineistot50
Yhteiskuntatieteellinen tietoarkisto, FSDTietoarkiston arkistoimien tutkimusaineistojen kokoelman pitkäaikaissäilytys1
Partner organizations of the Digital Preservation Service for Research Data (preservation agreements)
OrganizationPurpose of useCapacity (TB)
Geologian TutkimuskeskusGTK:n tomografialaitteen tuottamat tietoaineistot12
Helsingin yliopistoHelsingin yliopiston SMEAR-aineistojen valikoima meteorologisia - ja ilmanlaatumittauksia2
Helsingin yliopistoM. cinxia and C. melitaearum in the Åland metapopulation system2
Helsingin yliopistoFIRE (The Finnish Reflection Experiment)1
Helsingin yliopistoLuomuksen aineistot150
Itä-Suomen yliopistoSENSOTRA1
Jyväskylän yliopiston kiihdytinlaboratorio250-Nobeliumin hajoamisspektroskopia1
Oulun yliopisto, Sodankylän geofysikaalinen observatorioHavaintoaineistot30
Tampereen yliopistoYhteiskuntatieteiden tiedekunnan Kansanperinteen arkiston A-K-kokoelma2
Turun yliopistoHistorian, kulttuurin ja taiteiden tutkimuksen arkiston aineistot (HKT-arkisto)20
Åbo AkademiSamlingar vid Åbo Akademis bibliotek10

Data Accumulation in 2023

About 678 terabytes of new data were received for preservation during the year, and the total amount of data in preservation at the end of 2023 was over 2.7 petabytes. The data accumulation during 2023 is shown in the figure below.

Data accumulation in 2023

The DPS took during 2023 responsibility for preserving more than 970,000 content packages, and at the end of 2023 there were more than 3,621,000 content packages in preservation. The accumulation of content packages during 2023 is shown in the figure below.

Package accumulation in 2023

Maintenance of the Digital Preservation Services

A wide range of activities are required to produce digital preservation services: maintenance tasks, development of methods and models, software development, development of equipment infrastructure, and administrative work. The following section focuses in particular on the maintenance tasks of the digital preservation services, using the model for quality reporting of IT services’ production operations, which typically focus, over a certain period of time, on the growth of data, incidents and the recovery from them.

The main objectives of maintaining the Digital Preservation Services are:

  • ensure the integrity and availability of archival information packages in preservation
  • monitor the functionality of the service; and
  • support organizations in utilizing the DPS services (e.g. fixing invalid or incomplete submission information packages detected during ingest).

Monitoring the Digital Preservation Services

Monitoring the DPS has been automated as far as possible. This control provides status and event information for the maintenance of the services, and also for the organizations that make use of it, which enables the experts to infer the status of the service and take the necessary measures when needed.

The following items are automatically monitored in the DPS at the moment: device failures (such as broken hard drives), broken tape drives, server availability, disk area fill rate, visibility of distributed storage areas on different servers, up-to-dateness of virus database for virus checks, storage layer integrity, availability of tape libraries, SSL certificate life cycles, and failed login attempts of SFTP port on frontend servers.

In addition, the following items are manually monitored: the progress of the work queue at ingest, processing submission information packages stuck in the work queue, checking the integrity of archival information packets, analysing problems with rejected transfer packets, replicating broken media, and creating copies for the dark archive.

As part of the development of the DPS, monitoring the service will be improved and new processes will be automated. This makes it possible to maintain a cost-effective service even though the amount of content to be preserved is increasing.

Quality Deviations Related to the Data in Preservation in 2023

We have together with the partner organizations considered what quality means in terms of the long-term preservation of data. It has been agreed that the integrity of the data and the reliability of preservation are of particular importance. In this case, quality deviations are situations where the preservation of data is threatened, and not for example situations where the service is temporarily unavailable. Reporting on the quality of the service using these criteria is somewhat challenging, as the usual indicators of IT environments (e.g. service accessibility percentages) do not indicate deviations or actual threats to the preservation of the data. We have defined that situations where the preservation of data is threatened are situations where there are less than three intact copies of archival information packages of the data. These situations are typically recovered from using an intact copy on another media type. The maintenance of the DPS is able to restore these situations to normal as part of its normal operation.

During 2023, the hardware infrastructure of the digital preservation system experienced seven disk failures, three power supply breakdowns, and two cases where the magnetic tape station was jammed. In addition to these, one tape was jammed in its position. The most significant deviation from the hardware infrastructure was the crash of one of the RAID controllers. None of these problems led to corruption of archival information package copies and no deviations in the preserved data have been detected.

IBM announced a possible manufacturing defect in LTO9 tapes in the autumn of 2023 and asked to check if the tapes delivered to the DPS fall under any of the incorrect batches. None of the checked tapes have been part of the batches of tapes covered by the recall.

New Features of Software Development

The code base for DPS was extensively revised, taking into account the upcoming operating system migration in 2024 as the CentOS 7 operating system comes to an end. A new, easier-to-use model to provide DPS tools to partner organizations was prepared.

The management interface of the Digital Preservation Service for Research Data was renewed during 2023. As a result, the layout of the management interface is more closely aligned with the common line of the Fairdata Services. The pre-ingest tool offered to organizations was renewed with the aim of publishing the new version of the tool at the beginning of 2024.

In the Digital Preservation Service for Cultural Heritage, the preparation of the mass migration of ARC/WARC contents proceeded as planned. Regarding file formats, support for multi-image image files was added, and improvements to MPEG-PS version recognition was made. Improvements when handling validation results were also made. How non-supported file formats with a recommended version included in the same package are supported and processed in the ingest was specified. Python 2 code was removed from the code base after the Python 3 migration was completed.

Support for Partner Organizations

The DPS help organizations that make use of the data in questions related to the digital preservation of the data. In particular, this support is provided during the DPS deployment process, but organizations can also submit service requests in other situations. Requests for support are received at the support address of the DPS: pas-support@csc.fi.

In 2023, a total of 110 service requests were received from organizations utilizing the DPS. In addition to service requests, discussions are held with partner organizations, for example through the digital preservation collaboration group which meets 3-4 times a year.

The events and current affairs of the DPS were announced on the digitalpreservation.fi website, the X-channel (@dpres_fi) and on an email list intended for information purposes.