Lead image by Blue Planet Studio / iStockPhoto

By Ty Burke

When it comes to technology, change comes at you fast. For today’s children, the floppy disks that dominated data storage in the 1990s are an unrecognizable relic. So too have the USB drives that prevailed in the early 2010s, now replaced by cloud-based data storage services. And today’s computer technologies will one day be replaced by new ones that are better and faster.

Technological obsolescence is a major challenge for archives. For centuries, they have served as repositories safeguarding important records about our past – from ancient scrolls and dusty manuscripts to land registries, historical maps and century-old court records. The sources of information archives preserve are central to our ability to understand and interpret history and to manage the future. They help us know and govern ourselves.

The rapid pace of technological change raises questions about whether future generations will benefit similarly from the digital information we generate today. Even if the data are preserved, will archives have the hardware and know-how needed to access, use and see them?

This is just one of many questions about the future working of archives. The InterPARES Trust AI (I Trust AI) research project is anticipating these challenges and aims to preserve today’s digital artifacts – computer games, digital twins, Tik Tok videos – for centuries to come.

A woman teaches a lesson to a large crowd of people seated around her.

Archiving Challenges in the Age of Massive Data Generation

Funded by the Social Sciences and Humanities Research Council (SSHRC), the five-year multinational, multisector and interdisciplinary research project led by Dr Luciana Duranti and Muhammad Abdul-Mageed of the University of British Columbia will leverage artificial intelligence (AI) to archive trustworthy public records.

Tracey Lauriault, an associate professor of Critical Media and Big Data in the School of Journalism and Communication is a co-applicant leading Carleton’s partnership that includes the departments of Communication and Media Studies, Architecture, Engineering and Data Science. Her contributions will centre on how to archive complex digital systems, such as those used for geographic information systems (GIS), building information modelling (BIM), smart grids and assessing whether or not a record is AI generated.

A headshot of a woman wearing glasses, smiling for the camera.

Carleton University associate professor Tracey Lauriault

“I Trust AI asks how we can maintain digital records for the next two or three hundred years,” says Lauriault.

The challenge is particularly acute for complex digital systems. For example, the Carleton Immersive Media Studio (CIMS) is creating hyper-realistic digital twins of Canada that immerse viewers in a multi-dimensional replica of an environment. CIMS creates detailed BIMs like the architectural rehabilitation and heritage conservation project of Canada’s Parliament.

This award-winning model used point cloud data to replicate the curved geometry, intricate details and surface deformations of the physical space, from the gargoyles, the construction materials to the entire precinct. Digital twins incorporate data from terrestrial laser scans, geo-referenced photogrammetry, computer-aided design software, historical photographs, and technical and research reports. It is only by integrating these different methods and tools that CIMS can it create immersive leading-edge models.

“There are hundreds of different file formats, computer code and all kinds of systems involved,” says Lauriault.

“I Trust AI asks whether artificial intelligence could keep all these digital pieces together and ensure they work.”

Lauriault further explains that “today, we can go to Library and Archives Canada and look at 16th century paper maps, but we’re not sure that people in the future will be able to look at the complex geospatial artifacts we create today. We are creating these wonderful engaging systems and environments of the world around us, without preservation in mind. This project is about knowing how these work and developing ways to make these available to future generations of creators, researchers, engineers and archivists all working together.”

A Time-Saving Aid for Digital Archivists

Other systems like smart grids that involve the internet of things (IoT) face similar challenges – you can’t understand the whole without all of the parts. But there are other issues facing archives, such as the sheer volume of information we generate – just think of the millions of photos, videos and computer games out there. Here too, AI could help.

More than 500 hours of video are uploaded to YouTube each minute. It would be humanly impossible to watch them all, let alone catalog, sort and manage them. AI can help reduce the workload for archivists as machine learning models can be developed to read, curate, sort and write descriptions. Imagine trying to preserve the data and the way they are shared on platforms that generate copious quantities of information like Facebook or Reddit.

“Archivists used to be able to watch all the films they ingested, but they can’t watch 200,000 videos,” says Lauriault.

“AI will help appraise these and describe them, but the results need to be accurate and reliable. And how do we know if these were generated by AI?”

Trust and Governance in Preserving AI-Generated Archives

The quantity of data that archivists need to manage continuously grows, and given the emergence of generative AI applications like ChatGPT, the records in those archives will be what the ChatGPTs of the world will be using to generate new content. These AI systems refer to copious amounts of texts and images and then generate more, and it is not always obvious whether this content has been AI generated or comes from an artifact generated by a human author, director, or artist.

Lauriault, in collaboration with computer scientists at Carleton University are contributing research that will help identify what was generated by AI.

This I Trust AI work is especially important as archives around the world will need to assess and ingest ‘truthful’ and ‘trustworthy’ audio, text, video and images resulting from war, the documentation related to human rights abuses, and artifacts created from newsworthy events.

These are but some of the many key issues that will shape the future of archives. Over the next five years, Carleton will be contributing I Trust AI research to address these challenges.

“As we move forward, we will want to look back on the systems we create today,” says Lauriault.

“Fifty years from now, we may need them. We may need to look at a digital twin of Ottawa as it today, in the event the city is damaged by a natural disaster like a flood, or to pre-empt and to prevent future calamities related to climate change. And we need to think of how to govern these complex spaces, for example, today’s smart grid will help shape future electricity systems. What kind of data agreements do we need to create digital twins, to manage smart grids, to study AI-generated content, how we appraise these, and how do we keep them intact for future generations.

“How do we keep these models live, so that future generations will be able to look back and engage with these artifacts, learn from then, and plan better?”