I recently read an article on Tedium linked from hacker news about disc rot. It reminded me of two times in my past:

The first was after a hard drive I had failed. It was working fine and I put it down for a few months. One day I plugged it back in to find it wouldn’t mount. I spent some time on command line trying to force its mount and eventually discovered significant sector errors and a totally corrupt boot manager. A data recovery ensued and the HDD was destroyed afterwards, unrepairable.

The second was whilst working with a research group based in Netherlands. They specialised in long term storage of architectural 3D point cloud scans for conversion to BIM models. This was probably the first time I truly considered the implications of long term data storage based on evolving file types.

I have then pondered the questions: How can we store data long term? And, the less practical and more philosophical question, should we? So let’s get into it.

How can we store data long term?

Storing data long term is incredibly complex.

Should we?

I have often theorized about the idea of data “memory”: Data that fades over time. What rules would have to be coded to allow this to happen in an organic, natural way. Should we even allow it to? How do we rate the importance of data to know what needs to be stored forever and what we can let disappear from existence?

I remember seeing a story detailing how facebook had to build an entire new datacenter which it filled with it’s users (customers) data. Most likely images of cats. They not only had to store billions of images, but they had to duplicate the images across block devices in order to ensure the reliability of data. If a block device failed, and Facebook lost the cat image, they failed on their promise to the customers that the data was safe from disappearing. In order to not have to build twice the number of servers, they devised an algorithm which could use only 60% of the image data to rebuild 100% of the image again as an exact copy.

Assuming a large proportion of these images are memes, cats, food, etc. Is it worth building massive centers to ensure the data keeps? I personally cannot think of a larger example of procrastination as a species. I always imagine aliens coming to observe the Earth, what would they think, not knowing the reasons behind these decisions?