The permanence of scientific data

by Konrad Hinsen, posted on 29 May 2015

In a recent blog post, Bret Victor succinctly describes one of the main weaknesses of today’s Web technology: it was not designed with any specific level of permanence in mind. As a result, information published on the Web is too permanent for some uses (privacy concerns etc.), but not permanent enough for others. Scientific publishing is clearly in this second category. The scientific record is meant to be as permanent as technically and economically feasible. As Victor points out, the possibility for distributed archiving is important for ensuring the permanence of information.

This point has always been important in the development of ActivePapers. An ActivePaper is a file, which anyone can copy and archive as often as desired. Web technology is used to distribute ActivePaper files, but their existence does not depend on Web technology. In contrast, some scientific data is accessible only through Web servers. This includes in particular software proposed in the form of Web services (see also my earlier blog post on this topic), but also databases that do not permit downloading datasets in well-defined file formats. Such data can disappear at any moment, either by accident or mistake (we all know how reliable computer technology is), or by decision of whoever owns the Web server.



comments powered by Disqus