by Felix Schönbrodt & Roland Ramthun
The open availability of scientific material (such as research data, code, or other material) has often been identified as one cornerstone of a trustworthy, reproducible, and verifiable science. At the same time, the actual availability of such reproducible material still is scarce (though on the rise).
To increase the availability of open scientific material, we propose a license for scientific research data that increases the availability of other open scientific material. It borrows a mechanism from open source software development: The application of copyleft (or, in the CC terminology, “sharealike”) licenses. These are so-called “sticky licenses”, because they require that every reuse of the licensed material has to have the same license. This means, if you reuse material under this license, your own product/derivative must also (a) be freely reusable and (b) use that license, so that any derivative from your product is free as well, ad infinitum.
The promise of such a “viral” license is that it can induce more and more freedom into a system. It is supposed to be a strategy to reform the environment: The more artifacts have a copyleft license, the more likely it is that future products have the same license, until, at the end, everything is free.
One criticism of such licenses stems from the definition of “freedom”: According to this point of view, the highest degree of freedom is if you can do anything with a material. This also includes commercial usage, which is usually closed for competitive reasons, or to integrate the material into a larger dataset which itself can not be open, because other parts of the data have restrictive licenses. We are not lawyers, but in our understanding this could, for example, also include restrictions due to privacy rights.
For example, imagine the compilation of an integrative database that includes both material from a copyleft source and another source that has individual-related material, which cannot be openly shared due to privacy rights (but could be shared as a restricted scientific use file). At least from our understanding, a strict copyleft license would preclude the reuse in such a restricted way. Hence, the copyleft license, although claiming to ensure freedom, does preclude a lot of potential reuse scenarios. From this point of view, a so-called permissive license (such as CC0, MIT, or BSD) provides more freedom than a copyleft license (see, e.g., The Whys and Hows of Licensing Scientific Code).
We propose a system that addresses both points of view, with the goal to provide some stickiness of scientific open sharing, but also the possibility to operate with scientific material that require restrictiveness, for example due to privacy rights.
We suggest the following clause for the reuse of open research data:
Upon publication of any scientific work under a broad definition (including, but not limited to journal papers, books or book chapters, conference proceedings, blog posts) that is based in full or in part on this data set, all data analysis scripts involved in the creation of this work must be made openly available under a license that allows reuse (e.g., BSD or MIT).
(Of course more topics must be addressed in the license, such as the obligation to properly cite the authors of the data set, not to try to reidentify research participants, etc. But we focus only on the copyleft aspect here).
This system has some differences from traditional copyleft licenses.
The proposed system offers some protection against the “research parasites” argument. The parasite discussion refers to the free-rider problem in social dilemmas: While some people invest resources to provide a public good, others (the parasites/free-riders) profit from the public good, without giving back to the community (see also Linek et al., 2017). This often creates a feeling of injustice, and impulses to punish the free-riders. (An entire scientific field is devoted to the structural, sociological, political, and psychological properties and consequences of such social dilemma structures.)
In the proposed licensing system, those who profit from openness by reusing open data must give something back to the community. This increases overall openness, reusability, and reproducibility of scientific outputs, and probably decreases feelings of exploitation and unfairness for the data providers.
Do you think such a license would work? Do you see any drawbacks we didn’t think of?