In 2021, the German Psychological Society (DGPs) signed the DORA declaration. In consequence, they recently installed a task force with the goal to create a recommendation how a responsible research assessment could be practically implemented in hiring and promotion within the field of psychology.
In our current draft (not public yet) we want to decenter (A) scientific publications as the primary research output that counts, and recommend to also take (B) published data sets, and the development and maintenance of (C) research software into consideration. (Along with Recognition and Rewards and other initiatives, we also call for taking Teaching, Leadership skills, Service to the institution/field, and Societal impact into account. In the white paper, however, we only address the operationalization of the Research dimension).
Concerning research software, we worked on an operationalization. This is inspired from:
- the INRIA Evaluation Committee Criteria for Software Self-Assessment
- Alliez, P., Cosmo, R. D., Guedj, B., Girault, A., Hacid, M.-S., Legrand, A., & Rougier, N. (2020). Attributing and Referencing (Research) Software: Best Practices and Outlook From Inria. Computing in Science & Engineering, 22(1), 39–52. https://doi.org/10.1109/MCSE.2019.2949413
- Gomez-Diaz T and Recio T. On the evaluation of research software: the CDUR procedure [version 2; peer review: 2 approved]. /F1000Research/ 2019, 8:1353 (https://doi.org/10.12688/f1000research.19994.2)
Please note that …
- The system should be as easy as possible (otherwise it will not be used in hiring committees)
- Psychologists are not computer scientists, so existing criteria for those might be too advanced.
- As R is the #1 open source software for statistical computing in psychology, so all examples relate to R.
Here is our current draft of the research software section. As we are not aware of any concrete implementation of assessing research software for hiring or promotion purposes (at least not in psychology or neighboring fields), we like to ask the community for feedback. At the end of the post we list three ways how you can comment.
DRAFT SECTION FOR OPERATIONALIZING RESEARCH SOFTWARE CONTRIBUTIONS IN HIRING AND PROMOTION
(C) Research Software Contributions
Research software is a vital part of modern data-driven science that fuels both data collection (e.g., PsychoPy, Peirce et al., 2019, or lab.js, Henninger et al., 2021) and analysis (see, for example, R and the many contributed packages). In some cases, the functioning of entire scientific disciplines depends on the work of a few (often unpaid) software maintainers of critical software (Muna et al., 2016). Furthermore, non-commercial open source software is a necessary building block for computational transparency, reproducibility, and a thriving and inclusive scientific community. Instead of being a “career suicide”, it is high time that research software development is properly acknowledged in hiring and promotion.
Some research software is accompanied with a citable paper describing the software (e.g., for the lavaan structural equation modeling package in R: Rosseel, 2012). However, these “one-shot” descriptions of software often do not appropriately reflect the continuous work and changing teams that are necessary to develop and maintain research software. Therefore we include “Contributions to Research Software” as a separate category with their own quality criteria. Note that this category (C) only refers to dedicated, reusable research software, not to specific analysis scripts for a particular project. The latter should be listed under “Open reproducible scripts” of the respective paper in section (A).
For the evaluation of contributed research software, applicants can list up to 5 software artifacts along with the self-assessment criteria presented in Table 3 (a more comprehensive evaluation scheme with more quality criteria is proposed in Appendix A). Contributor roles are taken from the INRIA Evaluation Committee Criteria for Software Self-Assessment.
Table 3. Simple evaluation scheme for research software, with one specific example
|Research Software 1||URL||Comment|
|Title||R package RSA||https://CRAN.R-project.org/package=RSA|
|Citation||Schönbrodt, F. D. & Humberg, S. (2021). RSA: An R package for response surface analysis (version 0.10.4). Retrieved from https://cran.r-project.org/package=RSA|
|Short description||An R package for Response Surface Analysis|
|Date of first full release||2013||Necessary to compute citations relative to age of software|
|Date of most recent major release||2020||Indicates whether software is actively maintained|
|Contributor roles and involvement||DA-3|
|What has the applicant contributed?|
For each of the 3 roles:
– design and architecture (DA)
– coding and debugging (CD)
– maintenance and support (MS)
… specify if you are:
0. not involved
1. an occasional contributor
2. a regular contributor
3. a main contributor
DA-2, CD-3, MS-1
|License||GPLv3||Is the software open source?|
|Scientific impact indicators:|
|Downloads or users per month||710 downloads / month||https://cranlogs.r-pkg.org/badges/RSA|
|Citations||110||https://scholar.google.de/citations?view_op=view_citation&hl=de&user=KMy_6VIAAAAJ&citation_for_view=KMy_6VIAAAAJ:mB3voiENLucC||Evaluate relative to the age of software|
|Other impact indicators (optional)||–||E.g., Github stars, number of dependencies.|
Be careful and responsible when using metrics, in particular when they are black-box algorithms.
|Reusability indicator||R3||Levels of the reusability indicator:|
R1 (0.25 points): Single scripts, loose documentation, no long-term maintenance.
Prototype: A collection of reusable R scripts on OSF.
R2 (1 points): Well-developed and tested software, fairly extensive documentation. Some attention to usability and user feedback. Not necessarily regularly updated.
Prototype: A small CRAN package with no more active development (just maintenance)
R3 (2 points): Major software project, strong attention to functionality and usability, extensive documentation, systematic bug chasing and unit testing, external quality control (e.g. by uploading to CRAN). Regularly updated.
Prototype: Well received and actively maintained CRAN package.
R4 (6 points): Critical infrastructure software. Hundreds of research projects use or depend on the software (+ all criteria of R3).
Prototype: lavaan package.
|Merit / impact statement (narrative, max 100 words)||The RSA package has become a standard package for computing and visualizing response surface analyses in psychology. A PsycInfo search for “response surface analysis” (from 2022-05-18) revealed that of the 20 most recent publications, 35% used our package (although 2 of 7 did not cite it). Several features, such as computation of multiple standard models and model comparisons are unique to this package.|
|Reward Points||(3+3+3)/3 * 3 = 9||Take the average value of the 3 contributor roles and multiply it with the points of the level of the reusability indicator.|
Is there essential information missing in the table?
Calibrating reward points
We also want to offer a suggestion how to compute „reward points“. The goal is to bring the categories of „publications“ and „software contributions“ onto a common evaluative dimension. This gets a bit complicated, as we also propose bonus points for publications with certain quality criteria, so not every publication gets the same number of points. For the moment, imagine a publication of good quality (neither a quickly churned out low-quality publication, nor an outstanding, seminal contribution). What is the “paper equivalent” of a software contribution? Note that these bonus points are thought as incremental to an existing paper that describes the software.
Here’s our suggestion, being aware that it is easy to find counter-examples that do not fit in the system. But we are happy if our system is an incremental improvement over the status quo (which is: to ignore software contributions and to count the number of papers without any quality weighting):
|Research Software Prototype||Paper equivalents (of good quality)|
|Simple script (a few hundred lines) with reuse potential, completely done by applicant||0.25|
|A well-developed CRAN package: Occasional co-developer with a minor contribution||0.5|
|A well-developed CRAN package: Active co-developer with major contribution||1|
|A well-developed CRAN package: Main developer||2|
|Critical infrastructure: Regular co-developer||2|
|Critical infrastructure (e.g., lavaan): Main developer||5|
How to comment?
If you have comments, you can …
- post them here below the blog post
- write an email to firstname.lastname@example.org
- directly add your comments in a Google doc
Thanks for your help!