Lab 4: Reproducibility
Reproducibility
Today’s lab is about reproducibility, an important skill to professionals in industry and academia alike. Intimately related to communication and documentation, the main goal of reproducibility is to ensure others (and your future self) are able to reach the same results you did by following a sequence of steps. Our focus in the context of this course will be to attempt to reproduce the results from a recently published papr, pondering along the way whether we would have done anything different and assessing our overall experience. The paper in question is Comparative Advantage of Humans versus AI in the Long Tail (2024) by Agarwal, Huang, Moehring, Rajpurkar, Salz and Yu.
Setup
Find a link to “Replication Package” on the paper’s page and click on “Download this project”. Authentication is required; we recommend to “access through your institution” and fall back to the other options in case this fails. By the end of this step, you should have a file named
202185-V1.zip
on your machine. If that also does not work, you may trywget www.stat.berkeley.edu/~paciorek/transfer/202185-V1.zip
.Unzip
202185-V1.zip
and enterRad_AI_Longtail/
.Investigate which files are likely to contain reproducibility instructions and give it a quick read.
We now need to use the specific python version mentioned by the authors (3.11) to install the packages necessary to reproduce the results, which should be straightforward given the
requirements.txt
. But there are several caveats here. Try installing those packages and document your experience. Creating a conda environment using the providedrequirements.txt
file may be a reasonable attempt:conda create --name lab4 python=3.11 --file requirements.txt
. If it fails on your system, try to identify why.As an alternative, you may try the following command to generate an adjusted requirements file to be then to install packages:
grep -rh import . |
sed -e 's/"//g; s/,//g; s/\\n//g; s/^ *//g; s/ *$//g; /^\(import \|from \)/!d' |
cut -d ' ' -f 2 |
cut -d '.' -f 1 |
sort |
uniq |
grep -f - requirements.txt |
grep -v '^#' |
cut -d '=' -f 1,2 |
sed 's/=/==/g' |
grep -v '_' |
grep -v '\-base' >
requirements_adjusted.txt
- Running the command above incrementally might be useful to better understand what each step does if it is not immediately clear. Installing the packages should then be less prone to error by using this
requirements_adjusted.txt
file. You will also neednbformat==5.9.2
andjinja2==3.1.2
to be installed. Think in particular about the following modificationsed 's/=/==/g'
and what are the implications.
Replicating the results in the paper
With a proper environment set up, proceed to identify what is the minimal set of input files that is needed to run all the scripts. Are all input files used? Are there any files provided that you wouldn’t expect to be input?
Start following the instructions to replicate the results. Try to locate the “radiology experiment data” mentioned nd report back. If you are unable to find it, you may download it via
wget https://www.stat.berkeley.edu/~paciorek/transfer/data_public.txt
instead. Discuss the implications.Check the file permissions of the shell script via
ls -l make.sh
. You may change the permissions viachmod
if desired.Explore the
make.sh
script and discuss how it helps or hinders reproducibility. In addition, pay attention to how failures are handled via bash in the script and discuss what you think the motivation was.Note that
make.sh
invokes both python and ipython. Double check both executables are on the correct version. If you do not have ipython installed, you may do so viaconda install ipython
orpip install ipython
depending on what kind of virtual environment you set up.In what follows, we are going to run scripts that output text to the terminal. Document along do way your opinion about the text that is purposefully printed (informative or uninformative; excessive or lacking; etc.) and about any other text printed to screen (are there warnings, for example?).
Run the
make.sh
script up to the “main calculations” part. That step takes hours because the number of “bootstrap replicates” is large. Identify how that parameter is set and decrease it. Discuss your experience in making this change and whether you would have programmed this differently.Even though we adjusted the “bootstrap replicates” parameter to reduce the time the script takes, we are going to skip this step and conveniently download the files that would be generated by the script with “bootstrap replicates” set to 51. Run
wget www.stat.berkeley.edu/~paciorek/transfer/data-analysis-bootstrap.zip
andwget www.stat.berkeley.edu/~paciorek/transfer/data-analysis.zip
to do so. Proceed to unzip the files and ensure the files are in the correct directories.Run the last two scripts in
make.sh
to generate the plots. Do they look similar to the ones in the paper? If not, what could explain the difference?Note the contact information given in the instructions file. Are the two options feasible ways to reach the authors?