I'm just finishing an intense and exciting week focusing a lot on the role of software and data in science. The big event was a workshop on Indexing Astronomical Software jointly organized by AAS and GitHub and supported by the Alfred P. Sloan Foundation. It was a small workshop with about twenty people that are deeply involved in these issues. I was happy to be invited since I'm not an astronomer, but I think about the issue for particle physics.
So why was I there? There are a few connections:
I have good connections with Arfon Smith of GitHub and Lars Holm Nielsen of Zenodo/CERN that worked together to create the GitHub Guide to Make Your Code Citable. Some code of mine was used in the first GitHub → Zenodo DOI example . In just a year there are nearly 3000 repositories that now have DOIs.
As part of the Moore-Sloan Data Science Environment, I am co-lead for NYU's Open Science and Reproducibility working group
Shortly before the meeting there was a blog post by Titus Brown asking "is software a primary product of science?". Surprisingly, to many people at least, Titus concluded 'no' it isn't. That led to some discussion on twitter and prompted Dan Katz to write a response on his blog. The discussion is interesting exploring the analogy with instrumentation, the role of software in communication and incapsulation of knowledge. The discussion also got into the very tough set of issues around jobs in academia, the brain drain, and recognition for the essential role of software in science today. I took some time to try to reconcile the situation for myself and wrote some comments in Disqus. Luckily it remained civil and later Titus followed up with these thoughts. If you are interested in those topics, I definitely recommend reading some of those links.
So on Sunday I made it to San Francisco. I had an awesome lunch with the team from experiment.com, a crowd funding platform for science, and some artisinal hipster ice cream with my bud Roy Keys (physicist turned data scientist). Monday the AAS/GitHub workshop started with a round of short introductions. That initial stage was fairly efficient because we had all been asked to do some homework before hand. In particular, we were asked to provide a mission statement for the meeting and to identify the three biggest tangible obstacles that we saw to achieving the goal as described in our mission statement. Reading these ahead of time was very informative and established some context.
Later we broke into groups. I was in the "infrastructure & frictions" group. We threw out a lot of issues, Lars helped us organize them, and then Robert Hanisch of NIST helped us reduce the scope a little to something more acheivable for a short time. During lunch we started to have a plan for some specific infrastructure that could help streamline the current curation effort being performed by the Astrophysics Source Code Library ASCL. As for myself, I saw a lot of parallels between ASCL/ADS in astrophysics and HepData/INSPIRE in particle physics. Both ASCL and HepData peform an important curation role, and both are running on a very light budget and there are worries regarding sustainability and older underlying technology. Another obvious parallel is related to the custom identifiers used by ASCL and HepData, which would ideally be DOIs. I was volunteered to present for our group when we came back together -- I think partially because I was an outsider.
After an exhausting, but productive, session we took advantage of GitHub's nicely stocked bar.
After a few drinks we had a quick demo session. There was an impressive demo of the new ADS interface. I also showed some recent work that my student Lukas Heinrich has done to use GitHub webhooks to initiate requests to the Recast reintepretation framework (more on that in the next post. After that I had to run to SFO to catch the red-eye to JFK.