For the IRIS-HEP organization we need to collect publications and update our webpage regularly. I also have to do this for our group webpage, my CV, etc. It's annoying to copy/paste all of this information. The http://inspirehep.net website lets you export bibtex, latex, and plain text for individual papers or a set of papers that match a search result, but it's still not very convenient for web stuff where Markdown is common. The IRIS-HEP webpage is also based on jekyll, and can parse yaml files for making publication pages. So I wanted a tool that could ingest a bunch of paper identifiers and output yaml.
The new INSPIRE beta has a more modern API, so I wanted to try that out. Here's a repo with what I came up with while at CERN
recid_unpublished = 1726790 #notpublished
recid_published = 1705857 #published
list_of_recids = [recid_published, recid_unpublished]
yaml.dump(summarize_records(list_of_recids),default_flow_style=False)
which yields
publications:
- arxiv_eprint: '1811.12113'
authors: Aaboud, Morad; Aad, Georges; Abbott, Brad; Abdinov, Ovsat; Abeloos, Baptiste;
et. al.
collaboration: ATLAS
creation_date: '2018-11-30'
doi: 10.1007/JHEP04(2019)046
journal_title: JHEP
journal_volume: '04'
journal_year: 2019
page_start: '046'
recid: 1705857
title: Measurements of fiducial and differential cross-sections of $t\bar{t}$ production
with additional heavy-flavour jets in proton-proton collisions at $\sqrt{s}$ =
13 TeV with the ATLAS detector
url: https://arxiv.org/abs/1811.12113
- arxiv_eprint: '1903.10563'
authors: Carleo, Giuseppe; Cirac, Ignacio; Cranmer, Kyle; Daudet, Laurent; Schuld,
Maria; et. al.
creation_date: '2019-03-27'
recid: 1726790
title: Machine learning and the physical sciences
url: https://arxiv.org/abs/1903.10563
Here's an except from the notebook
Convert inspire ID's into short python dictionaries for Website¶
by Kyle Cranmer April 14, 2019
import requests
import json
#if you are running on Binder, you will need to uncomment the next line and execute it
#!pip install pyyaml
import yaml
recid_unpublished = 1726790 #notpublished
recid_published = 1705857 #published
recid = recid_unpublished
url = 'https://labs.inspirehep.net/api/literature/'+str(recid)
def summarize_record(recid):
url = 'https://labs.inspirehep.net/api/literature/'+str(recid)
max_authors = 5
r = requests.get(url)
data = r.json()['metadata']
mini_dict = {'recid':recid}
mini_dict.update({'title':data['titles'][0]['title']})
if len(data['authors'])>max_authors:
#mini_dict.update({'authors':[a['full_name'] for a in data['authors'][:max_authors]]+['et. al.']})
mini_dict.update({'authors':"; ".join([a['full_name'] for a in data['authors'][:max_authors]]+['et. al.'])})
else:
mini_dict.update({'authors':[a['full_name'] for a in data['authors']]})
if 'collaborations' in data:
mini_dict.update({'collaboration': data['collaborations'][0]['value']})
mini_dict.update({'arxiv_eprint': data['arxiv_eprints'][0]['value']})
mini_dict.update({'url': 'https://arxiv.org/abs/'+data['arxiv_eprints'][0]['value']})
mini_dict.update({'creation_date': data['legacy_creation_date']})
if 'publication_info' in data:
mini_dict.update({'journal_title':data['publication_info'][0]['journal_title']})
mini_dict.update({'journal_volume':data['publication_info'][0]['journal_volume']})
mini_dict.update({'page_start':data['publication_info'][0]['page_start']})
mini_dict.update({'journal_year':data['publication_info'][0]['year']})
if 'dois' in data:
mini_dict.update({'doi': data['dois'][0]['value']})
return mini_dict
def summarize_records(recids):
return {'publications':[summarize_record(recid) for recid in recids]}
example summarizing 2 individual records¶
summarize_record(recid_published)
Hacking in the sun in the on the @CERN patio while jet lagged.
— Kyle Cranmer (@KyleCranmer) April 14, 2019
- Input: @inspirehep record IDs
- Output: yaml for @iris_hep webpage
- Bonus: try it on Binder thanks to @mybinderteam
- code: https://t.co/9FKalW43oM pic.twitter.com/AxatW6YiJP
Comments
comments powered by Disqus