For the IRIS-HEP organization we need to collect publications and update our webpage regularly. I also have to do this for our group webpage, my CV, etc. It's annoying to copy/paste all of this information. The http://inspirehep.net website lets you export bibtex, latex, and plain text for individual papers or a set of papers that match a search result, but it's still not very convenient for web stuff where Markdown is common. The IRIS-HEP webpage is also based on jekyll, and can parse yaml files for making publication pages. So I wanted a tool that could ingest a bunch of paper identifiers and output yaml.

The new INSPIRE beta has a more modern API, so I wanted to try that out. Here's a repo with what I came up with while at CERN

recid_unpublished = 1726790 #notpublished
recid_published = 1705857 #published
list_of_recids = [recid_published, recid_unpublished]
yaml.dump(summarize_records(list_of_recids),default_flow_style=False)

which yields

publications:
- arxiv_eprint: '1811.12113'
  authors: Aaboud, Morad; Aad, Georges; Abbott, Brad; Abdinov, Ovsat; Abeloos, Baptiste;
    et. al.
  collaboration: ATLAS
  creation_date: '2018-11-30'
  doi: 10.1007/JHEP04(2019)046
  journal_title: JHEP
  journal_volume: '04'
  journal_year: 2019
  page_start: '046'
  recid: 1705857
  title: Measurements of fiducial and differential cross-sections of $t\bar{t}$ production
    with additional heavy-flavour jets in proton-proton collisions at $\sqrt{s}$ =
    13 TeV with the ATLAS detector
  url: https://arxiv.org/abs/1811.12113
- arxiv_eprint: '1903.10563'
  authors: Carleo, Giuseppe; Cirac, Ignacio; Cranmer, Kyle; Daudet, Laurent; Schuld,
    Maria; et. al.
  creation_date: '2019-03-27'
  recid: 1726790
  title: Machine learning and the physical sciences
  url: https://arxiv.org/abs/1903.10563

Here's an except from the notebook

Convert inspire ID's into short python dictionaries for Website¶

by Kyle Cranmer April 14, 2019

In [1]:

import requests
import json

In [2]:

#if you are running on Binder, you will need to uncomment the next line and execute it
#!pip install pyyaml

In [3]:

import yaml

In [4]:

recid_unpublished = 1726790 #notpublished
recid_published = 1705857 #published
recid = recid_unpublished
url = 'https://labs.inspirehep.net/api/literature/'+str(recid)

In [5]:

def summarize_record(recid):
    url = 'https://labs.inspirehep.net/api/literature/'+str(recid)
    max_authors = 5
    r = requests.get(url)
    data = r.json()['metadata']
    mini_dict = {'recid':recid}
    mini_dict.update({'title':data['titles'][0]['title']})
    if len(data['authors'])>max_authors:
        #mini_dict.update({'authors':[a['full_name'] for a in data['authors'][:max_authors]]+['et. al.']})
        mini_dict.update({'authors':"; ".join([a['full_name'] for a in data['authors'][:max_authors]]+['et. al.'])})
    else:
        mini_dict.update({'authors':[a['full_name'] for a in data['authors']]})

    if 'collaborations' in data:
        mini_dict.update({'collaboration': data['collaborations'][0]['value']})

    mini_dict.update({'arxiv_eprint': data['arxiv_eprints'][0]['value']})
    mini_dict.update({'url': 'https://arxiv.org/abs/'+data['arxiv_eprints'][0]['value']})
    mini_dict.update({'creation_date': data['legacy_creation_date']})

    if 'publication_info' in data:
        mini_dict.update({'journal_title':data['publication_info'][0]['journal_title']})
        mini_dict.update({'journal_volume':data['publication_info'][0]['journal_volume']})
        mini_dict.update({'page_start':data['publication_info'][0]['page_start']})
        mini_dict.update({'journal_year':data['publication_info'][0]['year']})
    
    if 'dois' in data:
        mini_dict.update({'doi': data['dois'][0]['value']})
    return mini_dict

In [6]:

def summarize_records(recids):
    return {'publications':[summarize_record(recid) for recid in recids]}

example summarizing 2 individual records¶

In [7]:

summarize_record(recid_published)

Out[7]:

{'recid': 1705857,
 'title': 'Measurements of fiducial and differential cross-sections of $t\\bar{t}$ production with additional heavy-flavour jets in proton-proton collisions at $\\sqrt{s}$ = 13 TeV with the ATLAS detector',
 'authors': 'Aaboud, Morad; Aad, Georges; Abbott, Brad; Abdinov, Ovsat; Abeloos, Baptiste; et. al.',
 'collaboration': 'ATLAS',
 'arxiv_eprint': '1811.12113',
 'url': 'https://arxiv.org/abs/1811.12113',
 'creation_date': '2018-11-30',
 'journal_title': 'JHEP',
 'journal_volume': '04',
 'page_start': '046',
 'journal_year': 2019,
 'doi': '10.1007/JHEP04(2019)046'}

Hacking in the sun in the on the @CERN patio while jet lagged.
- Input: @inspirehep record IDs
- Output: yaml for @iris_hep webpage
- Bonus: try it on Binder thanks to @mybinderteam
- code: https://t.co/9FKalW43oM pic.twitter.com/AxatW6YiJP
— Kyle Cranmer (@KyleCranmer) April 14, 2019

INSPIRE API

Convert inspire ID's into short python dictionaries for Website¶

example summarizing 2 individual records¶

Comments