libbmc package

Submodules

libbmc.bibtex module

This file contains functions to deal with Bibtex files and edit them.

TODO: Unittests

libbmc.bibtex.append(filename, data)[source]

Append some entries to a bibtex file.

Parameters:
  • filename – The name of the BibTeX file to edit.
  • data – A bibtexparser.BibDatabase object.
libbmc.bibtex.bibdatabase2bibtex(data)[source]

Convert a BibDatabase object to a BibTeX string.

Parameters:data – A bibtexparser.BibDatabase object.
Returns:A formatted BibTeX string.
libbmc.bibtex.delete(filename, identifier)[source]

Delete an entry in a BibTeX file.

Parameters:
  • filename – The name of the BibTeX file to edit.
  • identifier – The id of the entry to delete, in the BibTeX file.
libbmc.bibtex.dict2bibtex(data)[source]

Convert a single BibTeX entry dict to a BibTeX string.

Parameters:data – A dict representing BibTeX entry, as the ones from bibtexparser.BibDatabase.entries output.
Returns:A formatted BibTeX string.
libbmc.bibtex.edit(filename, identifier, data)[source]

Update an entry in a BibTeX file.

Parameters:
  • filename – The name of the BibTeX file to edit.
  • identifier – The id of the entry to update, in the BibTeX file.
  • data – A dict associating fields and updated values. Fields present in the BibTeX file but not in this dict will be kept as is.
libbmc.bibtex.get(filename, ignore_fields=None)[source]

Get all entries from a BibTeX file.

Parameters:
  • filename – The name of the BibTeX file.
  • ignore_fields – An optional list of fields to strip from the BibTeX file.
Returns:

A bibtexparser.BibDatabase object representing the fetched entries.

libbmc.bibtex.get_entry(filename, identifier, ignore_fields=None)[source]

Get an entry from a BibTeX file.

Parameters:
  • filename – The name of the BibTeX file.
  • identifier – An id of the entry to fetch, in the BibTeX file.
  • ignore_fields – An optional list of fields to strip from the BibTeX file.
Returns:

A bibtexparser.BibDatabase object representing the fetched entry. None if entry was not found.

libbmc.bibtex.get_entry_by_filter(filename, filter_function, ignore_fields=None)[source]

Get an entry from a BibTeX file.

Note

Returns the first matching entry.

Parameters:
  • filename – The name of the BibTeX file.
  • filter_function – A function returning True or False whether the entry should be included or not.
  • ignore_fields – An optional list of fields to strip from the BibTeX file.
Returns:

A bibtexparser.BibDatabase object representing the first matching entry. None if entry was not found.

libbmc.bibtex.replace(filename, identifier, data)[source]

Replace an entry in a BibTeX file.

Parameters:
  • filename – The name of the BibTeX file to edit.
  • identifier – The id of the entry to replace, in the BibTeX file.
  • data – A bibtexparser.BibDatabase object containing a single entry.
libbmc.bibtex.to_filename(data, mask='{first}_{last}-{journal}-{year}{arxiv_version}', extra_formatters=None)[source]

Convert a bibtex entry to a formatted filename according to a given mask.

Note

Available formatters out of the box are:
  • journal
  • title
  • year
  • first for the first author
  • last for the last author
  • authors for the list of authors
  • arxiv_version (discarded if no arXiv version in the BibTeX)

Filename is slugified after applying the masks.

Parameters:
  • data – A bibtexparser.BibDatabase object representing a BibTeX entry, as the one from bibtexparser output.
  • mask – A Python format string.
  • extra_formatters – A dict of format string (in the mask) and associated lambdas to perform the formatting.
Returns:

A formatted filename.

libbmc.bibtex.write(filename, data)[source]

Create a new BibTeX file.

Parameters:
  • filename – The name of the BibTeX file to write.
  • data – A bibtexparser.BibDatabase object.

libbmc.doi module

This file contains all the DOI-related functions.

libbmc.doi.extract_from_text(text)[source]

Extract canonical DOIs from a text.

Parameters:text – The text to extract DOIs from.
Returns:A list of found DOIs.
>>> sorted(extract_from_text('10.1209/0295-5075/111/40005 10.1016.12.31/nature.S0735-1097(98)2000/12/31/34:7-7 10.1002/(SICI)1522-2594(199911)42:5<952::AID-MRM16>3.0.CO;2-S 10.1007/978-3-642-28108-2_19 10.1007.10/978-3-642-28108-2_19 10.1016/S0735-1097(98)00347-7 10.1579/0044-7447(2006)35\[89:RDUICP\]2.0.CO;2 <geo coords="10.4515260,51.1656910"></geo>'))
['10.1002/(SICI)1522-2594(199911)42:5<952::AID-MRM16>3.0.CO;2-S', '10.1007.10/978-3-642-28108-2_19', '10.1007/978-3-642-28108-2_19', '10.1016.12.31/nature.S0735-1097(98)2000/12/31/34:7-7', '10.1016/S0735-1097(98)00347-7', '10.1209/0295-5075/111/40005', '10.1579/0044-7447(2006)35\\[89:RDUICP\\]2.0.CO;2']
libbmc.doi.get_bibtex(doi)[source]

Get a BibTeX entry for a given DOI.

Parameters:doi – The canonical DOI to get BibTeX from.
Returns:A BibTeX string or None.
>>> get_bibtex('10.1209/0295-5075/111/40005')
'@article{Verney_2015,\n\tdoi = {10.1209/0295-5075/111/40005},\n\turl = {http://dx.doi.org/10.1209/0295-5075/111/40005},\n\tyear = 2015,\n\tmonth = {aug},\n\tpublisher = {{IOP} Publishing},\n\tvolume = {111},\n\tnumber = {4},\n\tpages = {40005},\n\tauthor = {Lucas Verney and Lev Pitaevskii and Sandro Stringari},\n\ttitle = {Hybridization of first and second sound in a weakly interacting Bose gas},\n\tjournal = {{EPL}}\n}'
libbmc.doi.get_linked_version(doi)[source]

Get the original link behind the DOI.

Parameters:doi – A canonical DOI.
Returns:The canonical URL behind the DOI, or None.
>>> get_linked_version('10.1209/0295-5075/111/40005')
'http://stacks.iop.org/0295-5075/111/i=4/a=40005?key=crossref.9ad851948a976ecdf216d4929b0b6f01'
libbmc.doi.get_oa_policy(doi)[source]

Get OA policy for a given DOI.

Note

Uses beta.dissem.in API.

Parameters:doi – A canonical DOI.
Returns:The OpenAccess policy for the associated publications, or None if unknown.
>>> tmp = get_oa_policy('10.1209/0295-5075/111/40005'); (tmp["published"], tmp["preprint"], tmp["postprint"], tmp["romeo_id"])
('can', 'can', 'can', '1896')
>>> get_oa_policy('10.1215/9780822387268') is None
True
libbmc.doi.get_oa_version(doi)[source]

Get an OA version for a given DOI.

Note

Uses beta.dissem.in API.

Parameters:doi – A canonical DOI.
Returns:The URL of the OA version of the given DOI, or None.
>>> get_oa_version('10.1209/0295-5075/111/40005')
'http://arxiv.org/abs/1506.06690'
libbmc.doi.is_valid(doi)[source]

Check that a given DOI is a valid canonical DOI.

Parameters:doi – The DOI to be checked.
Returns:Boolean indicating whether the DOI is valid or not.
>>> is_valid('10.1209/0295-5075/111/40005')
True
>>> is_valid('10.1016.12.31/nature.S0735-1097(98)2000/12/31/34:7-7')
True
>>> is_valid('10.1002/(SICI)1522-2594(199911)42:5<952::AID-MRM16>3.0.CO;2-S')
True
>>> is_valid('10.1007/978-3-642-28108-2_19')
True
>>> is_valid('10.1007.10/978-3-642-28108-2_19')
True
>>> is_valid('10.1016/S0735-1097(98)00347-7')
True
>>> is_valid('10.1579/0044-7447(2006)35\[89:RDUICP\]2.0.CO;2')
True
>>> is_valid('<geo coords="10.4515260,51.1656910"></geo>')
False
libbmc.doi.to_canonical(urls)[source]

Convert a list of DOIs URLs to a list of canonical DOIs.

Parameters:dois – A list of DOIs URLs. Can also be a single DOI URL.
Returns:List of canonical DOIs (resp. a single value). None if an error occurred.
>>> to_canonical(['http://dx.doi.org/10.1209/0295-5075/111/40005'])
['10.1209/0295-5075/111/40005']
>>> to_canonical('http://dx.doi.org/10.1209/0295-5075/111/40005')
'10.1209/0295-5075/111/40005'
>>> to_canonical('aaaa') is None
True
>>> to_canonical(['aaaa']) is None
True
libbmc.doi.to_url(dois)[source]

Convert a list of canonical DOIs to a list of DOIs URLs.

Parameters:dois – List of canonical DOIs. Can also be a single canonical DOI.
Returns:A list of DOIs URLs (resp. a single value).
>>> to_url(['10.1209/0295-5075/111/40005'])
['http://dx.doi.org/10.1209/0295-5075/111/40005']
>>> to_url('10.1209/0295-5075/111/40005')
'http://dx.doi.org/10.1209/0295-5075/111/40005'

libbmc.fetcher module

This file contains functions to download locally some papers, eventually using a proxy.

libbmc.fetcher.download(url, proxies=None)[source]

Download a PDF or DJVU document from a url, eventually using proxies.

Params url:The URL to the PDF/DJVU document to fetch.
Params proxies:An optional list of proxies to use. Proxies will be used sequentially. Proxies should be a list of proxy strings. Do not forget to include "" (empty string) in the list if you want to try direct fetching without any proxy.
Returns:A tuple of the raw content of the downloaded data and its associated content-type. Returns (None, None) if it was unable to download the document.
>>> download("http://arxiv.org/pdf/1312.4006.pdf") 

libbmc.isbn module

This file contains all the ISBN-related functions.

libbmc.isbn.extract_from_text(text)[source]

Extract ISBNs from a text.

Parameters:text – Some text.
Returns:A list of canonical ISBNs found in the text.
>>> extract_from_text("978-3-16-148410-0 9783161484100 9783161484100aa abcd 0136091814 0136091812 9780136091817 123456789X")
['9783161484100', '9783161484100', '9783161484100', '0136091814', '123456789X']
libbmc.isbn.from_doi(doi_identifier)[source]

Make an ISBN out of the given DOI.

Note

See https://github.com/xlcnd/isbnlib#note. The returned ISBN may not be issued yet (it is a valid one, but not necessary corresponding to a valid book).

Parameters:doi_identifier – A valid canonical DOI.
Returns:An ISBN string.
>>> from_doi('10.978.316/1484100')
'9783161484100'
libbmc.isbn.get_bibtex(isbn_identifier)[source]

Get a BibTeX string for the given ISBN.

Parameters:isbn_identifier – ISBN to fetch BibTeX entry for.
Returns:A BibTeX string or None if could not fetch it.
>>> get_bibtex('9783161484100')
'@book{9783161484100,\n     title = {Berkeley, Oakland: Albany, Emeryville, Alameda, Kensington},\n    author = {Peekaboo Maps},\n      isbn = {9783161484100},\n      year = {2009},\n publisher = {Peek A Boo Maps}\n}'
libbmc.isbn.is_valid(isbn_id)[source]

Check that a given string is a valid ISBN.

Parameters:isbn_id – the isbn to be checked.
Returns:boolean indicating whether the isbn is valid or not.
>>> is_valid("978-3-16-148410-0")
True
>>> is_valid("9783161484100")
True
>>> is_valid("9783161484100aa")
False
>>> is_valid("abcd")
False
>>> is_valid("0136091814")
True
>>> is_valid("0136091812")
False
>>> is_valid("9780136091817")
False
>>> is_valid("123456789X")
True
libbmc.isbn.to_doi(isbn_identifier)[source]

Make a DOI out of the given ISBN.

Note

See https://github.com/xlcnd/isbnlib#note. The returned DOI may not be issued yet.

Parameters:isbn_identifier – A valid ISBN string.
Returns:A DOI as string.
>>> to_doi('9783161484100')
'10.978.316/1484100'

libbmc.tools module

This file contains various utility functions.

libbmc.tools.batch(iterable, size)[source]

Get items from a sequence a batch at a time.

Params iterable:
 An iterable to get batches from.
Params size:Size of the batches.
Returns:A new batch of the given size at each time.
>>> [list(i) for i in batch([1, 2, 3, 4, 5], 2)]
[[1, 2], [3, 4], [5]]
libbmc.tools.clean_whitespaces(text)[source]

Remove multiple whitespaces from text. Also removes leading and trailing whitespaces.

Parameters:text – Text to remove multiple whitespaces from.
Returns:A cleaned text.
>>> clean_whitespaces("this  is    a text with    spaces")
'this is a text with spaces'
libbmc.tools.map_or_apply(function, param)[source]

Map the function on param, or apply it, depending whether param is a list or an item.

Parameters:
  • function – The function to apply.
  • param – The parameter to feed the function with (list or item).
Returns:

The computed value or None.

libbmc.tools.remove_duplicates(some_list)[source]

Remove the duplicates from a list.

Parameters:some_list – List to remove duplicates from.
Returns:A list without duplicates.
>>> remove_duplicates([1, 2, 3, 1])
[1, 2, 3]
>>> remove_duplicates([1, 2, 1, 2])
[1, 2]
libbmc.tools.remove_urls(text)[source]

Remove URLs from a given text (only removes http, https and naked domains URLs).

Parameters:text – The text to remove URLs from.
Returns:The text without URLs.
>>> remove_urls("foobar http://example.com https://example.com foobar")
'foobar foobar'
libbmc.tools.replace_all(text, replace_dict)[source]

Replace multiple strings in a text.

Note

Replacements are made successively, without any warranty on the order in which they are made.

Parameters:
  • text – Text to replace in.
  • replace_dict – Dictionary mapping strings to replace with their substitution.
Returns:

Text after replacements.

>>> replace_all("foo bar foo thing", {"foo": "oof", "bar": "rab"})
'oof rab oof thing'
libbmc.tools.slugify(value)[source]

Normalizes string, converts to lowercase, removes non-alpha characters, and converts spaces to hyphens to have nice filenames.

From Django’s “django/template/defaultfilters.py”.

>>> slugify("El pingüino Wenceslao hizo kilómetros bajo exhaustiva lluvia y frío, añoraba a su querido cachorro. ortez ce vieux whisky au juge blond qui fume sur son île intérieure, à Γαζέες καὶ μυρτιὲς δὲν θὰ βρῶ πιὰ στὸ χρυσαφὶ ξέφωτο いろはにほへとちりぬるを Pchnąć w tę łódź jeża lub ośm skrzyń fig กว่าบรรดาฝูงสัตว์เดรัจฉาน")
'El_pinguino_Wenceslao_hizo_kilometros_bajo_exhaustiva_lluvia_y_frio_anoraba_a_su_querido_cachorro_ortez_ce_vieux_whisky_au_juge_blond_qui_fume_sur_son_ile_interieure_a_Pchnac_w_te_odz_jeza_lub_osm_skrzyn_fig'

Module contents

libbmc

The libbmc is a generic Python library to manage bibliography and play with scientific papers.