Tips

Migration Guide from 0.x to 1.x

The upgrade from scopus 0.x to 1.x saw many changes in scopus’ internal architecture, but also in four classes (see change log): ScopusAbstract(), ScopusAffiliation(), ScopusAuthor() and ScopusSearch().

To avoid too many issues resulting from missing backward-compatibility, new classes were introduced to gradually replace other ones: AbstractRetrieval() (replacing ScopusAbstract()), AuthorRetrieval() (replacing ScopusAuthor()) and ContentAffiliationRetrieval() (replacing ScopusAffiliation()). The corresponding old classes will stay until scopus 2.x but their maintenance has been suspended. Cached files that were downloaded with the old classes are not usable by the new classes.

ScopusSearch() had to be revamped completely; code that uses ScopusSearch() has to be updated, but not significantly.

Guiding principles

The change to scopus 1.x was guided by five principles: 1. Use json rather than xml for the cached files to reduce overhead and lower maintenance efforts 2. Align class names, script names, attribution names and names of folders with the names the Scopus API uses 3. Use properties to return a high share of information provided by Scopus, and get functions to increase user experience 4. Allow users to set and change configuration via a configuration file 5. Return namedtuples when Scopus provides combined information to increase interoperability with other python modules

How to update code

Class AbstractRetrieval() replaces ScopusAbstract(). This class has seen the most changes. The following attributes have been renamed but their return value stays the same (so that simply renaming it will suffice): citationLanguage becomes language, citationType becomes srctype, citingby_url becomes citingby_link, scopus_url becomes scopus_link. There are some attributes which are now properties: bibtex becomes get_bibtex(), html becomes get_html(), ris becomes get_ris() and latex becomes get_latex(). Properties affiliations (new: affiliation), subjectAreas (new: subject_areas), authkeywords and authors are entirely different now: They return namedtuples. Please see the examples for how to use them. Property nauthors has been removed; use len(AbstractRetrieval(<eid>).authors instead. Finally, method get_corresponding_author_info() has been removed, as Scopus does not prodive this information any more.

Class AuthorRetrieval() replaces ScopusAuthor(). The following properties have been renamed but their value stays the same: author_id becomes identifier, coauthor_url becomes coauthor_link, firstname becomes given_name, hindex becomes h_index, lastname becomes surname, name becomes indexed_name, ncited_by becomes cited_by_count, ncoauthors becomes coauthor_count, ndocuments becomes document_count. Property current_affiliation has been renamed to affiliation_current but the return value is now the Scopus ID of the affiliation. Property publication_history has been renamed to journal_history and returns a list of namedtuples rather than a a list of tuples. Property affiliation_history now returns a list of Scopus IDs instead of a list of ScopusAffiliation() objects. Property subject_areas now returns a list of namedtuples instead of a list of tuples.

Class ContentAffiliationRetrieval() replaces ScopusAffiliation. It will suffice to replace the class name in your scripts and rename the following attributes: nauthors becomes author_count, ndocuments becomes document_count, name becomes affiliation_name, org_url becomes org_URL, api_url becomes self_link, scopus_id becomes identifier.

Class ScopusSearch() remains but was revamped. The search results are now cached under a hex-ed filename to allow for complex queries. Files are now saved in a different folder (by default). results is now the main property, returning a list of namedtuples containing all useful information regarding the search results. For convenience, get_eids() returns just the list of EIDs of the articles, and property EIDS, which will be removed in a future release, returns just this list.

Updates

Scopus is a living database with changes happning constantly. These are not just additions of new items (Articles, Books, …) as they are published or updated citation counts, but also backfills of existing sources and corrections. Corrections include changes of titles, names or abstracts, mergers of duplicate authors, affiliations or even research items. Mergers affect multiple entities: For example mergers of authors affect both the authors and the articles of the duplicate.

For these reasons update your cached files regularly. Implement cross-checks to verify that an abstract is also listed as publication in the author profile.

Corrections in the Scopus database can be reported here.

Error messages

Since scopus 0.2.0, an exception is raised when the download status is not ok. This is to prevent faulty information (i.e. the error status and message) being saved as cached file.

The Scopus API returns a number of errors, upon which the current scopus run interrupts and prints the error to the screen.

Here are common exception classes, status lines and possible causes:

requests.exceptions.TooManyRedirects: Exceeded 30 redirects.
The entity you are looking for was not properly merged with another one entity in the sense that it is not forwarding. Happens rarely when Scopus Author profiles are merged. May also occur less often with Abstract EIDs and Affiliation IDs.
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url
Usually an invalid search query, such as a missing parenthesis. Verify that your query works in Advanced Search.
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url
Either the provided key is not correct, in which case you should change it in ~/.scopus/my_scopus.py, or you are outside the network that provides you access to the Scopus database (e.g. your university network). Remember that you need both to access Scopus.
requests.exceptions.HTTPError: 404 Client Error: Not Found for url
The entity you are looking for does not exist. Check that your identifier is still pointing to the item you are looking for.
requests.exceptions.HTTPError: 421 Quota Exceeded
Your provided API key’s weekly allowance of 5000 requests (for standard views) is depleted. Wait a week or change the key in ~/.scopus/my_scopus.py
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url
Formally, the server does not respond, for various reasons. A common reason in searches is that you use a fieldname that does not exist. Verify that your query works in Advanced Search.

Affiliations

Scopus knows two types of affiliations: Org profiles and Non-Org profiles.

Org profiles are those entities, that perform or sponsor research, such as a university, research institute, or government organization, which leads to the origination of documents by its members. Affiliations that are org profiles (OrgID) according to Scopus start with a 6 (6XXXXXXX). Scopus strives to have precise information about the institution, such as type and address.

Non-Org profiles correspond to automatically clustered profiles. In theory, Non-Org profiles should correspond to research networks and virtual institutes, as they neither have a type nor an address. Affiliations that are Non-Org profiles start with a 1 (1XXXXXXXX). Often these are are duplicates of Org profiles, which should be requested to be merged here.