CitationXpert

citationxpert-logo

CitationXpert - Data Science for Citation Analysis

CitationXpert is an open-source data science tool for analyzing publication citations from Google Scholar.

Download Issues Track Code @ GitHub

 

Releases


CitationXpert 1.0.0 is available (Aug 2016). You can download it by clicking here.
The tool can also be fetched and tested from its GitHub repository. Please, see the instructions below on how to use CitationXpert.

 

Documentation


CitationXpert is constantly evolving as well as its documentation. There is no documentation of past releases since CitationXpert development is incremental. Nevertheless, the documentation highlights from which version a feature is available.

 

 

CitationXpert is fully developed in Python.

1. Download the latest stable version (citationxpert-1.0.0.zip) and uncompress it:

$ unzip citationxpert-1.0.0.zip

2. or download from the source code (non-stable development version):

$ git clone https://github.com/bibxpert/citationxpert.git

3. The tool will be ready to use once the release or the source code is downloaded from GitHub, and the dependencies are installed.

CitationXpert does not require any additional Python module to perform the analyses. However, if you intend to generate the plots, you will need to install the Python modules listed below.
$ pip install pygal
$ pip install pygal_maps_world

4. Once all dependencies are installed (if needed), go into the folder and execute the tool:

$ cd citationxpert
$ ./citationxpert --help
CitationXpert uses the scholar.py python module to process Google Scholar requests. No additional installation or configuration is required for this package.

 

CitationXpert provides several automated features for performing data analysis on publications. Before performing any analysis, it is necessary that a list of citations for a publication should be obtained from Google Scholar. CitationXpert features include:

 


1. Data Acquisition

CitationXpert perform analyses over a list of citations for a publication. The -c or --citations options perform a request to Google Scholar to obtain all citations for a specific publication. This option should be used as follows:

$ ./citationxpert -c "Online Task Resource Consumption Prediction for Scientific Workflows"
$ ./citationxpert --citations="Online Task Resource Consumption Prediction for Scientific Workflows"

The output data can be redirected to a file by using the -o or --output options:

$ ./citationxpert -c "Online Task Resource Consumption Prediction for Scientific Workflows" -o citations.out
$ ./citationxpert --citations="Online Task Resource Consumption Prediction for Scientific Workflows" --output=citations.out

In this case, the citations downloaded from Google Scholar will be written to the citations.out file.

Most of the CitationXpert features require an input file, which can be a citations file obtained from -c or --citations options, or an output file of another analysis (e.g., authors).


2. Self/External Analysis

CitationXpert provides an analysis of self-reference and external references to a publication. A citation is considered self-referenced if at least one of the authors also co-authored the main publication. Otherwise, it is considered an external reference. This option should be used as follows:

$ ./citationxpert -i citations.out -o citations-self.out -s
$ ./citationxpert --input=citations.out --output=citations-self.out --self

Note that the output data is redirected to a file by using the -o or --output options.

This option requires a citations file obtained from -c or --citations options as input file.

The command above will perform the analysis and write the outcomes to the output channel (file or stdout). In order to generate the graphs, the -p or --plot options should be used as follows:

$ ./citationxpert -i citations.out -o citations-self.out -p -s
$ ./citationxpert --input=citations.out --output=citations-self.out --plot --self

The above command will generate 3 different graphs:

citations-self-overall.svg
citations-self-external-year.svg
citations-self-self-year.svg

The first file is the Total Number of Self and External References, the second the Distribution of External References per Year, and the third the Distribution of Self-References per Year.

Note that in order to generate graphs you must have the Python pygal library installed. Please, refer to the Installation and Configuration tab for the steps to install the library.


3. Citation h-index Analysis

Documentation soon.


4. Authors Analysis and Map

Documentation soon.

 

 

CitationXpert provides three logging modes: default, verbose, and debug.

The default mode prompts only standard output, warning, and error messages. The verbose mode shows progress messages of the tool. This mode can be enabled by using the -v or --verbose options:

$ ./citationxpert -v -c "Toward Fine-Grained Online Task Characteristics Estimation in Scientific Workflows" -o citations.out
$ ./citationxpert --verbose --citation="Toward Fine-Grained Online Task Characteristics Estimation in Scientific Workflows" -o citations.out

The debug mode shows detailed information of the tool progress. This mode should be used sparingly, since significant amount of information may be printed out. Debugging can be enabled by using the -d or --debug options:

$ ./citationxpert -d -c "Toward Fine-Grained Online Task Characteristics Estimation in Scientific Workflows" -o citations.out
$ ./citationxpert --debug --citation="Toward Fine-Grained Online Task Characteristics Estimation in Scientific Workflows" -o citations.out

 

 

CitationXpert is used by the Pegasus Workflow Management System team to conduct Data Science analysis on the publications citing or using the Pegasus software. The analysis aggregates information from all Pegasus publications (for the software), and the Pegasus project website.

View the Pegasus Research Impact project analysis