Sparser (`indra.sources.sparser`)¶

Sparser API (`indra.sources.sparser.api`)¶

Provides an API used to run and get Statements from the Sparser reading system.

indra.sources.sparser.api.process_text(text, output_fmt='json', outbuf=None, cleanup=True, key='', **kwargs)[source]¶

Return processor with Statements extracted by reading text with Sparser.

Parameters:

text (str) – The text to be processed
output_fmt (Optional[str]) – The output format to obtain from Sparser, with the two options being ‘json’ and ‘xml’. Default: ‘json’
outbuf (Optional[file]) – A file like object that the Sparser output is written to.
cleanup (Optional[bool]) – If True, the temporary file created, which is used as an input file for Sparser, as well as the output file created by Sparser are removed. Default: True
key (Optional[str]) – A key which is embedded into the name of the temporary file passed to Sparser for reading. Default is empty string.

Returns:

indra.sources.sparser.api.process_nxml_str(nxml_str, output_fmt='json', outbuf=None, cleanup=True, key='', **kwargs)[source]¶

Return processor with Statements extracted by reading an NXML string.

Parameters:

nxml_str (str) – The string value of the NXML-formatted paper to be read.
output_fmt (Optional[str]) – The output format to obtain from Sparser, with the two options being ‘json’ and ‘xml’. Default: ‘json’
outbuf (Optional[file]) – A file like object that the Sparser output is written to.
cleanup (Optional[bool]) – If True, the temporary file created in this function, which is used as an input file for Sparser, as well as the output file created by Sparser are removed. Default: True
key (Optional[str]) – A key which is embedded into the name of the temporary file passed to Sparser for reading. Default is empty string.

Returns:

indra.sources.sparser.api.process_nxml_file(fname, output_fmt='json', outbuf=None, cleanup=True, **kwargs)[source]¶

Return processor with Statements extracted by reading an NXML file.

Parameters:

fname (str) – The path to the NXML file to be read.
output_fmt (Optional[str]) – The output format to obtain from Sparser, with the two options being ‘json’ and ‘xml’. Default: ‘json’
outbuf (Optional[file]) – A file like object that the Sparser output is written to.
cleanup (Optional[bool]) – If True, the output file created by Sparser is removed. Default: True

Returns:

indra.sources.sparser.api.process_sparser_output(output_fname, output_fmt='json')[source]¶

Return a processor with Statements extracted from Sparser XML or JSON

Parameters:

output_fname (str) – The path to the Sparser output file to be processed. The file can either be JSON or XML output from Sparser, with the output_fmt parameter defining what format is assumed to be processed.
output_fmt (Optional[str]) – The format of the Sparser output to be processed, can either be ‘json’ or ‘xml’. Default: ‘json’

Returns:

indra.sources.sparser.api.process_json_dict(json_dict)[source]¶

Return processor with Statements extracted from a Sparser JSON.

Parameters:	json_dict (dict) – The JSON object obtained by reading content with Sparser, using the ‘json’ output mode.
Returns:	sp – A SparserJSONProcessor which has extracted Statements as its statements attribute.
Return type:	SparserJSONProcessor

indra.sources.sparser.api.process_xml(xml_str)[source]¶

Return processor with Statements extracted from a Sparser XML.

Parameters:	xml_str (str) – The XML string obtained by reading content with Sparser, using the ‘xml’ output mode.
Returns:	sp – A SparserXMLProcessor which has extracted Statements as its statements attribute.
Return type:	SparserXMLProcessor

indra.sources.sparser.api.run_sparser(fname, output_fmt, outbuf=None, timeout=600)[source]¶

Return the path to reading output after running Sparser reading.

Parameters:	fname (str) – The path to an input file to be processed. Due to the Spaser executable’s assumptions, the file name needs to start with PMC and should be an NXML formatted file. output_fmt (Optional[str]) – The format in which Sparser should produce its output, can either be ‘json’ or ‘xml’. outbuf (Optional[file]) – A file like object that the Sparser output is written to. timeout (int) – The number of seconds to wait until giving up on this one reading. The default is 600 seconds (i.e. 10 minutes). Sparcer is a fast reader and the typical type to read a single full text is a matter of seconds.
Returns:	output_path – The path to the output file created by Sparser.
Return type:	str

indra.sources.sparser.api.get_version()[source]¶

Return the version of the Sparser executable on the path.

Returns:	version – The version of Sparser that is found on the Sparser path.
Return type:	str

indra.sources.sparser.api.make_nxml_from_text(text)[source]¶

Return raw text wrapped in NXML structure.

Parameters:	text (str) – The raw text content to be wrapped in an NXML structure.
Returns:	nxml_str – The NXML string wrapping the raw text input.
Return type:	str