Sparser (indra.sources.sparser)¶
Sparser API (indra.sources.sparser.api)¶
Provides an API used to run and get Statements from the Sparser reading system.
-
indra.sources.sparser.api.process_text(text, output_fmt='json', outbuf=None, cleanup=True, key='', **kwargs)[source]¶ Return processor with Statements extracted by reading text with Sparser.
Parameters: - text (str) – The text to be processed
- output_fmt (Optional[str]) – The output format to obtain from Sparser, with the two options being ‘json’ and ‘xml’. Default: ‘json’
- outbuf (Optional[file]) – A file like object that the Sparser output is written to.
- cleanup (Optional[bool]) – If True, the temporary file created, which is used as an input file for Sparser, as well as the output file created by Sparser are removed. Default: True
- key (Optional[str]) – A key which is embedded into the name of the temporary file passed to Sparser for reading. Default is empty string.
Returns: - SparserXMLProcessor or SparserJSONProcessor depending on what output
- format was chosen.
-
indra.sources.sparser.api.process_nxml_str(nxml_str, output_fmt='json', outbuf=None, cleanup=True, key='', **kwargs)[source]¶ Return processor with Statements extracted by reading an NXML string.
Parameters: - nxml_str (str) – The string value of the NXML-formatted paper to be read.
- output_fmt (Optional[str]) – The output format to obtain from Sparser, with the two options being ‘json’ and ‘xml’. Default: ‘json’
- outbuf (Optional[file]) – A file like object that the Sparser output is written to.
- cleanup (Optional[bool]) – If True, the temporary file created in this function, which is used as an input file for Sparser, as well as the output file created by Sparser are removed. Default: True
- key (Optional[str]) – A key which is embedded into the name of the temporary file passed to Sparser for reading. Default is empty string.
Returns: - SparserXMLProcessor or SparserJSONProcessor depending on what output
- format was chosen.
-
indra.sources.sparser.api.process_nxml_file(fname, output_fmt='json', outbuf=None, cleanup=True, **kwargs)[source]¶ Return processor with Statements extracted by reading an NXML file.
Parameters: - fname (str) – The path to the NXML file to be read.
- output_fmt (Optional[str]) – The output format to obtain from Sparser, with the two options being ‘json’ and ‘xml’. Default: ‘json’
- outbuf (Optional[file]) – A file like object that the Sparser output is written to.
- cleanup (Optional[bool]) – If True, the output file created by Sparser is removed. Default: True
Returns: - sp (SparserXMLProcessor or SparserJSONProcessor depending on what output)
- format was chosen.
-
indra.sources.sparser.api.process_sparser_output(output_fname, output_fmt='json')[source]¶ Return a processor with Statements extracted from Sparser XML or JSON
Parameters: - output_fname (str) – The path to the Sparser output file to be processed. The file can either be JSON or XML output from Sparser, with the output_fmt parameter defining what format is assumed to be processed.
- output_fmt (Optional[str]) – The format of the Sparser output to be processed, can either be ‘json’ or ‘xml’. Default: ‘json’
Returns: - sp (SparserXMLProcessor or SparserJSONProcessor depending on what output)
- format was chosen.
-
indra.sources.sparser.api.process_json_dict(json_dict)[source]¶ Return processor with Statements extracted from a Sparser JSON.
Parameters: json_dict (dict) – The JSON object obtained by reading content with Sparser, using the ‘json’ output mode. Returns: sp – A SparserJSONProcessor which has extracted Statements as its statements attribute. Return type: SparserJSONProcessor
-
indra.sources.sparser.api.process_xml(xml_str)[source]¶ Return processor with Statements extracted from a Sparser XML.
Parameters: xml_str (str) – The XML string obtained by reading content with Sparser, using the ‘xml’ output mode. Returns: sp – A SparserXMLProcessor which has extracted Statements as its statements attribute. Return type: SparserXMLProcessor
-
indra.sources.sparser.api.run_sparser(fname, output_fmt, outbuf=None, timeout=600)[source]¶ Return the path to reading output after running Sparser reading.
Parameters: - fname (str) – The path to an input file to be processed. Due to the Spaser executable’s assumptions, the file name needs to start with PMC and should be an NXML formatted file.
- output_fmt (Optional[str]) – The format in which Sparser should produce its output, can either be ‘json’ or ‘xml’.
- outbuf (Optional[file]) – A file like object that the Sparser output is written to.
- timeout (int) – The number of seconds to wait until giving up on this one reading. The default is 600 seconds (i.e. 10 minutes). Sparcer is a fast reader and the typical type to read a single full text is a matter of seconds.
Returns: output_path – The path to the output file created by Sparser.
Return type: str