TRIPS (indra.sources.trips)¶
TRIPS API (indra.sources.trips.api)¶
-
indra.sources.trips.api.process_text(text, save_xml_name='trips_output.xml', save_xml_pretty=True, offline=False, service_endpoint='drum', service_host=None)[source]¶ Return a TripsProcessor by processing text.
Parameters: - text (str) – The text to be processed.
- save_xml_name (Optional[str]) – The name of the file to save the returned TRIPS extraction knowledge base XML. Default: trips_output.xml
- save_xml_pretty (Optional[bool]) – If True, the saved XML is pretty-printed. Some third-party tools require non-pretty-printed XMLs which can be obtained by setting this to False. Default: True
- offline (Optional[bool]) – If True, offline reading is used with a local instance of DRUM, if available. Default: False
- service_endpoint (Optional[str]) – Selects the TRIPS/DRUM web service endpoint to use. Is a choice between “drum” (default) and “drum-dev”, a nightly build.
- service_host (Optional[str]) – Address of a service host different from the public IHMC server (e.g., a locally running service).
Returns: tp – A TripsProcessor containing the extracted INDRA Statements in tp.statements.
Return type:
-
indra.sources.trips.api.process_xml(xml_string)[source]¶ Return a TripsProcessor by processing a TRIPS EKB XML string.
Parameters: xml_string (str) – A TRIPS extraction knowledge base (EKB) string to be processed. http://trips.ihmc.us/parser/api.html Returns: tp – A TripsProcessor containing the extracted INDRA Statements in tp.statements. Return type: TripsProcessor
-
indra.sources.trips.api.process_xml_file(file_name)[source]¶ Return a TripsProcessor by processing a TRIPS EKB XML file.
Parameters: file_name (str) – Path to a TRIPS extraction knowledge base (EKB) file to be processed. Returns: tp – A TripsProcessor containing the extracted INDRA Statements in tp.statements. Return type: TripsProcessor
TRIPS Processor (indra.sources.trips.processor)¶
-
class
indra.sources.trips.processor.TripsProcessor(xml_string)[source]¶ The TripsProcessor extracts INDRA Statements from a TRIPS XML.
For more details on the TRIPS EKB XML format, see http://trips.ihmc.us/parser/cgi/drum
Parameters: xml_string (str) – A TRIPS extraction knowledge base (EKB) in XML format as a string. -
tree¶ An ElementTree object representation of the TRIPS EKB XML.
Type: xml.etree.ElementTree.Element
-
statements¶ A list of INDRA Statements that were extracted from the EKB.
Type: list[indra.statements.Statement]
-
doc_id¶ The PubMed ID of the paper that the extractions are from.
Type: str
-
sentences¶ The list of all sentences in the EKB with their IDs
Type: dict[str: str]
-
paragraphs¶ The list of all paragraphs in the EKB with their IDs
Type: dict[str: str]
-
par_to_sec¶ A map from paragraph IDs to their associated section types
Type: dict[str: str]
-
extracted_events¶ A list of Event elements that have been extracted as INDRA Statements.
Type: list[xml.etree.ElementTree.Element]
-
get_agents()[source]¶ Return list of INDRA Agents corresponding to TERMs in the EKB.
This is meant to be used when entities e.g. “phosphorylated ERK”, rather than events need to be extracted from processed natural language. These entities with their respective states are represented as INDRA Agents.
Returns: agents – List of INDRA Agents extracted from EKB. Return type: list[indra.statements.Agent]
-
get_all_events()[source]¶ Make a list of all events in the TRIPS EKB.
The events are stored in self.all_events.
-
get_term_agents()[source]¶ Return dict of INDRA Agents keyed by corresponding TERMs in the EKB.
This is meant to be used when entities e.g. “phosphorylated ERK”, rather than events need to be extracted from processed natural language. These entities with their respective states are represented as INDRA Agents. Further, each key of the dictionary corresponds to the ID assigned by TRIPS to the given TERM that the Agent was extracted from.
Returns: agents – Dict of INDRA Agents extracted from EKB. Return type: dict[str, indra.statements.Agent]
-
TRIPS Web-service Client (indra.sources.trips.client)¶
-
indra.sources.trips.client.get_xml(html, content_tag='ekb', fail_if_empty=False)[source]¶ Extract the content XML from the HTML output of the TRIPS web service.
Parameters: - html (str) – The HTML output from the TRIPS web service.
- content_tag (str) – The xml tag used to label the content. Default is ‘ekb’.
- fail_if_empty (bool) – If True, and if the xml content found is an empty string, raise an exception. Default is False.
Returns: - The extraction knowledge base (e.g. EKB) XML that contains the event and
- term extractions.
-
indra.sources.trips.client.save_xml(xml_str, file_name, pretty=True)[source]¶ Save the TRIPS EKB XML in a file.
Parameters: - xml_str (str) – The TRIPS EKB XML string to be saved.
- file_name (str) – The name of the file to save the result in.
- pretty (Optional[bool]) – If True, the XML is pretty printed.
-
indra.sources.trips.client.send_query(text, service_endpoint='drum', query_args=None, service_host=None)[source]¶ Send a query to the TRIPS web service.
Parameters: - text (str) – The text to be processed.
- service_endpoint (Optional[str]) – Selects the TRIPS/DRUM web service endpoint to use. Is a choice between “drum” (default), “drum-dev”, a nightly build, and “cwms” for use with more general knowledge extraction.
- query_args (Optional[dict]) – A dictionary of arguments to be passed with the query.
- service_host (Optional[str]) – The server’s base URL under which service_endpoint is an endpoint. By default, IHMC’s public server is used.
Returns: html – The HTML result returned by the web service.
Return type: str
TRIPS/DRUM Local Reader (indra.sources.trips.drum_reader)¶
-
class
indra.sources.trips.drum_reader.DrumReader(**kwargs)[source]¶ Agent which processes text through a local TRIPS/DRUM instance.
This class is implemented as a communicative agent which sends and receives KQML messages through a socket. It sends text (ideally in small blocks like one sentence at a time) to the running DRUM instance and receives extraction knowledge base (EKB) XML responses asynchronously through the socket. To install DRUM and its dependencies locally, follow instructions at: https://github.com/wdebeaum/drum Once installed, run drum/bin/trips-drum -nouser to run DRUM without a GUI. Once DRUM is running, this class can be instantiated as dr = DrumReader(), at which point it attempts to connect to DRUM via the socket. You can use dr.read_text(text) to send text for reading. In another usage more, dr.read_pmc(pmcid) can be used to read a full open-access PMC paper. Receiving responses can be started as dr.start() which waits for responses from the reader and returns when all responses were received. Once finished, the list of EKB XML extractions can be accessed via dr.extractions.
Parameters: - run_drum (Optional[bool]) – If True, the DRUM reading system is launched as a subprocess for reading. If False, DRUM is expected to be running independently. Default: False
- drum_system (Optional[subproces.Popen]) – A handle to the subprocess of a running DRUM system instance. This can be passed in in case the instance is to be reused rather than restarted. Default: None
- **kwargs – All other keyword arguments are passed through to the DrumReader KQML module’s constructor.
-
extractions¶ A list of EKB XML extractions corresponding to the input text list.
Type: list[str]
-
drum_system¶ A subprocess handle that points to a running instance of the DRUM reading system. In case the DRUM system is running independently, this is None.
Type: subprocess.Popen
-
read_pmc(pmcid)[source]¶ Read a given PMC article.
Parameters: pmcid (str) – The PMC ID of the article to read. Note that only articles in the open-access subset of PMC will work.