Util (indra.util)¶
Utilities for using AWS (indra.util.aws)¶
-
class
indra.util.aws.JobLog(job_info, log_group_name='/aws/batch/job', verbose=False, append_dumps=True)[source]¶ Gets the Cloudwatch log associated with the given job.
Parameters: - job_info (dict) – dict containing entries for ‘jobName’ and ‘jobId’, e.g., as returned by get_jobs()
- log_group_name (string) – Name of the log group; defaults to ‘/aws/batch/job’
Returns: The event messages in the log, with the earliest events listed first.
Return type: list of strings
-
indra.util.aws.dump_logs(job_queue='run_reach_queue', job_status='RUNNING')[source]¶ Write logs for all jobs with given the status to files.
-
indra.util.aws.get_batch_command(command_list, project=None, purpose=None)[source]¶ Get the command appropriate for running something on batch.
-
indra.util.aws.get_date_from_str(date_str)[source]¶ Get a utc datetime object from a string of format %Y-%m-%d-%H-%M-%S
Parameters: date_str (str) – A string of the format %Y(-%m-%d-%H-%M-%S). The string is assumed to represent a UTC time. Returns: Return type: datetime.datetime
-
indra.util.aws.get_jobs(job_queue='run_reach_queue', job_status='RUNNING')[source]¶ Returns a list of dicts with jobName and jobId for each job with the given status.
-
indra.util.aws.get_s3_client(unsigned=True)[source]¶ Return a boto3 S3 client with optional unsigned config.
Parameters: unsigned (Optional[bool]) – If True, the client will be using unsigned mode in which public resources can be accessed without credentials. Default: True Returns: A client object to AWS S3. Return type: botocore.client.S3
-
indra.util.aws.get_s3_file_tree(s3, bucket, prefix, date_cutoff=None, after=True, with_dt=False)[source]¶ Overcome s3 response limit and return NestedDict tree of paths.
The NestedDict object also allows the user to search by the ends of a path.
The tree mimics a file directory structure, with the leave nodes being the full unbroken key. For example, ‘path/to/file.txt’ would be retrieved by
ret[‘path’][‘to’][‘file.txt’][‘key’]The NestedDict object returned also has the capability to get paths that lead to a certain value. So if you wanted all paths that lead to something called ‘file.txt’, you could use
ret.get_paths(‘file.txt’)For more details, see the NestedDict docs.
Parameters: - s3 (boto3.client.S3) – A boto3.client.S3 instance
- bucket (str) – The name of the bucket to list objects in
- prefix (str) – The prefix filtering of the objects for list
- date_cutoff (str|datetime.datetime) – A datestring of format %Y(-%m-%d-%H-%M-%S) or a datetime.datetime object. The date is assumed to be in UTC. By default no filtering is done. Default: None.
- after (bool) – If True, only return objects after the given date cutoff. Otherwise, return objects before. Default: True
- with_dt (bool) – If True, yield a tuple (key, datetime.datetime(LastModified)) of the s3 Key and the object’s LastModified date as a datetime.datetime object, only yield s3 key otherwise. Default: False.
Returns: A file tree represented as an NestedDict
Return type:
-
indra.util.aws.iter_s3_keys(s3, bucket, prefix, date_cutoff=None, after=True, with_dt=False, do_retry=True)[source]¶ Iterate over the keys in an s3 bucket given a prefix
Parameters: - s3 (boto3.client.S3) – A boto3.client.S3 instance
- bucket (str) – The name of the bucket to list objects in
- prefix (str) – The prefix filtering of the objects for list
- date_cutoff (str|datetime.datetime) – A datestring of format %Y(-%m-%d-%H-%M-%S) or a datetime.datetime object. The date is assumed to be in UTC. By default no filtering is done. Default: None.
- after (bool) – If True, only return objects after the given date cutoff. Otherwise, return objects before. Default: True
- with_dt (bool) – If True, yield a tuple (key, datetime.datetime(LastModified)) of the s3 Key and the object’s LastModified date as a datetime.datetime object, only yield s3 key otherwise. Default: False.
- do_retry (bool) – If True, and no contents appear, try again in case there was simply a brief lag. If False, do not retry, and just accept the “directory” is empty.
Returns: An iterator over s3 keys or (key, LastModified) tuples.
Return type: iterator[key]|iterator[(key, datetime.datetime)]
-
indra.util.aws.kill_all(job_queue, reason='None given', states=None, kill_list=None)[source]¶ Terminates/cancels all jobs on the specified queue.
Parameters: - job_queue (str) – The name of the Batch job queue on which you wish to terminate/cancel jobs.
- reason (str) – Provide a reason for the kill that will be recorded with the job’s record on AWS.
- states (None or list[str]) – A list of job states to remove. Possible states are ‘STARTING’, ‘RUNNABLE’, and ‘RUNNING’. If None, all jobs in all states will be ended (modulo the kill_list below).
- kill_list (None or list[dict]) – A list of job dictionaries (as returned by the submit function) that you specifically wish to kill. All other jobs on the queue will be ignored. If None, all jobs on the queue will be ended (modulo the above).
Returns: killed_ids – A list of the job ids for jobs that were killed.
Return type: list[str]
A utility to get the INDRA version (indra.util.get_version)¶
This tool provides a uniform method for createing a robust indra version string, both from within python and from commandline. If possible, the version will include the git commit hash. Otherwise, the version will be marked with ‘UNHASHED’.
Define NestedDict (indra.util.nested_dict)¶
-
class
indra.util.nested_dict.NestedDict[source]¶ A dict-like object that recursively populates elements of a dict.
More specifically, this acts like a recursive defaultdict, allowing, for example:
>> nd = NestedDict() >> nd[‘a’][‘b’][‘c’] = ‘foo’
In addition, useful methods have been defined that allow the user to search the data structure. Note that the are not particularly optimized methods at this time. However, for convenience, you can for example simply call get_path to get the path to a particular key:
>> nd.get_path(‘c’) ((‘a’, ‘b’, ‘c’), ‘foo’)
and the value at that key. Similarly:
>> nd.get_path(‘b’) ((‘a’, ‘b’), NestedDict(
‘c’: ‘foo’))
get, gets, and get_paths operate on similar principles, and are documented below.