galaxy.objectstore package

Submodules

galaxy.objectstore.azure_blob module

Object Store plugin for the Microsoft Azure Block Blob Storage system

class galaxy.objectstore.azure_blob.AzureBlobObjectStore(config, config_xml)[source]

Bases: galaxy.objectstore.ObjectStore

Object store that stores objects as blobs in an Azure Blob Container. A local cache exists that is used as an intermediate location for files between Galaxy and Azure.

create(obj, **kwargs)[source]

Mark the object (obj) as existing in the store, but with no content.

This method will create a proper directory structure for the file if the directory does not already exist.

delete(obj, entire_dir=False, **kwargs)[source]

Delete the object identified by obj.

Parameters:entire_dir (boolean) – If True, delete the entire directory pointed to by extra_dir. For safety reasons, this option applies only for and in conjunction with the extra_dir or obj_dir options.
empty(obj, **kwargs)[source]

Test if the object identified by obj has content.

If the object does not exist raises ObjectNotFound.

exists(obj, **kwargs)[source]

Return True if the object identified by obj exists, False otherwise.

file_ready(obj, **kwargs)[source]

A helper method that checks if a file corresponding to a dataset is ready and available to be used. Return True if so, False otherwise.

get_data(obj, start=0, count=-1, **kwargs)[source]

Fetch count bytes of data offset by start bytes using obj.id.

If the object does not exist raises ObjectNotFound.

Parameters:
  • start (int) – Set the position to start reading the dataset file
  • count (int) – Read at most count bytes from the dataset
get_filename(obj, **kwargs)[source]

Get the expected filename with absolute path for object with id obj.id.

This can be used to access the contents of the object.

get_object_url(obj, **kwargs)[source]

Return the URL for direct acces if supported, otherwise return None.

Note: need to be careful to not bypass dataset security with this.

get_store_usage_percent()[source]

Return the percentage indicating how full the store is.

size(obj, **kwargs)[source]

Return size of the object identified by obj.

If the object does not exist, return 0.

update_from_file(obj, file_name=None, create=False, **kwargs)[source]

Inform the store that the file associated with obj.id has been updated.

If file_name is provided, update from that file instead of the default. If the object does not exist raises ObjectNotFound.

Parameters:
  • file_name (string) – Use file pointed to by file_name as the source for updating the dataset identified by obj
  • create (boolean) – If True and the default dataset does not exist, create it first.

galaxy.objectstore.pithos module

class galaxy.objectstore.pithos.PithosObjectStore(config, config_xml)[source]

Bases: galaxy.objectstore.ObjectStore

Object store that stores objects as items in a Pithos+ container. Cache is ignored for the time being.

create(obj, **kwargs)[source]

Touch a file (aka create empty), if it doesn’t exist

delete(obj, **kwargs)[source]

Delete the object :returns: weather the object was deleted

empty(obj, **kwargs)[source]
Returns:weather the object has content
Raises:ObjectNotFound
exists(obj, **kwargs)[source]

Check if file exists, fix if file in cache and not on Pithos+ :returns: weather the file exists remotely or in cache

get_data(obj, start=0, count=-1, **kwargs)[source]

Fetch (e.g., download) data :param start: Chunk of data starts here :param count: Fetch at most as many data, fetch all if negative

get_filename(obj, **kwargs)[source]

Get the expected filename with absolute path

get_object_url(obj, **kwargs)[source]
Returns:URL for direct access, None if no object
get_store_usage_percent()[source]
Returns:percentage indicating how full the store is
size(obj, **kwargs)[source]
Returns:The size of the object, or 0 if it doesn’t exist (sorry for that, not our fault, the ObjectStore interface is like that some times)
update_from_file(obj, **kwargs)[source]

Update the store when a file is updated

galaxy.objectstore.pithos.parse_config_xml(config_xml)[source]

Parse and validate config_xml, return dict for convenience :param config_xml: (xml.etree.ElementTree.Element) root of XML subtree :returns: (dict) according to syntax :raises: various XML parse errors

galaxy.objectstore.pulsar module

class galaxy.objectstore.pulsar.PulsarObjectStore(config, config_xml)[source]

Bases: galaxy.objectstore.ObjectStore

Object store implementation that delegates to a remote Pulsar server.

This may be more aspirational than practical for now, it would be good to Galaxy to a point that a handler thread could be setup that doesn’t attempt to access the disk files returned by a (this) object store - just passing them along to the Pulsar unmodified. That modification - along with this implementation and Pulsar job destinations would then allow Galaxy to fully manage jobs on remote servers with completely different mount points.

This implementation should be considered beta and may be dropped from Galaxy at some future point or significantly modified.

create(obj, **kwds)[source]

Mark the object (obj) as existing in the store, but with no content.

This method will create a proper directory structure for the file if the directory does not already exist.

delete(obj, **kwds)[source]

Delete the object identified by obj.

Parameters:entire_dir (boolean) – If True, delete the entire directory pointed to by extra_dir. For safety reasons, this option applies only for and in conjunction with the extra_dir or obj_dir options.
empty(obj, **kwds)[source]

Test if the object identified by obj has content.

If the object does not exist raises ObjectNotFound.

exists(obj, **kwds)[source]

Return True if the object identified by obj exists, False otherwise.

file_ready(obj, **kwds)[source]

Check if a file corresponding to a dataset is ready to be used.

Return True if so, False otherwise

get_data(obj, **kwds)[source]

Fetch count bytes of data offset by start bytes using obj.id.

If the object does not exist raises ObjectNotFound.

Parameters:
  • start (int) – Set the position to start reading the dataset file
  • count (int) – Read at most count bytes from the dataset
get_filename(obj, **kwds)[source]

Get the expected filename with absolute path for object with id obj.id.

This can be used to access the contents of the object.

get_object_url(obj, extra_dir=None, extra_dir_at_root=False, alt_name=None)[source]

Return the URL for direct acces if supported, otherwise return None.

Note: need to be careful to not bypass dataset security with this.

get_store_usage_percent()[source]

Return the percentage indicating how full the store is.

shutdown()[source]

Close any connections for this ObjectStore.

size(obj, **kwds)[source]

Return size of the object identified by obj.

If the object does not exist, return 0.

update_from_file(obj, **kwds)[source]

Inform the store that the file associated with obj.id has been updated.

If file_name is provided, update from that file instead of the default. If the object does not exist raises ObjectNotFound.

Parameters:
  • file_name (string) – Use file pointed to by file_name as the source for updating the dataset identified by obj
  • create (boolean) – If True and the default dataset does not exist, create it first.

galaxy.objectstore.rods module

Object Store plugin for the Integrated Rule-Oriented Data Store (iRODS)

The module is named rods to avoid conflicting with the PyRods module, irods

class galaxy.objectstore.rods.IRODSObjectStore(config, file_path=None, extra_dirs=None)[source]

Bases: galaxy.objectstore.DiskObjectStore

Galaxy object store based on iRODS

create(*args, **kwargs)
delete(*args, **kwargs)
empty(*args, **kwargs)
exists(*args, **kwargs)
get_data(*args, **kwargs)
get_filename(*args, **kwargs)
get_object_url(obj, **kwargs)[source]

Override ObjectStore’s stub.

Returns None, we have no URLs.

get_store_usage_percent()[source]

Override ObjectStore’s stub by return percent storage used.

size(obj, **kwargs)[source]

Override ObjectStore’s stub by return file size on disk.

Returns 0 if the object doesn’t exist yet or other error.

update_from_file(*args, **kwargs)
galaxy.objectstore.rods.rods_connect()[source]

A basic iRODS connection mechanism that connects using the current iRODS environment

galaxy.objectstore.s3 module

Object Store plugin for the Amazon Simple Storage Service (S3)

class galaxy.objectstore.s3.S3ObjectStore(config, config_xml)[source]

Bases: galaxy.objectstore.ObjectStore

Object store that stores objects as items in an AWS S3 bucket. A local cache exists that is used as an intermediate location for files between Galaxy and S3.

create(obj, **kwargs)[source]

Mark the object (obj) as existing in the store, but with no content.

This method will create a proper directory structure for the file if the directory does not already exist.

delete(obj, entire_dir=False, **kwargs)[source]

Delete the object identified by obj.

Parameters:entire_dir (boolean) – If True, delete the entire directory pointed to by extra_dir. For safety reasons, this option applies only for and in conjunction with the extra_dir or obj_dir options.
empty(obj, **kwargs)[source]

Test if the object identified by obj has content.

If the object does not exist raises ObjectNotFound.

exists(obj, **kwargs)[source]

Return True if the object identified by obj exists, False otherwise.

file_ready(obj, **kwargs)[source]

A helper method that checks if a file corresponding to a dataset is ready and available to be used. Return True if so, False otherwise.

get_data(obj, start=0, count=-1, **kwargs)[source]

Fetch count bytes of data offset by start bytes using obj.id.

If the object does not exist raises ObjectNotFound.

Parameters:
  • start (int) – Set the position to start reading the dataset file
  • count (int) – Read at most count bytes from the dataset
get_filename(obj, **kwargs)[source]

Get the expected filename with absolute path for object with id obj.id.

This can be used to access the contents of the object.

get_object_url(obj, **kwargs)[source]

Return the URL for direct acces if supported, otherwise return None.

Note: need to be careful to not bypass dataset security with this.

get_store_usage_percent()[source]

Return the percentage indicating how full the store is.

size(obj, **kwargs)[source]

Return size of the object identified by obj.

If the object does not exist, return 0.

update_from_file(obj, file_name=None, create=False, **kwargs)[source]

Inform the store that the file associated with obj.id has been updated.

If file_name is provided, update from that file instead of the default. If the object does not exist raises ObjectNotFound.

Parameters:
  • file_name (string) – Use file pointed to by file_name as the source for updating the dataset identified by obj
  • create (boolean) – If True and the default dataset does not exist, create it first.
class galaxy.objectstore.s3.SwiftObjectStore(config, config_xml)[source]

Bases: galaxy.objectstore.s3.S3ObjectStore

Object store that stores objects as items in a Swift bucket. A local cache exists that is used as an intermediate location for files between Galaxy and Swift.

galaxy.objectstore.s3_multipart_upload module

Split large file into multiple pieces for upload to S3. This parallelizes the task over available cores using multiprocessing. Code mostly taken form CloudBioLinux.

galaxy.objectstore.s3_multipart_upload.map_wrap(f)[source]
galaxy.objectstore.s3_multipart_upload.mp_from_ids(s3server, mp_id, mp_keyname, mp_bucketname)[source]

Get the multipart upload from the bucket and multipart IDs.

This allows us to reconstitute a connection to the upload from within multiprocessing functions.

galaxy.objectstore.s3_multipart_upload.multimap(*args, **kwds)[source]

Provide multiprocessing imap like function.

The context manager handles setting up the pool, worked around interrupt issues and terminating the pool on completion.

galaxy.objectstore.s3_multipart_upload.multipart_upload(s3server, bucket, s3_key_name, tarball, mb_size)[source]

Upload large files using Amazon’s multipart upload functionality.

galaxy.objectstore.s3_multipart_upload.transfer_part(args)[source]

Transfer a part of a multipart upload. Designed to be run in parallel.

Module contents

objectstore package, abstraction for storing blobs of data for use in Galaxy.

all providers ensure that data can be accessed on the filesystem for running tools

class galaxy.objectstore.DiskObjectStore(config, config_xml=None, file_path=None, extra_dirs=None)[source]

Bases: galaxy.objectstore.ObjectStore

Standard Galaxy object store.

Stores objects in files under a specific directory on disk.

>>> from galaxy.util.bunch import Bunch
>>> import tempfile
>>> file_path=tempfile.mkdtemp()
>>> obj = Bunch(id=1)
>>> s = DiskObjectStore(Bunch(umask=0o077, jobs_directory=file_path, new_file_path=file_path, object_store_check_old_style=False), file_path=file_path)
>>> s.create(obj)
>>> s.exists(obj)
True
>>> assert s.get_filename(obj) == file_path + '/000/dataset_1.dat'
create(obj, **kwargs)[source]

Override ObjectStore’s stub by creating any files and folders on disk.

delete(obj, entire_dir=False, **kwargs)[source]

Override ObjectStore’s stub; delete the file or folder on disk.

empty(obj, **kwargs)[source]

Override ObjectStore’s stub by checking file size on disk.

exists(obj, **kwargs)[source]

Override ObjectStore’s stub and check on disk.

get_data(obj, start=0, count=-1, **kwargs)[source]

Override ObjectStore’s stub; retrieve data directly from disk.

get_filename(obj, **kwargs)[source]

Override ObjectStore’s stub.

If object_store_check_old_style is set to True in config then the root path is checked first.

get_object_url(obj, **kwargs)[source]

Override ObjectStore’s stub.

Returns None, we have no URLs.

get_store_usage_percent()[source]

Override ObjectStore’s stub by return percent storage used.

size(obj, **kwargs)[source]

Override ObjectStore’s stub by return file size on disk.

Returns 0 if the object doesn’t exist yet or other error.

update_from_file(obj, file_name=None, create=False, **kwargs)[source]

create parameter is not used in this implementation.

class galaxy.objectstore.DistributedObjectStore(config, config_xml=None, fsmon=False)[source]

Bases: galaxy.objectstore.NestedObjectStore

ObjectStore that defers to a list of backends.

When getting objects the first store where the object exists is used. When creating objects they are created in a store selected randomly, but with weighting.

create(obj, **kwargs)[source]

The only method in which obj.object_store_id may be None.

shutdown()[source]

Shut down. Kill the free space monitor if there is one.

class galaxy.objectstore.HierarchicalObjectStore(config, config_xml=None, fsmon=False)[source]

Bases: galaxy.objectstore.NestedObjectStore

ObjectStore that defers to a list of backends.

When getting objects the first store where the object exists is used. When creating objects only the first store is used.

create(obj, **kwargs)[source]

Call the primary object store.

exists(obj, **kwargs)[source]

Check all child object stores.

class galaxy.objectstore.NestedObjectStore(config, config_xml=None)[source]

Bases: galaxy.objectstore.ObjectStore

Base for ObjectStores that use other ObjectStores.

Example: DistributedObjectStore, HierarchicalObjectStore

create(obj, **kwargs)[source]

Create a backing file in a random backend.

delete(obj, **kwargs)[source]

For the first backend that has this obj, delete it.

empty(obj, **kwargs)[source]

For the first backend that has this obj, determine if it is empty.

exists(obj, **kwargs)[source]

Determine if the obj exists in any of the backends.

file_ready(obj, **kwargs)[source]

Determine if the file for obj is ready to be used by any of the backends.

get_data(obj, **kwargs)[source]

For the first backend that has this obj, get data from it.

get_filename(obj, **kwargs)[source]

For the first backend that has this obj, get its filename.

get_object_url(obj, **kwargs)[source]

For the first backend that has this obj, get its URL.

shutdown()[source]

For each backend, shuts them down.

size(obj, **kwargs)[source]

For the first backend that has this obj, return its size.

update_from_file(obj, **kwargs)[source]

For the first backend that has this obj, update it from the given file.

class galaxy.objectstore.ObjectStore(config, **kwargs)[source]

Bases: object

ObjectStore abstract interface.

FIELD DESCRIPTIONS (these apply to all the methods in this class):

Parameters:
  • obj (StorableObject) – A Galaxy object with an assigned database ID accessible via the .id attribute.
  • base_dir (string) – A key in self.extra_dirs corresponding to the base directory in which this object should be created, or None to specify the default directory.
  • dir_only (boolean) – If True, check only the path where the file identified by obj should be located, not the dataset itself. This option applies to extra_dir argument as well.
  • extra_dir (string) – Append extra_dir to the directory structure where the dataset identified by obj should be located. (e.g., 000/extra_dir/obj.id). Valid values include ‘job_work’ (defaulting to config.jobs_directory = ‘$GALAXY_ROOT/database/jobs_directory’); ‘temp’ (defaulting to config.new_file_path = ‘$GALAXY_ROOT/database/tmp’).
  • extra_dir_at_root (boolean) – Applicable only if extra_dir is set. If True, the extra_dir argument is placed at root of the created directory structure rather than at the end (e.g., extra_dir/000/obj.id vs. 000/extra_dir/obj.id)
  • alt_name (string) – Use this name as the alternative name for the created dataset rather than the default.
  • obj_dir (boolean) – Append a subdirectory named with the object’s ID (e.g. 000/obj.id)
create(obj, base_dir=None, dir_only=False, extra_dir=None, extra_dir_at_root=False, alt_name=None, obj_dir=False)[source]

Mark the object (obj) as existing in the store, but with no content.

This method will create a proper directory structure for the file if the directory does not already exist.

delete(obj, entire_dir=False, base_dir=None, extra_dir=None, extra_dir_at_root=False, alt_name=None, obj_dir=False)[source]

Delete the object identified by obj.

Parameters:entire_dir (boolean) – If True, delete the entire directory pointed to by extra_dir. For safety reasons, this option applies only for and in conjunction with the extra_dir or obj_dir options.
empty(obj, base_dir=None, extra_dir=None, extra_dir_at_root=False, alt_name=None, obj_dir=False)[source]

Test if the object identified by obj has content.

If the object does not exist raises ObjectNotFound.

exists(obj, base_dir=None, dir_only=False, extra_dir=None, extra_dir_at_root=False, alt_name=None)[source]

Return True if the object identified by obj exists, False otherwise.

file_ready(obj, base_dir=None, dir_only=False, extra_dir=None, extra_dir_at_root=False, alt_name=None, obj_dir=False)[source]

Check if a file corresponding to a dataset is ready to be used.

Return True if so, False otherwise

get_data(obj, start=0, count=-1, base_dir=None, extra_dir=None, extra_dir_at_root=False, alt_name=None, obj_dir=False)[source]

Fetch count bytes of data offset by start bytes using obj.id.

If the object does not exist raises ObjectNotFound.

Parameters:
  • start (int) – Set the position to start reading the dataset file
  • count (int) – Read at most count bytes from the dataset
get_filename(obj, base_dir=None, dir_only=False, extra_dir=None, extra_dir_at_root=False, alt_name=None, obj_dir=False)[source]

Get the expected filename with absolute path for object with id obj.id.

This can be used to access the contents of the object.

get_object_url(obj, extra_dir=None, extra_dir_at_root=False, alt_name=None, obj_dir=False)[source]

Return the URL for direct acces if supported, otherwise return None.

Note: need to be careful to not bypass dataset security with this.

get_store_usage_percent()[source]

Return the percentage indicating how full the store is.

shutdown()[source]

Close any connections for this ObjectStore.

size(obj, extra_dir=None, extra_dir_at_root=False, alt_name=None, obj_dir=False)[source]

Return size of the object identified by obj.

If the object does not exist, return 0.

update_from_file(obj, base_dir=None, extra_dir=None, extra_dir_at_root=False, alt_name=None, obj_dir=False, file_name=None, create=False)[source]

Inform the store that the file associated with obj.id has been updated.

If file_name is provided, update from that file instead of the default. If the object does not exist raises ObjectNotFound.

Parameters:
  • file_name (string) – Use file pointed to by file_name as the source for updating the dataset identified by obj
  • create (boolean) – If True and the default dataset does not exist, create it first.
class galaxy.objectstore.ObjectStorePopulator(app)[source]

Bases: object

Small helper for interacting with the object store and making sure all datasets from a job end up with the same object_store_id.

set_object_store_id(data)[source]
galaxy.objectstore.build_object_store_from_config(config, fsmon=False, config_xml=None)[source]

Invoke the appropriate object store.

Will use the object_store_config_file attribute of the config object to configure a new object store from the specified XML file.

Or you can specify the object store type in the object_store attribute of the config object. Currently ‘disk’, ‘s3’, ‘swift’, ‘distributed’, ‘hierarchical’, ‘irods’, and ‘pulsar’ are supported values.

galaxy.objectstore.convert_bytes(bytes)[source]

A helper function used for pretty printing disk usage.

galaxy.objectstore.local_extra_dirs(func)[source]

Non-local plugin decorator using local directories for the extra_dirs (job_work and temp).