Asynchronous processes

This page contains an overview of asynchronous processes in Yoda.

Metadata
Metadata schema update job
Replication
Replication job
Revision management
Revision management - creation job
Revision management - cleanup job
Statistics
Statistics job
Archiving
Archiving - copy-to-vault job
Archiving - publication job

Metadata

Metadata changes are handled synchronously, except for batch updates of data package metadata after schema updates.

Schema update job


Script	/etc/irods/yoda-ruleset/tools/check-metadata-for-schema-updates.r
Purpose	verify and update data package metadata to new schema versions
Lock file	no locking
Scheduling	delayed rule queue
Typically started by	manually by application administrator, after schema or application upgrade

Any changes to data package metadata will be recorded in the rodsLog.

Replication

By default, data in Yoda is replicated across two servers. Policies add a metadata attribute to data objects that should be replicated, and the asynchronous replication job replicates these objects. The default name of the metadata attribute is org_replication_scheduled; the attribute value contains the source and destination resource, separated by commas.

Replication job


Script	/etc/irods/yoda-ruleset/tools/async-data-replicate.py
Purpose	replicate data objects, also handles checksumming
Lock file	/tmp/irods-async-data-replicate.py.lock
Scheduling	cronjob, data object queue based on data object metadata attributes
Typically started by	cronjob runs every five minutes

The replication job handles replicating data objects to a replication resource. It also adds checksums to data objects that do not have a checksum yet.

Data objects are marked for replication using a metadata attribute. The default name of these attributes is org_replication_scheduled.

The script has a verbose mode (which can be enabled using the -v switch). This will log additional information for troubleshooting to the rodsLog.

The script has a dry run mode (which can be enabled using the -n switch). This will not create any replications.

If a flag named /ZONE/yoda/flags/stop_replication is present, the script will stop processing data objects. See the page about setting job flags for more information.

Revision management

Yoda supports revision management of data objects, so that users can recover older versions of files. Old revisions are removed regularly using a revision strategy. Both revision creation and revision cleanup are handled asynchronously.

Revision creation job


Script	/etc/irods/yoda-ruleset/tools/async-data-revision.py
Purpose	create revisions of data objects
Lock file	/tmp/irods-async-data-revision.py.lock
Scheduling	cronjob, queue based on data object metadata attributes
Typically started by	cronjob runs every ten minutes

Data objects are marked for revision creation using a metadata attribute. The default name of these attributes is org_revision_scheduled.

The script has a verbose mode (which can be enabled using the -v switch). This will log additional information for troubleshooting to the rodsLog.

The script has a dry run mode (which can be enabled using the -n switch). This will not create any revisions.

If a flag named /ZONE/yoda/flags/stop_revisions is present, the script will stop processing data objects. See the page about setting job flags for more information.

Revision cleanup job


Script	/var/lib/irods/.irods/cronjob-revision-cleanup.sh
Purpose	remove unneeded revisions of data objects, as per revision strategy
Lock file	no lock file
Typically started by	daily cronjob

Statistics

The statistics module provides users with an overview of the amount of data stored in Yoda groups and communities.

Statistics job


Script	/etc/irods/yoda-ruleset/tools/storage-statistics.r
Purpose	record size of storage data in group metadata
Lock file	no lock file
Typically started by	cronjob with a daily frequency as a minimum

Archiving and publication

Asynchronous jobs are also used to copy data packages from a research folder to the vault, as well as to process publications.

Copy to vault job


Script	/etc/irods/yoda-ruleset/tools/copy-to-vault.r
Purpose	copy data packages from research groups to the vault
Lock file	no lock file, but collection metadata attribute records processing status
Typically started by	cronjob, runs every 5 minutes

By default, groups that are to be copied to the vault are marked with a metadata attribute named cronjob_copy_to_vault.

Process publication job


Script	/etc/irods/yoda-ruleset/tools/process-publication.r
Purpose	Asynchronously handles publication and depublication of data packages
Lock file	no lock file, but status is recorded in metadata
Typically started by	cronjob, runs every minute