Asynchronous processes

This page contains an overview of asynchronous processes in Yoda.

Table of contents

Metadata

Metadata changes are handled synchronously, except for batch updates of data package metadata after schema updates.

Schema update job

   
Script /etc/irods/yoda-ruleset/tools/check-metadata-for-schema-updates.r
Purpose verify and update data package metadata to new schema versions
Lock file no locking
Scheduling delayed rule queue
Typically started by manually by application administrator, after schema or application upgrade

Any changes to data package metadata will be recorded in the rodsLog.

Replication

By default, data in Yoda is replicated across two servers. Policies add a metadata attribute to data objects that should be replicated, and the asynchronous replication job replicates these objects. The default name of the metadata attribute is org_replication_scheduled; the attribute value contains the source and destination resource, separated by commas.

Replication job

   
Script /etc/irods/yoda-ruleset/tools/async-data-replicate.py
Purpose replicate data objects, also handles checksumming
Lock file /tmp/irods-async-data-replicate.py.lock
Scheduling cronjob, data object queue based on data object metadata attributes
Typically started by cronjob runs every five minutes

The replication job handles replicating data objects to a replication resource. It also adds checksums to data objects that do not have a checksum yet.

Data objects are marked for replication using a metadata attribute. The default name of these attributes is org_replication_scheduled.

The script has a verbose mode (which can be enabled using the -v switch). This will log additional information for troubleshooting to the rodsLog.

The script has a dry run mode (which can be enabled using the -n switch). This will not create any replications.

If a flag named /ZONE/yoda/flags/stop_replication is present, the script will stop processing data objects. See the page about setting job flags for more information.

Revision management

Yoda supports revision management of data objects, so that users can recover older versions of files. Old revisions are removed regularly using a revision strategy. Both revision creation and revision cleanup are handled asynchronously.

Revision creation job

   
Script /etc/irods/yoda-ruleset/tools/async-data-revision.py
Purpose create revisions of data objects
Lock file /tmp/irods-async-data-revision.py.lock
Scheduling cronjob, queue based on data object metadata attributes
Typically started by cronjob runs every ten minutes

Data objects are marked for revision creation using a metadata attribute. The default name of these attributes is org_revision_scheduled.

The script has a verbose mode (which can be enabled using the -v switch). This will log additional information for troubleshooting to the rodsLog.

The script has a dry run mode (which can be enabled using the -n switch). This will not create any revisions.

If a flag named /ZONE/yoda/flags/stop_revisions is present, the script will stop processing data objects. See the page about setting job flags for more information.

Revision cleanup job

   
Script /var/lib/irods/.irods/cronjob-revision-cleanup.sh
Purpose remove unneeded revisions of data objects, as per revision strategy
Lock file no lock file
Typically started by daily cronjob

Statistics

The statistics module provides users with an overview of the amount of data stored in Yoda groups and communities.

Statistics job

   
Script /etc/irods/yoda-ruleset/tools/storage-statistics.r
Purpose record size of storage data in group metadata
Lock file no lock file
Typically started by cronjob with a daily frequency as a minimum

Archiving and publication

Asynchronous jobs are also used to copy data packages from a research folder to the vault, as well as to process publications.

Retry copy to vault job

   
Script /etc/irods/yoda-ruleset/tools/retry-copy-to-vault.r
Purpose copy data packages from research groups to the vault
Lock file no lock file, but collection metadata attribute records processing status
Typically started by cronjob, runs every 15 minutes (or every 5 minutes on development)

By default, groups that are to be copied to the vault are marked with a metadata attribute named cronjob_copy_to_vault.

Process publication job

   
Script /etc/irods/yoda-ruleset/tools/process-publication.r
Purpose Asynchronously handles publication and depublication of data packages
Lock file no lock file, but status is recorded in metadata
Typically started by cronjob, runs every minute