Asynchronous processes
This page contains an overview of asynchronous processes in Yoda.
Table of contents
- Metadata
- Metadata schema update job
- Replication
- Replication job
- Revision management
- Revision management - creation job
- Revision management - cleanup job
- Statistics
- Statistics job
- Archiving
- Archiving - retry-copy-to-vault job
- Archiving - publication job
Metadata
Metadata changes are handled synchronously, except for batch updates of data package metadata after schema updates.
Schema update job
Script | /etc/irods/yoda-ruleset/tools/check-metadata-for-schema-updates.r |
Purpose | verify and update data package metadata to new schema versions |
Lock file | no locking |
Scheduling | delayed rule queue |
Typically started by | manually by application administrator, after schema or application upgrade |
Any changes to data package metadata will be recorded in the rodsLog.
Replication
By default, data in Yoda is replicated across two servers. Policies add a metadata attribute
to data objects that should be replicated, and the asynchronous replication job replicates these
objects. The default name of the metadata attribute is org_replication_scheduled
; the attribute value
contains the source and destination resource, separated by commas.
Replication job
Script | /etc/irods/yoda-ruleset/tools/async-data-replicate.py |
Purpose | replicate data objects, also handles checksumming |
Lock file | /tmp/irods-async-data-replicate.py.lock |
Scheduling | cronjob, data object queue based on data object metadata attributes |
Typically started by | cronjob runs every five minutes |
The replication job handles replicating data objects to a replication resource. It also adds checksums to data objects that do not have a checksum yet.
Data objects are marked for replication using a metadata attribute. The default name of these
attributes is org_replication_scheduled
.
The script has a verbose mode (which can be enabled using the -v
switch). This will log additional
information for troubleshooting to the rodsLog.
The script has a dry run mode (which can be enabled using the -n
switch). This
will not create any replications.
If a flag named /ZONE/yoda/flags/stop_replication
is present, the script will stop
processing data objects. See the page about setting job flags for more information.
Revision management
Yoda supports revision management of data objects, so that users can recover older versions of files. Old revisions are removed regularly using a revision strategy. Both revision creation and revision cleanup are handled asynchronously.
Revision creation job
Script | /etc/irods/yoda-ruleset/tools/async-data-revision.py |
Purpose | create revisions of data objects |
Lock file | /tmp/irods-async-data-revision.py.lock |
Scheduling | cronjob, queue based on data object metadata attributes |
Typically started by | cronjob runs every ten minutes |
Data objects are marked for revision creation using a metadata attribute. The default name of these
attributes is org_revision_scheduled
.
The script has a verbose mode (which can be enabled using the -v
switch). This will log additional
information for troubleshooting to the rodsLog.
The script has a dry run mode (which can be enabled using the -n
switch). This
will not create any revisions.
If a flag named /ZONE/yoda/flags/stop_revisions
is present, the script will stop
processing data objects. See the page about setting job flags for more information.
Revision cleanup job
Script | /var/lib/irods/.irods/cronjob-revision-cleanup.sh |
Purpose | remove unneeded revisions of data objects, as per revision strategy |
Lock file | no lock file |
Typically started by | daily cronjob |
Statistics
The statistics module provides users with an overview of the amount of data stored in Yoda groups and communities.
Statistics job
Script | /etc/irods/yoda-ruleset/tools/storage-statistics.r |
Purpose | record size of storage data in group metadata |
Lock file | no lock file |
Typically started by | cronjob with a daily frequency as a minimum |
Archiving and publication
Asynchronous jobs are also used to copy data packages from a research folder to the vault, as well as to process publications.
Retry copy to vault job
Script | /etc/irods/yoda-ruleset/tools/retry-copy-to-vault.r |
Purpose | copy data packages from research groups to the vault |
Lock file | no lock file, but collection metadata attribute records processing status |
Typically started by | cronjob, runs every 15 minutes (or every 5 minutes on development) |
By default, groups that are to be copied to the vault are marked with a metadata attribute named
cronjob_copy_to_vault
.
Process publication job
Script | /etc/irods/yoda-ruleset/tools/process-publication.r |
Purpose | Asynchronously handles publication and depublication of data packages |
Lock file | no lock file, but status is recorded in metadata |
Typically started by | cronjob, runs every minute |