rook

Documentation Status Build Status GitHub license
rook (the bird)

The rook belongs to the crow family …

rook

Remote Operations On Klimadaten.

Rook is a Web Processing Service (WPS) of the roocs project to allow remote operations like subsetting on climate model data. This service provides a one-to-one mapping to the operations available in the daops library based on xarray.

Documentation

Learn more about rook in its official documentation at https://rook-wps.readthedocs.io.

Submit bug reports, questions and feature requests at https://github.com/roocs/rook/issues

Contributing

You can find information about contributing in our Developer Guide.

Please use bumpversion to release a new version.

Tests

The tests folder includes additional tests for a deployed rook service.

  • Smoke test: ensure service is operational. See tests/smoke/README.md.

  • Storm test: load-test using locust. See tests/storm/README.md.

License

Free software: Apache Software License 2.0

Credits

This package was created with Cookiecutter and the bird-house/cookiecutter-birdhouse project template.

Installation

Install from Conda

Warning

TODO: Prepare Conda package.

Install from GitHub

Check out code from the rook GitHub repo and start the installation:

$ git clone https://github.com/roocs/rook.git
$ cd rook

Create Conda environment named rook:

$ conda env create -f environment.yml
$ source activate rook

Install rook app:

$ pip install -e .
OR
make install

For development you can use this command:

$ pip install -e ".[dev]"
OR
$ make develop

Configure roocs

rook is using daops for the operations. It needs a roocs.ini configuration file. You can overwrite the defaults by setting the environment variable ROOCS_CONFIG.

$ export ROOCS_CONFIG=~/.roocs.ini

There is an example in etc/sample-roocs.ini.

For more information on the configuration settings, see https://roocs-utils.readthedocs.io/en/latest/configuration.html

Start rook PyWPS service

After successful installation you can start the service using the rook command-line.

$ rook --help # show help
$ rook start  # start service with default configuration

OR

$ rook start --daemon # start service as daemon
loading configuration
forked process id: 42

The deployed WPS service is by default available on:

http://localhost:5000/wps?service=WPS&version=1.0.0&request=GetCapabilities.

Note

Remember the process ID (PID) so you can stop the service with kill PID.

You can find which process uses a given port using the following command (here for port 5000):

$ netstat -nlp | grep :5000

Check the log files for errors:

$ tail -f  pywps.log
… or do it the lazy way

You can also use the Makefile to start and stop the service:

$ make start
$ make status
$ tail -f pywps.log
$ make stop

Run rook as Docker container

You can also run rook as a Docker container.

Warning

TODO: Describe Docker container support.

Use Ansible to deploy rook on your System

Use the Ansible playbook for PyWPS to deploy rook on your system.

Configuration

Command-line options

You can overwrite the default PyWPS configuration by using command-line options. See the rook help which options are available:

$ rook start --help
--hostname HOSTNAME        hostname in PyWPS configuration.
--port PORT                port in PyWPS configuration.

Start service with different hostname and port:

$ rook start --hostname localhost --port 5001

Use a custom configuration file

You can overwrite the default PyWPS configuration by providing your own PyWPS configuration file (just modifiy the options you want to change). Use one of the existing sample-*.cfg files as example and copy them to etc/custom.cfg.

For example change the hostname (demo.org) and logging level:

$ cd rook
$ vim etc/custom.cfg
$ cat etc/custom.cfg
[server]
url = http://demo.org:5000/wps
outputurl = http://demo.org:5000/outputs

[logging]
level = DEBUG

Start the service with your custom configuration:

# start the service with this configuration
$ rook start -c etc/custom.cfg

Developer Guide

Warning

To create new processes look at examples in Emu.

Building the docs

First install dependencies for the documentation:

$ make develop

Run the Sphinx docs generator:

$ make docs

Add pre-commit hooks

Before committing your changes, we ask that you install pre-commit in your environment. Pre-commit runs git hooks that ensure that your code resembles that of the project and catches and corrects any small errors or inconsistencies when you git commit:

$ conda install -c conda-forge pre_commit
$ pre-commit install

Running tests

Run tests using pytest.

First activate the rook Conda environment and install pytest.

$ source activate rook
$ pip install -r requirements_dev.txt  # if not already installed
OR
$ make develop

Configure the pywps configuration with path to test data.

$ export PYWPS_CFG=/path/to/test/pywps.cfg

Run quick tests (skip slow and online):

$ pytest -m 'not slow and not online'"

Run all tests:

$ pytest

Check pep8:

$ flake8

Run tests the lazy way

Do the same as above using the Makefile.

$ make test
$ make test-all
$ make lint

Prepare a release

Update the Conda specification file to build identical environments on a specific OS.

Note

You should run this on your target OS, in our case Linux.

$ conda env create -f environment.yml
$ source activate rook
$ make clean
$ make install
$ conda list -n rook --explicit > spec-list.txt

Bump a new version

Make a new version of rook in the following steps:

  • Make sure everything is commit to GitHub.

  • Update CHANGES.rst with the next version.

  • Dry Run: bumpversion --dry-run --verbose --new-version 0.8.1 patch

  • Do it: bumpversion --new-version 0.8.1 patch

  • … or: bumpversion --new-version 0.9.0 minor

  • Push it: git push

  • Push tag: git push --tags

See the bumpversion documentation for details.

Notebooks

You can use the rooki Python client to use the rook service. See the online notebooks with examples.

Processes

Subset

Average

Orchestrate

Provenance

Introduction

The rook processes are recording provenance information about the process execution details. This information includes:

  • used software and versions (rook, daops, …)

  • applied operators like subset and average

  • used input data and parameters (cmip6 dataset, time, area)

  • generated outputs (NetCDF files)

  • execution time (start-time and end-time)

This information is described with the W3C PROV standard and using the Python PROV Library

Overview of PROV

The W3C PROV Primer document gives an overview of the W3C PROV standard.

_images/prov-overview.png

A PROV document consists of agents, activities and entities. These can be connected via PROV relations like wasDerivedFrom.

Entities
W3C PROV

In PROV, physical, digital, conceptual, or other kinds of thing are called entities.

In rook we use entities for:

  • workflow description,

  • input datasets and

  • resulting output NetCDF files.

Activities
W3C PROV

Activities are how entities come into existence and how their attributes change to become new entities, often making use of previously existing entities to achieve this.

In rook we use activities for:

  • operators like subset and average.

  • processes like orchestrate to run a workflow.

Agent
W3C PROV

An agent takes a role in an activity such that the agent can be assigned some degree of responsibility for the activity taking place. An agent can be a person, a piece of software or an organisation.

In rook we use agents for:

  • software like rook and daops,

  • organisations like Copernicus Climate Data Store.

Namespaces
W3C PROV

Using URIs and namespaces, a provenance record can draw from multiple sources on the Web.

We use namespaces to use existing PROV vocabularies like prov:SoftwareAgent. These are for example:

Subset Example
_images/prov-subset.png

The activity subset is started by the software agent daops (Python library) which was triggered by rook (data-reduction service).

The NetCDF file tas_day_...nc entity was derived from c3s-cmip6 dataset entity using the activity subset.

Workflow Example
_images/prov-workflow.png
W3C PROV Plans

Activities may follow pre-defined procedures, such as recipes, tutorials, instructions, or workflows. PROV refers to these, in general, as plans.

In W3C PROV workflows are named plans.

The activity orchestrate is started by the agent rook. It uses a workflow document entity (plan) which consists of a subset and average activity. These activities are started by the software agent daops.

Example: Workflow with Subsetting Operators

The rooki client for rook has example notebooks for process executions and displaying the provenance information.

You can run the orchestrate process to execute a workflow with subsetting operators and show the provenance document:

 1from rooki import operators as ops
 2wf = ops.Subset(
 3      ops.Subset(
 4          ops.Input(
 5              'tas', ['c3s-cmip6.ScenarioMIP.INM.INM-CM5-0.ssp245.r1i1p1f1.day.tas.gr1.v20190619']
 6          ),
 7          time="2016-01-01/2020-12-30",
 8      ),
 9      time="2017-01-01/2017-12-30",
10)
11resp = wf.orchestrate()
12# show URLs of output files
13resp.download_urls()
14# show URL to provenance document
15resp.provenance()
16# show URL to provenance image
17resp.provenance_image()

The response of the process includes a provenance document in PROV-JSON format:

{
  "prefix": {
    "provone": "http://purl.dataone.org/provone/2015/01/15/ontology#",
    "dcterms": "http://purl.org/dc/terms/",
    "default": "http://purl.org/roocs/prov#"
  },
  "agent": {
    "copernicus_CDS": {
      "prov:type": "prov:Organization",
      "dcterms:title": "Copernicus Climate Data Store"
    },
    "rook": {
      "prov:type": "prov:SoftwareAgent",
      "dcterms:source": "https://github.com/roocs/rook/releases/tag/v0.2.0"
    },
    "daops": {
      "prov:type": "prov:SoftwareAgent",
      "dcterms:source": "https://github.com/roocs/daops/releases/tag/v0.3.0"
    }
  },
  "wasAttributedTo": {
    "_:id1": {
      "prov:entity": "rook",
      "prov:agent": "copernicus_CDS"
    }
  },
  "entity": {
    "workflow": {
      "prov:type": "provone:Workflow"
    },
    "c3s-cmip6.ScenarioMIP.INM.INM-CM5-0.ssp245.r1i1p1f1.day.tas.gr1.v20190619": {},
    "tas_day_INM-CM5-0_ssp245_r1i1p1f1_gr1_20160101-20201229.nc": [{}, {}],
    "tas_day_INM-CM5-0_ssp245_r1i1p1f1_gr1_20170101-20171229.nc": {}
  },
  "activity": {
    "orchestrate": [{
      "prov:startedAtTime": "2021-02-15T13:24:33"
    }, {
      "prov:endedAtTime": "2021-02-15T13:24:57"
    }],
    "subset_tas_1": {
      "time": "2016-01-01/2020-12-30",
      "apply_fixes": false
    },
    "subset_tas_2": {
      "time": "2017-01-01/2017-12-30",
      "apply_fixes": false
    }
  },
  "wasAssociatedWith": {
    "_:id2": {
      "prov:activity": "orchestrate",
      "prov:agent": "rook",
      "prov:plan": "workflow"
    },
    "_:id3": {
      "prov:activity": "subset_tas_1",
      "prov:agent": "daops",
      "prov:plan": "workflow"
    },
    "_:id5": {
      "prov:activity": "subset_tas_2",
      "prov:agent": "daops",
      "prov:plan": "workflow"
    }
  },
  "wasDerivedFrom": {
    "_:id4": {
      "prov:generatedEntity": "tas_day_INM-CM5-0_ssp245_r1i1p1f1_gr1_20160101-20201229.nc",
      "prov:usedEntity": "c3s-cmip6.ScenarioMIP.INM.INM-CM5-0.ssp245.r1i1p1f1.day.tas.gr1.v20190619",
      "prov:activity": "subset_tas_1"
    },
    "_:id6": {
      "prov:generatedEntity": "tas_day_INM-CM5-0_ssp245_r1i1p1f1_gr1_20170101-20171229.nc",
      "prov:usedEntity": "tas_day_INM-CM5-0_ssp245_r1i1p1f1_gr1_20160101-20201229.nc",
      "prov:activity": "subset_tas_2"
    }
  }
}

This provenance document can also be displayed as an image:

Provenance Example

Changes

0.8.2 (2022-05-16)

  • Updated to daops 0.8.1 and clisops 0.9.1 (#211).

  • Added tests to check correct metadata (#211).

0.8.1 (2022-04-20)

  • Updated to roocs-utils 0.6.1 (#209).

  • Fixed director for new average_time operator (#208).

  • Added smoke tests for c3s-cmip5 and c3s-cordex (#208, #209).

0.8.0 (2022-04-14)

  • Added “average” and “average_time” operators (#191, #206).

  • Removed “diff” operator (#204).

  • Cleaned up workflow and tests (#205).

  • Added changes for CMIP6 decadal (#202).

  • Updated to daops 0.8.0 (#207).

  • Updated to clisops 0.9.0 (#207).

  • Updated to latest bokeh 2.4.2 in dashboard (#207).

  • Updated pre-commit (#207).

  • Updated pywps 4.5.2 (#203, #207).

0.7.0 (2021-11-08)

  • Added “subset-by-point” (#190).

  • Updated to clisops 0.7.0.

  • Updated to daops 0.7.0.

  • Updated dashboard (#195).

  • Updated provenance namespace (#188).

0.6.2 (2021-08-11)

  • Update pywps 4.4.5 (#186).

  • Updated provenance types and ids (#184).

  • Update dashboard (#183).

0.6.1 (2021-06-18)

  • Added initial dashboard (#182).

  • Update clisops 0.6.5.

0.6.0 (2021-05-20)

  • Inventory urls removed from etc/roocs.ini. Intake catalog url now lives in daops. (#175)

  • Intake catalog base and search functionality moved to daops. Database intake implementation remains in rook. (#175)

  • Updated to roocs-utils 0.4.2.

  • Updated to clisops 0.6.4.

  • Updated to daops 0.6.0.

  • Added initial usage process (#178)

0.5.0 (2021-04-01)

  • Updated pywps 4.4.2.

  • Updated clisops 0.6.3.

  • Updated roocs-utils 0.3.0.

  • Use FileMapper for search results (#169).

  • Using intake catalog (#148).

0.4.2 (2021-03-22)

  • Updated clisops 0.6.2

0.4.1 (2021-03-21)

  • Updated pywps 4.4.1 (#162, #154, #151).

  • Use pywps storage_copy_function=link (#154).

  • Updated director with InvalidCollection error (#153).

  • Added locust (storm) tests (#141, #149, #155).

  • Updated smoke tests (#134, #137).

  • Cleaned requirements (#152).

  • Fixed warning in workflow yaml loaded (#142).

  • Removed original files option for average and added test (#136).

0.4.0 (2021-03-04)

  • Removed cfunits, udunits2, cf-xarray and python-dateutil as dependencies.

  • Use daops>=0.5.0

  • Renamed axes input of wps_average.Average to dims

  • Added wps_average to work with daops.ops.average (#126)

  • Fixed tests for new inventory (#127)

  • Use apply_fixes=False for average (#129)

  • Added smoke tests (#131, #134)

0.3.1 (2021-02-24)

  • Pin cf_xarray <0.5.0 … does not work with daops/clisops.

0.3.0 (2021-02-24)

  • Fixed testdata using git-python (#123).

  • Removed xfail where not needed (#121).

  • Updated PyWPS 4.4.0 (#120).

  • Updated provenance (#112, #114 ,#119).

  • Fixed subset alignment (#117).

  • apply_fixes and original_files option added for WPS processes and the Operator class (#111).

  • Replaced travis with GitHub CI (#104).

  • director module added. This makes decisions on what is returned - NetCDF files or original file URLs (#77, #83)

  • python-dateutil>=2.8.1 added as a new dependency.

  • Allow no inventory option when processing datasets

  • c3s-cmip6 dataset ids must now be identified by the use of c3s-cmip6 (#87).

  • Fixed workflow (#79, #75, #71).

0.2.0 (2020-11-19)

Changes:

  • Build on cookiecutter template with cruft update.

  • Available processes: subset, orchestrate.

  • Using daops for subsetting operation.

  • Using a simple workflow implementation for combining operators.

  • Process outputs are provided as Metalink documents.

  • Added initial support for provenance documentation.

0.1.0 (2020-04-03)

  • First release.

Indices and tables