04 Oct Unlocking Global Satellite data using C-SCALE infrastructure with openEO
Aqua Monitor provides a global view of land-to-water and water-to-land changes over the last 35 years. A notebook including course material was developed to help anyone get started with global EO data using openEO, and the algorithm is translated and broken down in detail for anyone to apply it to their use case.
With increasing volumes of earth observation (EO) becoming readily available, the need for a holistic way of working with these big data also increases. Recent developments in a proposed solution for this problem, openEO, are being explored by porting the already existing EO product Aqua Monitor from Google Earth Engine to the openEO processing back-end. Aqua Monitor tracks global land-to-water changes and vice-versa over the last 35 years. The multi-petabyte volume of data used in this application pose an interesting challenge for the European Open Science Cloud (EOSC) infrastructures and services.
Within C-SCALE, the Google Earth Engine based Aqua Monitor algorithm is translated to openEO providing a European alternative for processing large volumes of EO data. By implementing the use case, data and compute requirements are communicated to the EOSC providers and as a result the openEO backend has become easier to install on the distributed infrastructures available in Europe. Towards achieving planetary scale big data analytics, the infrastructure and data availability are tested by running the workflow on large spatiotemporal scales.
The current algorithm uses a multi-petabyte dataset and requires parallel processing on large tiles. This presents significant data challenges for the infrastructures in terms of storage, rapid access and compute, as the openEO backend used is based on Apache Spark, which keeps (most) data in memory to do calculations. The current implementation on the openEO backend is a proof-of-concept version; the algorithm is successfully ported and applied but limited in spatiotemporal scale. Scaling this up requires considerable resources, which should be allocated in collaboration with backend providers.
The next challenge is that the analysis code should be understandable for non-technical experts: our aim is to share this wealth of data not only with technical experts, but also with scientists who best know where and how to apply the Aqua Monitor and similar algorithms.
Support from C-SCALE
The implementation of Aqua Monitor was part of the start of the C-SCALE project. INCD and INFN started with installing the openEO geotrellis backend on their infrastructure, with the help of the backend authors from VITO. So far, the current implementation has been run on the openEO Platform and VITO infrastructure, and the implementation on the other two backends will be tested later this year.
C-SCALE was also afforded the opportunity to present the openEO Aqua Monitor implementation at the recent UN / Austria symposium organised by UNOOSA. There we gave a workshop on the Jupyter Notebook containing the code, including exercises so that the participants can start creating their own openEO-based applications.
C-SCALE services used
OpenEO Aqua Monitor is now using / will use the following C-SCALE services:
- The EGI check-in for authorization.
- The OpenEO platform for their JupyterLab environment for teaching purposes.
- The OpenEO platform backend for data access and as a processing platform.
Testimony by researcher at Deltares
who implements the Aqua Monitor and Global Water Watch use cases in C-SCALE
- We need a processing platform for EO data. C-SCALE helped us learning about, getting access to and obtaining support for OpenEO, which we did not have before.
- EGI check-in makes it easy to solve the authorization problem, which can be time consuming for projects with many partners.
- OpenEO platform creates a great place where all backends, including their data access can be collated, and is promising as the ecosystem grows.