ODC Google Sandbox
Contributed by the CEOS SEO
How to use the ODC-Google Sandbox
The Open Data Cube (ODC) Google Sandbox is a free and open programming interface that connects users to Google Earth Engine datasets. The open source tool allows users to run Python application algorithms using Google's Colab notebook environment. This tool demonstrates rapid creation of science products anywhere in the world without the need to download and process the satellite data. Some example applications include: scene-based cloud statistics, custom cloud-filtered mosaics, spectral index products including vegetation fractional cover, historic water extent, and vegetation land change. Basic operation of the tool will support many users for small-scale demonstrations and training but can also be scaled in size and scope, with Google Cloud resources, to support enhanced user needs.
Earth Engine Authorization is required to run the Google Colab notebooks. This process requires you have a Google account and have Earth Engine user authorization. Users with a GMAIL address will already have a Google account. Below are links for more information on how to get a Google account and Earth Engine user authorization. The Earth Engine authorization may take up to 2 days, but is much faster for existing GMAIL, "edu" and "gov" addresses.
Creating a new Google Account: https://support.google.com/accounts/answer/27441
Getting Earth Engine Authorization: https://signup.earthengine.google.com/
Steps to Run a Notebook in the ODC-Google Sandbox
The quickest way to start running the ODC-Google Colab Sandbox is to click the following link, or the button below which will open Colab with a notebook called Getting Started: Open Data Cube on Google Colab.
In this notebook, you will find an overview of ODC and Colab, simple interactions with the Data Cube, and in Section 5, links to explore several example applications.
If you’d like to access the applications directly, you can start by opening the GitHub folder with a list of sample Python notebooks: https://github.com/ceos-seo/odc-colab/tree/master/notebooks
Right-click on any notebook and "Open Link in a New Tab". You can do this for up to 5 separate notebooks as each one is considered a separate "session" and has its own dedicated computing instance that includes ~12GB RAM and ~100 GB of storage.
Click on "Open in Colab" at the top of the notebook. This will prepare the notebook to run. An example of the Colab view and menu is shown below.
From the menu, select "Runtime > Run all" to run the notebook for the first time.
Wait about 90 seconds for the ODC and data index to load (first 2 cells of code)
When the notebook gets to cell #3, there is an Earth Engine authorization step. Click the Earth Engine authorization link to open another window. Select your Google account email address and then get the verification code. You can copy the code or select the ICON to the right of the code. An example of this window is shown below. Each user and notebook will have a unique authorization code.
Return to the notebook and paste the verification code into the box and hit RETURN
The rest of the code should run until completion
As a notebook is running, cells are numbered as they are completed. A cell that is currently being executed will have a "circle" moving around the cell number label. Users can also view the execution time in the banner at the very bottom of the notebook and view the resources being consumed (RAM and Storage) using the banner in the top-right of the notebook screen.
NOTE: Each notebook must be run in a separate tab and will have its own dedicated Google processing instance. The Earth Engine authorization step is required for each separate notebook.
Running all of parts of a notebook after the first run
Once a notebook has run to completion it is possible to make changes to the notebook and run again. This does not require new Earth Engine authorization unless the notebook is restarted from the beginning using one of the "Runtime > Restart" commands. In order to run the notebook with new edits (e.g. new region, new time window, new plot configuration) users can select “Runtime > Run after” from within the block the edits were made. This will first run the current block and then continue running blocks until the end of the notebook is reached. A user may also select “Runtime > Run all” to run the entire notebook again. This is not the same as a restart so will maintain the Colab setup process and current authorization. Finally, a common method of running notebooks is to run one block at a time. This is done by selecting the block and then hitting "Shift+Return". This will only run a single block but can be quickly repeated to continue running successive blocks.
Saving and sharing notebooks is possible using Google Drive. Users will see a link "Copy to Drive" at the top of the notebook or using the menu at "File > Save a copy in Drive". Users will notice that the saved file is stored in a "Colab Notebooks" folder on their Google Drive with an assigned name and "time stamp". The filename can be altered later from the user's Google Drive account. In order to share this notebook, users should click the "share" button in the Drive menu or right-click the filename to find the sharing link. Just enter the name or email address of another Google user to allow them sharing rights. It should be noted that these notebooks are generally small files and that users receive up to 15GB of free Google Drive storage.
Several export options are available: downloading the Jupyter notebook as a *.ipynb file to your local drive, saving the notebook to your personal GitHub account, printing the notebook on a local printer, or saving the notebook as a PDF (select "Save as PDF" as the print destination).
Output files are saved in the temporary Colab instance. Users can find these output files by clicking the folder icon on the far-left menu. This folder structure will allow users to find the output file and download it to their local computer. Once the Colab instance is closed, the output file is lost.
The background code for the ODC-Google Sandbox is located in these GitHub locations:
Summary of the Sample Notebooks and How to Make Modifications
Each of the baseline notebooks uses global Landsat-8 data. Users are able to make changes to the region and time window to run sample cases anywhere in the world. It is suggested that users keep their regions and time windows similar in scale to those used in the baseline notebooks as this will allow the code to run to completion in a reasonable amount of time (e.g. less than 5 minutes). Larger regions and longer time windows are possible, but they may exceed the limits of the Google Colab environment (12GB RAM, 100 GB storage) or take a long time to run to completion. In addition to modifying regions and times, users may also want to modify plot settings or add their own code. Comments are used throughout the notebooks to describe the details of code blocks and where code blocks can be easily modified by users. Look for "MODIFY HERE" statements in the code to identify blocks of code that are easily modified by users to yield new results.
We recommend questions be posted to our online forum here>>