QA masks for MODIS NDVI time series from Google Earth Engine

The MODIS vegetation product has a Summary QA band which has 4 values:

  • 0: Good data, use with confidence
  • 1: Marginal data, useful but look at detailed QA for more information
  • 2: Pixel covered with snow/ice
  • 3: Pixel is cloudy

It also has a DetailedQA band which is bit packed and gives more information about the pixel quality. Here we'll unpack the DetailedQA band to see.

This uses the gee_subset packge to query Google Earth Engine (GEE) for a single pixel timeseries across 6 years. To use it you'll need a GEE account.

In [73]:
import ee
from gee_subset.gee_subset import gee_subset
import pandas as pd
import seaborn as sns
import numpy as np
# ee.Authenticate()
ee.Initialize()

import unpackqa
In [74]:
df = gee_subset(product    = 'MODIS/006/MOD13A1',
                bands      = ['NDVI','DetailedQA'],
                scale      = 500,
                start_date = '2015-01-01',
                end_date   = '2020-12-31',
                latitude   =   44.8166,
                longitude  = -114.8177,
                pad        =  0)
df.head()
Out[74]:
id longitude latitude date NDVI DetailedQA product
0 2015_01_01 -114.815922 44.814704 2015-01-01 806 18709 MODIS/006/MOD13A1
1 2015_01_17 -114.815922 44.814704 2015-01-17 2622 36330 MODIS/006/MOD13A1
2 2015_02_02 -114.815922 44.814704 2015-02-02 2578 2116 MODIS/006/MOD13A1
3 2015_02_18 -114.815922 44.814704 2015-02-18 2614 2441 MODIS/006/MOD13A1
4 2015_03_06 -114.815922 44.814704 2015-03-06 2470 2372 MODIS/006/MOD13A1

First adjust the NDVI by the scaling factor and drop any NA values

In [75]:
df['NDVI'] = df.NDVI * 0.0001
df = df[~df.NDVI.isna()].reset_index()

Values from GEE come in as floats, but unpackqa only works on integers. Here DetailedQA are 16-bits

In [76]:
df['DetailedQA'] = df.DetailedQA.astype(np.uint16)

expanded_detailed_qa = unpackqa.unpack_to_dict(df.DetailedQA.values, product = 'MOD13Q1v006_DetailedQA', flags='all')

unpack_to_dict produces a dictionary where each key is a flag, and each value is an array the same length as the df data.frame. This can be converted directly to a new data.frame where the columns are the flag names, and then appended to the original data.frame.

In [77]:
expanded_detailed_qa = pd.DataFrame(expanded_detailed_qa)
df = pd.concat([df, expanded_detailed_qa], axis=1)
df.head()
Out[77]:
index id longitude latitude date NDVI DetailedQA product VI_Quality VI_Usefulness Aerosol_Quantity Adjacent_cloud_detected Atmosphere_BRDF_Correction Mixed_Clouds Land_Water_Mask Possible_snow_ice Possible_shadow
0 0 2015_01_01 -114.815922 44.814704 2015-01-01 0.0806 18709 MODIS/006/MOD13A1 1 5 0 1 0 0 1 1 0
1 1 2015_01_17 -114.815922 44.814704 2015-01-17 0.2622 36330 MODIS/006/MOD13A1 2 10 3 1 0 1 1 0 1
2 2 2015_02_02 -114.815922 44.814704 2015-02-02 0.2578 2116 MODIS/006/MOD13A1 0 1 1 0 0 0 1 0 0
3 3 2015_02_18 -114.815922 44.814704 2015-02-18 0.2614 2441 MODIS/006/MOD13A1 1 2 2 1 0 0 1 0 0
4 4 2015_03_06 -114.815922 44.814704 2015-03-06 0.2470 2372 MODIS/006/MOD13A1 0 1 1 1 0 0 1 0 0

Each pixel value in the time series now has the associated flags as columns.
The VI_Quality flag has 4 values, where 0 signifies the best quality, 1 signifies a QA issue, 2 indicates likely clouds, and 3 indicates the pixel was not produced. Note in this instance there are no "not produced" pixels.

In [78]:
colors = {0:'green',
          1:'red',
          2:'darkred',
          3:'black'}

sns.scatterplot(x='date',y='NDVI',hue='VI_Quality',data=df,palette=colors)
Out[78]:
<AxesSubplot:xlabel='date', ylabel='NDVI'>

Other DetailedQA flags can give more insight into what it is affecting pixel quality. Here lets single out clouds, snow/ice, or cloud shadows.

In [79]:
colors = {0:'green',
          1:'red'}
sns.scatterplot(x='date',y='NDVI',hue='Mixed_Clouds',data=df,palette=colors)
Out[79]:
<AxesSubplot:xlabel='date', ylabel='NDVI'>
In [80]:
sns.scatterplot(x='date',y='NDVI',hue='Possible_snow_ice',data=df,palette=colors)
Out[80]:
<AxesSubplot:xlabel='date', ylabel='NDVI'>
In [81]:
sns.scatterplot(x='date',y='NDVI',hue='Possible_shadow',data=df,palette=colors)
Out[81]:
<AxesSubplot:xlabel='date', ylabel='NDVI'>

Other quality issues may also be affecting it. Looking at the unique combinations of VI_Quality along with the other flags shows what is driving the final quality indicator.

In [82]:
 df[['VI_Quality','Mixed_Clouds','Adjacent_cloud_detected','Possible_snow_ice','Possible_shadow','Atmosphere_BRDF_Correction']].drop_duplicates().sort_values('VI_Quality')
Out[82]:
VI_Quality Mixed_Clouds Adjacent_cloud_detected Possible_snow_ice Possible_shadow Atmosphere_BRDF_Correction
2 0 0 0 0 0 0
43 0 0 0 0 1 0
4 0 0 1 0 0 0
0 1 0 1 1 0 0
68 1 0 0 1 1 0
24 1 0 1 1 1 0
23 1 0 0 1 0 0
14 1 0 0 0 0 0
3 1 0 1 0 0 0
19 1 0 1 0 1 0
20 2 1 0 0 0 0
95 2 0 1 0 1 0
28 2 1 0 0 1 0
49 2 0 0 0 0 0
1 2 1 1 0 1 0
74 2 1 1 0 0 0
93 2 0 0 0 1 0
116 2 0 1 0 0 0