The xarray
package allows you to work with labelled arrays. It uses numpy arrays, so unpackqa
works with it without any modifications.
import xarray as xr
import dask
import numpy as np
import pooch
import unpackqa
L8_QA_PIXEL_FILE = 'https://landsat.usgs.gov/sites/default/files/C2_Sample_Data/LC08_L1TP_140041_20130503_20200912_02_T1.zip'
L8_qa_product = 'LANDSAT_8_C2_L2_QAPixel'
file_path = pooch.retrieve(
L8_QA_PIXEL_FILE,
known_hash=None,
processor=pooch.Unzip()
)
my_file = [f for f in file_path if 'QA_PIXEL' in f][0]
ds = xr.open_rasterio(my_file, chunks={'x':2000,'y':2000})
# Drop the band axis since the file has a single band.
ds = ds.isel(band=0)
unpack_to_array
can be used directly inside the xarray apply_ufunc
function. With this setup you can take advantage of a parallel computing environment. Read more in the xarray docs: http://xarray.pydata.org/en/stable/user-guide/dask.html
Here a new dim will be added called flag
, with the same length as the number of flags in the Landsat 8 QA_PIXEL band. The flag
dim should be set as a core dim.
L8_flag_names = unpackqa.list_qa_flags(L8_qa_product)
flag_ds = xr.apply_ufunc(
unpackqa.unpack_to_array,
ds,
kwargs = dict(product=L8_qa_product),
output_core_dims = [['flag']],
dask_gufunc_kwargs = dict(output_sizes={'flag':len(L8_flag_names)}),
output_dtypes=[np.uint8],
vectorize=False,
dask='parallelized'
)
# put labels on the flag coordinates
flag_ds['flag'] = L8_flag_names
print(flag_ds)
<xarray.DataArray (y: 7351, x: 7551, flag: 12)> dask.array<transpose, shape=(7351, 7551, 12), dtype=uint8, chunksize=(2000, 2000, 12), chunktype=numpy.ndarray> Coordinates: band int64 1 * y (y) float64 3.144e+06 3.144e+06 3.144e+06 ... 2.924e+06 2.924e+06 * x (x) float64 3.708e+05 3.708e+05 3.709e+05 ... 5.973e+05 5.973e+05 * flag (flag) <U23 'Fill' 'Dilated_Cloud' ... 'Cirrus_Confidence'
Another option would be to have each flag as it's own data variable. That can be done with some rearranging.
flag_ds2 = [flag_ds.sel(flag=flag_name).drop('flag').rename(flag_name) for flag_name in L8_flag_names]
flag_ds2 = xr.merge(flag_ds2)
print(flag_ds2)
<xarray.Dataset> Dimensions: (y: 7351, x: 7551) Coordinates: band int64 1 * y (y) float64 3.144e+06 3.144e+06 ... 2.924e+06 * x (x) float64 3.708e+05 3.708e+05 ... 5.973e+05 Data variables: Fill (y, x) uint8 dask.array<chunksize=(2000, 2000), meta=np.ndarray> Dilated_Cloud (y, x) uint8 dask.array<chunksize=(2000, 2000), meta=np.ndarray> Cirrus (y, x) uint8 dask.array<chunksize=(2000, 2000), meta=np.ndarray> Cloud (y, x) uint8 dask.array<chunksize=(2000, 2000), meta=np.ndarray> Cloud_Shadow (y, x) uint8 dask.array<chunksize=(2000, 2000), meta=np.ndarray> Snow (y, x) uint8 dask.array<chunksize=(2000, 2000), meta=np.ndarray> Clear (y, x) uint8 dask.array<chunksize=(2000, 2000), meta=np.ndarray> Water (y, x) uint8 dask.array<chunksize=(2000, 2000), meta=np.ndarray> Cloud_Confidence (y, x) uint8 dask.array<chunksize=(2000, 2000), meta=np.ndarray> Cloud_Shadow_Confidence (y, x) uint8 dask.array<chunksize=(2000, 2000), meta=np.ndarray> Snow_Ice_Confidence (y, x) uint8 dask.array<chunksize=(2000, 2000), meta=np.ndarray> Cirrus_Confidence (y, x) uint8 dask.array<chunksize=(2000, 2000), meta=np.ndarray>
When the chunks
argument is set for xr.open_rasterio
, all files are accessed lazily. The load
options executes all underlying functions and loads all data into memory.
flag_ds2.load()
print(flag_ds2)
<xarray.Dataset> Dimensions: (y: 7351, x: 7551) Coordinates: band int64 1 * y (y) float64 3.144e+06 3.144e+06 ... 2.924e+06 * x (x) float64 3.708e+05 3.708e+05 ... 5.973e+05 Data variables: Fill (y, x) uint8 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 Dilated_Cloud (y, x) uint8 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 Cirrus (y, x) uint8 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 Cloud (y, x) uint8 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 Cloud_Shadow (y, x) uint8 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 Snow (y, x) uint8 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 Clear (y, x) uint8 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 Water (y, x) uint8 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 Cloud_Confidence (y, x) uint8 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 Cloud_Shadow_Confidence (y, x) uint8 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 Snow_Ice_Confidence (y, x) uint8 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 Cirrus_Confidence (y, x) uint8 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0