Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define common-metadata operations on split attribute dictionaries. #77

Draft
wants to merge 13 commits into
base: splitattrs_ncsave_redo
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 15 additions & 3 deletions docs/src/further_topics/metadata.rst
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,16 @@ actual `data attribute`_ names of the metadata members on the Iris class.
metadata members are Iris specific terms, rather than recognised `CF Conventions`_
terms.

.. note::

:class:`~iris.cube.Cube` :attr:`~iris.cube.Cube.attributes` implement the
concept of dataset-level and variable-level attributes, to enable correct
NetCDF loading and saving (see :class:`~iris.cube.CubeAttrsDict` and NetCDF
:func:`~iris.fileformats.netcdf.saver.save` for more). ``attributes`` on
the other classes do not have this distinction, but the ``attributes``
members of ALL the classes still have the same interface, and can be
compared.


Common Metadata API
===================
Expand Down Expand Up @@ -128,7 +138,9 @@ For example, given the following :class:`~iris.cube.Cube`,
source 'Data from Met Office Unified Model 6.05'

We can easily get all of the associated metadata of the :class:`~iris.cube.Cube`
using the ``metadata`` property:
using the ``metadata`` property (note the specialised
:class:`~iris.cube.CubeAttrsDict` for the :attr:`~iris.cube.Cube.attributes`,
as mentioned earlier):

>>> cube.metadata
CubeMetadata(standard_name='air_temperature', long_name=None, var_name='air_temperature', units=Unit('K'), attributes=CubeAttrsDict(globals={'Conventions': 'CF-1.5'}, locals={'STASH': STASH(model=1, section=3, item=236), 'Model scenario': 'A1B', 'source': 'Data from Met Office Unified Model 6.05'}), cell_methods=(CellMethod(method='mean', coord_names=('time',), intervals=('6 hour',), comments=()),))
Expand Down Expand Up @@ -701,7 +713,7 @@ which is replaced with a **different value**,
>>> metadata != cube.metadata
True
>>> metadata.combine(cube.metadata) # doctest: +SKIP
CubeMetadata(standard_name=None, long_name=None, var_name='air_temperature', units=Unit('K'), attributes={'Conventions': 'CF-1.5', 'Model scenario': 'A1B', 'STASH': STASH(model=1, section=3, item=236), 'source': 'Data from Met Office Unified Model 6.05'}, cell_methods=(CellMethod(method='mean', coord_names=('time',), intervals=('6 hour',), comments=()),))
CubeMetadata(standard_name=None, long_name=None, var_name='air_temperature', units=Unit('K'), attributes={'STASH': STASH(model=1, section=3, item=236), 'Model scenario': 'A1B', 'source': 'Data from Met Office Unified Model 6.05', 'Conventions': 'CF-1.5'}, cell_methods=(CellMethod(method='mean', coord_names=('time',), intervals=('6 hour',), comments=()),))

The ``combine`` method combines metadata by performing a **strict** comparison
between each of the associated metadata member values,
Expand All @@ -724,7 +736,7 @@ Let's reinforce this behaviour, but this time by combining metadata where the
>>> metadata != cube.metadata
True
>>> metadata.combine(cube.metadata).attributes
{'Model scenario': 'A1B'}
CubeAttrsDict(globals={}, locals={'Model scenario': 'A1B'})

The combined result for the ``attributes`` member only contains those
**common keys** with **common values**.
Expand Down
5 changes: 4 additions & 1 deletion docs/src/userguide/iris_cubes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,10 @@ A cube consists of:
data dimensions as the coordinate has dimensions.

* an attributes dictionary which, other than some protected CF names, can
hold arbitrary extra metadata.
hold arbitrary extra metadata. This implements the concept of dataset-level
and variable-level attributes when loading and and saving NetCDF files (see
:class:`~iris.cube.CubeAttrsDict` and NetCDF
:func:`~iris.fileformats.netcdf.saver.save` for more).
* a list of cell methods to represent operations which have already been
applied to the data (e.g. "mean over time")
* a list of coordinate "factories" used for deriving coordinates from the
Expand Down
9 changes: 9 additions & 0 deletions docs/src/whatsnew/latest.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,14 @@ This document explains the changes made to Iris for this release
✨ Features
===========

#. `@pp-mo`_, `@lbdreyer`_ and `@trexfeathers`_ improved
:class:`~iris.cube.Cube` :attr:`~iris.cube.Cube.attributes` handling to
better preserve the distinction between dataset-level and variable-level
attributes, allowing file-Cube-file round-tripping of NetCDF attributes. See
:class:`~iris.cube.CubeAttrsDict` and NetCDF
:func:`~iris.fileformats.netcdf.saver.save` for more. (:pull:`5152`,
`split attributes project`_)

#. `@rcomer`_ rewrote :func:`~iris.util.broadcast_to_shape` so it now handles
lazy data. (:pull:`5307`)

Expand Down Expand Up @@ -94,3 +102,4 @@ This document explains the changes made to Iris for this release

.. comment
Whatsnew resources in alphabetical order:
.. _split attributes project: https://github.com/orgs/SciTools/projects/5?pane=info
16 changes: 13 additions & 3 deletions lib/iris/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,9 @@ def callback(cube, field, filename):
class Future(threading.local):
"""Run-time configuration controller."""

def __init__(self, datum_support=False, pandas_ndim=False):
def __init__(
self, datum_support=False, pandas_ndim=False, save_split_attrs=False
):
"""
A container for run-time options controls.

Expand All @@ -164,6 +166,11 @@ def __init__(self, datum_support=False, pandas_ndim=False):
pandas_ndim : bool, default=False
See :func:`iris.pandas.as_data_frame` for details - opts in to the
newer n-dimensional behaviour.
save_split_attrs : bool, default=False
Save "global" and "local" cube attributes to netcdf in appropriately
different ways : "global" ones are saved as dataset attributes, where
possible, while "local" ones are saved as data-variable attributes.
See :func:`iris.fileformats.netcdf.saver.save`.

"""
# The flag 'example_future_flag' is provided as a reference for the
Expand All @@ -175,12 +182,15 @@ def __init__(self, datum_support=False, pandas_ndim=False):
# self.__dict__['example_future_flag'] = example_future_flag
self.__dict__["datum_support"] = datum_support
self.__dict__["pandas_ndim"] = pandas_ndim
self.__dict__["save_split_attrs"] = save_split_attrs

def __repr__(self):
# msg = ('Future(example_future_flag={})')
# return msg.format(self.example_future_flag)
msg = "Future(datum_support={}, pandas_ndim={})"
return msg.format(self.datum_support, self.pandas_ndim)
msg = "Future(datum_support={}, pandas_ndim={}, save_split_attrs={})"
return msg.format(
self.datum_support, self.pandas_ndim, self.save_split_attrs
)

# deprecated_options = {'example_future_flag': 'warning',}
deprecated_options = {}
Expand Down
125 changes: 125 additions & 0 deletions lib/iris/common/_split_attribute_dicts.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# Copyright Iris contributors
#
# This file is part of Iris and is released under the LGPL license.
# See COPYING and COPYING.LESSER in the root of the repository for full
# licensing details.
"""
Dictionary operations for dealing with the CubeAttrsDict "split"-style attribute
dictionaries.

The idea here is to convert a split-dictionary into a "plain" one for calculations,
whose keys are all pairs of the form ('global', <keyname>) or ('local', <keyname>).
And to convert back again after the operation, if the result is a dictionary.

For "strict" operations this clearly does all that is needed. For lenient ones,
we _might_ want for local+global attributes of the same name to interact.
However, on careful consideration, it seems that this is not actually desirable for
any of the common-metadata operations.
So, we simply treat "global" and "local" attributes of the same name as entirely
independent. Which happily is also the easiest to code, and to explain.
"""

from collections.abc import Mapping, Sequence
from functools import wraps


def _convert_splitattrs_to_pairedkeys_dict(dic):
"""
Convert a split-attributes dictionary to a "normal" dict.

Transform a :class:`~iris.cube.CubeAttributesDict` "split" attributes dictionary
into a 'normal' :class:`dict`, with paired keys of the form ('global', name) or
('local', name).
"""

def _global_then_local_items(dic):
# Routine to produce global, then local 'items' in order, and with all keys
# "labelled" as local or global type, to ensure they are all unique.
for key, value in dic.globals.items():
yield ("global", key), value
for key, value in dic.locals.items():
yield ("local", key), value

return dict(_global_then_local_items(dic))


def _convert_pairedkeys_dict_to_splitattrs(dic):
"""
Convert an input with global/local paired keys back into a split-attrs dict.

For now, this is always and only a :class:`iris.cube.CubeAttrsDict`.
"""
from iris.cube import CubeAttrsDict

result = CubeAttrsDict()
for key, value in dic.items():
keytype, keyname = key
if keytype == "global":
result.globals[keyname] = value
else:
assert keytype == "local"
result.locals[keyname] = value
return result


def adjust_for_split_attribute_dictionaries(operation):
"""
Decorator to make a function of attribute-dictionaries work with split attributes.

The wrapped function of attribute-dictionaries is currently always one of "equals",
"combine" or "difference", with signatures like :
equals(left: dict, right: dict) -> bool
combine(left: dict, right: dict) -> dict
difference(left: dict, right: dict) -> None | (dict, dict)

The results of the wrapped operation are either :
* for "equals" (or "__eq__") : a boolean
* for "combine" : a (converted) attributes-dictionary
* for "difference" : a list of (None or "pair"), where a pair contains two
dictionaries

Before calling the wrapped operation, its inputs (left, right) are modified by
converting any "split" dictionaries to a form where the keys are pairs
of the form ("global", name) or ("local", name).

After calling the wrapped operation, for "combine" or "difference", the result can
contain a dictionary or dictionaries. These are then transformed back from the
'converted' form to split-attribute dictionaries, before returning.

"Split" dictionaries are all of class :class:`~iris.cube.CubeAttrsDict`, since
the only usage of 'split' attribute dictionaries is in Cubes (i.e. they are not
used for cube components).
"""

@wraps(operation)
def _inner_function(*args, **kwargs):
from iris.cube import CubeAttrsDict

# First make all inputs into CubeAttrsDict, if not already.
args = [
arg if isinstance(arg, CubeAttrsDict) else CubeAttrsDict(arg)
for arg in args
]
# Convert all inputs into 'pairedkeys' type dicts
args = [_convert_splitattrs_to_pairedkeys_dict(arg) for arg in args]

result = operation(*args, **kwargs)

# Convert known specific cases of 'pairedkeys' dicts in the result, and convert
# those back into split-attribute dictionaries.
if isinstance(result, Mapping):
# Fix a result which is a single dictionary -- for "combine"
result = _convert_pairedkeys_dict_to_splitattrs(result)
elif isinstance(result, Sequence) and len(result) == 2:
# Fix a result which is a pair of dictionaries -- for "difference"
left, right = result
left, right = (
_convert_pairedkeys_dict_to_splitattrs(left),
_convert_pairedkeys_dict_to_splitattrs(right),
)
result = result.__class__([left, right])
# ELSE: leave other types of result unchanged. E.G. None, bool

return result

return _inner_function
41 changes: 41 additions & 0 deletions lib/iris/common/metadata.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
from xxhash import xxh64_hexdigest

from ..config import get_logger
from ._split_attribute_dicts import adjust_for_split_attribute_dictionaries
from .lenient import _LENIENT
from .lenient import _lenient_service as lenient_service
from .lenient import _qualname as qualname
Expand Down Expand Up @@ -1255,6 +1256,46 @@ def _check(item):

return result

#
# Override each of the attribute-dict operations in BaseMetadata, to enable
# them to deal with split-attribute dictionaries correctly.
# There are 6 of these, for (equals/combine/difference) * (lenient/strict).
# Each is overridden with a *wrapped* version of the parent method, using the
# "@adjust_for_split_attribute_dictionaries" decorator, which converts any
# split-attribute dictionaries in the inputs to ordinary dicts, and likewise
# re-converts any dictionaries in the return value.
#

@staticmethod
@adjust_for_split_attribute_dictionaries
def _combine_lenient_attributes(left, right):
return BaseMetadata._combine_lenient_attributes(left, right)

@staticmethod
@adjust_for_split_attribute_dictionaries
def _combine_strict_attributes(left, right):
return BaseMetadata._combine_strict_attributes(left, right)

@staticmethod
@adjust_for_split_attribute_dictionaries
def _compare_lenient_attributes(left, right):
return BaseMetadata._compare_lenient_attributes(left, right)

@staticmethod
@adjust_for_split_attribute_dictionaries
def _compare_strict_attributes(left, right):
return BaseMetadata._compare_strict_attributes(left, right)

@staticmethod
@adjust_for_split_attribute_dictionaries
def _difference_lenient_attributes(left, right):
return BaseMetadata._difference_lenient_attributes(left, right)

@staticmethod
@adjust_for_split_attribute_dictionaries
def _difference_strict_attributes(left, right):
return BaseMetadata._difference_strict_attributes(left, right)


class DimCoordMetadata(CoordMetadata):
"""
Expand Down
23 changes: 16 additions & 7 deletions lib/iris/cube.py
Original file line number Diff line number Diff line change
Expand Up @@ -825,6 +825,7 @@ class CubeAttrsDict(MutableMapping):

>>> from iris.cube import Cube
>>> cube = Cube([0])
>>> # CF defines 'history' as global by default.
>>> cube.attributes.update({"history": "from test-123", "mycode": 3})
>>> print(cube.attributes)
{'history': 'from test-123', 'mycode': 3}
Expand All @@ -843,6 +844,9 @@ class CubeAttrsDict(MutableMapping):

"""

# TODO: Create a 'further topic' / 'tech paper' on NetCDF I/O, including
# discussion of attribute handling.

def __init__(
self,
combined: Optional[Union[Mapping, str]] = "__unspecified",
Expand All @@ -853,7 +857,7 @@ def __init__(
Create a cube attributes dictionary.

We support initialisation from a single generic mapping input, using the default
global/local assignment rules explained at :meth:`__setatrr__`, or from
global/local assignment rules explained at :meth:`__setattr__`, or from
two separate mappings. Two separate dicts can be passed in the ``locals``
and ``globals`` args, **or** via a ``combined`` arg which has its own
``.globals`` and ``.locals`` properties -- so this allows passing an existing
Expand All @@ -878,6 +882,7 @@ def __init__(
--------

>>> from iris.cube import CubeAttrsDict
>>> # CF defines 'history' as global by default.
>>> CubeAttrsDict({'history': 'data-story', 'comment': 'this-cube'})
CubeAttrsDict(globals={'history': 'data-story'}, locals={'comment': 'this-cube'})

Expand Down Expand Up @@ -930,19 +935,19 @@ def _normalise_attrs(
return attributes

@property
def locals(self):
def locals(self) -> LimitedAttributeDict:
return self._locals

@locals.setter
def locals(self, attributes):
def locals(self, attributes: Optional[Mapping]):
self._locals = self._normalise_attrs(attributes)

@property
def globals(self):
def globals(self) -> LimitedAttributeDict:
return self._globals

@globals.setter
def globals(self, attributes):
def globals(self, attributes: Optional[Mapping]):
self._globals = self._normalise_attrs(attributes)

#
Expand Down Expand Up @@ -1335,8 +1340,12 @@ def _names(self):
#
# Ensure that .attributes is always a :class:`CubeAttrsDict`.
#
@CFVariableMixin.attributes.setter
def attributes(self, attributes):
@property
def attributes(self) -> CubeAttrsDict:
return super().attributes

@attributes.setter
def attributes(self, attributes: Optional[Mapping]):
"""
An override to CfVariableMixin.attributes.setter, which ensures that Cube
attributes are stored in a way which distinguishes global + local ones.
Expand Down
Loading