Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement dataless cubes #6253

Open
wants to merge 57 commits into
base: main
Choose a base branch
from
Open

Implement dataless cubes #6253

wants to merge 57 commits into from

Conversation

ESadek-MO
Copy link
Contributor

@ESadek-MO ESadek-MO commented Dec 11, 2024

Closes #4447.

I plan to do this in four stages:

  • Ensure that cubes can handle data is None when handed a shape value, and can essentially round-trip removing and adding data. This shouldn't break existing tests. 71c7ae8
  • Create new tests for the above. b2fb2f8
  • Ensure that all DataManager methods all make sense and work with None data. DataManager.copy() is an example of a method that won't make sense with None data.
  • Create new tests for the above.

@ESadek-MO ESadek-MO marked this pull request as draft December 11, 2024 11:36
Copy link

codecov bot commented Dec 11, 2024

Codecov Report

Attention: Patch coverage is 66.00000% with 34 lines in your changes missing coverage. Please review.

Project coverage is 89.73%. Comparing base (d135ebb) to head (9221917).
Report is 16 commits behind head on main.

Files with missing lines Patch % Lines
lib/iris/cube.py 17.39% 9 Missing and 10 partials ⚠️
lib/iris/_data_manager.py 91.52% 3 Missing and 2 partials ⚠️
lib/iris/_merge.py 0.00% 2 Missing and 2 partials ⚠️
lib/iris/_concatenate.py 60.00% 1 Missing and 1 partial ⚠️
lib/iris/_lazy_data.py 50.00% 1 Missing and 1 partial ⚠️
lib/iris/exceptions.py 50.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6253      +/-   ##
==========================================
- Coverage   89.83%   89.73%   -0.10%     
==========================================
  Files          88       88              
  Lines       23347    23451     +104     
  Branches     4344     4383      +39     
==========================================
+ Hits        20974    21044      +70     
- Misses       1646     1664      +18     
- Partials      727      743      +16     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ESadek-MO ESadek-MO requested a review from bjlittle December 18, 2024 12:49
@ESadek-MO ESadek-MO marked this pull request as ready for review December 18, 2024 12:50
Copy link
Member

@bjlittle bjlittle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ESadek-MO I'm only partially through the review, but I thought I'd push these review comments early so that you can see them and address them ASAP.

Note that you also need to refactor the following DataManager methods:

  • __equal__ to account for dataless cubes, particularly when dataless cubes are involved in the operation i.e., for the new use cases between data and dataless, and dataless and dataless
  • lazy_data to deal with the dataless case
  • akin to the lazy_data method, you also have to deal with core_data for the dataless case
  • __repr__ requires to cope with the dataless case i.e., provide the shape

@@ -832,3 +841,6 @@ def use_plugin(plugin_name):
significance of the import statement and warn that it is an unused import.
"""
importlib.import_module(f"iris.plugins.{plugin_name}")


DATALESS_COPY = "NONE"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ESadek-MO It's better to have such constants defined at the top of the module, plus with a comment please 👍

Around about line+142 (after the constraint definition conveniences seems about right)

managed. If a value of None is given, the data manager will be
considered dataless.

shape :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ESadek-MO We've been adopting the following numpydoc standard for specifying the type of parameters i.e., shape : tuple, optional

Same standard applies to the data parameter 👍


shape :
A tuple, representing the shape of the data manager. This can only
be used in the case of `data=None`, and will render the data manager
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use double-ticks, see here

i.e., ``data=None``


"""
if (shape is not None) and (data is not None):
msg = "`shape` should only be provided if `data is None`"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
msg = "`shape` should only be provided if `data is None`"
msg = '"shape" should only be provided if "data" is None'

# Initialise the instance.
self._lazy_array = None
self._real_array = None

# Assign the data payload to be managed.
self._shape = shape
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ESadek-MO The comment on line+45 applies to self.data = data on line+47.

Could you move self._shape = shape to the above # Initialise the instance. block 👍

Comment on lines 270 to 271
if not dataless:
data = np.asarray(data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ESadek-MO We shouldn't need this defensive code for the dataless case i.e., we shouldn't get here.

result = self.core_data().shape
return result

def is_dataless(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def is_dataless(self):
def is_dataless(self) -> bool:

return result

def is_dataless(self):
"""Determine whether the cube is dataless.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ESadek-MO Perhaps it's best to not use dataless to describle is_dataless e.g., maybe something like Determine whether the cube has no data. instead?

Comment on lines 295 to 299
if self.core_data() is None:
result = self._shape
else:
result = self.core_data().shape
return result
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if self.core_data() is None:
result = self._shape
else:
result = self.core_data().shape
return result
return self._shape if self._shape else self.core_data().shape

bool

"""
return (self.core_data() is None) and (self.shape is not None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ESadek-MO Given our axiom (and if this isn't true then something is wrong) it must always be the case that:

  • self._shape = None and (self._lazy_array is not None or self._real_array is not None)
  • self._shape is not None and (self._lazy_array is None and self._real_array is None)

Therefore, is_dataless should be defined simply as return self._shape is not None, right?

@ESadek-MO
Copy link
Contributor Author

Something is wrong with the _assert_axiom code now. I need to catch a bus, so will have to figure out why tomorrow.

@ESadek-MO
Copy link
Contributor Author

ESadek-MO commented Dec 23, 2024

Although the tests are passing, next time I look at this, I need to investigate the data setter. Currently, line 256 causes issues, as self._shape should be changable if self._shape = (), and it won't be.

That is to say, it works as expected, but I believe it hasn't been written in such a way as it SHOULD be.


@property
def shape(self):
"""The shape of the data being managed."""
return self.core_data().shape
return self._shape if self._shape else self.core_data().shape
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is defensive; it wouldn't cause any issues that I can think of to just have return self._shape, but this should protect us in case I've missed something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've actually removed this now.

() is falsy. I could have easily done ...if self._shape is not None, but I decided there's no point in adding redundant code. Easy undo if my reviewer disagrees!

state = is_lazy ^ is_real
assert state, emsg.format("" if is_lazy else "no ", "" if is_real else "no ")

if not (is_lazy ^ is_real):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be clearer as if is_lazy == is_real.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did see this, but forgot to comment.
Personally, I think I prefer it as it is, but if there's significant desire for your suggestion, I don't mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

A Dataless Cube
3 participants