Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ProximityArchive for novelty search #472

Merged
merged 40 commits into from
Jun 28, 2024

Conversation

gresavage
Copy link
Contributor

@gresavage gresavage commented Jun 15, 2024

Description

Implement the ProximityArchive (also known as unstructured archive or novelty archive) from Novelty Search (Lehman, 2011) http://eplex.cs.ucf.edu/papers/lehman_ecj11.pdf

Resolves #468

Naming

ProximityArchive is a specific name that reflects how the archive operations are based on how close solutions are to each other in measure space. We also considered NearestNeighborArchive, but that name is a bit long. We find that NoveltyArchive and UnstructuredArchive are rather imprecise terms.

TODO

  • Add tests -> also added python-box as a dev dependency in setup.py to make tests easier
  • Implement class
  • Allow passing in objective=None -> objective will then default to 0
  • Add tests for scheduler with None objective (also fixed handling of None objectives for add_mode="single")
  • Tidy up TODOs in class (i.e., clean up code)

Status

  • I have read the guidelines in
    CONTRIBUTING.md
  • I have formatted my code using yapf
  • I have tested my code by running pytest
  • I have linted my code with pylint
  • I have added a one-line description of my change to the changelog in
    HISTORY.md
  • This PR is ready to go

Copy link
Member

@btjanaka btjanaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @gresavage, thank you for your PR! I appreciate that it is well-documented and tested and that you clearly put a lot of time into it. I have left comments requesting a couple of changes.

Overall, I think the UnstructuredArchive is great. The UnstructuredGridArchive seems rather specialized, so I do not think we should add it to pyribs. Could you remove from this PR the UnstructuredGridArchive and the roll() method in ArrayStore?

I know these things can take quite a bit of time; if you need to work on other things, just let me know and I'm happy to help make the changes I mentioned.

ribs/archives/_array_store.py Outdated Show resolved Hide resolved
ribs/archives/_array_store.py Outdated Show resolved Hide resolved
ribs/archives/_unstructured_archive.py Outdated Show resolved Hide resolved
ribs/archives/_unstructured_archive.py Outdated Show resolved Hide resolved
ribs/archives/_unstructured_archive.py Outdated Show resolved Hide resolved
ribs/archives/_unstructured_archive.py Outdated Show resolved Hide resolved
tests/archives/conftest.py Outdated Show resolved Hide resolved
ribs/archives/_unstructured_archive.py Outdated Show resolved Hide resolved
ribs/archives/_unstructured_archive.py Outdated Show resolved Hide resolved
ribs/archives/_unstructured_archive.py Outdated Show resolved Hide resolved
gresavage and others added 20 commits June 19, 2024 13:46
change `GridArchive` to `GridUnstructuredArchive` in `roll` docstring
narrow pylint disable scope
upper and lower bound properties return measure max/min without checking archive size.

update docstrings: UnstructuredArchive, index_of

remove learning_rate, threshold_min arguments from __init__

rename "sparsity" to "novelty"

remove boundaries property and `List` typehints
@btjanaka
Copy link
Member

Hi @gresavage, just wanted to give a quick update on things. I'm going through and modifying the UnstructuredArchive -- I've changed it a decent amount to align more with the rest of the library and how we implement things. I removed GridUnstrucutredArchive and ArrayStore.roll as per my earlier comments, but do consider taking a look at my comment here #472 (comment) as I think it reveals an unexpected behavior. I'll keep working on this bit by bit, and it should be ready soon. Thanks again!

@btjanaka btjanaka changed the title feat: unstructured archives Add UnstructuredArchive for novelty search Jun 25, 2024
@gresavage
Copy link
Contributor Author

Hi @btjanaka I'm eager to see what you've done.

I haven't pushed any of my changes w.r.t NS + local competition yet as they are still in flux and I'd like to know your thoughts on handling non-dominated sorting... I also want to fold my changes into what you've got so the two strategies are homogeneous if you decide to include NS + local competition.

I was also thinking more about the objective threshold and learning rate. Are we sure we want to do away with this behavior entirely if we implement NS + local competition? I removed the keyword arguments in my last commit but I think it could be useful to have CMA-MAE behavior with the local competition variant.

@gresavage
Copy link
Contributor Author

gresavage commented Jun 26, 2024

Hi @btjanaka I'm eager to see what you've done.
I haven't pushed any of my changes w.r.t NS + local competition yet as they are still in flux and I'd like to know your thoughts on handling non-dominated sorting... I also want to fold my changes into what you've got so the two strategies are homogeneous if you decide to include NS + local competition.
I was also thinking more about the objective threshold and learning rate. Are we sure we want to do away with this behavior entirely if we implement NS + local competition? I removed the keyword arguments in my last commit but I think it could be useful to have CMA-MAE behavior with the local competition variant.

@gresavage I'll reply to both your comments here.

To implement NSLC in pyribs, I think we would create an emitter that contains the MOEA, most likely using one of the libraries you mentioned (indeed, it would likely be out of scope to create NDS algorithms in pyribs unless there is some custom feature we want). The archive would then output the novelty and local competition via the add_info returned from the add() method, and the emitter would use the NDS to rank based on the novelty and local competition. Alternatively, if the emitter needs to also include the population in novelty and local competition computations, then it can compute those metrics inside of the emitter rather than in the archive.

Regarding the archive, I took another look at the NSLC paper, and I think one point that may be confusing us (at least me) is whether solutions are replaced based on their local competition. My understanding is that NSLC just uses a regular archive identical to the one from novelty search. The local competition seems to be a metric that is computed using the solutions in the archive, but it does not seem the metric plays any role in updating the archive. If you saw a section or piece of code saying otherwise, could you send it along? I've also asked some friends in the QD community for their thoughts on this; I'm genuinely not sure how this part works.

If I am right, then the current archive would be roughly what we need for both NS and NSLC, and the job of implementing NSLC would likely involve making a new emitter as I described above.

RE: Emitters

I think an emitter could work for NSLC so long as the ranking doesn't play a roll in what gets added to the archive. The only real hurdle I see with that route is that NSLC doesn't bother with how candidates are generated, so an NSLC emitter could essentially wrap any of the other emitters and just leave the ask method unchanged. In my eyes this means making an NSLC variant for each of the other emitters which is only a hurdle inasmuch as it is tedious 😆 . The tell method would be where the magic happens.

RE: Replacement & Use of NDS Rank

I looked over the papers as well and I think you're right that solutions are not replaced - which greatly simplifies the issue.

So after thinking about it even more, I think the way the archive works in NSLC is that essentially every member of the current population is added to the archive if it survives the genetic selection. And a solution only survives selection based on NSGA-II optimizing morphological novelty and local competition objective plus novelty playing the role of crowding distance. So I think the archive in NSLC doesn't actually do any determination of what gets added - if it is sent to the add method it gets added.

I also think whether the "selection" based on non-domination rank and novelty for NSLC is done outside of PyRibs by a user algorithm or in the add method is kind of a "6 one way, half dozen the other" situation. So I am happy to assist with whatever path you think is best and within scope.

btjanaka added a commit that referenced this pull request Jun 26, 2024
## Description

<!-- Provide a brief description of the PR's purpose here. -->

We are working on supporting diversity optimization algorithms such as
novelty search (see #472). One difference from other algorithms is that
such algorithms do not require objectives. However, the current API to
Scheduler.tell() is:

```
tell(objective, measures, **fields)
```

i.e., it assumes objective and measures will be provided. This PR seeks
to make it possible to call `scheduler.tell(measures)` in addition to
the current `scheduler.tell(objective, measures)`.

### Use Cases

There are a couple of use cases that the scheduler must satisfy:

1. QD optimization (i.e., current behavior should not break):
    - Positional arguments: `scheduler.tell(objective, measures)`
- Keyword arguments: `scheduler.tell(objective=objective,
measures=measures)`
- Mixed (a rather extreme case): `scheduler.tell(objective,
measures=measures)`
2. Diversity optimization:
    - Positional arguments: `scheduler.tell(measures)`
    - Keyword arguments: `scheduler.tell(measures=measures)`
3. Diversity optimization but where an objective has been added as an
extra field:
- Positional arguments: `scheduler.tell(measures, objective=objective)`
- Keyword arguments: `scheduler.tell(measures=measures,
objective=objective)`

Ideally, we would also be robust to a case where we need to only have
objectives in the future:

4. Objective optimization:
   - Positional arguments: `scheduler.tell(objective)`
   - Keyword arguments: `scheduler.tell(objective=objective)`

### Potential Solutions

1. Change scheduler API to `tell(*args, **kwargs)` and add a
`problem_type` to each archive.
- args and kwargs are interpreted based on the type of the archive. If
the problem type is quality_diversity, then the args are interpreted as
objectives and measures. If the problem type is diversity_optimization,
then the args are interpreted as measures. If the problem type is
single_objective, the args are interpreted as objective. kwargs will be
treated the same as fields in all cases.
    - Pros: Backwards-compatible and handles all 4 use cases above.
- Cons: Makes the archives a bit more complex, may be a bit confusing to
users because the signature will not be very informative. However, I
think we can get away with this with good docstrings and documentation.
2. Allow the current objective argument to be treated as measures, and
assign a default argument of None to the current objective and measures
-> i.e., `tell(objective=None, measures=None, **fields)`
- Inspired by how numpy's `integers` can be either `rng.integers(high)`
or `rng.integers(low, high)` -> see
[here](https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.integers.html)
- Pros: Minimal changes to current API, and still quite interpretable
for users.
- Cons: On the other hand, it may be confusing to see that objective can
be set to measures. This will also fail case 3 above, i.e.,
`scheduler.tell(measures, objective=objective)` will throw an error
because `measures` is actually `objective` due to the positional
argument, so `objective` is being passed in twice.
3. **Require the user to pass in `objective=None` when performing
diversity optimization and maintain the same `scheduler.tell` API.**
- Pros: This requires the fewest changes to the API. It also provides a
good model for what EvolutionStrategyEmitter and other emitters should
do -- all these classes can now implement behavior for `objective=None`.
It also does not require modifying the archives to have a `problem_type`
or `archive_type` attribute. Furthermore, `objective` can still be
provided if that is a field in the archive.
- Cons: It is a bit verbose to have `scheduler.tell(None, measures)` or
`scheduler.tell(objective=None, measures=measures)` but I think users
can understand that.
- Optionally, we can also set a default of `objective=None` and
`measures=None` so that one can pass just measures, e.g.,
`scheduler.tell(measures=...)` or just objectives
`scheduler.tell(objective=)`. However, I think we may not want to add
this feature for now as it requires adding default values to a lot of
places.
4. Add a new parameter to `scheduler.tell` called `mode` or something
similar to indicate diversity optimization is in effect, i.e.,
`scheduler.tell(measures, mode="diversity")`
- Pros: This is very explicit and makes it clear that diversity
optimization is happening without objectives.
    - Cons: Rather verbose.

### Decision

I believe **Solution 3** is the best solution, since it involves the
fewest changes to the API and also contains the changes to the
scheduler.

## TODO

<!-- Notable points that this PR has either accomplished or will
accomplish. -->

- [x] Implement behavior for when `objective=None` in `Scheduler.tell`
-- specifically, the scheduler will now allow any field to be None, and
when the field is None, the scheduler will simply pass None down to the
emitter tell functions.
- [x] Write test for passing `objective=None` to the scheduler, both
with positional and keyword arguments. This test will be updated once
the UnstructuredArchive is implemented.

## Status

- [x] I have read the guidelines in

[CONTRIBUTING.md](https://github.com/icaros-usc/pyribs/blob/master/CONTRIBUTING.md)
- [x] I have formatted my code using `yapf`
- [x] I have tested my code by running `pytest`
- [x] I have linted my code with `pylint`
- [x] I have added a one-line description of my change to the changelog
in
      `HISTORY.md`
- [x] This PR is ready to go
@btjanaka
Copy link
Member

btjanaka commented Jun 27, 2024

@gresavage I see what you are saying about using the NSLC ranking. I think a good path would be to implement something in ribs.emitters.rankers. Essentially, the ranker would integrate an NDS, and the ranker could be used in emitters like EvolutionStrategyEmitter. Meanwhile, the archive could return local_competition in add_info. The emitter would take in the novelty and local_competition from the add_info and pass these pieces of info to the ranker.

For now, I've updated the code to just be a regular Novelty Search archive. I added a TODO to add local competition on issue #474. Would you mind taking a look at the code? Thanks!

@btjanaka btjanaka changed the title Add UnstructuredArchive for novelty search Add NearestNeighborArchive for novelty search Jun 27, 2024
@gresavage
Copy link
Contributor Author

@gresavage I see what you are saying about using the NSLC ranking. I think a good path would be to implement something in ribs.emitters.rankers. Essentially, the ranker would integrate an NDS, and the ranker could be used in emitters like EvolutionStrategyEmitter. Meanwhile, the archive could return local_competition in add_info. The emitter would take in the novelty and local_competition from the add_info and pass these pieces of info to the ranker.

For now, I've updated the code to just be a regular Novelty Search archive. I added a TODO to add local competition on issue #474. Would you mind taking a look at the code? Thanks!

@btjanaka AHHHH, rankers, of course! Clever :)
image

I will take a look.

@gresavage
Copy link
Contributor Author

@btjanaka Do you think we need to implement visualization for the archive?

@btjanaka
Copy link
Member

@btjanaka Do you think we need to implement visualization for the archive?

Yes, I added a TODO on #474

@gresavage
Copy link
Contributor Author

@btjanaka Do you think we need to implement visualization for the archive?

Yes, I added a TODO on #474

Apologies! I literally just opened that up 😬

@btjanaka btjanaka changed the title Add NearestNeighborArchive for novelty search Add ProximityArchive for novelty search Jun 27, 2024
@btjanaka btjanaka merged commit 9aec95a into icaros-usc:master Jun 28, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support unstructured archives [FEATURE REQUEST]
2 participants