Add ProximityArchive for novelty search #472

gresavage · 2024-06-15T00:35:59Z

Description

Implement the ProximityArchive (also known as unstructured archive or novelty archive) from Novelty Search (Lehman, 2011) http://eplex.cs.ucf.edu/papers/lehman_ecj11.pdf

Resolves #468

Naming

ProximityArchive is a specific name that reflects how the archive operations are based on how close solutions are to each other in measure space. We also considered NearestNeighborArchive, but that name is a bit long. We find that NoveltyArchive and UnstructuredArchive are rather imprecise terms.

TODO

Add tests -> also added python-box as a dev dependency in setup.py to make tests easier
Implement class
Allow passing in objective=None -> objective will then default to 0
Add tests for scheduler with None objective (also fixed handling of None objectives for add_mode="single")
Tidy up TODOs in class (i.e., clean up code)

Status

I have read the guidelines in
CONTRIBUTING.md
I have formatted my code using yapf
I have tested my code by running pytest
I have linted my code with pylint
I have added a one-line description of my change to the changelog in
HISTORY.md
This PR is ready to go

btjanaka

Hi @gresavage, thank you for your PR! I appreciate that it is well-documented and tested and that you clearly put a lot of time into it. I have left comments requesting a couple of changes.

Overall, I think the UnstructuredArchive is great. The UnstructuredGridArchive seems rather specialized, so I do not think we should add it to pyribs. Could you remove from this PR the UnstructuredGridArchive and the roll() method in ArrayStore?

I know these things can take quite a bit of time; if you need to work on other things, just let me know and I'm happy to help make the changes I mentioned.

ribs/archives/_array_store.py

ribs/archives/_unstructured_archive.py

tests/archives/conftest.py

ribs/archives/_unstructured_archive.py

change `GridArchive` to `GridUnstructuredArchive` in `roll` docstring

narrow pylint disable scope

upper and lower bound properties return measure max/min without checking archive size. update docstrings: UnstructuredArchive, index_of remove learning_rate, threshold_min arguments from __init__ rename "sparsity" to "novelty" remove boundaries property and `List` typehints

…bs into 468-unstructured-archives

btjanaka · 2024-06-25T09:09:34Z

Hi @gresavage, just wanted to give a quick update on things. I'm going through and modifying the UnstructuredArchive -- I've changed it a decent amount to align more with the rest of the library and how we implement things. I removed GridUnstrucutredArchive and ArrayStore.roll as per my earlier comments, but do consider taking a look at my comment here #472 (comment) as I think it reveals an unexpected behavior. I'll keep working on this bit by bit, and it should be ready soon. Thanks again!

gresavage · 2024-06-25T17:21:49Z

Hi @btjanaka I'm eager to see what you've done.

I haven't pushed any of my changes w.r.t NS + local competition yet as they are still in flux and I'd like to know your thoughts on handling non-dominated sorting... I also want to fold my changes into what you've got so the two strategies are homogeneous if you decide to include NS + local competition.

I was also thinking more about the objective threshold and learning rate. Are we sure we want to do away with this behavior entirely if we implement NS + local competition? I removed the keyword arguments in my last commit but I think it could be useful to have CMA-MAE behavior with the local competition variant.

gresavage · 2024-06-26T02:46:37Z

Hi @btjanaka I'm eager to see what you've done.
I haven't pushed any of my changes w.r.t NS + local competition yet as they are still in flux and I'd like to know your thoughts on handling non-dominated sorting... I also want to fold my changes into what you've got so the two strategies are homogeneous if you decide to include NS + local competition.
I was also thinking more about the objective threshold and learning rate. Are we sure we want to do away with this behavior entirely if we implement NS + local competition? I removed the keyword arguments in my last commit but I think it could be useful to have CMA-MAE behavior with the local competition variant.

@gresavage I'll reply to both your comments here.

To implement NSLC in pyribs, I think we would create an emitter that contains the MOEA, most likely using one of the libraries you mentioned (indeed, it would likely be out of scope to create NDS algorithms in pyribs unless there is some custom feature we want). The archive would then output the novelty and local competition via the add_info returned from the add() method, and the emitter would use the NDS to rank based on the novelty and local competition. Alternatively, if the emitter needs to also include the population in novelty and local competition computations, then it can compute those metrics inside of the emitter rather than in the archive.

Regarding the archive, I took another look at the NSLC paper, and I think one point that may be confusing us (at least me) is whether solutions are replaced based on their local competition. My understanding is that NSLC just uses a regular archive identical to the one from novelty search. The local competition seems to be a metric that is computed using the solutions in the archive, but it does not seem the metric plays any role in updating the archive. If you saw a section or piece of code saying otherwise, could you send it along? I've also asked some friends in the QD community for their thoughts on this; I'm genuinely not sure how this part works.

If I am right, then the current archive would be roughly what we need for both NS and NSLC, and the job of implementing NSLC would likely involve making a new emitter as I described above.

RE: Emitters

I think an emitter could work for NSLC so long as the ranking doesn't play a roll in what gets added to the archive. The only real hurdle I see with that route is that NSLC doesn't bother with how candidates are generated, so an NSLC emitter could essentially wrap any of the other emitters and just leave the ask method unchanged. In my eyes this means making an NSLC variant for each of the other emitters which is only a hurdle inasmuch as it is tedious 😆 . The tell method would be where the magic happens.

RE: Replacement & Use of NDS Rank

I looked over the papers as well and I think you're right that solutions are not replaced - which greatly simplifies the issue.

So after thinking about it even more, I think the way the archive works in NSLC is that essentially every member of the current population is added to the archive if it survives the genetic selection. And a solution only survives selection based on NSGA-II optimizing morphological novelty and local competition objective plus novelty playing the role of crowding distance. So I think the archive in NSLC doesn't actually do any determination of what gets added - if it is sent to the add method it gets added.

I also think whether the "selection" based on non-domination rank and novelty for NSLC is done outside of PyRibs by a user algorithm or in the add method is kind of a "6 one way, half dozen the other" situation. So I am happy to assist with whatever path you think is best and within scope.

## Description  We are working on supporting diversity optimization algorithms such as novelty search (see #472). One difference from other algorithms is that such algorithms do not require objectives. However, the current API to Scheduler.tell() is: ``` tell(objective, measures, **fields) ``` i.e., it assumes objective and measures will be provided. This PR seeks to make it possible to call `scheduler.tell(measures)` in addition to the current `scheduler.tell(objective, measures)`. ### Use Cases There are a couple of use cases that the scheduler must satisfy: 1. QD optimization (i.e., current behavior should not break): - Positional arguments: `scheduler.tell(objective, measures)` - Keyword arguments: `scheduler.tell(objective=objective, measures=measures)` - Mixed (a rather extreme case): `scheduler.tell(objective, measures=measures)` 2. Diversity optimization: - Positional arguments: `scheduler.tell(measures)` - Keyword arguments: `scheduler.tell(measures=measures)` 3. Diversity optimization but where an objective has been added as an extra field: - Positional arguments: `scheduler.tell(measures, objective=objective)` - Keyword arguments: `scheduler.tell(measures=measures, objective=objective)` Ideally, we would also be robust to a case where we need to only have objectives in the future: 4. Objective optimization: - Positional arguments: `scheduler.tell(objective)` - Keyword arguments: `scheduler.tell(objective=objective)` ### Potential Solutions 1. Change scheduler API to `tell(*args, **kwargs)` and add a `problem_type` to each archive. - args and kwargs are interpreted based on the type of the archive. If the problem type is quality_diversity, then the args are interpreted as objectives and measures. If the problem type is diversity_optimization, then the args are interpreted as measures. If the problem type is single_objective, the args are interpreted as objective. kwargs will be treated the same as fields in all cases. - Pros: Backwards-compatible and handles all 4 use cases above. - Cons: Makes the archives a bit more complex, may be a bit confusing to users because the signature will not be very informative. However, I think we can get away with this with good docstrings and documentation. 2. Allow the current objective argument to be treated as measures, and assign a default argument of None to the current objective and measures -> i.e., `tell(objective=None, measures=None, **fields)` - Inspired by how numpy's `integers` can be either `rng.integers(high)` or `rng.integers(low, high)` -> see [here](https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.integers.html) - Pros: Minimal changes to current API, and still quite interpretable for users. - Cons: On the other hand, it may be confusing to see that objective can be set to measures. This will also fail case 3 above, i.e., `scheduler.tell(measures, objective=objective)` will throw an error because `measures` is actually `objective` due to the positional argument, so `objective` is being passed in twice. 3. **Require the user to pass in `objective=None` when performing diversity optimization and maintain the same `scheduler.tell` API.** - Pros: This requires the fewest changes to the API. It also provides a good model for what EvolutionStrategyEmitter and other emitters should do -- all these classes can now implement behavior for `objective=None`. It also does not require modifying the archives to have a `problem_type` or `archive_type` attribute. Furthermore, `objective` can still be provided if that is a field in the archive. - Cons: It is a bit verbose to have `scheduler.tell(None, measures)` or `scheduler.tell(objective=None, measures=measures)` but I think users can understand that. - Optionally, we can also set a default of `objective=None` and `measures=None` so that one can pass just measures, e.g., `scheduler.tell(measures=...)` or just objectives `scheduler.tell(objective=)`. However, I think we may not want to add this feature for now as it requires adding default values to a lot of places. 4. Add a new parameter to `scheduler.tell` called `mode` or something similar to indicate diversity optimization is in effect, i.e., `scheduler.tell(measures, mode="diversity")` - Pros: This is very explicit and makes it clear that diversity optimization is happening without objectives. - Cons: Rather verbose. ### Decision I believe **Solution 3** is the best solution, since it involves the fewest changes to the API and also contains the changes to the scheduler. ## TODO  - [x] Implement behavior for when `objective=None` in `Scheduler.tell` -- specifically, the scheduler will now allow any field to be None, and when the field is None, the scheduler will simply pass None down to the emitter tell functions. - [x] Write test for passing `objective=None` to the scheduler, both with positional and keyword arguments. This test will be updated once the UnstructuredArchive is implemented. ## Status - [x] I have read the guidelines in [CONTRIBUTING.md](https://github.com/icaros-usc/pyribs/blob/master/CONTRIBUTING.md) - [x] I have formatted my code using `yapf` - [x] I have tested my code by running `pytest` - [x] I have linted my code with `pylint` - [x] I have added a one-line description of my change to the changelog in `HISTORY.md` - [x] This PR is ready to go

btjanaka · 2024-06-27T02:05:34Z

@gresavage I see what you are saying about using the NSLC ranking. I think a good path would be to implement something in ribs.emitters.rankers. Essentially, the ranker would integrate an NDS, and the ranker could be used in emitters like EvolutionStrategyEmitter. Meanwhile, the archive could return local_competition in add_info. The emitter would take in the novelty and local_competition from the add_info and pass these pieces of info to the ranker.

For now, I've updated the code to just be a regular Novelty Search archive. I added a TODO to add local competition on issue #474. Would you mind taking a look at the code? Thanks!

gresavage · 2024-06-27T16:45:53Z

@gresavage I see what you are saying about using the NSLC ranking. I think a good path would be to implement something in ribs.emitters.rankers. Essentially, the ranker would integrate an NDS, and the ranker could be used in emitters like EvolutionStrategyEmitter. Meanwhile, the archive could return local_competition in add_info. The emitter would take in the novelty and local_competition from the add_info and pass these pieces of info to the ranker.

For now, I've updated the code to just be a regular Novelty Search archive. I added a TODO to add local competition on issue #474. Would you mind taking a look at the code? Thanks!

@btjanaka AHHHH, rankers, of course! Clever :)

I will take a look.

ribs/archives/_nearest_neighbor_archive.py

gresavage · 2024-06-27T20:25:31Z

@btjanaka Do you think we need to implement visualization for the archive?

btjanaka · 2024-06-27T20:37:18Z

@btjanaka Do you think we need to implement visualization for the archive?

Yes, I added a TODO on #474

gresavage · 2024-06-27T20:39:40Z

@btjanaka Do you think we need to implement visualization for the archive?

Yes, I added a TODO on #474

Apologies! I literally just opened that up 😬

This reverts commit 66f2d44.

feat: unstructured archives

c7d4bf6

gresavage mentioned this pull request Jun 15, 2024

Support unstructured archives [FEATURE REQUEST] #468

Closed

btjanaka added 3 commits June 17, 2024 17:17

Formatting

d8199d1

Merge branch 'master' into 468-unstructured-archives

8b426a4

Fix dtype

dbb13ed

btjanaka requested changes Jun 18, 2024

View reviewed changes

gresavage and others added 20 commits June 19, 2024 13:46

Update _array_store.py

0a46707

change `GridArchive` to `GridUnstructuredArchive` in `roll` docstring

Update conftest.py

f8cadeb

narrow pylint disable scope

Alphabetize listing

45465aa

Merge branch '468-unstructured-archives' of github.com:gresavage/pyri…

ed249d0

…bs into 468-unstructured-archives

Rearrange list

747955f

history

0cdc70a

Fix calls in tests

2695d1e

Update docstring and second draft of index_of

71759f7

Make index_of return nearest neighbors

4c03201

Remove GridUnstructuredArchive test

f08d0c9

Update conftest for unstructuredarchive

19772cf

Add cells method

cc0449e

Use box to simplify conftest

9c8c93f

Start new draft of unstructuredarchive

165e85e

More tests

b84e62a

Finish most of tests

7dd8f84

Fix bug

73d04a2

Add cqd_score

3cf2adb

Remove GridUnstructuredArchive

794c13b

Fix more tests

a4c30a8

btjanaka changed the title ~~feat: unstructured archives~~ Add UnstructuredArchive for novelty search Jun 25, 2024

btjanaka mentioned this pull request Jun 26, 2024

Support diversity optimization in Scheduler.tell #473

Merged

8 tasks

Merge branch 'master' into 468-unstructured-archives

515aba6

btjanaka mentioned this pull request Jun 26, 2024

[FEATURE REQUEST] Diversity Optimization #474

Open

15 tasks

btjanaka added 6 commits June 26, 2024 17:01

Allow None objective in validation

2179228

Clean up class and tests

999e0d1

Merge branch 'master' into 468-unstructured-archives

243e6ba

allow none objective in utils

8d70ae3

Fix scheduler tests

c68c2be

Fix tests

e1e3f72

btjanaka added 2 commits June 26, 2024 19:25

Fix tests; add more docs

2d1e9e4

Rename to NearestNeighborArchive

99dab43

btjanaka changed the title ~~Add UnstructuredArchive for novelty search~~ Add NearestNeighborArchive for novelty search Jun 27, 2024

gresavage commented Jun 27, 2024

View reviewed changes

ribs/archives/_nearest_neighbor_archive.py Outdated Show resolved Hide resolved

gresavage commented Jun 27, 2024

View reviewed changes

ribs/archives/_nearest_neighbor_archive.py Outdated Show resolved Hide resolved

btjanaka changed the title ~~Add NearestNeighborArchive for novelty search~~ Add ProximityArchive for novelty search Jun 27, 2024

btjanaka added 5 commits June 27, 2024 15:03

Rename to ProximityArchive

6ae9c0c

Remove compare_to_batch

7837855

Throw error for initial capacity

0692a26

Rename novelty_threshold to proximity_threshold

66f2d44

Edits

945c458

btjanaka approved these changes Jun 28, 2024

View reviewed changes

Revert "Rename novelty_threshold to proximity_threshold"

b6432a4

This reverts commit 66f2d44.

btjanaka merged commit 9aec95a into icaros-usc:master Jun 28, 2024
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ProximityArchive for novelty search #472

Add ProximityArchive for novelty search #472

gresavage commented Jun 15, 2024 •

edited by btjanaka

Loading

btjanaka left a comment

btjanaka commented Jun 25, 2024

gresavage commented Jun 25, 2024

gresavage commented Jun 26, 2024 •

edited

Loading

btjanaka commented Jun 27, 2024 •

edited

Loading

gresavage commented Jun 27, 2024

gresavage commented Jun 27, 2024

btjanaka commented Jun 27, 2024

gresavage commented Jun 27, 2024

Add ProximityArchive for novelty search #472

Add ProximityArchive for novelty search #472

Conversation

gresavage commented Jun 15, 2024 • edited by btjanaka Loading

Description

Naming

TODO

Status

btjanaka left a comment

Choose a reason for hiding this comment

btjanaka commented Jun 25, 2024

gresavage commented Jun 25, 2024

gresavage commented Jun 26, 2024 • edited Loading

RE: Emitters

RE: Replacement & Use of NDS Rank

btjanaka commented Jun 27, 2024 • edited Loading

gresavage commented Jun 27, 2024

gresavage commented Jun 27, 2024

btjanaka commented Jun 27, 2024

gresavage commented Jun 27, 2024

gresavage commented Jun 15, 2024 •

edited by btjanaka

Loading

gresavage commented Jun 26, 2024 •

edited

Loading

btjanaka commented Jun 27, 2024 •

edited

Loading