-
Notifications
You must be signed in to change notification settings - Fork 268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Idea: New way to organize many tests #1353
Comments
I like this idea a lot. We'll be able to document this dataset as well and theoretically it could be a useful research asset in it's own right. 👍 💪 |
Great. I hacked out an early version to help me find seeds for #1288 by watching all the invocations of ---
coplayer:
init_kwargs: {}
name: Cooperator
expected_outcome:
coplayer_actions: CCCCCCCCCCCCCC
player_actions: CCCCCCDDDDDDDD
player_attributes: null
match_parameters:
game: null
noise: null
prob_end: null
seed: null
turns: null
player:
init_kwargs:
initial_plays: null
name: Adaptive
---
coplayer:
init_kwargs: {}
name: Defector
expected_outcome:
coplayer_actions: DDDDDDDDDDDDDD
player_actions: CCCCCCDDDDDDDD
player_attributes: null
match_parameters:
game: null
noise: null
prob_end: null
seed: null
turns: null
player:
init_kwargs:
initial_plays: null
name: Adaptive From there I was able to cook up a script to run these matches and look for new seeds, or potentially opponents. It's rough but here's how it works. import axelrod
from axelrod import load_matches
def verify_match_outcomes(match, expected_actions1, expected_actions2, attrs):
# Test expected sequence of plays from the match is as expected.
player1, player2 = match.players
for (play, expected_play) in zip(player1.history, expected_actions1):
if play != expected_play:
# print(play, expected_play)
return False
for (play, expected_play) in zip(player2.history, expected_actions2):
# print(play, expected_play)
if play != expected_play:
return False
# Test final player attributes are as expected
if attrs:
for attr, value in attrs.items():
if getattr(player1, attr) != value:
return False
return True
def run_matches():
match_configs = list(load_matches())
for match_config in match_configs:
try:
match = match_config()
except AttributeError:
continue
player, coplayer = match.players
if isinstance(player, axelrod.Human) or isinstance(coplayer, axelrod.Human):
continue
print(match_config)
seed = match_config.match_parameters.seed
attrs = match_config.expected_outcome.player_attributes
player_actions = match_config.expected_outcome.player_actions
coplayer_actions = match_config.expected_outcome.coplayer_actions
if seed is None:
match.play()
print(verify_match_outcomes(match, player_actions, coplayer_actions, attrs))
else:
# Search for a seed
for seed in range(1, 200000):
match.set_seed(seed)
# axelrod.seed(seed)
match.play()
if verify_match_outcomes(match, player_actions, coplayer_actions, attrs):
print("Seed found:", seed)
break
print()
if __name__ == "__main__":
run_matches()
|
I'd like to do the same thing for full tournaments and Moran processes. |
Sounds good to me. |
In the course of finding new seeds for many tests for #1288 it occurs to me that we can probably organize these tests in a more useful way. There are many tests in the library like the following:
where the player is given by
self.Player()
fromTestPlayer
. Sometimes these tests useMockPlayer
as an opponent rather than an actual opponent. Searching for a new seed involves manually extracting the info (opponent, histories, etc.) and looping over seeds until a new one producing the same behavior is found.I think a better approach would be to have a large file of expected matches, encoding the range of expected behaviors of every strategy, with rows of the form
Player1Class, Player2Class, expected_history_1, expected_history_2, seed, other params (like noise), ...
e.g.
Essentially it's a dataframe of tests. We could include a description of the test and other metadata. Maybe some other format would be better but hopefully you get the idea.
Such a structure has a few benefits:
The file can encode all expected behaviors of a strategy by the histories it should at some point yield in a simple way, rather than being scattered across various tests as is currently, reducing a lot of redundant code (regardless of whether a seed is required)
It's easier to find new seeds when we need them. An auxiliary script can easily scan for new seeds if we change something about how seeding works, or how a strategy works, etc. Right now there's no easy way to extract all the expected tests to systematically find new seeds because the necessary data is hard-coded into functions, and often there is more than one "row" per test function.
Similarly, when adding a new strategy, generic search functions can look for an opponent, a seed, etc. that generates a specific sequence of outcomes. I think we're all currently doing these as one-offs, with MockPlayers, etc.
The associated tests will be more single issue now rather than some of the
test_strategy
functions we have that test several different things at once. This will show all the failures rather than failures one at time now for the compound tests as each subtest fails.The collection of expected matches might itself be useful somehow
Similarly, we have a lot of example tournaments and Moran processes with expected outputs that are seed dependent; perhaps they could be encoded in a similar manner. Not every test can be written as so but I would guess that the majority of tests could be done this way. This bumps up against #421, having a way to configure a tournament in a code-free way.
Thoughts?
The text was updated successfully, but these errors were encountered: