colorSchema | routerMode | layout | theme | neversink_slug | mdc |
---|---|---|---|---|---|
light |
hash |
cover |
neversink |
Goals in RL |
true |
Guy Davidson & Todd M. Gureckis
New York University
- McCarthy's definition of intelligence and goals.
- Reward hypothesis.
- Goals as preferences over state-action histories [Bowling et al., 2023].
- Insufficient to express constraints or risks [Bellemare et al., 2023].
- Goal-conditioned RL (more on this later).
:: title ::
::left::
- Act safely
- Don't get stuck
- Avoid collisions
- Be efficient
- Don't run out of battery
::right::
- Children (and adults) create playful goals
- These goals help us learn how to structure problem spaces and find solutions
[Chu & Schulz, 2020; Molinaro & Collins, 2023; Chu et al., 2024]
:: title ::
:: content ::
If we want to develop agents that accomplish diverse tasks across different environments, we need agents that can propose and pursue rich, complex, and creative goals.
[Ouedeyer et al., 2007; Colas et al., 2022]
- Abstraction (abstract goals, abstracting goal components)
- Temporal extension
- Compositionality
- Grounding
::left::
- Implicit (directly encoded as reward functions)
- Goal states (e.g. target manipulator positions)
- Image-based observations
- Natural language-based goals
- Represented as programs?
```lisp
(preference throwBalltoBin
(exists (?d - dodgeball ?h - hegagonal_bin)
(then
(once (agent_holds ?d))
(hold (and (not (agent_holds ?d)) (in_motion ?d)))
(once (and (not (in_motion ?d)) (in ?h ?d)))
)))
```
:: title ::
:: content ::
Prevalent non-language approaches facilitate grounding at the expense of other desiderata
Language (and programs) offer benefits at the cost of grounding complexity
::title::
::left::
- Reward functions: Challenging reward engineering effort
- Observations: Borderline impossible?
- Language: Easy to express, hard to ground
- Programs: Open question
::right::
- Reward functions: (less) Challenging reward engineering effort
- Observations: Requires embedding abstractions
- Language: Abstraction is natural, grounding is hard
- Programs: Easy for abstractions defined in program grammar
- Reward functions: Compose mathematically, not semantically
- Observations: Composing images to represent general properties is hard
- Language: Inherently compositional, but grounding is hard
- SuccessVQA near chance on held-out goals [Du et al., 2023a]
- Hill et al. [2019] generalize to held-out objects, but not negations
- Programs: Compose by default, as defined by their grammar
- The structured LTL-based approach of Leon et al. [2022] composes negation succesfully
:: left ::
- Takeaway 1: We need agents that can propose and pursue rich, complex, and creative goals.
- Takeaway 2: This requires richer goal representations and developing methods to ground them.
- Where do goals (and their representations) fall on the agent-environment boundary?
- What important desiderata are we missing?
- How do temporally extended goals play with the Markov assumption?
- Can program-based goals scale to diverse environments?
- What can building agents that propose and pursue rich goals teach us about human goal-setting?
Guy Davidson & Todd M. Gureckis
New York University