colorSchema	routerMode	layout	theme	neversink_slug	mdc
light	hash	cover	neversink	Goals in RL	true

Toward Complex and Structured Goals in Reinforcement Learning

Finding the Frame @ RLC 2024

Guy Davidson & Todd M. Gureckis
New York University

layout: default

Goals in Reinforcement Learning

McCarthy's definition of intelligence and goals.
Reward hypothesis.
Goals as preferences over state-action histories [Bowling et al., 2023].
- Insufficient to express constraints or risks [Bellemare et al., 2023].
Goal-conditioned RL (more on this later).

layout: two-cols-title

:: title ::

Straining these definitions of goals

::left::

Goals as intended behaviors

Act safely
Don't get stuck
Avoid collisions
Be efficient
Don't run out of battery

::right::

Creativity, richness of human goals

Children (and adults) create playful goals

These goals help us learn how to structure problem spaces and find solutions
[Chu & Schulz, 2020; Molinaro & Collins, 2023; Chu et al., 2024]

layout: side-title align: rm-lm

:: title ::

Core idea

:: content ::

If we want to develop agents that accomplish diverse tasks across different environments, we need agents that can propose and pursue rich, complex, and creative goals.

[Ouedeyer et al., 2007; Colas et al., 2022]

Desiderata for Goal Representations

Abstraction (abstract goals, abstracting goal components)
Temporal extension
Compositionality
Grounding

layout: two-cols-header

Surveyed Goal Representation Approaches

::left::

Implicit (directly encoded as reward functions)
Goal states (e.g. target manipulator positions)
Image-based observations
Natural language-based goals
Represented as programs?

::right::

```lisp (preference throwBalltoBin (exists (?d - dodgeball ?h - hegagonal_bin) (then (once (agent_holds ?d)) (hold (and (not (agent_holds ?d)) (in_motion ?d))) (once (and (not (in_motion ?d)) (in ?h ?d))) ))) ```

layout: side-title align: rm-lm

:: title ::

Key takeaway

:: content ::

Prevalent non-language approaches facilitate grounding at the expense of other desiderata

Language (and programs) offer benefits at the cost of grounding complexity

layout: two-cols-title

::title::

Example argument: abstraction

::left::

Representing abstract goals

Reward functions: Challenging reward engineering effort
Observations: Borderline impossible?
Language: Easy to express, hard to ground
Programs: Open question

::right::

Abstracting goal components

Reward functions: (less) Challenging reward engineering effort
Observations: Requires embedding abstractions
Language: Abstraction is natural, grounding is hard
Programs: Easy for abstractions defined in program grammar

disabled: true

Example argument: compositionality

Reward functions: Compose mathematically, not semantically
Observations: Composing images to represent general properties is hard
Language: Inherently compositional, but grounding is hard
- SuccessVQA near chance on held-out goals [Du et al., 2023a]
- Hill et al. [2019] generalize to held-out objects, but not negations
Programs: Compose by default, as defined by their grammar
- The structured LTL-based approach of Leon et al. [2022] composes negation succesfully

layout: two-cols-title

:: left ::

Takeaways

Takeaway 1: We need agents that can propose and pursue rich, complex, and creative goals.
Takeaway 2: This requires richer goal representations and developing methods to ground them.

:: right ::

Questions

Where do goals (and their representations) fall on the agent-environment boundary?
What important desiderata are we missing?
How do temporally extended goals play with the Markov assumption?
Can program-based goals scale to diverse environments?
What can building agents that propose and pursue rich goals teach us about human goal-setting?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

slides.md

slides.md

Toward Complex and Structured Goals in Reinforcement Learning

Finding the Frame @ RLC 2024

layout: default

Goals in Reinforcement Learning

layout: two-cols-title

Straining these definitions of goals

Goals as intended behaviors

Creativity, richness of human goals

layout: side-title align: rm-lm

Core idea

Desiderata for Goal Representations

layout: two-cols-header

Surveyed Goal Representation Approaches

layout: side-title align: rm-lm

Key takeaway

layout: two-cols-title

Example argument: abstraction

Representing abstract goals

Abstracting goal components

disabled: true

Example argument: compositionality

layout: two-cols-title

Takeaways

Questions

layout: center

Thank you!