Skip to content

Latest commit

 

History

History
358 lines (262 loc) · 8.51 KB

slides.md

File metadata and controls

358 lines (262 loc) · 8.51 KB
colorSchema routerMode layout theme neversink_slug mdc
light
hash
cover
neversink
Goals in RL
true

Toward Complex and Structured Goals in Reinforcement Learning

Finding the Frame @ RLC 2024

Guy Davidson & Todd M. Gureckis
New York University


layout: default

Goals in Reinforcement Learning

  • McCarthy's definition of intelligence and goals.
  • Reward hypothesis.
  • Goals as preferences over state-action histories [Bowling et al., 2023].
    • Insufficient to express constraints or risks [Bellemare et al., 2023].
  • Goal-conditioned RL (more on this later).

layout: two-cols-title

:: title ::

Straining these definitions of goals

::left::

Goals as intended behaviors

  • Act safely
  • Don't get stuck
  • Avoid collisions
  • Be efficient
  • Don't run out of battery

::right::

Creativity, richness of human goals

  • Children (and adults) create playful goals
  • These goals help us learn how to structure problem spaces and find solutions
    [Chu & Schulz, 2020; Molinaro & Collins, 2023; Chu et al., 2024]

layout: side-title align: rm-lm

:: title ::

Core idea 

:: content ::

If we want to develop agents that accomplish diverse tasks across different environments, we need agents that can propose and pursue rich, complex, and creative goals.

[Ouedeyer et al., 2007; Colas et al., 2022]


Desiderata for Goal Representations

  • Abstraction (abstract goals, abstracting goal components)
  • Temporal extension
  • Compositionality
  • Grounding

layout: two-cols-header

Surveyed Goal Representation Approaches

::left::

  • Implicit (directly encoded as reward functions)
  • Goal states (e.g. target manipulator positions)
  • Image-based observations
  • Natural language-based goals
  • Represented as programs?
::right::
```lisp (preference throwBalltoBin (exists (?d - dodgeball ?h - hegagonal_bin) (then (once (agent_holds ?d)) (hold (and (not (agent_holds ?d)) (in_motion ?d))) (once (and (not (in_motion ?d)) (in ?h ?d))) ))) ```
<style> .image img { display: block; margin-left: auto; margin-right: auto; margin-top: 0.5em; } </style>

layout: side-title align: rm-lm

:: title ::

Key takeaway 

:: content ::

Prevalent non-language approaches facilitate grounding at the expense of other desiderata

Language (and programs) offer benefits at the cost of grounding complexity


layout: two-cols-title

::title::

Example argument: abstraction

::left::

Representing abstract goals

  • Reward functions: Challenging reward engineering effort
  • Observations: Borderline impossible?
  • Language: Easy to express, hard to ground
  • Programs: Open question

::right::

Abstracting goal components

  • Reward functions: (less) Challenging reward engineering effort
  • Observations: Requires embedding abstractions
  • Language: Abstraction is natural, grounding is hard
  • Programs: Easy for abstractions defined in program grammar

disabled: true

Example argument: compositionality

  • Reward functions: Compose mathematically, not semantically
  • Observations: Composing images to represent general properties is hard
  • Language: Inherently compositional, but grounding is hard
    • SuccessVQA near chance on held-out goals [Du et al., 2023a]
    • Hill et al. [2019] generalize to held-out objects, but not negations
  • Programs: Compose by default, as defined by their grammar
    • The structured LTL-based approach of Leon et al. [2022] composes negation succesfully

layout: two-cols-title

:: left ::

Takeaways 

  • Takeaway 1: We need agents that can propose and pursue rich, complex, and creative goals.
  • Takeaway 2: This requires richer goal representations and developing methods to ground them.

:: right ::

Questions

  • Where do goals (and their representations) fall on the agent-environment boundary?
  • What important desiderata are we missing?
  • How do temporally extended goals play with the Markov assumption?
  • Can program-based goals scale to diverse environments?
  • What can building agents that propose and pursue rich goals teach us about human goal-setting?

layout: center

Thank you!


Toward Complex and Structured Goals in Reinforcement Learning

Finding the Frame @ RLC 2024


Guy Davidson & Todd M. Gureckis
New York University