[BC]Weighted bc training #416

tvmarino · 2025-01-15T22:02:29Z

Training code for the BC-Max algorithm which includes the tensorflow
code to train a new policy and to save it as a tf-policy and code to
compute the re-weighting for the supervised learning problem. This
required updates to SequenceExampleFeatureNames from
generate_bc_trajectories_lib.

inlining for size.

or neither of them is set. Further, passes keep_temps to ```compilation_runner.get_workdir_context``` and ensures that the flag and argument are not set at the same time.

explicit_temps_dir is not set and persistent_objects_path is set.

code to train a new policy and to save it as a tf-policy and code to compute the re-weighting for the supervised learning problem. This required updates to SequenceExampleFeatureNames from generate_bc_trajectories_lib.

mtrofin

lgtm with some nits, but please get Alekh's take on the algo side of things which I wouldn't be familiar with.

compiler_opt/rl/imitation_learning/weighted_bc_trainer.py

mtrofin · 2025-01-16T16:28:03Z

compiler_opt/rl/imitation_learning/weighted_bc_trainer.py

+    int_labels = tf.cast(labels, tf.int32)
+    return tf.gather(weights_arr, int_labels)
+
+  def _loss_fn(self, y_true, y_pred, labels, weights_arr):


_get_loss_fn?

algorithm for training.

tvmarino added 14 commits December 19, 2024 17:42

Adding configs for collecting imitation learning tarjectories for

ffbe8c1

inlining for size.

Removing values from imitation_learning.gin

97eb067

Removed unused flags for gin bindings and gin configs.

2f3f86c

Merge branch 'google:main' into keep_temps_update

588bed7

Makes sure that either both of base_path and keep_temps are set

8dd9070

or neither of them is set. Further, passes keep_temps to ```compilation_runner.get_workdir_context``` and ensures that the flag and argument are not set at the same time.

yapf

4174e57

yapf

1018316

yapf with updated deps

293970c

Addressing @mtrofin and @boomanaiden154 comments.

1f725b5

yapf and pytype

fa79cff

yapf

110aaff

Set explicit_temps_dir to persistent_objects_path+'/temp_dirs' whenever

1b65f9d

explicit_temps_dir is not set and persistent_objects_path is set.

Merge branch 'google:main' into weighted_bc_training

96d9e7d

tvmarino requested review from mtrofin and boomanaiden154 January 15, 2025 22:02

tvmarino added 3 commits January 15, 2025 22:08

Fixing duplicate flag definitions.

3548268

Supressing pylint dangerous default value warnings.

0e34287

yapf

dd1ace4

mtrofin approved these changes Jan 16, 2025

View reviewed changes

mtrofin requested a review from alekh January 16, 2025 16:30

Nits and a change to weights computaiton after @alekh reviewed the

0e9a163

algorithm for training.

tvmarino merged commit 9915a6d into google:main Jan 17, 2025
11 checks passed

tvmarino deleted the weighted_bc_training branch January 17, 2025 15:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BC]Weighted bc training #416

[BC]Weighted bc training #416

tvmarino commented Jan 15, 2025

mtrofin left a comment

mtrofin Jan 16, 2025

[BC]Weighted bc training #416

[BC]Weighted bc training #416

Conversation

tvmarino commented Jan 15, 2025

mtrofin left a comment

Choose a reason for hiding this comment

mtrofin Jan 16, 2025

Choose a reason for hiding this comment