Skip to content

Commit

Permalink
Update: SparseGPT recipes (#1142)
Browse files Browse the repository at this point in the history
#### **Issue**  
The following test was failing:  
```
FAILED tests/e2e/vLLM/test_vllm.py::TestvLLM_0_tests_e2e_vLLM_configs_sparse2of4_fp8_dynamic_yaml::test_vllm  
ValueError: There is no module or parameter named 'lm_head.bitmask' in LlamaForCausalLM
```
This issue arose due to recent improvements in `SparseGPTModifier`,
which changed its default behavior. Previously, `lm_head` was silently
ignored, but the new updates no longer do so automatically.

#### **Fix**  
The fix involves explicitly updating the affected recipes to include the
parameter:
```yaml
ignore: ["re:.*lm_head"]
```
when all layers are targeted. This ensures that `lm_head` is properly
excluded and prevents the failure.

#### **Example Change**  
Previously, we relied on regex patterns to target linear layers while
ignoring `lm_head`. The updated configuration now explicitly targets
linear layers and ignores `lm_head`:
```diff
-    "targets": ["re:model.layers.\\d+$"],
+    "targets": ["Linear"],
+    "ignore": ["re:.*lm_head"]
```
This provides a more structured approach and avoids unnecessary
regex-based filtering.

#### **Additional Fixes & Improvements**  
- **Removed** the deprecated argument `sequential_update`.  
- **Updated recipes** to use `"targets": ["Linear"]` instead of regex
matching for better clarity and maintainability.
- **Raise Warning** when lm_head is targetted. Contributed by @kylesayrs

---
- To see the specific tasks where the Asana app for GitHub is being
used, see below:
  - https://app.asana.com/0/0/1209368043192615

---------

Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Rahul Tuli <[email protected]>
Co-authored-by: Kyle Sayers <[email protected]>
  • Loading branch information
rahul-tuli and kylesayrs authored Feb 12, 2025
1 parent 6377c30 commit 98a7ae6
Show file tree
Hide file tree
Showing 22 changed files with 44 additions and 35 deletions.
12 changes: 4 additions & 8 deletions examples/finetuning/example_alternating_recipe.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,10 @@ initial_sparsity_stage:
SparseGPTModifier:
sparsity: 0.5
block_size: 128
sequential_update: False
percdamp: 0.01
mask_structure: "0:0"
targets: [
"re:model.layers.\\d+$"
]
targets: ["Linear"]
ignore: ["re:.*lm_head"]
initial_training_stage:
run_type: train
pruning_modifiers:
Expand All @@ -22,12 +20,10 @@ next_sparsity_stage:
SparseGPTModifier:
sparsity: 0.7
block_size: 128
sequential_update: False
percdamp: 0.01
mask_structure: "0:0"
targets: [
"re:model.layers.\\d+$"
]
targets: ["Linear"]
ignore: ["re:.*lm_head"]
next_training_stage:
run_type: train
pruning_modifiers:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@ sparsity_stage:
SparseGPTModifier:
sparsity: 0.5
mask_structure: "2:4"
sequential_update: false
targets: ["Linear"]
ignore: ["re:.*lm_head"]
finetuning_stage:
run_type: train
finetuning_modifiers:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@ sparsity_stage:
SparseGPTModifier:
sparsity: 0.5
mask_structure: "2:4"
sequential_update: false
targets: ["Linear"]
ignore: ["re:.*lm_head"]
finetuning_stage:
run_type: train
finetuning_modifiers:
Expand Down
3 changes: 2 additions & 1 deletion src/llmcompressor/modifiers/obcq/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,10 @@ class SparseGPTModifier(SparsityModifierMixin, Modifier):
| SparseGPTModifier:
| sparsity: 0.5
| mask_structure: "2:4"
| sequential_update: True
| dampening_frac: 0.001
| block_size: 128
| targets: ['Linear']
| ignore: ['re:.*lm_head']
Lifecycle:
- on_initialize
Expand Down
24 changes: 20 additions & 4 deletions src/llmcompressor/modifiers/obcq/sgpt_mixin.py
Original file line number Diff line number Diff line change
Expand Up @@ -139,10 +139,26 @@ def on_initialize(self, state: "State", **kwargs) -> bool:

for name, module in get_prunable_layers(layer).items():
name = f"{layer_name}.{name}"
if not match_targets(name, self.ignore)[0]:
self._module_names[module] = name
self._module_sparsities[module] = layer_sparsity
self.register_hook(module, self.calibrate_module, "forward")

if match_targets(name, self.ignore)[0]:
continue

# HACK: previously, embeddings were not quantized because they were not
# accessible by the layer compressor. For now, we manually ignore it,
# but in the FUTURE this should be ignored by the user
if isinstance(module, torch.nn.Embedding):
continue

if name.endswith("lm_head"):
logger.warning(
"`lm_head` was previously auto-ignored by SparseGPT and Wanda "
"modifiers and is not advised. Please add `re:.*lm_head` to "
"your ignore list if this was unintentional"
)

self._module_names[module] = name
self._module_sparsities[module] = layer_sparsity
self.register_hook(module, self.calibrate_module, "forward")

# infer and run pipeline
model_name = state.model.__class__.__name__
Expand Down
3 changes: 2 additions & 1 deletion tests/e2e/vLLM/recipes/Sparse_2of4/recipe_sparse_2of4.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,5 @@ sparsity_stage:
SparseGPTModifier:
sparsity: 0.5
mask_structure: "2:4"
sequential_update: false
targets: ["Linear"]
ignore: ["re:.*lm_head"]
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@ sparsity_stage:
SparseGPTModifier:
sparsity: 0.5
mask_structure: "2:4"
sequential_update: false
targets: ["Linear"]
ignore: ["re:.*lm_head"]
quantization_stage:
run_type: oneshot
quantization_modifiers:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@ sparsity_stage:
SparseGPTModifier:
sparsity: 0.5
mask_structure: "2:4"
sequential_update: false
targets: ["Linear"]
ignore: ["re:.*lm_head"]
quantization_stage:
run_type: oneshot
quantization_modifiers:
Expand Down
3 changes: 2 additions & 1 deletion tests/e2e/vLLM/recipes/WNA16_2of4/2of4_w4a16_recipe.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@ sparsity_stage:
SparseGPTModifier:
sparsity: 0.5
mask_structure: "2:4"
sequential_update: false
targets: ["Linear"]
ignore: ["re:.*lm_head"]
quantization_stage:
run_type: oneshot
quantization_modifiers:
Expand Down
1 change: 0 additions & 1 deletion tests/e2e/vLLM/test_vllm.py
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,6 @@ def test_vllm(self):
session.reset()

if SKIP_HF_UPLOAD.lower() != "yes":

logger.info("================= UPLOADING TO HUB ======================")

stub = f"{HF_MODEL_HUB_NAME}/{self.save_dir}-e2e"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@ pruning_stage:
obcq_modifiers:
SparseGPTModifier:
sparsity: 0.5
sequential_update: true
mask_structure: "2:4"
targets: ['re:model.layers.\d*$']
targets: ["Linear"]
ignore: ["re:.*lm_head"]
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@ pruning_stage:
obcq_modifiers:
SparseGPTModifier:
sparsity: 0.5
sequential_update: true
mask_structure: "2:4"
targets: ['re:model.layers.\d*$']
targets: ["Linear"]
ignore: ["re:.*lm_head"]
quant_stage:
quant_modifiers:
QuantizationModifier:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,10 @@ test_oneshot_stage:
SparseGPTModifier:
sparsity: 0.7
block_size: 128
sequential_update: False
percdamp: 0.01
mask_structure: "0:0"
target_ids: ["attention_mask", "position_ids"]
targets: ["Linear"]
ignore: ["re:.*lm_head"]
test_train_stage:
pruning_modifiers:
ConstantPruningModifier:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ test_stage:
SparseGPTModifier:
sparsity: 0.7
block_size: 128
sequential_update: True
percdamp: 0.01
mask_structure: "0:0"
targets: ["model.layers.0"]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@ test_stage:
SparseGPTModifier:
sparsity: 0.7
block_size: 128
sequential_update: False
percdamp: 0.01
mask_structure: "0:0"
targets: [
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@ test_stage:
SparseGPTModifier:
sparsity: 0.5
block_size: 128
sequential_update: False
percdamp: 0.01
mask_structure: "0:0"
targets: ["model.layers.0"]
1 change: 0 additions & 1 deletion tests/llmcompressor/transformers/obcq/recipes/sparse.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ test_stage:
SparseGPTModifier:
sparsity: 0.3
block_size: 128
sequential_update: False
percdamp: 0.01
targets: ["model.layers.0", "model.layers.1"]
mask_structure: "0:0"
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ test_stage:
SparseGPTModifier:
sparsity: 0.5
block_size: 128
sequential_update: False
percdamp: 0.01
mask_structure: "2:4"
targets: [
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ test_stage:
SparseGPTModifier:
sparsity: 0.5
block_size: 128
sequential_update: False
percdamp: 0.01
mask_structure: "0:0"
targets: [
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ test_stage:
SparseGPTModifier:
sparsity: 0.5
block_size: 128
sequential_update: False
targets: [
're:model.layers.3.mlp.gate_proj.weight'
]
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@ recipe: |
SparseGPTModifier:
sparsity: 0.5
block_size: 128
sequential_update: False
targets: [
're:model.layers.3.mlp.gate_proj.weight'
]
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@ recipe: |
SparseGPTModifier:
sparsity: 0.5
block_size: 128
sequential_update: False
targets: [
're:model.layers.3.mlp.gate_proj.weight'
]

0 comments on commit 98a7ae6

Please sign in to comment.