Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vllm use 'OPEA/QwQ-32B-Preview-int4-sym-mixed-awq-inc' report KeyError: 'layers.5.mlp.down_proj.weight' #422

Open
xiezhipeng-git opened this issue Jan 19, 2025 · 2 comments

Comments

@xiezhipeng-git
Copy link

xiezhipeng-git commented Jan 19, 2025

import torch
from modelscope import snapshot_download
from transformers import AutoModelForCausalLM, AutoTokenizer
import vllm
# 下载模型
model_dir = snapshot_download('OPEA/QwQ-32B-Preview-int4-sym-mixed-awq-inc')
llm = vllm.LLM(model= model_dir ,device = "cuda")
KeyError                                  Traceback (most recent call last)
Cell In[19], [line 59](vscode-notebook-cell:?execution_count=19&line=59)
     [57](vscode-notebook-cell:?execution_count=19&line=57)     initialize_models(parallel_models_num=parallel_num, mode="eval")  # 设置为推理模式
     [58](vscode-notebook-cell:?execution_count=19&line=58) else:
---> [59](vscode-notebook-cell:?execution_count=19&line=59)     llm, tokenizer = initialize_llm_and_tokenizer(model_config)

Cell In[13], [line 36](vscode-notebook-cell:?execution_count=13&line=36)
     [34](vscode-notebook-cell:?execution_count=13&line=34) def initialize_llm_and_tokenizer(model_config):
     [35](vscode-notebook-cell:?execution_count=13&line=35)     """初始化LLM和分词器"""
---> [36](vscode-notebook-cell:?execution_count=13&line=36)     llm = vllm.LLM(**model_config)
     [37](vscode-notebook-cell:?execution_count=13&line=37)     tokenizer = llm.get_tokenizer()
     [38](vscode-notebook-cell:?execution_count=13&line=38)     return llm, tokenizer

File ~/anaconda3/lib/python3.12/site-packages/vllm/utils.py:986, in deprecate_args.<locals>.wrapper.<locals>.inner(*args, **kwargs)
    [979](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/utils.py:979)             msg += f" {additional_message}"
    [981](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/utils.py:981)         warnings.warn(
    [982](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/utils.py:982)             DeprecationWarning(msg),
    [983](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/utils.py:983)             stacklevel=3,  # The inner function takes up one level
    [984](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/utils.py:984)         )
--> [986](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/utils.py:986) return fn(*args, **kwargs)

File ~/anaconda3/lib/python3.12/site-packages/vllm/entrypoints/llm.py:230, in LLM.__init__(self, model, tokenizer, tokenizer_mode, skip_tokenizer_init, trust_remote_code, allowed_local_media_path, tensor_parallel_size, dtype, quantization, revision, tokenizer_revision, seed, gpu_memory_utilization, swap_space, cpu_offload_gb, enforce_eager, max_seq_len_to_capture, disable_custom_all_reduce, disable_async_output_proc, hf_overrides, mm_processor_kwargs, task, override_pooler_config, compilation_config, **kwargs)
    [227](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/entrypoints/llm.py:227) self.engine_class = self.get_engine_class()
    [229](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/entrypoints/llm.py:229) # TODO(rob): enable mp by default (issue with fork vs spawn)
--> [230](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/entrypoints/llm.py:230) self.llm_engine = self.engine_class.from_engine_args(
    [231](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/entrypoints/llm.py:231)     engine_args, usage_context=UsageContext.LLM_CLASS)
    [233](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/entrypoints/llm.py:233) self.request_counter = Counter()

File ~/anaconda3/lib/python3.12/site-packages/vllm/engine/llm_engine.py:517, in LLMEngine.from_engine_args(cls, engine_args, usage_context, stat_loggers)
    [515](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/engine/llm_engine.py:515) executor_class = cls._get_executor_cls(engine_config)
    [516](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/engine/llm_engine.py:516) # Create the LLM engine.
--> [517](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/engine/llm_engine.py:517) engine = cls(
    [518](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/engine/llm_engine.py:518)     vllm_config=engine_config,
    [519](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/engine/llm_engine.py:519)     executor_class=executor_class,
    [520](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/engine/llm_engine.py:520)     log_stats=not engine_args.disable_log_stats,
    [521](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/engine/llm_engine.py:521)     usage_context=usage_context,
    [522](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/engine/llm_engine.py:522)     stat_loggers=stat_loggers,
    [523](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/engine/llm_engine.py:523) )
    [525](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/engine/llm_engine.py:525) return engine

File ~/anaconda3/lib/python3.12/site-packages/vllm/engine/llm_engine.py:273, in LLMEngine.__init__(self, vllm_config, executor_class, log_stats, usage_context, stat_loggers, input_registry, mm_registry, use_cached_outputs)
    [269](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/engine/llm_engine.py:269) self.input_registry = input_registry
    [270](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/engine/llm_engine.py:270) self.input_processor = input_registry.create_input_processor(
    [271](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/engine/llm_engine.py:271)     self.model_config)
--> [273](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/engine/llm_engine.py:273) self.model_executor = executor_class(vllm_config=vllm_config, )
    [275](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/engine/llm_engine.py:275) if self.model_config.runner_type != "pooling":
    [276](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/engine/llm_engine.py:276)     self._initialize_kv_caches()

File ~/anaconda3/lib/python3.12/site-packages/vllm/executor/executor_base.py:36, in ExecutorBase.__init__(self, vllm_config)
     [34](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/executor/executor_base.py:34) self.prompt_adapter_config = vllm_config.prompt_adapter_config
     [35](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/executor/executor_base.py:35) self.observability_config = vllm_config.observability_config
---> [36](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/executor/executor_base.py:36) self._init_executor()

File ~/anaconda3/lib/python3.12/site-packages/vllm/executor/gpu_executor.py:35, in GPUExecutor._init_executor(self)
     [33](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/executor/gpu_executor.py:33) self.driver_worker = self._create_worker()
     [34](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/executor/gpu_executor.py:34) self.driver_worker.init_device()
---> [35](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/executor/gpu_executor.py:35) self.driver_worker.load_model()

File ~/anaconda3/lib/python3.12/site-packages/vllm/worker/worker.py:155, in Worker.load_model(self)
    [154](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/worker/worker.py:154) def load_model(self):
--> [155](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/worker/worker.py:155)     self.model_runner.load_model()

File ~/anaconda3/lib/python3.12/site-packages/vllm/worker/model_runner.py:1096, in GPUModelRunnerBase.load_model(self)
   [1094](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/worker/model_runner.py:1094) logger.info("Starting to load model %s...", self.model_config.model)
   [1095](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/worker/model_runner.py:1095) with DeviceMemoryProfiler() as m:
-> [1096](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/worker/model_runner.py:1096)     self.model = get_model(vllm_config=self.vllm_config)
   [1098](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/worker/model_runner.py:1098) self.model_memory_usage = m.consumed_memory
   [1099](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/worker/model_runner.py:1099) logger.info("Loading model weights took %.4f GB",
   [1100](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/worker/model_runner.py:1100)             self.model_memory_usage / float(2**30))

File ~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/model_loader/__init__.py:12, in get_model(vllm_config)
     [10](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/model_loader/__init__.py:10) def get_model(*, vllm_config: VllmConfig) -> nn.Module:
     [11](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/model_loader/__init__.py:11)     loader = get_model_loader(vllm_config.load_config)
---> [12](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/model_loader/__init__.py:12)     return loader.load_model(vllm_config=vllm_config)

File ~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/model_loader/loader.py:366, in DefaultModelLoader.load_model(self, vllm_config)
    [363](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/model_loader/loader.py:363)     model = _initialize_model(vllm_config=vllm_config)
    [365](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/model_loader/loader.py:365) weights_to_load = {name for name, _ in model.named_parameters()}
--> [366](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/model_loader/loader.py:366) loaded_weights = model.load_weights(
    [367](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/model_loader/loader.py:367)     self._get_all_weights(model_config, model))
    [368](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/model_loader/loader.py:368) # We only enable strict check for non-quantized models
    [369](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/model_loader/loader.py:369) # that have loaded weights tracking currently.
    [370](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/model_loader/loader.py:370) if model_config.quantization is None and loaded_weights is not None:

File ~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/models/qwen2.py:506, in Qwen2ForCausalLM.load_weights(self, weights)
    [499](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/models/qwen2.py:499) def load_weights(self, weights: Iterable[Tuple[str,
    [500](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/models/qwen2.py:500)                                                torch.Tensor]]) -> Set[str]:
    [501](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/models/qwen2.py:501)     loader = AutoWeightsLoader(
    [502](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/models/qwen2.py:502)         self,
    [503](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/models/qwen2.py:503)         skip_prefixes=(["lm_head."]
    [504](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/models/qwen2.py:504)                        if self.config.tie_word_embeddings else None),
    [505](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/models/qwen2.py:505)     )
--> [506](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/models/qwen2.py:506)     return loader.load_weights(weights)

File ~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/models/utils.py:237, in AutoWeightsLoader.load_weights(self, weights, mapper)
    [234](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/models/utils.py:234) if mapper is not None:
    [235](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/models/utils.py:235)     weights = mapper.apply(weights)
--> [237](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/models/utils.py:237) autoloaded_weights = set(self._load_module("", self.module, weights))
    [238](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/models/utils.py:238) return autoloaded_weights

File ~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/models/utils.py:198, in AutoWeightsLoader._load_module(self, base_prefix, module, weights)
    [194](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/models/utils.py:194)         logger.debug("Skipping module %s", prefix)
    [196](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/models/utils.py:196)         continue
--> [198](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/models/utils.py:198)     yield from self._load_module(prefix,
    [199](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/models/utils.py:199)                                  child_modules[child_prefix],
    [200](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/models/utils.py:200)                                  child_weights)
    [201](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/models/utils.py:201) elif child_prefix in child_params:
    [202](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/models/utils.py:202)     if self._can_skip(prefix):

File ~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/models/utils.py:175, in AutoWeightsLoader._load_module(self, base_prefix, module, weights)
    [173](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/models/utils.py:173) module_load_weights = getattr(module, "load_weights", None)
    [174](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/models/utils.py:174) if callable(module_load_weights):
--> [175](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/models/utils.py:175)     loaded_params = module_load_weights(weights)
    [176](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/models/utils.py:176)     if loaded_params is None:
    [177](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/models/utils.py:177)         logger.warning(
    [178](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/models/utils.py:178)             "Unable to collect loaded parameters "
    [179](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/models/utils.py:179)             "for module %s", module)

File ~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/models/qwen2.py:393, in Qwen2Model.load_weights(self, weights)
    [391](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/models/qwen2.py:391) if is_pp_missing_parameter(name, self):
    [392](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/models/qwen2.py:392)     continue
--> [393](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/models/qwen2.py:393) param = params_dict[name]
    [394](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/models/qwen2.py:394) weight_loader = getattr(param, "weight_loader",
    [395](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/models/qwen2.py:395)                         default_weight_loader)
    [396](https://vscode-remote+wsl-002bubuntu-002d24-002e04.vscode-resource.vscode-cdn.net/mnt/d/my/work/study/ai/kaggle_code/WSDM/~/anaconda3/lib/python3.12/site-packages/vllm/model_executor/models/qwen2.py:396) weight_loader(param, loaded_weight)

KeyError: 'layers.5.mlp.down_proj.weight'

@wenhuach21

@wenhuach21
Copy link
Contributor

Thank you for reporting this issue. It may be related to mixed precision support or a configuration compatibility issue with vLLM. As noted in the model card, three layers, including layers.5.mlp.down_proj, were excluded from quantization.

If vLLM supports mixed precision, we can adjust the configuration to align with vLLM. Otherwise, there is nothing we can do at the moment.

@WeiweiZhang1, could you please take a look?

@WeiweiZhang1
Copy link
Collaborator

Hello zhipeng, I made a simple attempt, but sorry I couldn't help you. This model is a standard AWQ format int4 model. Regarding the issue with loading vLLM mixed-precision models, I suggest opening an issue in the vLLM repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants