You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction
I am currently rewriting the generate_progressively function for my custom model class. My goal is to enable the model to generate results progressively by concatenating the initial input_ids with each element of the compress_outputs sequence in turn. Specifically:
In the first iteration, the model generates results by concatenating input_ids with the first element of compress_outputs.
In the second iteration, it concatenates input_ids with the first and second elements of compress_outputs (the first two elements) to generate results.
This process continues until the last element of the compress_outputs sequence is included.
To improve efficiency, I want to leverage caching, as the majority of the concatenated input in each iteration has already been used to compute past_key_values. Below is the code snippet for the function I implemented. In this context, self.model refers to mistral-7b-chat-v0.2.
When I execute this code, the program throws an error during execution. The error occurs at line 393 in transformers/generation/utils.py, specifically in the prepare_inputs_for_generation function.
The problematic line of code is:
if inputs_embeds is not None and cache_position[0] == 0:
The error message is: IndexError: index 0 is out of bounds for dimension 0 with size 0.
I track the excution of the code and here’s a detailed breakdown of the issue:
The error occurs in transformers/generation/utils.py. Initially, the program enters the self._sample function and then proceeds to the self._get_initial_cache_position function.
Within this function, the following line:
if not is_torchdynamo_compiling():
cache_position = cache_position[past_length:]
causes the correct cache_position slice to become empty, resulting in an IndexError in subsequent steps.
Even if I manage to fix the issue with cache_position, another problem arises later in the self.prepare_inputs_for_generation function.
The relevant code is as follows:
if not self.config.is_encoder_decoder:
if inputs_embeds is not None and cache_position[0] == 0:
model_inputs[input_ids_key] = None
model_inputs["inputs_embeds"] = inputs_embeds
else:
model_inputs[input_ids_key] = input_ids.clone(memory_format=torch.contiguous_format)
model_inputs["inputs_embeds"] = None
In my case, I provide only inputs_embeds and past_key_values, and since cache_position[0] is not 0, the code attempts to set model_inputs[input_ids_key] using input_ids. However, since input_ids is None, this results in further issues.
Under the current implementation of the generate function in transformers, is it possible to use only inputs_embeds and past_key_values for generation? How can I modify my implementation to achieve progressive generation with caching as intended? Are there specific guidelines for correctly managing cache_position and ensuring compatibility with inputs_embeds?
Expected behavior
My primary objective is to progressively generate outputs by leveraging caching (past_key_values) to improve efficiency.
The text was updated successfully, but these errors were encountered:
System Info
transformers
version: 4.46.3Who can help?
@gante
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I am currently rewriting the generate_progressively function for my custom model class. My goal is to enable the model to generate results progressively by concatenating the initial input_ids with each element of the compress_outputs sequence in turn. Specifically:
To improve efficiency, I want to leverage caching, as the majority of the concatenated input in each iteration has already been used to compute past_key_values. Below is the code snippet for the function I implemented. In this context, self.model refers to mistral-7b-chat-v0.2.
When I execute this code, the program throws an error during execution. The error occurs at line 393 in transformers/generation/utils.py, specifically in the prepare_inputs_for_generation function.
The problematic line of code is:
The error message is: IndexError: index 0 is out of bounds for dimension 0 with size 0.
I track the excution of the code and here’s a detailed breakdown of the issue:
The error occurs in transformers/generation/utils.py. Initially, the program enters the self._sample function and then proceeds to the self._get_initial_cache_position function.
Within this function, the following line:
causes the correct cache_position slice to become empty, resulting in an IndexError in subsequent steps.
Even if I manage to fix the issue with cache_position, another problem arises later in the self.prepare_inputs_for_generation function.
The relevant code is as follows:
In my case, I provide only inputs_embeds and past_key_values, and since cache_position[0] is not 0, the code attempts to set model_inputs[input_ids_key] using input_ids. However, since input_ids is None, this results in further issues.
Under the current implementation of the generate function in transformers, is it possible to use only inputs_embeds and past_key_values for generation? How can I modify my implementation to achieve progressive generation with caching as intended? Are there specific guidelines for correctly managing cache_position and ensuring compatibility with inputs_embeds?
Expected behavior
My primary objective is to progressively generate outputs by leveraging caching (past_key_values) to improve efficiency.
The text was updated successfully, but these errors were encountered: