-
Notifications
You must be signed in to change notification settings - Fork 13.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactored/codebase By defining different classes for different operations and much more #444
base: main
Are you sure you want to change the base?
Conversation
…erations and implemented better type safety
Well done bro. |
@mowentian Please review this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems interesting
Please, split the PR into smaller (ideally atomic) pieces. This is very hard to review + more error prone. |
I am Testing the original and refactored code will upload the result soon ..., so I think then it would be easy for anyone to review |
Here is the link of the repo that you can clone run the tests that compares the original and refactored files : https://github.com/Pratiyankkumar/Deepseek-Tests The script would only run when you have a computer with better specs and more importantly if you are having Nvidia GPU , because triton (developed by Open AI) which is used in DeepSeek works on that, So i was unable to run the script , please try who are having the Nvidia GPU's , and if there comes an error other than that feel free to open the issue on my repo. And if the test is Successful please share the output here. You can refer to this repo to read more about triton : https://github.com/triton-lang/triton |
@mowentian Please review this PR 🙏🏻🙏🏻 |
Code Improvements Summary
convert.py
Here’s a detailed breakdown of the enhancements made to the codebase to improve clarity, robustness, and maintainability.
1. Type Hints
TensorMapping
,StateDict
).2. Error Handling
try-except
blocks.n_experts % mp == 0
).3. Code Organization
process_tensor_name
: Handles tensor name processing.shard_tensor
: Manages tensor sharding for model parallelism.convert_checkpoint
: Main logic for checkpoint conversion.mapping
dictionary to a module-level constant (TENSOR_MAPPING
).4. Path Handling
os.path
withpathlib.Path
for more robust and modern path handling.5. Documentation
TensorMapping
,StateDict
).6. Best Practices
TENSOR_MAPPING
).mp_idx
,mp_count
).tqdm
bars for better visibility during execution.7. Structure
convert_checkpoint
function for better modularity.main()
function with argument parsing for cleaner execution flow.8. Safety
fp8_cast_bf16.py
🔄 Major Structural Changes
WeightConverter
class📝 New Classes & Methods
WeightConverter
__init__
_load_model_index
_get_tensor
_manage_memory
convert
🛠 Key Improvements
🔍 Functionality
generate.py
Text Generator Refactoring
🔄 Major Structural Changes
📝 New Classes & Methods
TokenSampler
: Handle token sampling logicTextGenerator
: Core generation functionalityDistributedEnvironment
: Manage distributed setupChatSession
: Handle chat interactionsGenerationConfig
: Configuration management🛠 Key Improvements
🔍 Functionality
kernel.py
FP8 Operations Refactoring
🔄 Major Structural Changes
📝 New Classes & Methods
QuantizationKernels
: Handle quantization operationsMatrixMultKernels
: Matrix multiplication operationsTensorOps
: High-level interfaceBlockConfig
: Configuration management🛠 Key Improvements
🔍 Functionality