Refactored/codebase By defining different classes for different operations and much more #444

Pratiyankkumar · 2025-01-29T05:17:56Z

Code Improvements Summary

convert.py

Here’s a detailed breakdown of the enhancements made to the codebase to improve clarity, robustness, and maintainability.

1. Type Hints

Added comprehensive type annotations for better code clarity and IDE support.
Used type definitions for complex data structures (e.g., TensorMapping, StateDict).

2. Error Handling

Added proper exception handling using try-except blocks.
Included validation checks for inputs (e.g., n_experts % mp == 0).
Improved error messages for better debugging and user feedback.

3. Code Organization

Split functionality into smaller, focused functions:
- process_tensor_name: Handles tensor name processing.
- shard_tensor: Manages tensor sharding for model parallelism.
- convert_checkpoint: Main logic for checkpoint conversion.
Moved the mapping dictionary to a module-level constant (TENSOR_MAPPING).
Separated tensor processing and sharding logic into dedicated functions.

4. Path Handling

Replaced os.path with pathlib.Path for more robust and modern path handling.
Added checks for file/directory existence to ensure valid inputs.

5. Documentation

Added detailed docstrings for functions, including:
- Args: Descriptions of function arguments.
- Raises: List of exceptions that may be raised.
Improved comments for complex operations to enhance readability.
Added type definitions for complex data structures (e.g., TensorMapping, StateDict).

6. Best Practices

Used constants for magic values (e.g., TENSOR_MAPPING).
Improved variable naming for better clarity (e.g., mp_idx, mp_count).
Added progress descriptions to tqdm bars for better visibility during execution.
Used more descriptive variable names throughout the code.

7. Structure

Separated the main logic into the convert_checkpoint function for better modularity.
Created a proper main() function with argument parsing for cleaner execution flow.
Better organization of related operations (e.g., tensor processing, sharding, and saving).

8. Safety

Added validation for tensor dimensions to ensure compatibility with model parallelism.
Added checks for missing files to prevent runtime errors.
Improved error messages to aid in debugging and troubleshooting.

fp8_cast_bf16.py

🔄 Major Structural Changes

Created WeightConverter class
Added type hints throughout
Split main function into focused methods

📝 New Classes & Methods

WeightConverter
- __init__
- _load_model_index
- _get_tensor
- _manage_memory
- convert

🛠 Key Improvements

Better encapsulation of conversion logic
Proper memory management
Enhanced error handling
Type safety with hints

🔍 Functionality

Maintained exact same conversion process
Same CLI interface
Identical output format

generate.py

Text Generator Refactoring

🔄 Major Structural Changes

Created separate classes:
- TokenSampler
- TextGenerator
- DistributedEnvironment
- ChatSession
Added GenerationConfig dataclass

📝 New Classes & Methods

TokenSampler: Handle token sampling logic
TextGenerator: Core generation functionality
DistributedEnvironment: Manage distributed setup
ChatSession: Handle chat interactions
GenerationConfig: Configuration management

🛠 Key Improvements

Better separation of concerns
Improved configuration management
Enhanced distributed processing
Clearer session handling
Better type safety

🔍 Functionality

Same generation capabilities
Identical distributed processing
Same interactive and batch modes

kernel.py

FP8 Operations Refactoring

🔄 Major Structural Changes

Created classes:
- QuantizationKernels
- MatrixMultKernels
- TensorOps
Added BlockConfig dataclass

📝 New Classes & Methods

QuantizationKernels: Handle quantization operations
MatrixMultKernels: Matrix multiplication operations
TensorOps: High-level interface
BlockConfig: Configuration management

🛠 Key Improvements

Better organization of kernels
Improved configuration handling
Enhanced type safety
Clearer operation grouping
Better documentation

🔍 Functionality

Same quantization operations
Identical matrix multiplication
Same performance characteristics

…erations and implemented better type safety

danial-qamar · 2025-01-29T05:23:04Z

Well done bro.

Pratiyankkumar · 2025-01-29T05:25:09Z

@mowentian Please review this PR

Ndegwadavid

This seems interesting

z-a-f · 2025-01-29T21:29:28Z

Please, split the PR into smaller (ideally atomic) pieces. This is very hard to review + more error prone.

Pratiyankkumar · 2025-01-30T03:06:28Z

I am Testing the original and refactored code will upload the result soon ..., so I think then it would be easy for anyone to review

Pratiyankkumar · 2025-01-30T16:33:28Z

Here is the link of the repo that you can clone run the tests that compares the original and refactored files : https://github.com/Pratiyankkumar/Deepseek-Tests

The script would only run when you have a computer with better specs and more importantly if you are having Nvidia GPU , because triton (developed by Open AI) which is used in DeepSeek works on that, So i was unable to run the script , please try who are having the Nvidia GPU's , and if there comes an error other than that feel free to open the issue on my repo. And if the test is Successful please share the output here.

You can refer to this repo to read more about triton : https://github.com/triton-lang/triton

Pratiyankkumar · 2025-02-08T12:05:17Z

@mowentian Please review this PR 🙏🏻🙏🏻

Pratiyankkumar added 2 commits January 28, 2025 09:20

Refactored convert.py

70ff909

Refactored the codebase by defining seperate classes for different op…

de7df86

…erations and implemented better type safety

Pratiyankkumar changed the title ~~Refactor/codebase By defining different classes for different operations~~ Refactored/codebase By defining different classes for different operations and much more Jan 29, 2025

danial-qamar approved these changes Jan 29, 2025

View reviewed changes

Ndegwadavid reviewed Jan 29, 2025

View reviewed changes

laniasepsutisna approved these changes Jan 29, 2025

View reviewed changes

Merge branch 'main' into refactor/codebase

6bb22e0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactored/codebase By defining different classes for different operations and much more #444

Refactored/codebase By defining different classes for different operations and much more #444

Pratiyankkumar commented Jan 29, 2025

danial-qamar commented Jan 29, 2025

Pratiyankkumar commented Jan 29, 2025

Ndegwadavid left a comment

z-a-f commented Jan 29, 2025

Pratiyankkumar commented Jan 30, 2025 •

edited

Loading

Pratiyankkumar commented Jan 30, 2025 •

edited

Loading

Pratiyankkumar commented Feb 8, 2025

Refactored/codebase By defining different classes for different operations and much more #444

Are you sure you want to change the base?

Refactored/codebase By defining different classes for different operations and much more #444

Conversation

Pratiyankkumar commented Jan 29, 2025

Code Improvements Summary

convert.py

1. Type Hints

2. Error Handling

3. Code Organization

4. Path Handling

5. Documentation

6. Best Practices

7. Structure

8. Safety

fp8_cast_bf16.py

🔄 Major Structural Changes

📝 New Classes & Methods

🛠 Key Improvements

🔍 Functionality

generate.py

Text Generator Refactoring

🔄 Major Structural Changes

📝 New Classes & Methods

🛠 Key Improvements

🔍 Functionality

kernel.py

FP8 Operations Refactoring

🔄 Major Structural Changes

📝 New Classes & Methods

🛠 Key Improvements

🔍 Functionality

danial-qamar commented Jan 29, 2025

Pratiyankkumar commented Jan 29, 2025

Ndegwadavid left a comment

Choose a reason for hiding this comment

z-a-f commented Jan 29, 2025

Pratiyankkumar commented Jan 30, 2025 • edited Loading

Pratiyankkumar commented Jan 30, 2025 • edited Loading

Pratiyankkumar commented Feb 8, 2025

Pratiyankkumar commented Jan 30, 2025 •

edited

Loading

Pratiyankkumar commented Jan 30, 2025 •

edited

Loading