Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored/codebase By defining different classes for different operations and much more #444

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Pratiyankkumar
Copy link

Code Improvements Summary

convert.py

Here’s a detailed breakdown of the enhancements made to the codebase to improve clarity, robustness, and maintainability.


1. Type Hints

  • Added comprehensive type annotations for better code clarity and IDE support.
  • Used type definitions for complex data structures (e.g., TensorMapping, StateDict).

2. Error Handling

  • Added proper exception handling using try-except blocks.
  • Included validation checks for inputs (e.g., n_experts % mp == 0).
  • Improved error messages for better debugging and user feedback.

3. Code Organization

  • Split functionality into smaller, focused functions:
    • process_tensor_name: Handles tensor name processing.
    • shard_tensor: Manages tensor sharding for model parallelism.
    • convert_checkpoint: Main logic for checkpoint conversion.
  • Moved the mapping dictionary to a module-level constant (TENSOR_MAPPING).
  • Separated tensor processing and sharding logic into dedicated functions.

4. Path Handling

  • Replaced os.path with pathlib.Path for more robust and modern path handling.
  • Added checks for file/directory existence to ensure valid inputs.

5. Documentation

  • Added detailed docstrings for functions, including:
    • Args: Descriptions of function arguments.
    • Raises: List of exceptions that may be raised.
  • Improved comments for complex operations to enhance readability.
  • Added type definitions for complex data structures (e.g., TensorMapping, StateDict).

6. Best Practices

  • Used constants for magic values (e.g., TENSOR_MAPPING).
  • Improved variable naming for better clarity (e.g., mp_idx, mp_count).
  • Added progress descriptions to tqdm bars for better visibility during execution.
  • Used more descriptive variable names throughout the code.

7. Structure

  • Separated the main logic into the convert_checkpoint function for better modularity.
  • Created a proper main() function with argument parsing for cleaner execution flow.
  • Better organization of related operations (e.g., tensor processing, sharding, and saving).

8. Safety

  • Added validation for tensor dimensions to ensure compatibility with model parallelism.
  • Added checks for missing files to prevent runtime errors.
  • Improved error messages to aid in debugging and troubleshooting.

fp8_cast_bf16.py

🔄 Major Structural Changes

  1. Created WeightConverter class
  2. Added type hints throughout
  3. Split main function into focused methods

📝 New Classes & Methods

  • WeightConverter
    • __init__
    • _load_model_index
    • _get_tensor
    • _manage_memory
    • convert

🛠 Key Improvements

  1. Better encapsulation of conversion logic
  2. Proper memory management
  3. Enhanced error handling
  4. Type safety with hints

🔍 Functionality

  • Maintained exact same conversion process
  • Same CLI interface
  • Identical output format

generate.py

Text Generator Refactoring

🔄 Major Structural Changes

  1. Created separate classes:
    • TokenSampler
    • TextGenerator
    • DistributedEnvironment
    • ChatSession
  2. Added GenerationConfig dataclass

📝 New Classes & Methods

  • TokenSampler: Handle token sampling logic
  • TextGenerator: Core generation functionality
  • DistributedEnvironment: Manage distributed setup
  • ChatSession: Handle chat interactions
  • GenerationConfig: Configuration management

🛠 Key Improvements

  1. Better separation of concerns
  2. Improved configuration management
  3. Enhanced distributed processing
  4. Clearer session handling
  5. Better type safety

🔍 Functionality

  • Same generation capabilities
  • Identical distributed processing
  • Same interactive and batch modes

kernel.py

FP8 Operations Refactoring

🔄 Major Structural Changes

  1. Created classes:
    • QuantizationKernels
    • MatrixMultKernels
    • TensorOps
  2. Added BlockConfig dataclass

📝 New Classes & Methods

  • QuantizationKernels: Handle quantization operations
  • MatrixMultKernels: Matrix multiplication operations
  • TensorOps: High-level interface
  • BlockConfig: Configuration management

🛠 Key Improvements

  1. Better organization of kernels
  2. Improved configuration handling
  3. Enhanced type safety
  4. Clearer operation grouping
  5. Better documentation

🔍 Functionality

  • Same quantization operations
  • Identical matrix multiplication
  • Same performance characteristics

@Pratiyankkumar Pratiyankkumar changed the title Refactor/codebase By defining different classes for different operations Refactored/codebase By defining different classes for different operations and much more Jan 29, 2025
@danial-qamar
Copy link

Well done bro.

@Pratiyankkumar
Copy link
Author

@mowentian Please review this PR

Copy link

@Ndegwadavid Ndegwadavid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems interesting

@z-a-f
Copy link

z-a-f commented Jan 29, 2025

Please, split the PR into smaller (ideally atomic) pieces. This is very hard to review + more error prone.

@Pratiyankkumar
Copy link
Author

Pratiyankkumar commented Jan 30, 2025

I am Testing the original and refactored code will upload the result soon ..., so I think then it would be easy for anyone to review

@Pratiyankkumar
Copy link
Author

Pratiyankkumar commented Jan 30, 2025

Here is the link of the repo that you can clone run the tests that compares the original and refactored files : https://github.com/Pratiyankkumar/Deepseek-Tests

The script would only run when you have a computer with better specs and more importantly if you are having Nvidia GPU , because triton (developed by Open AI) which is used in DeepSeek works on that, So i was unable to run the script , please try who are having the Nvidia GPU's , and if there comes an error other than that feel free to open the issue on my repo. And if the test is Successful please share the output here.

You can refer to this repo to read more about triton : https://github.com/triton-lang/triton

@Pratiyankkumar
Copy link
Author

@mowentian Please review this PR 🙏🏻🙏🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants