Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RoadMap] Mooncake Roadmap Q1 & Q2 2025 #44

Open
2 of 40 tasks
stmatengss opened this issue Dec 18, 2024 · 8 comments
Open
2 of 40 tasks

[RoadMap] Mooncake Roadmap Q1 & Q2 2025 #44

stmatengss opened this issue Dec 18, 2024 · 8 comments
Assignees
Labels
Roadmap Future roadmap or plan for new features

Comments

@stmatengss
Copy link
Collaborator

stmatengss commented Dec 18, 2024

We categorized our roadmap into two major themes: New Component (Mooncake Managed Object Store) and New Features of Mooncake. As we are seeing more.


New Component: Mooncake Managed Object Store

25Q1

  • Object Store Interfaces
  • Object Store Features
    • Support M-to-N (XpYd) KVCache Sharing between P/D instances
      • Support building multiple connections between Xp and Yd
      • Object Store Master to manage metadata of KVcache
      • Asynchronous KV cache transfer with Layer-by-layer pipeline

25Q2+

  • Advanced Features
    • Cluster reconfiguration
    • Better cache eviction strategy
    • Multi-tier caching

New Features of Mooncake

Transfer Engine

  • 25Q1
  • 25Q2+
    • Transfer data from GPU memory
      • More features integrated into GDS, including direct RDMA-based VRAM-to-DRAM transfers
      • VRAM-to-VRAM fast transfers based on NVLink
      • Support for GPU of other vendors (domestic...)
    • Path Selection
      • Automatically generate Topology Matrix according to hardware conditions
      • Optimize multi-NIC scheduling strategy based on request load and hardware characteristics
    • Performance
      • Working-set based QP management, avoiding RNIC cache thrashing
    • Security Enhancement
      • Remote memory protection based on RDMA keys
      • Auth

P2P Store

  • 25Q1
    • Support python interface
    • Network bandwidth allocation

LLM Framework Integration

  • 25Q1
    • ZeroCopy from vLLM to RDMA memory
    • Support More LLM Framework
      • TensorRT-LLM
    • Support Asynchronous KV cache transfer for vLLM
      • Layer-by-layer pipeline
      • API: transferAsync in class VLLMAdaptor
    • Support KVCache Prefetch

If any of the items you wanted is not on the roadmap, your suggestion and contribution are still welcomed! Please feel free to comment in this thread, open feature request, or create an RFC.

@VegetaPn
Copy link

The roadmap looks fantastic! What does "Cluster reconfiguration" mean? Does it refer to dynamically adjusting the P/D role type?

@SkylarKBKB
Copy link

Hello, I am interested in the Transport support CXL/shared memory and maybe I can do this in one month.

@stmatengss
Copy link
Collaborator Author

The roadmap looks fantastic! What does "Cluster reconfiguration" mean? Does it refer to dynamically adjusting the P/D role type?

Yes, it has two meanings. Firstly, any GPU server can freely join or leave the Mooncake KVCache pool. Secondly, a Prefill or Decoding Instance can change its role type.

@stmatengss
Copy link
Collaborator Author

Hello, I am interested in the Transport support CXL/shared memory and maybe I can do this in one month.

Thank you for your contribution. I look forward to seeing the pull request on GitHub.

@doujiang24
Copy link
Contributor

Check & revise error handling (problems from device/connection/software)
Hello, I'm happy to take this one in Q1.

@alogfans
Copy link
Collaborator

Check & revise error handling (problems from device/connection/software)
Hello, I'm happy to take this one in Q1.

Thank you for your contribution! Looking forward to seeing the pull request on GitHub.

@ANormalMan12
Copy link

ANormalMan12 commented Jan 6, 2025

I think it quite hard to implement "ZeroCopy from vLLM to RDMA memory" and "Layer-by-layer pipeline" without modifying core components of vllm a lot. Is there an easier way to implement "ZeroCopy" and "Layer-by-layer Pipeline"?

@cherhh
Copy link

cherhh commented Jan 10, 2025

Is the functionality provided by this Mooncake Managed Object Store similar to that of an in-memory database like Redis?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Roadmap Future roadmap or plan for new features
Projects
None yet
Development

No branches or pull requests

8 participants