Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor audio crate #1327

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

EzraEllette
Copy link
Contributor

@EzraEllette EzraEllette commented Feb 11, 2025

description

  • Moved pcm_to_mel into screenpipe and switched to tokio runtime.
  • Reapplied LRU cache for speaker embeddings
  • lazy load onnx sessions
  • stt is now fully async

related issue: #1318

how to test

Test using instruments.

Update your profile (~/.zshrc)

alias instruments="open /Applications/Xcode.app/Contents/Applications/Instruments.app"
get-task-allow () {
	codesign -s - -v -f --entitlements ~/.get-task-allow.ent $1
}

Create .get-task-allow.ent in ~

.get-task-allow.ent
<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
    <dict>
        <key>com.apple.security.get-task-allow</key>
        <true/>
    </dict>
</plist>

Update screenpipe/Cargo.toml [release] section (optional)

[release]
debug=true

Testing (TEST THOROUGHLY)

  1. Build screenpipe with --release (optional)
  2. run get-task-allow <path_to_screenpipe_build>
  3. Open instruments with instruments
  4. Select the Leaks & your screenpipe build as the target.
  5. Edit the target to add arguments.
  6. Press the record button. Fix permissions. It helps to remove screenpipe completely from permissions and add it back when prompted.
  7. Use screenpipe normally

Screenshots

Before

before

After

after

Ran for 13:33 and had under 400MB usage

/claim #1318

- Improve audio segment handling with more efficient buffer management
- Implement lazy-loading and caching for Pyannote and Whisper model sessions
- Add LRU cache for speaker embeddings
- Enhance async handling of STT and embedding extraction
- Optimize memory usage in audio processing components
- Update log_mel_spectrogram_ and pcm_to_mel functions to be async
- Replace thread::scope with tokio::task::spawn for parallel processing
- Remove Arc and thread synchronization in favor of async task spawning
- Add .await calls to async functions in processing pipeline
- Upgrade Candle libraries from 0.7.2 to 0.8.2
- Update tokenizers from 0.20.0 to 0.21.0
- Add num-traits dependency to screenpipe-audio
Copy link

vercel bot commented Feb 11, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
screenpipe ✅ Ready (Inspect) Visit Preview 💬 Add feedback Feb 11, 2025 5:47am

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant