Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VoxRay Real Estate Example #49

Open
wants to merge 27 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
1ca215c
Initial Voxray server implementation
cweems Jul 19, 2024
1f37424
Add voice SDK for testing
cweems Aug 2, 2024
c31a9cb
VoxRay Real Estate Example
twilio-cfeehan Aug 14, 2024
bc1a3c9
Added 'enableInterimResult=true' as <VoxRay> param
twilio-cfeehan Aug 14, 2024
0c9a42e
Support Streaming and Non-Streaming modes, prompts, mockdatabase, too…
twilio-cfeehan Aug 16, 2024
6e8c8fc
Added Personaliztion and SMS Confirmations
twilio-cfeehan Aug 16, 2024
916e601
Added Live Agent Handoff capability
twilio-cfeehan Aug 17, 2024
5a3cc8f
Fixed streaming issues. Added multi tool calling to non-streaming
twilio-cfeehan Aug 19, 2024
1306a41
Remove personalization.js from repository tracking
twilio-cfeehan Aug 19, 2024
690eac1
Fixed Time Normalization, Improved SMS handling, Fixed Streaming with…
twilio-cfeehan Aug 21, 2024
c1dfe89
Remove personalization.js from repository tracking
twilio-cfeehan Aug 22, 2024
9edb8a2
Cleanse repo for Voxray-only code
twilio-cfeehan Aug 22, 2024
d395aa0
Fixed bugs in streaming code
twilio-cfeehan Sep 3, 2024
e2ebbb9
Updates to streaming and logging
twilio-cfeehan Oct 9, 2024
4f589c1
added date utilities to mock-database
pBread Nov 5, 2024
26b7c72
removed deprecated deepgram & elevenlabs references
pBread Nov 5, 2024
d1fafb9
changed default voice
pBread Nov 5, 2024
2a47374
changed tts defaults
pBread Nov 5, 2024
fc1ab49
updated config naming to match env
pBread Nov 5, 2024
3b7c62b
updated TWIML from Voxray => ConversationRelay, moved env to config
pBread Nov 5, 2024
5ee9010
added standard urlencoded & json express middleware
pBread Nov 5, 2024
12948c3
added callSid to log
pBread Nov 5, 2024
e5b72be
moved env reference to cfg
pBread Nov 5, 2024
0a2de32
removed deepgram sdk from dependencies
pBread Nov 5, 2024
96bdb8d
removed package scripts that aren't used
pBread Nov 5, 2024
195a530
minor, sorted package scripts
pBread Nov 5, 2024
aef7377
Merge pull request #1 from pBread/voxray
twilio-cfeehan Nov 6, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added .DS_Store
Binary file not shown.
10 changes: 7 additions & 3 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,14 @@ SERVER='myserver.website.com'

# Service API Keys
OPENAI_API_KEY=
DEEPGRAM_API_KEY=

# Deepgram voice model, see more options here: https://developers.deepgram.com/docs/tts-models
VOICE_MODEL=aura-asteria-en
# Supported TTS Providers and Voices:
# all of these: https://www.twilio.com/docs/voice/twiml/say/text-speech#available-voices-and-languages
# plus these...
# Google: en-US-Journey-D, en-US-Journey-F, en-US-Journey-O, en-IN-Journey-D, en-IN-Journey-F, en-GB-Journey-D, en-GB-Journey-F, de-DE-Journey-D, de-DE-Journey-F
# Amazon: Amy-Generative, Matthew-Generative, Ruth-Generative
TTS_PROVIDER='amazon'
TTS_VOICE='Danielle-Neural'

# Call Recording
# Important: Legal implications of call recording
Expand Down
64 changes: 25 additions & 39 deletions .eslintrc.js
Original file line number Diff line number Diff line change
@@ -1,47 +1,33 @@
module.exports = {
'env': {
'browser': true,
'commonjs': true,
'es2021': true
env: {
browser: true,
commonjs: true,
es2021: true,
},
'extends': 'eslint:recommended',
'overrides': [
extends: "eslint:recommended",
overrides: [
{
'env': {
'node': true
env: {
node: true,
},
'files': [
'.eslintrc.{js,cjs}'
],
'parserOptions': {
'sourceType': 'script'
}
}
files: [".eslintrc.{js,cjs}"],
parserOptions: {
sourceType: "script",
},
},
],
'globals' : {
'expect': 'writeable',
'test': 'writeable',
'process': 'readable'
globals: {
expect: "writeable",
test: "writeable",
process: "readable",
},
parserOptions: {
ecmaVersion: "latest",
},
'parserOptions': {
'ecmaVersion': 'latest'
rules: {
indent: "off", // Turns off indent enforcement
"linebreak-style": "off", // Turns off linebreak enforcement
quotes: "off", // Turns off quote enforcement
semi: "off", // Turns off semicolon enforcement
},
'rules': {
'indent': [
'error',
2
],
'linebreak-style': [
'error',
'unix'
],
'quotes': [
'error',
'single'
],
'semi': [
'error',
'always'
]
}
};
18 changes: 18 additions & 0 deletions .github/workflows/fly-deploy.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# See https://fly.io/docs/app-guides/continuous-deployment-with-github-actions/

name: Fly Deploy
on:
push:
branches:
- main
jobs:
deploy:
name: Deploy app
runs-on: ubuntu-latest
concurrency: deploy-group # optional: ensure only one action runs at a time
steps:
- uses: actions/checkout@v4
- uses: superfly/flyctl-actions/setup-flyctl@master
- run: flyctl deploy --remote-only
env:
FLY_API_TOKEN: ${{ secrets.FLY_API_TOKEN }}
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@
# Edit at https://www.toptal.com/developers/gitignore?templates=node

### Node ###

#personalization
data/personalization.js

# Logs
logs
*.log
Expand Down
88 changes: 44 additions & 44 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,13 @@ Wouldn't it be neat if you could build an app that allowed you to chat with Chat
Twilio gives you a superpower called [Media Streams](https://twilio.com/media-streams). Media Streams provides a Websocket connection to both sides of a phone call. You can get audio streamed to you, process it, and send audio back.

This app serves as a demo exploring two services:
- [Deepgram](https://deepgram.com/) for Speech to Text and Text to Speech

- [OpenAI](https://openai.com) for GPT prompt completion

These service combine to create a voice application that is remarkably better at transcribing, understanding, and speaking than traditional IVR systems.

Features:

- 🏁 Returns responses with low latency, typically 1 second by utilizing streaming.
- ❗️ Allows the user to interrupt the GPT assistant and ask a different question.
- 📔 Maintains chat history with GPT.
Expand All @@ -19,21 +20,25 @@ Features:
## Setting up for Development

### Prerequisites

Sign up for the following services and get an API key for each:
- [Deepgram](https://console.deepgram.com/signup)

- [OpenAI](https://platform.openai.com/signup)

If you're hosting the app locally, we also recommend using a tunneling service like [ngrok](https://ngrok.com) so that Twilio can forward audio to your app.

### 1. Start Ngrok

Start an [ngrok](https://ngrok.com) tunnel for port `3000`:

```bash
ngrok http 3000
```

Ngrok will give you a unique URL, like `abc123.ngrok.io`. Copy the URL without http:// or https://. You'll need this URL in the next step.

### 2. Configure Environment Variables

Copy `.env.example` to `.env` and configure the following environment variables:

```bash
Expand All @@ -43,7 +48,6 @@ SERVER="yourserverdomain.com"

# Service API Keys
OPENAI_API_KEY="sk-XXXXXX"
DEEPGRAM_API_KEY="YOUR-DEEPGRAM-API-KEY"

# Configure your Twilio credentials if you want
# to make test calls using '$ npm test'.
Expand All @@ -54,17 +58,21 @@ TO_NUMBER='+13334445555'
```

### 3. Install Dependencies with NPM

Install the necessary packages:

```bash
npm install
```

### 4. Start Your Server in Development Mode

Run the following command:

```bash
npm run dev
```

This will start your app using `nodemon` so that any changes to your code automatically refreshes and restarts the server.

### 5. Configure an Incoming Phone Number
Expand All @@ -76,14 +84,16 @@ You can also use the Twilio CLI:
```bash
twilio phone-numbers:update +1[your-twilio-number] --voice-url=https://your-server.ngrok.io/incoming
```

This configuration tells Twilio to send incoming call audio to your app when someone calls your number. The app responds to the incoming call webhook with a [Stream](https://www.twilio.com/docs/voice/twiml/stream) TwiML verb that will connect an audio media stream to your websocket server.

## Application Workflow

CallGPT coordinates the data flow between multiple different services including Deepgram, OpenAI, and Twilio Media Streams:
![Call GPT Flow](https://github.com/twilio-labs/call-gpt/assets/1418949/0b7fcc0b-d5e5-4527-bc4c-2ffb8931139c)


## Modifying the ChatGPT Context & Prompt

Within `gpt-service.js` you'll find the settings for the GPT's initial context and prompt. For example:

```javascript
Expand All @@ -92,7 +102,9 @@ this.userContext = [
{ "role": "assistant", "content": "Hello! I understand you're looking for a pair of AirPods, is that correct?" },
],
```

### About the `system` Attribute

The `system` attribute is background information for the GPT. As you build your use-case, play around with modifying the context. A good starting point would be to imagine training a new employee on their first day and giving them the basics of how to help a customer.

There are some context prompts that will likely be helpful to include by default. For example:
Expand All @@ -109,17 +121,21 @@ These context items help shape a GPT so that it will act more naturally in a pho
The `•` symbol context in particular is helpful for the app to be able to break sentences into natural chunks. This speeds up text-to-speech processing so that users hear audio faster.

### About the `content` Attribute

This attribute is your default conversations starter for the GPT. However, you could consider making it more complex and customized based on personalized user data.

In this case, our bot will start off by saying, "Hello! I understand you're looking for a pair of AirPods, is that correct?"

## Using Function Calls with GPT

You can use function calls to interact with external APIs and data sources. For example, your GPT could check live inventory, check an item's price, or place an order.

### How Function Calling Works

Function calling is handled within the `gpt-service.js` file in the following sequence:

1. `gpt-service` loads `function-manifest.js` and requires (imports) all functions defined there from the `functions` directory. Our app will call these functions later when GPT gives us a function name and parameters.

```javascript
tools.forEach((tool) => {
const functionName = tool.function.name;
Expand All @@ -131,35 +147,42 @@ tools.forEach((tool) => {

```javascript
const stream = await this.openai.chat.completions.create({
model: 'gpt-4',
model: "gpt-4",
messages: this.userContext,
tools, // <-- function-manifest definition
stream: true,
});
```
3. When the GPT responds, it will send us a stream of chunks for the text completion. The GPT will tell us whether each text chunk is something to say to the user, or if it's a tool call that our app needs to execute. This is indicated by the `deltas.tool_calls` key:

3. When the GPT responds, it will send us a stream of chunks for the text completion. The GPT will tell us whether each text chunk is something to say to the user, or if it's a tool call that our app needs to execute. This is indicated by the `deltas.tool_calls` key:

```javascript
if (deltas.tool_calls) {
// handle function calling
}
```

4. Once we have gathered all of the stream chunks about the tool call, our application can run the actual function code that we imported during the first step. The function name and parameters are provided by GPT:

```javascript
const functionToCall = availableFunctions[functionName];
const functionResponse = functionToCall(functionArgs);
```

5. As the final step, we add the function response data into the conversation context like this:

```javascript
this.userContext.push({
role: 'function',
role: "function",
name: functionName,
content: functionResponse,
});
```

We then ask the GPT to generate another completion including what it knows from the function call. This allows the GPT to respond to the user with details gathered from the external data source.

### Adding Custom Function Calls

You can have your GPT call external data sources by adding functions to the `/functions` directory. Follow these steps:

1. Create a function (e.g. `checkInventory.js` in `/functions`)
Expand Down Expand Up @@ -200,19 +223,23 @@ Example function manifest entry:
},
}
```

#### Using `say` in the Function Manifest

The `say` key in the function manifest allows you to define a sentence for the app to speak to the user before calling a function. For example, if a function will take a long time to call you might say "Give me a few moments to look that up for you..."

### Receiving Function Arguments

When ChatGPT calls a function, it will provide an object with multiple attributes as a single argument. The parameters included in the object are based on the definition in your `function-manifest.js` file.

In the `checkInventory` example above, `model` is a required argument, so the data passed to the function will be a single object like this:

```javascript
{
model: "airpods pro"
model: "airpods pro";
}
```

For our `placeOrder` function, the arguments passed will look like this:

```javascript
Expand All @@ -221,71 +248,44 @@ For our `placeOrder` function, the arguments passed will look like this:
quantity: 10
}
```

### Returning Arguments to GPT
Your function should always return a value: GPT will get confused when the function returns nothing, and may continue trying to call the function expecting an answer. If your function doesn't have any data to return to the GPT, you should still return a response with an instruction like "Tell the user that their request was processed successfully." This prevents the GPT from calling the function repeatedly and wasting tokens.

Your function should always return a value: GPT will get confused when the function returns nothing, and may continue trying to call the function expecting an answer. If your function doesn't have any data to return to the GPT, you should still return a response with an instruction like "Tell the user that their request was processed successfully." This prevents the GPT from calling the function repeatedly and wasting tokens.

Any data that you return to the GPT should match the expected format listed in the `returns` key of `function-manifest.js`.

## Utility Scripts for Placing Calls

The `scripts` directory contains two files that allow you to place test calls:

- `npm run inbound` will place an automated call from a Twilio number to your app and speak a script. You can adjust this to your use-case, e.g. as an automated test.
- `npm run outbound` will place an outbound call that connects to your app. This can be useful if you want the app to call your phone so that you can manually test it.

## Using Eleven Labs for Text to Speech
Replace the Deepgram API call and array transformation in tts-service.js with the following call to Eleven Labs. Note that sometimes Eleven Labs will hit a rate limit (especially on the free trial) and return 400 errors with no audio (or a clicking sound).

```
try {
const response = await fetch(
`https://api.elevenlabs.io/v1/text-to-speech/21m00Tcm4TlvDq8ikWAM/stream?output_format=ulaw_8000&optimize_streaming_latency=3`,
{
method: 'POST',
headers: {
'xi-api-key': process.env.XI_API_KEY,
'Content-Type': 'application/json',
accept: 'audio/wav',
},
body: JSON.stringify({
model_id: process.env.XI_MODEL_ID,
text: partialResponse,
}),
}
);

if (response.status === 200) {
const audioArrayBuffer = await response.arrayBuffer();
this.emit('speech', partialResponseIndex, Buffer.from(audioArrayBuffer).toString('base64'), partialResponse, interactionCount);
} else {
console.log('Eleven Labs Error:');
console.log(response);
}
} catch (err) {
console.error('Error occurred in XI LabsTextToSpeech service');
console.error(err);
}
```


## Testing with Jest

Repeatedly calling the app can be a time consuming way to test your tool function calls. This project contains example unit tests that can help you test your functions without relying on the GPT to call them.

Simple example tests are available in the `/test` directory. To run them, simply run `npm run test`.

## Deploy via Fly.io

Fly.io is a hosting service similar to Heroku that simplifies the deployment process. Given Twilio Media Streams are sent and received from us-east-1, it's recommended to choose Fly's Ashburn, VA (IAD) region.

> Deploying to Fly.io is not required to try the app, but can be helpful if your home internet speed is variable.

Modify the app name `fly.toml` to be a unique value (this must be globally unique).

Deploy the app using the Fly.io CLI:

```bash
fly launch

fly deploy
```

Import your secrets from your .env file to your deployed app:

```bash
fly secrets import < .env
```
Loading