Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation: App Layer Rate Limiting #137

Open
12 tasks
k-macmillan opened this issue Jan 10, 2025 · 0 comments
Open
12 tasks

Validation: App Layer Rate Limiting #137

k-macmillan opened this issue Jan 10, 2025 · 0 comments
Labels
Dev Reviewed Reviewed by Tech Lead Notify Board trigger PM Reviewed Reviewed by Product Manager QA Reviewed Reviewed by Quality Assurance QA Issue requires QA collaboration

Comments

@k-macmillan
Copy link
Member

k-macmillan commented Jan 10, 2025

User Story - Business Need

We will use redis to handle app-layer rate limiting.

Investigated alternatives, memcached it not really supported by the python community, whereas redis has async support.

  • Sync with Kyle when ticket is picked up.
  • Ticket is understood, and QA has been contacted (if the ticket has a QA label).

User Story(ies)

As a VA Notify reliability dev
I want to limit API requests
So that one client is not capable of affecting another

Additional Info and Resources

We likely want to use:

pool = redis.ConnectionPool.from_url(<url>)
EnpState.redis_client = redis.Redis.from_pool(pool)

and set that as part of our EnpState with lifespan (setup/shutdown), making sure to close it with the shutdown.

Redis Keys should be rate-limit-<service_id>-<api_key_id> of an authenticated request because some clients have sub-groups using different keys.

Example taken (and tweaked) from Redis' Youtube account:

async def is_rate_limited(redis_client: Redis, key: str, observation_period: int, limit: int) -> bool:
    """Decrements the value of a given key and returns if the key is rate limited or not.

       Key expires after observation_period seconds, which allows for small bursts within the given timeframe.
    """
    limited: bool = True
    # Set the key and value if the key does not exist
    if await redis_client.setnx(key, limit):
        # Expire after the observation_period seconds
        await redis_client.expire(key, observation_period)
    request_count = await redis_client.get(key)
    # Each call reduces the count by one, resets after the key is expired
    if request_count and int(request_count) > 0:
        await redis_client.decrby(key, 1)
        limited = False
    return limited

Acceptance Criteria

This AC is based on the suggestion in the Additional Info. If the dev would like to move in a different direction please discuss with @k-macmillan so we can update the AC.

  • redis.asyncio used and added to the project dependencies (pypi page)
  • Environment variable for the redis url. Defaults to local
  • Environment variable to set the limit (value for keys). Default = 1800 (1800/60 = 30 req/s)
  • Environment variable to set the observation_period (how long until the key expires). Default = 60
  • key is remaining-rate-limit-<service_id>-<api_key_id>
  • Redis added to docker-compose file
  • Environment variables for docker-compose set in ci/.env.local
  • Used as a Depends on sms/email/ POST routes and notification GET route (for any that are implemented)
  • Runs after auth is verified. If necessary, group the two Depends into a single method and call each within that method
  • This work is added to the sprint review slide deck (key win bullet point and demo slide)

QA Considerations

Multiple services and api keys should all work as expected. Could be tested by automating a load test to hammer it, or extending the observation_period and reducing the limit. There are some mild concerns with async behaving correctly, so the preference is to hammer the endpoint(s). May be good to try both (start slow, finish with the hammer) .

Potential Dependencies

Out of Scope

Determining limits/observation periods. We will fine tune that based on performance/reliability evaluations later.

There may be some race conditions due to decrby creating the key value pair. If we find it in testing we'll address with transactions and modified flow. Docs aren't clear enough.

@k-macmillan k-macmillan added Dev Reviewed Reviewed by Tech Lead Notify Board trigger QA Issue requires QA collaboration labels Jan 10, 2025
@cris-oddball cris-oddball added the QA Reviewed Reviewed by Quality Assurance label Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dev Reviewed Reviewed by Tech Lead Notify Board trigger PM Reviewed Reviewed by Product Manager QA Reviewed Reviewed by Quality Assurance QA Issue requires QA collaboration
Projects
None yet
Development

No branches or pull requests

4 participants