Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split visits and downloads into two apps. Support pip compile. Add pydantic-settings #10

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 34 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,22 +10,23 @@ Currently, there are the following endpoints that are used:
### `/version`

- Information about the latest available release
- MultiQC uses this to print a log message advising if the current version is out of date, with information about how to upgrade.
- MultiQC uses this to print a log message advising if the current version is out of date, with information about
how to upgrade.
- _[Planned]_: Broadcast messages
- Can be used to announce arbitrary information, such as critical changes.
- No usage currently anticipated, this is mostly a future-proofing tool.
- Can be used to announce arbitrary information, such as critical changes.
- No usage currently anticipated, this is mostly a future-proofing tool.
- _[Planned]_: Module-specific warnings
- Warnings scoped to module and MultiQC version
- Will allow MultiQC to notify end users via the log if the module that they are running has serious bugs or errors.
- Warnings scoped to module and MultiQC version
- Will allow MultiQC to notify end users via the log if the module that they are running has serious bugs or errors.

### `/downloads`

- MultiQC package downloads across multiple sources, and, when available, different versions:
- [PyPI](https://pypi.org/project/multiqc) (additionally, split by version)
- [BioConda](https://bioconda.github.io/recipes/multiqc) (additionally, split by version)
- [DockerHub](https://hub.docker.com/r/ewels/multiqc)
- [GitHub clones](https://github.com/ewels/MultiQC/graphs/traffic)
- [BioContainers (AWS mirror)](https://api.us-east-1.gallery.ecr.aws/getRepositoryCatalogData)
- [PyPI](https://pypi.org/project/multiqc) (additionally, split by version)
- [BioConda](https://bioconda.github.io/recipes/multiqc) (additionally, split by version)
- [DockerHub](https://hub.docker.com/r/ewels/multiqc)
- [GitHub clones](https://github.com/ewels/MultiQC/graphs/traffic)
- [BioContainers (AWS mirror)](https://api.us-east-1.gallery.ecr.aws/getRepositoryCatalogData)

## Logged metrics

Expand All @@ -40,11 +41,15 @@ Currently, it reports:
- _[Planned]_: Installation method (pip|conda|docker|unknown)
- _[Planned]_: CI environment (GitHub Actions|none)

No identifying information is collected. No IPs are logged, no information about what MultiQC is being used for or where, no sample data or metadata is transferred. All code in both MultiQC and this API is open source and can be inspected.
No identifying information is collected. No IPs are logged, no information about what MultiQC is being used for or
where, no sample data or metadata is transferred. All code in both MultiQC and this API is open source and can be
inspected.

This version check can be disabled by adding `no_version_check: true` to your MultiQC config (see [docs](https://multiqc.info/docs/getting_started/config/#checks-for-new-versions)).
This version check can be disabled by adding `no_version_check: true` to your MultiQC config (
see [docs](https://multiqc.info/docs/getting_started/config/#checks-for-new-versions)).

The request uses a very short timeout (2 seconds) and fails silently if MultiQC has no internet connection or an unexpected response is returned.
The request uses a very short timeout (2 seconds) and fails silently if MultiQC has no internet connection or an
unexpected response is returned.

## Production deployment

Expand All @@ -56,6 +61,8 @@ ghcr.io/multiqc/apimultiqcinfo:latest

## Development

### Local build

> **Note:**
> These instructions are intended for local development work, not a production deployment.

Expand All @@ -74,10 +81,23 @@ docker compose up

The API should now be available at <http://localhost:8008/>

I recommend using something like [Postcode](https://marketplace.visualstudio.com/items?itemName=rohinivsenthil.postcode) (VSCode extension) or [httpie](https://httpie.io/) or similar.
I recommend using something
like [Postcode](https://marketplace.visualstudio.com/items?itemName=rohinivsenthil.postcode) (VSCode extension)
or [httpie](https://httpie.io/) or similar.

When you're done, <kbd>Ctrl</kbd>+<kbd>C</kbd> to exit, then lean up:

```bash
docker compose down
```

### Dependencies

To add a dependency, add it to the `pyproject.toml` file and then compile the requirements:

```sh
uv pip compile pyproject.toml -o requirements.txt
uv pip compile pyproject.toml --extra dev -o requirements-dev.txt
```


125 changes: 125 additions & 0 deletions app/app_downloads.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
import asyncio
import logging

import datetime
from contextlib import asynccontextmanager
from typing import cast

import uvicorn

from fastapi import BackgroundTasks, FastAPI, HTTPException, status
from fastapi.responses import PlainTextResponse
from fastapi.routing import APIRoute
from sqlalchemy.exc import ProgrammingError

from app import __version__, db

logger = logging.getLogger("multiqc_app_downloads")

logger.info("Starting MultiQC API download scraping service")

# Add timestamp to the uvicorn logger
for h in logging.getLogger("uvicorn.access").handlers:
h.setFormatter(logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s"))


@asynccontextmanager
async def lifespan(_: FastAPI):
asyncio.create_task(update_downloads())

yield


async def update_downloads():
"""
Repeated task to update the daily download statistics
"""
while True:
_update_download_stats()
await asyncio.sleep(60 * 60 * 24) # 24 hours


app = FastAPI(
title="MultiQC download scraper service",
description="MultiQC API service, providing run-time information about available " "updates.",
version=__version__,
license_info={
"name": "Source code available under the MIT Licence",
"url": "https://github.com/MultiQC/api.multiqc.info/blob/main/LICENSE",
},
)

db.create_db_and_tables()


@app.get("/")
async def index(_: BackgroundTasks):
"""
Root endpoint for the API.
Returns a list of available endpoints.
"""
routes = [cast(APIRoute, r) for r in app.routes]
return {
"message": "Welcome to the MultiQC downloads scraping service",
"available_endpoints": [
{"path": route.path, "name": route.name} for route in routes if route.name != "swagger_ui_redirect"
],
}


@app.get("/health")
async def health():
"""
Health check endpoint. Checks if the visits table contains records
in the past 15 minutes.
"""
try:
visits = db.get_visit_stats(start=datetime.datetime.now() - datetime.timedelta(minutes=15))
except Exception as e:
raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, detail=str(e))
if not visits:
raise HTTPException(status_code=status.HTTP_503_SERVICE_UNAVAILABLE, detail="No recent visits found")
return PlainTextResponse(content=str(len(visits)))


@app.post("/update_downloads")
async def update_downloads_endpoint(background_tasks: BackgroundTasks):
"""
Endpoint to manually update the daily download statistics
"""
try:
background_tasks.add_task(_update_download_stats)
msg = "Queued updating the download stats in the DB"
logger.info(msg)
return PlainTextResponse(content=msg)
except Exception as e:
msg = f"Failed to update the download stats: {e}"
raise HTTPException(status_code=status.INTERNAL_SERVER_ERROR, detail=msg)


def _update_download_stats():
"""
Update the daily download statistics in the database
"""
try:
existing_downloads = db.get_download_stats()
except ProgrammingError:
logger.error("The table does not exist, will create and populate with historical data")
existing_downloads = []
if len(existing_downloads) == 0: # first time, populate historical data
logger.info("Collecting historical downloads data...")
df = daily.collect_daily_download_stats()
logger.info(f"Adding {len(df)} historical entries to the table...")
db.insert_download_stats(df)
logger.info(f"Successfully populated {len(df)} historical entries")
else: # recent days only
n_days = 4
logger.info(f"Updating downloads data for the last {n_days} days...")
df = daily.collect_daily_download_stats(days=n_days)
logger.info(f"Adding {len(df)} recent entries to the table. Will update existing " f"entries at the same date")
db.insert_download_stats(df)
logger.info(f"Successfully updated {len(df)} new daily download statistics")


if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8000)
Loading