In this example we will use Multinode to solve the classic travelling salesman problem (TSP) using the simulated annealing (SA) algorithm.
TSP is a well known problem, where, given a list of cities and the distances between them, the goal is to find the shortest possible route that visits each city exactly once, and returns to the origin city.
The classic SA algorithm works as follows:
- Initialize the
temperature
parameter to some large value. - Initialize the
cooling_rate
parameter to a value between 0 and 1 (usually very close to 1). - Initialize
current_route
to a random route. - Repeat
iterations
times (e.g.,iterations=10_000
):- Perturb the current route to get a new candidate route (e.g., swap order of two random cities - if Los Angeles was 2nd on the route and New York was 5th, make Los Angeles 5th and New York 2nd).
- Accept the new candidate route as the current route with probability:
p = exp((current_route_length - new_route_length)/temperature)
. (If the new route is shorter, the probability will be > 1, so it will definitely be accepted. Otherwise, the probability will be between 0 and 1 and: (i) the higher the temperature, the higher the probability, (ii) the higher the difference between the two route lengths, the lower the probability.) - Reduce the temperature by multiplying by the cooling rate:
temperature = temperature * cooling_rate
.
A visualization of simulated annealing applied to the travelling salesman problem in a setting with 125 cities arranged in 5 clusters:
Rather than implementing the classic simulated annealing algorithm, we will implement a variant that takes advantage of distributed parallel compute, thus showcasing the full power of Multinode.
- Initialize the
temperature
parameter to some large value. - Initialize the
cooling_rate
parameter to a value between 0 and 1 (usually very close to 1). - Initialize
current_route
to a random route. - Repeat
top_level_iterations
times (e.g.,top_level_iterations=5
):- Run
n_workers
workers (e.g.,n_workers=4
), where each worker independently runsworker_iterations
iterations of the classic SA algorithm, starting from thecurrent_route
and the currenttemperature
. (Each iteration of the classic SA algorithm involves perturbing the route to get a new candidate, accepting or rejecting the candidate, and reducing the temperature.) Crucially, each worker is seeded with a different randomseed
. - Get the final route from each worker and select the shortest one as the new
current_route
.
- Run
You can visualize this workflow as follows:
At any given moment, we have not one but n_workers
machines working on the problem at
the same time. At regular intervals, the machines share their results and pick the
best route so far to continue with. This way, we arrive at the solution much faster than
by running code locally or on a single cloud machine.
Our application consists of 3 files:
This is the implementation of the distributed SA TSP solver, which we will register with the Multinode control plane. It follows the structure described in the definitions API reference.
tasks/main.py
defines two Multinode functions: solve_tsp
and simulated_annealing
.
The solve_tsp
function orchestrates the entire distributed SA algorithm end-to-end.
@mn.function(cpu=0.1)
def solve_tsp(
n_cities: int,
distances: np.ndarray,
top_level_iterations: int = 5,
worker_iterations: int = 500_000,
n_workers: int = 4,
initial_temperature: float = 20.0,
cooling_rate: float = 0.999997,
):
...
In each iteration of the algorithm, solve_tsp
dispatches n_workers
workers
running the simulated_annealing
function:
@mn.function(cpu=4)
def simulated_annealing(
curr_route: np.ndarray,
distances: np.ndarray,
n_iterations: int = 1000,
temp: float = 1000,
cooling_rate: float = 0.999,
seed: int = 0,
):
...
This is the list of Python dependencies that we want to install on each Multinode worker.
This is a FastAPI app that allows users to trigger the travelling salesman calculation via API calls.
We will run this app locally, but it is also possible to run it on any machine with access to the internet - for example, an EC2 instance, your own ECS cluster, Google Cloud Run, Heroku, etc.
application/main.py
accesses solve_tsp
via the get_deployed_function
method:
from multinode import get_deployed_function
solve_tsp = get_deployed_function(
project_name="tsp-solver",
function_name="solve_tsp"
)
application/main.py
defines 3 API endpoints:
POST /tsp
- usessolve_tsp.start(np.arange(50), np.random.rand(50, 50))
to start a new invocation ofsolve_tsp
. Returns theinvocation_id
of this invocation. The input tosolve_tsp
consists of 50 cities numbered from 0 to 49, with random distances between each pair.GET /tsp/{invocation_id}
- usessolve_tsp.get(invocation_id)
to get the status of the invocation with IDinvocation_id
, and its latest result.PUT /tsp/{invocation_id}/cancel
- usessolve_tsp.cancel(invocation_id)
to cancel the invocation with IDinvocation_id
.
To run this example, make sure that:
- you have Python, Pip, and Docker installed on your system
- you have installed Multinode in your AWS account
You also need to install additional Python dependencies:
pip install multinode fastapi uvicorn numpy
And if you have not already done so, you need to authenticate with the Multinode control plane:
multinode login
You can now register the solve_tsp
and simulated_annealing
functions
with the Multinode control plane, under the project name tsp-solver
:
multinode deploy tasks/ --project-name tsp-solver
(This assumes that you are currently in the example-project directory.)
tsp-solver
should now appear under the list of projects:
multinode list
And solve_tsp
and simulated_annealing
should appear as functions registered under the project tsp-solver
.
multinode describe --project-name tsp-solver
All that remains is to start the FastAPI app, so that users can trigger the travelling salesman calculation via API calls:
uvicorn application.main:app --reload
And you're done!
As a user, you can send requests to the API using the curl
command.
First, dispatch a TSP solver task:
curl -X POST http://127.0.0.1:8000/tsp
Take note of the INVOCATION_ID
in the response.
Then, periodically, check the status of the task:
curl http://127.0.0.1:8000/tsp/INVOCATION_ID
You should notice the response changing over time:
- Initially, the response is
Provisioning resources for the invocation.
- When cloud resources have been provisioned, the response changes to
Invocation is running.
- Once the first intermediate results are in, the response becomes
Invocation is running. Best distance so far: <some value>
. The value will keep changing after each top-level iteration. - Eventually, when the task finishes, the response becomes
Invocation finished. Best distance: <some value>
.
If at any point you decide to cancel the invocation, you can run:
curl -X PUT http://127.0.0.1:8000/tsp/INVOCATION_ID/cancel
This is of course only a toy example of using Multinode deployed functions as a part of a larger application. In practice, you can trigger Multinode functions in any way you want - from a Python API, a cronjob, a Lambda function, or even your frontend!