Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(graphql): added support for graphql subscriptions to work for actions #6904

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

psihius
Copy link
Contributor

@psihius psihius commented Jan 9, 2025

Q A
Branch? 4.0 or main
Tickets None that I found
License MIT
Doc PR

This PR makes create and delete operations issue mercure updates for GraphQL subscriptions.
It also adds the ability to configure "restrictive updates" - https://api-platform.com/docs/core/mercure/#dispatching-restrictive-updates-security-mode - implementation follows the same idea, just one that fits GraphQL subscriptions.

Some things that need feedback/work/ideas:

  • Private fields config option probably should also use expression language same way topics do
  • Caching item of the subscriptionCollection is going to become quite sizeable when you have a lot of segmentation based on owners of the data and I don't have a good idea how to segment that - operation context is very restricted in the doctrine listener and I don't have a good knowledge of the internals to properly construct one, but I did my best. Best idea I have is to read the object's data and use it as part of the cache key (the private fields option). One problem is when deleting data - when in publishUpdate, the object data is already gone since it constructs and passes and stdObject with very limited data, so I do not have context at that point.
  • Mercure payload format probably needs updating - with it now containing all 3 types of operations, I think it needs a field that contains what type of operation it was - create, update or deleted so mercure consumers can decide what to do
  • Deleted stdObject contains too little information, kind'a want ability to retain some of the data in there, like for example the parent relation field that points to the parent ID the record was deleted for. No ideas as of now how to implement that besides adding more configurations to mercure and modifying publish listener's deleteObject if branch to retain extra fields from the resource, kind'a like i did with private_fields in the SubscriptionManager and graphql object context.

This is a WIP because I need feedback on the direction taken and suggestions for improving some aspects of it.

This code is being used in production, works for our use case, but it's obviously far from ideal and I want to improve it and make it standard part of ApiPlatform.

The goal for this is to make managing items on frontend easy purely via mercure updates via graphql subscriptions. In our case it's a chat like frontend that needs to not only receive updates, but also new data (new messages, new chats) and data removal messages. This puts GraphQL API on par with functionality of the REST API where you can have restrictive data updates AND you can subscribe to collections which include all 3 types of data events.

@psihius
Copy link
Contributor Author

psihius commented Jan 15, 2025

@soyuka @dunglas could you provide some feedback please, so I can decide how I want to continue working on this? :)

Copy link
Member

@soyuka soyuka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Private fields config option probably should also use expression language same way topics do

Not sure to understand this functionality, did I miss something?

  • Caching item of the subscriptionCollection is going to become quite sizeable when you have a lot of segmentation based on owners of the data and I don't have a good idea how to segment that

I don't think we should send collection updates, only items inside a collection should be sufficient.

When a data is deleted you can still send an update on its IRI (as mercure topic) with a null value? I'm not sure why data should be part of a cache key, instead we should use IRIs just as in REST? Or is it that you want to send a mercure update with only the fields selected in the GraphQl Query?

  • Mercure payload format probably needs updating - with it now containing all 3 types of operations, I think it needs a field that contains what type of operation it was - create, update or deleted so mercure consumers can decide what to do

IMO, we should always consider a mercure update as a modified element, it's a new one if you don't have it in your referential, or else it's updated. You can detect it's deleted when the value is null.

  • Deleted stdObject contains too little information, kind'a want ability to retain some of the data in there, like for example the parent relation field that points to the parent ID the record was deleted for.

Not sure why we would need this. It makes things really complicated when we look for data associations, when do you stop looking for relations?

src/GraphQl/Subscription/SubscriptionManager.php Outdated Show resolved Hide resolved
/** @var ResolveInfo $info */
$info = $context['info'];
$fields = $info->getFieldSelection(\PHP_INT_MAX);
$this->arrayRecursiveSort($fields, 'ksort');
$iri = $operation ? $this->getIdentifierFromOperation($operation, $context['args'] ?? []) : $this->getIdentifierFromContext($context);
if (null === $iri) {
if (empty($iri)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's avoid a function call here if this can return an empty array we should fix the return type of the above functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also returns an empty string if you provide a value, but it's not a valid URI. That's why I made empty call instead of doing null === $iri || '' === $iri || [] === $iri.
Those 2 methods called above are part of a IdentifierTrait trait, means all call sides will need modification.

foreach ($privateFieldsConfig as $privateField) {
$privateFieldData['__private_field_'.$privateField] = $this->getResourceId($privateField, $object);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen this logic twice, could you use a function to remove a few lines?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A Trait then? This logic is used in 2 different files.

@@ -22,5 +22,5 @@ interface SubscriptionManagerInterface
{
public function retrieveSubscriptionId(array $context, ?array $result): ?string;

public function getPushPayloads(object $object): array;
public function getPushPayloads(object $object, string $type): array;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a public interface, the added argument should be optional to cover the BC layer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

src/GraphQl/Subscription/SubscriptionManager.php Outdated Show resolved Hide resolved
@@ -50,6 +50,7 @@ final class PublishMercureUpdatesListener
'topics' => true,
'data' => true,
'private' => true,
'private_fields' => true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are private_fields? couldn't find that in the specification.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are fields that are also read into the fields array based on witch the sha256 hash for the subscription id is generated. In the REST side of things this is called "Issuing restrictive updates": https://api-platform.com/docs/core/mercure/#dispatching-restrictive-updates-security-mode
There you just generate a topic prefix, but since topics are not part of GraphQL subscription, we are incorporating value of the field so it generates unique subscription id that works only for that specific combo of private fields.

private function getCollectionIri(string $iri): string
{
return substr($iri, 0, strrpos($iri, '/'));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too hackish we need to find a better way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm all ears, I haven't found one :)

Comment on lines +153 to +160
private function getResourceId(mixed $privateField, object $previousObject): string
{
$id = $previousObject->{'get' . ucfirst($privateField)}()->getId();
if ($id instanceof \Stringable) {
return (string)$id;
}
return $id;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is usually using identifiersExtractor could you explain why you need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the part that extract the value from an object filed that is used for restrictive updates to be limited to specific context (in my case to a specific owner_id).

It also is extracting a defined field, not an ID, so the name for the method should be changed.
I am all for doing it in a better way :)

@@ -23,7 +23,21 @@ final class SubscriptionIdentifierGenerator implements SubscriptionIdentifierGen
public function generateSubscriptionIdentifier(array $fields): string
{
unset($fields['mercureUrl'], $fields['clientSubscriptionId']);
$fields = $this->removeTypename($fields);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed?

Copy link
Contributor Author

@psihius psihius Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When executing a subscription GraphQL query, the __typename key is not present in fields in the SubscriptionProcessor, but when you get fields in doctrine subscriber to publish updates, the __typename is present in the fields. That results in different sha256 hashes as subscription id, which breaks publishing.

@psihius
Copy link
Contributor Author

psihius commented Jan 17, 2025

  • Private fields config option probably should also use expression language same way topics do

Not sure to understand this functionality, did I miss something?

https://api-platform.com/docs/core/mercure/#dispatching-restrictive-updates-security-mode - this, but for the graphql subscription. It's a way to restrict data pushing to specific user/resource/whatever way you want to restrict access based on resource field values. I use it to restrict access between accounts in SaaS service.

  • Caching item of the subscriptionCollection is going to become quite sizeable when you have a lot of segmentation based on owners of the data and I don't have a good idea how to segment that

I don't think we should send collection updates, only items inside a collection should be sufficient.
This already does not happen. When a resource already exists, it hits the individual resource URI cache, so it doesn't access the collection cache item.

$iri = $this->iriConverter->getIriFromResource($object);
// Find based on IRI if there's an item already cached. If resource is not being created - this always returns the subscription array, meaning the collection iri is never accessed.
$subscriptions = $this->getSubscriptionsFromIri($iri);
if ($subscriptions === []) {
    // Get subscriptions from collection Iri
    $subscriptions = $this->getSubscriptionsFromIri($this->getCollectionIri($iri));
}

As you can see here, it first tries to get subscriptions from full IRI. If it's an update operation - the item is already in cache, if not - it goes to collection cache (which would happen only for create operation).
As I understand, the underlying cache is system cache, so data is permanently in storage, meaning it will access collection cache only for create action, never for update or delete.

When a data is deleted you can still send an update on its IRI (as mercure topic) with a null value? I'm not sure why data should be part of a cache key, instead we should use IRIs just as in REST? Or is it that you want to send a mercure update with only the fields selected in the GraphQl Query?
Because that data is used to find the correct subscription id that has exactly the same field set with the same values in the private fields section.

if ($private && $privateFields && $previousObject) {
    foreach ($options['private_fields'] as $privateField) {
        $fields['__private_field_'.$privateField] = $this->getResourceId($privateField, $previousObject);
    }
}
$subscriptionsCacheItem = $this->subscriptionsCache->getItem($this->encodeIriToCacheKey($iri));
$subscriptions = [];
if ($subscriptionsCacheItem->isHit()) {
    $subscriptions = $subscriptionsCacheItem->get();
    foreach ($subscriptions as [$subscriptionId, $subscriptionFields, $subscriptionResult]) {
        if ($subscriptionFields === $fields) {
            return $subscriptionId;
        }
    }
}

See that second foreach there? It filters out based on field combo subscriptions that are not addressed to the specific combo of fields. The _private_field$field item makes sure with it's value that you send updates to a subscription id that is restricted to the same field value. IF you use owner_id aka the user who owns it, then you effectively restrict pushing the update only to subscriptions that were created by that user only.

There's the corresponding change in SubscriptionManager::retrieveSubscriptionId which gets called in SubscriptionProcessor which is where the subscription id is created for a given IRI, that subscription id is created from fields that were selected as return data for the subscription. In there I also inject those private field and their data from the resource in question, which affects the value of generated subscription id. The subscription id always is generated as a resource wide subscription, it's the same id regardless what IRI you pass as long as it's the same resource type. The cache just was stored per resource IRI (aka per item), so I just added also the collection level cache item so you can push updates for "create" operation, since that adds a new item and you need to know what subscriptions you need to push that update into.

  • Mercure payload format probably needs updating - with it now containing all 3 types of operations, I think it needs a field that contains what type of operation it was - create, update or deleted so mercure consumers can decide what to do

IMO, we should always consider a mercure update as a modified element, it's a new one if you don't have it in your referential, or else it's updated. You can detect it's deleted when the value is null.

I did modify the output for the "delete" operation to contain operationType: deleted - the one receiving mercure update can check for that. It just makes things easier, but you can totally figure out that it's a delete operation just by the fact it sends you IRI only and that's it, no other data.
The current data format was designed only for Update operation, that's why I think it might need updating.

  • Deleted stdObject contains too little information, kind'a want ability to retain some of the data in there, like for example the parent relation field that points to the parent ID the record was deleted for.

Not sure why we would need this. It makes things really complicated when we look for data associations, when do you stop looking for relations?
Yeah, I think it's not worth it. In my case it was when a chat message is deleted, I kind'a wanted to know what chat it belongs to. But I also receive an update for chat itself that something change, so I can re-query messages. But my idea was that if I know what chat id it is, I don't need to maintain the whole chat message => chat mapping array, which might get very big.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants