-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(fabric
): switch to maps for view rows
#4984
Conversation
c225e75
to
ced2e74
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looked a lot smaller than expected! Well done. Just few minor suggestions and cleanup as review comments.
I noticed you opted to not use the {view_row, #{}}
shape, like in the original PR and just went straight for a map as a row. That's probably simpler. I guess I could see it work either way. If we opt to send execution stats with a complete call that could look like {complete, #{stats => Stats}}
or something of that nature. Let's keep it as a simple map for now.
src/fabric/src/fabric_view.erl
Outdated
{row, [{id, Id}, {key, Key}, {value, Value}, {doc, Doc}]}. | ||
{row, [{id, Id}, {key, Key}, {value, Value}, {doc, Doc}]}; | ||
transform_row(#{} = Row) -> | ||
transform_row(fabric_util:to_view_row_record(Row)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes the PR smaller, but eventually when we remove the view_row
it will be a bit more work, which is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. The rolling execution stats will already need to expand this body further to inject the stats field.
I'd prefer the |
Thinking about it some more, I agree as well. It will make the receive clauses a bit clearer. Another suggestion to investigate if it would make sense to make a separate |
I have no strong feelings against using the |
ced2e74
to
d5756cd
Compare
For the record, the current version works fine in a mixed-version environment, in old coordinator—new worker, new coordinator—old worker configurations. This was tested with a database of 100K documents, 3 nodes, with 1 coordinator and 2 workers of the respective versions, for 7 minutes on 2 threads. I used |
82cbd9e
to
6d33573
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very nice! Lots of tests, and the row wrapper module turned out decently, I thought.
I made few notes inline, few style nits and a few notes about the merge function.
Since we're eventually trying to report the work done by a worker up until the row it emitted, would it make sense to have the completed
messages return the extra metadata object/stats object as well, if the view_row_map
if flag is set? Then we'd have both a {view_row, #{}}
and {completed, #{}}
and both can report all the work or stats done on that worker node in one response.
src/fabric/src/fabric_view.erl
Outdated
detach_partition(Row) -> | ||
case fabric_view_row:get_key(Row) of | ||
{p, _Partition, Key} -> fabric_view_row:set_key(Row, Key); | ||
_PrimitiveKey -> Row |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Primitive is a bit of an odd name, to avoid introducing new terminology maybe just use _
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Or we could just use _Key
.
src/fabric/src/fabric_view_row.erl
Outdated
merge(Dir, Row, Rows) -> | ||
lists:merge( | ||
fun(RowA, RowB) -> | ||
IdA = get_id(RowA), | ||
IdB = get_id(RowB), | ||
case Dir of | ||
fwd -> IdA =< IdB; | ||
rev -> IdA > IdB | ||
end | ||
end, | ||
[Row], | ||
Rows | ||
). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is specifically merging rows by ID. We also merge view rows by key and ID, with and without a keydict and with or without a unicode collator. I wonder if it would make sense to move all merging to the fabric_view_row then, or move this back to all_docs handler. Mainly so it looks consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is a good idea. In the light of this, I think it makes sense to move this version of merge rows over to fabric_view_row
as an alternative to merge
, as merge/5
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aw, scratch that. Although couch_replicator_fabric
and fabric_view
uses exactly the same merge function, it looks odd to include this in fabric_view_row
as it does not anything special related to view rows. The merge functions already use fabric_view_row
to access the required properties so they could be considered abstract.
src/fabric/src/fabric_view_row.erl
Outdated
fwd -> IdA =< IdB; | ||
rev -> IdA > IdB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I wonder if it should be A<B
and A>B
?
Erlang lists module expect the comparison operator to basically be =<
or >=
. However our btrees use less
(strictly <
). I don't know if that's an issue here since we don't expect two equal document IDs to come through once we start streaming for all_docs case. But it might be good to stick to <
and >
just to emphasize that our sorting function on the btrees is a strictly less than
function and we want to make sure we use the same sorting function in the workers as we do in the coordinator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no either deeper understanding of this part or strong preferences, so I am fine with the suggestion.
6d33573
to
6b97660
Compare
I am not sure about that. Originally, the problem what this PR wants to solve is about the case when streaming of view rows is "abruptly" terminated in response to the On the other hand, the change focuses on making the structure of the view rows more flexible, and it does not know anything about how this is utilized. It is not necessarily the statistics that could be sent along the streamed rows but anything that the view callback may consider meaningful. If statistics is sent along with the standard view row data, I think there is no need to return a closing message with metadata. Statistical information could be sent as either deltas or absolute values, this is subject to the implementation of the given view. They are both sufficient to keep track of statistics about the view rows. I have tested this in the Mango rolling execution statistics PR (one of the future consumers of this change) and it worked without the need for extending the That said, I do not see any explicit need for this. But it could be added later on, if any related use case surfaces. |
The Now we emit a separate execution stats message and a callback message couchdb/src/mango/src/mango_cursor_view.erl Lines 419 to 422 in 5c6aeab
complete message also carry the extra metadata. That way we wouldn't have to send separate execution_stats message at all, complete and view rows extra field would have a place to send back stats as well.
However, updating all the message handlers which couchdb/src/fabric/src/fabric_view_changes.erl Lines 288 to 291 in 5c6aeab
|
I would not call those cases "abruptly terminated" because there is a chance to gracefully send out all the required messages. The easy extension of view rows is a future-proof alternative to creating a another Yes, this leaves us with separate
That is one of reasons why I am rather hesitant to take on this advice.
I would say this is out of scope of this PR. I assume this part of the code does not have any consistency or performance issues, introducing extended |
Yeah definitely. In fact, I am not sure if "abruptly terminated" exactly applies here in general. That part works like that by design. In the majority of cases, the
Exactly. That would be more of a nice-to-have to make things more consistent. And since when we do that refactor later we'd need to introduce another Also in theory I think it could affect correctness. We are handling and processing execution stats messages out of band without going through the merge row semantics. Notice, we call the
couchdb/src/fabric/src/fabric_view_all_docs.erl Lines 263 to 266 in 5c6aeab
Yup, agree, let's not worry about it in this PR. [1] This is according to our new semantics as discussed in the previous PR, that we want to account exactly for work which happened on each worker, prior to its emitted row if that row was included as part of the response, as opposed to having a probabilistic kind of a thing where we account for the approximate work as I suggested initially. |
Probably "abrupt" is a strong word here. That is just my false expectation on that
Yeah, I have noticed the piggy on the back. I am a fan of consistent code changes, but this is slightly more than that me. As I mentioned, I consider this "view row map" operation mode oblivious to sending any metadata — we could even say that this is only a coincidence that the next field we may want to add is about that. It can be combined with "extensible complete" of which may have its own scope of application. I see these as two orthogonal building blocks for hosting other features. (Compositionality.) Tackling this problem in a separate PR can help with limiting the scope of this one, mitigating the associated (deployment) risks, and shipping the dependent feature quicker.
Mind we are switching to the discussion of the Mango rolling execution statistics here, which is only a possible application. Note that the In the Mango execution statistics, currently we have "keys examined", "docs examined", and "results returned". Views are created from key ranges of (the selected) indexes. Walking the whole view, independently of whether the corresponding row matches the selector (filter) should always implicitly bump the count of keys examined. (This is the broadest.) On the contrary, docs examined are counted only if a document was accessed. (This might be narrower.) And then if the document matches the selector, it is counted as a result returned. (The narrowest set.) The associated calculations are always performed in response to the view rows. The |
It is related to correctness that's why I mentioned the other pr. I was just pointing out that based on the discussion in the previous pr, and new semantics of how on the coordinator we want to account only for rows which participate in emitting the response to the user. With that view, having separate messages vs one message can lead to unexpected behavior. Because it may be possible for another worker to insert their message in between the two messages when streaming is about to conclude. In general, there could be a good amount of difference between the work done prior to emitting each row sent to the client, and all the work done by each worker.
This is a side-effect of mango using fabric all_docs and view query as a layer. That extra handling of execution stats in all_docs and view query was another impetus for having a generic completion metadata. Though that would also require a bit more work to make sure it passes through the merge sort mechanism. |
I am sure you are right about that. But I think we should carry on with the related discussion in #4958 then. Regarding this PR, could you please make some other case that is not the Mango rolling execution statistics? The "view row map capability" only makes it possible to get rid of the rigidity of the record-based communication. We could use it to send non-metadata or anything else, I just do not have any other case on my mind. But in theory, that is entirely possible!
Right. But is this a real problem here? The folding of Mango execution statistics hinges upon addition, which is a commutative and non-associative operation, which even lends itself to parallelization. The collection of the pieces happen locally without any interaction between the workers. Probably I am missing something here. |
It is if we want make any correctness guarantees. It doesn't have as much to do with commutativity and non-associativity of operations but which statistics should contribute to the result. We agreed to two things as far I recall:
And pointing out that having separate interleaved stats messages based on random arrival order from worker seems to conflict with the correctness goal. Towards the conclusion of request we could end up accounting for stats messages out-of-band so to speak, workers which did work that didn't end up contributing to the final response. However, as we discussed, the chance of this is probably low and maybe not worth worrying about. I am merely clarifying that it does impact correctness. |
Ah, now I got it! Thanks for your patience in explaining this to me. I was sure that I am missing something and I was confused that the comments about #4958 got included in this discussion. Please let me think about those in the scope of #4958 and leave them out from this PR. |
6b97660
to
1046c4b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, nicely done @pgj!
I like all the tests and I think the accessor module turned out pretty neat.
1046c4b
to
eae19bd
Compare
The `#view_row{}` record that is used for capturing messages with row data is not flexible enough to have it extended easily. If one wanted to introduce a fresh field, the change would have to be propagated through many functions and modules. Especially, if support for mixed-version clusters is a concern, this would come with some degree of duplication. Leverage Erlang/OTP's built-in maps for mitigating this issue and offer the view callbacks the `view_row_map` Boolean key in `#mrargs.extra` to request this communication format. This way the old record-based format would be still in use unless requested otherwise. This facilitates the smooth interoperability of old coordinators and new workers. In parallel to that, the new coordinator could still receive view rows from old workers.
eae19bd
to
0fd8c1b
Compare
I noticed this CI failure in a full CI run. It seems to be in the test code in the replicator:
|
?debugVal(Error, 100), | ||
?debugVal(binary_to_list(Reason), 100); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you intend to leave this debugging here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Sorry, I did not put a comment there about it, but I mentioned it to @nickva so he was aware of the intention. The reason is that basically I had trouble with finding out the reason for failure. Initially I thought the timeout should be increased, but, with the help of these extension, it got clear that something else needs to be done. So I left the debugging calls there to facilitate similar investigations in the future. Note that this is only visible in case of failures.
Oh, the good old CentOS 8. I will check on it later. |
It might be just exposing a flaky test. I'll re-run the CI and we'll see if we get the same error on the same os/arch combination. The difference from the previous case was that we used to wait longer if we didn't get the expected docs so it could get the error then take longer to possibly fix itself during the 10-15 second wait. Currently however, we exist on first error, but then the rest of the code expects a docs to be there. |
I was trying to reproduce this locally (with the CouchDB CI CentOS 8 container) but I could not. Hence it seems to be a flaky case. But I think there is an error in the code that tries to provide debug information: it is not prepared for one (or more) of possible outcomes and it breaks when it happens. |
Indeed it seems to be a flaky test. After re-running the CI I got the same failure on ubuntu 22.04 https://ci-couchdb.apache.org/blue/organizations/jenkins/jenkins-cm1%2FFullPlatformMatrix/detail/main/1042/pipeline/233 I wonder if it's worth reverting the behavior to let it keep retrying until it's getting an actual doc, or did it always timeout and and crash when testing the PR even with the current time. |
I am now trying to add more debugging around this |
Based on its shape there is a chance it's coming from a remote worker node, so it's an error that serialized and sent back as a string. |
Opened #5006 for investigation. |
The
#view_row{}
record that is used for capturing messages with row data is not flexible enough to have it extended easily. If one wanted to introduce a fresh field, the change would have to be propagated through many functions and modules. Especially, if support for mixed-version clusters is a concern, this would come with some degree of duplication.Leverage Erlang/OTP's built-in maps for mitigating this issue and offer the view callbacks the
view_row_map
Boolean key in#mrargs.extra
to request this communication format. This way the old record-based format would be still in use unless requested otherwise. This facilitates the smooth interoperability of old coordinators and new workers. In parallel to that, the new coordinator could still receive view rows from old workers.Testing recommendations
The unit and integration tests should cover all the changes. There are no user-facing changes, it is internal to the communication of the coordinator and the workers over fabric. Backwards compatibility is maintained.
Unit test coverage improvements:
couch_replicator
: 85% ⟶ 86%fabric
: 72% ⟶ 79%Related Issues or Pull Requests
This is to set the stage for #4958.
Checklist