You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a remaining question following the fix (#17570) to the reported ORC decoding bug (#17155).
So far it has been known that:
The bug can occur to the ORC timestamp data v0.12 (calling RLEv2 decode function), consisting of two streams: DATA (encoding the "second" component) and SECONDARY (encoding the "nanosecond" component).
The bug is not expected to occur to timestamp data v0.11 (calling RLEv1 decode function). This is because in v0.11 the run length is represented by 7 bits for both the "runs" and the "literals", which is 127 instead of 512 as in v0.12. With 1024 being the same initial limit of "max data to be consumed", the decoder can then always consume enough runs from the SECONDARY stream such that its progress is always ahead of the DATA stream's, not the opposite.
Our current question is:
Are other data types composed of more than one streams, such as string, char, varchar, binary, decimal, subject to the same "desync" bug as timestamp v0.12 did?
PS: Relevant sketch demonstrating the difference between v0.12 (marked as V2 on the diagram) and v0.11 (marked as V1): https://sketchtoy.com/71353765
The text was updated successfully, but these errors were encountered:
This is a remaining question following the fix (#17570) to the reported ORC decoding bug (#17155).
So far it has been known that:
Our current question is:
string
,char
,varchar
,binary
,decimal
, subject to the same "desync" bug as timestamp v0.12 did?PS: Relevant sketch demonstrating the difference between v0.12 (marked as V2 on the diagram) and v0.11 (marked as V1): https://sketchtoy.com/71353765
The text was updated successfully, but these errors were encountered: