Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ByteArray encoding type 33 #4

Open
BradyAJohnston opened this issue Aug 6, 2023 · 8 comments
Open

ByteArray encoding type 33 #4

BradyAJohnston opened this issue Aug 6, 2023 · 8 comments
Labels
question Further information is requested

Comments

@BradyAJohnston
Copy link

BradyAJohnston commented Aug 6, 2023

I'm working on my own parser, and I have it successfully working with importing example data .bcif from the py-mmcif as well as CellPack .bcif files from molstar.org/dev/me. It seems to be working well on parsing everything for the structures, but when extracting the symmetry operations from the CellPack files, I am coming across a ByteArray type that doesn't make sense.

[[{'kind': 'Delta', 'origin': 1, 'srcType': 3},
  {'kind': 'RunLength', 'srcType': 3, 'srcSize': 767},
  {'kind': 'IntegerPacking', 'byteCount': 1, 'isUnsigned': True, 'srcSize': 4},
  {'kind': 'ByteArray', 'type': 4}],
 [{'kind': 'ByteArray', 'type': 33}],
 [{'kind': 'ByteArray', 'type': 33}],
 [{'kind': 'ByteArray', 'type': 33}],
 [{'kind': 'ByteArray', 'type': 33}],
 [{'kind': 'ByteArray', 'type': 33}],
 [{'kind': 'ByteArray', 'type': 33}],
 [{'kind': 'ByteArray', 'type': 33}],
 [{'kind': 'ByteArray', 'type': 33}],
 [{'kind': 'ByteArray', 'type': 33}],
 [{'kind': 'ByteArray', 'type': 33}],
 [{'kind': 'ByteArray', 'type': 33}],
 [{'kind': 'ByteArray', 'type': 33}]]

Is 33 something special that isn't explicitly mentioned in the spec, or have I gotten something wrong earlier in my pipeline?

@arose
Copy link
Member

arose commented Aug 6, 2023

33 is for Float64 arrays

@arose arose added the question Further information is requested label Aug 6, 2023
@BradyAJohnston
Copy link
Author

This is my first time doing this kind of raw byte decoding, so I might be missing something here that is obvious, but why is this the case? Does 33 mean not 33? Are the data types specified as below?

ByteArray {
    kind = "ByteArray"
    type: Int8 | Int16 | Int32 | Uint8 | Uint16 | Uint32 | Float32 | Float64
 #  type:   1  |   2   |   3   |   4   |    5   |   6    |    7    |   33
}

@arose
Copy link
Member

arose commented Aug 7, 2023

Not sure it is in the spec. Here is the normative implementation... https://github.com/molstar/molstar/blob/master/src/mol-io/common/binary-cif/encoding.ts#L60-L72

@BradyAJohnston
Copy link
Author

Okay thanks for the additional clarification. Should this be something that is specified in the spec, if it's the official implementation?

@arose
Copy link
Member

arose commented Aug 7, 2023

Yeah it should. the spec could definitely use an overhaul.

@BradyAJohnston
Copy link
Author

Are there any other little 'gotchas' you can think of while I'm tackling this?

@arose
Copy link
Member

arose commented Aug 7, 2023

I'd look at the molstar implementation or this minimal python implementation https://gist.github.com/dsehnal/b06f5555fa9145da69fe69abfeab6eaf

@BradyAJohnston
Copy link
Author

BradyAJohnston commented Aug 7, 2023

Ah many thanks. I was also doing a minimal numpy implementation, and the example you linked does exactly what I am after but they've done it much more cleanly than I had come up with. Wish I had googled a bit harder and I would have saved myself a weekend of tinkering.

Would be useful to have that minimal implementation linked in the README also.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants