-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dictionaries #125
Comments
That's a good question and indeed this is one of those types on the edge. Some reasons why it seems like we should hold off for now:
Thus, as with #56, while I can see the use case, I'd like to see if we can get by without these in an MVP release. |
I would of course tend towards JS semantics here, so records having unique strings and ordering supported (ie exactly a I can certainly get by with a |
I suppose records & tuples alignment in JS would imply unordered record objects these days, so perhaps sorting might be preferable, it's hard to know. Either way, agreed as with #56 that this is nice-to-have but not necessarily MVP. |
Agreed. And just explaining a bit more about the design challenges: It makes sense that when passing in a host value, the host can already have guarantees of uniqueness; the complexity comes when the Canonical ABI lifts a dictionary from linear memory since the host can't trust anything and thus must verify things dynamically every time. For ordering: requiring ordered keys (in the Lastly, we could have I'm not sure what the right answer is here, just that it'd be nice to punt on this in the MVP. |
Could we take inspiration from |
I did not made this text pretty with ai. Main idea is There are https://en.wikipedia.org/wiki/Refinement_type and https://en.wikipedia.org/wiki/Dependent_type which ultimately allow to encode something like JSON Shema or XSD and prove things about composability. What types/properties for these most modern languages can do up to some degree?
Some people tend to reject idea of ordered to be any relevance to unique so https://www.ietf.org/archive/id/draft-devault-bare-11.html#section-2.2 , does not feels good. Specifically he asks to raise error if non unique, but that is dos and cannot be part of general standard. So from this, may be need to think about We can have So what to do with So having So what can be done? /// ordered
/// unique
type ordered_set<T> = list<T> or I can make docs structured: /// refinements:
/// - ordered
/// - unique
type ordered_set<T> = list<T> or I can think of #[refinements(ordered, unique)]
type ordered_set<T> = list<T> Is safe type subset possible? How to handle not fixed elements like strings? So may having structured tags or custom attributes is way to have dictionary? |
I agree with the solution of using refinement type to enhanc The guest language can choose the mapping based on its support for refined dict. type dict<K, V>; // HashMap<K, V>
@predicate(key_sorted)
type sorted_string_dict = dict<string, string> // BTreeMap<String, String> |
This is the wrong way to think about dictionary types. If the key space is finite, records are always preferred. Dictionaries should support the separate use case where the key space is "infinite", e.g. "a phone book", HTTP headers, JSON blobs, etc. Using the latter to emulate the former always turns into tech debt (I know JS does exactly that, but that's exactly why TS and JSON schemas are so popular), so we should think of them as disjoint use cases.
This assumes B-tree or similar implementation. Hash tables are amortized Θ(n) to lift and lower and can enforce uniqueness, and they're more broadly support in standard libraries. The canonical ABI for fn lift(entries: Vec<(K, V)>) -> HashMap<K, V> {
let dictionary = HashMap::with_capacity(entries.len());
for (key, value) in entries.iter() {
dictionary.insert(key, value); // Overwrite on key collision. Uniqueness enforced. 👍
}
return dictionary;
} I suppose the alternative would be to trap on key collisions, but I don't see why that's necessary. Guest languages could lift them into B-trees but that would involve sorting. Sorted maps are not as standardly-supported as general associative maps and would be better represented as lists anyway, so I think the basic Having the capacity known ahead of time (as per canonical list ABI) means no resizing during lifting, and hashing tends to be very performant (especially for strings). If the hash function is silly, a malicious component could engineer hash collisions for unequal keys to induce O(n^2), but even MD5 would be difficult to exploit in that way at scale. Happy to help push this boulder up the hill any way that I can. |
I've also been thinking that One thought in the meantime is that, Another thought is that the lazy lowering ABI may end up having a significant impact on the implementation, so to avoid churn, we might want to sequence this feature after that? Although if anyone wanted to do any prototyping in wit-bindgen in the short term, that'd be great too. Having heard a number of use cases and interest in this since the issue was opened, I do think this feature makes sense to include in 1.0 if folks are willing to implement it. |
Are dictionaries on the roadmap for the component model? At the moment it seems this requires representation via
list<tuple<string, ...>>
, where a dictionary could be a relatively straightforward host-level representation over such a datatype.The text was updated successfully, but these errors were encountered: