Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about hash of BucketInteger #12015

Open
xicm opened this issue Jan 21, 2025 · 0 comments
Open

Question about hash of BucketInteger #12015

xicm opened this issue Jan 21, 2025 · 0 comments
Labels
question Further information is requested

Comments

@xicm
Copy link

xicm commented Jan 21, 2025

Query engine

No response

Question

  private static class BucketInteger extends Bucket<Integer>
      implements SerializableFunction<Integer, Integer> {

    private BucketInteger(int numBuckets) {
      super(numBuckets);
    }

    @Override
    protected int hash(Integer value) {
      return BucketUtil.hash(value);
    }
  }

Why do we use Murmur3 instead of the int value itself?

test code

Bucket bucket = Bucket.get(Types.IntegerType.get(), 10);
        List<Integer> list = new ArrayList<>();
        List<Integer> randomList = new ArrayList<>();

        Random rand = new Random();

        for (int i = 0; i < 10000000; i++) {
            int num = rand.nextInt(10000);
            randomList.add(bucket.apply(num));
            list.add(bucket.apply(i));
        }

        System.out.println("natural  list");
        list.stream()
                .collect(Collectors.groupingBy(item -> item, Collectors.counting()))
                .forEach((key, value) -> System.out.println(key + ": " + value));

        System.out.println("random list");
        randomList.stream()
                .collect(Collectors.groupingBy(item -> item, Collectors.counting()))
                .forEach((key, value) -> System.out.println(key + ": " + value));

result with Murmur3

natural  list
0: 1000996
1: 999922
2: 1000005
3: 1000139
4: 1001529
5: 998785
6: 999291
7: 999777
8: 999204
9: 1000352
random list
0: 998404
1: 1005602
2: 1021090
3: 1021668
4: 996615
5: 1046078
6: 1023192
7: 997321
8: 923906
9: 966124

result of the origin value

natural  list
0: 1000000
1: 1000000
2: 1000000
3: 1000000
4: 1000000
5: 1000000
6: 1000000
7: 1000000
8: 1000000
9: 1000000
random list
0: 1001911
1: 997982
2: 999024
3: 1000518
4: 999166
5: 999954
6: 1001863
7: 1000033
8: 997981
9: 1001568
@xicm xicm added the question Further information is requested label Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant