From 824244300580809e18e7b769b81cc38f6aa41d77 Mon Sep 17 00:00:00 2001 From: Doug Davis Date: Tue, 16 Aug 2022 10:19:20 -0500 Subject: [PATCH 1/4] skeleton --- _posts/2022-08-xx-histogramming-with-dask.md | 11 +++++++++++ 1 file changed, 11 insertions(+) create mode 100644 _posts/2022-08-xx-histogramming-with-dask.md diff --git a/_posts/2022-08-xx-histogramming-with-dask.md b/_posts/2022-08-xx-histogramming-with-dask.md new file mode 100644 index 00000000..03198c9f --- /dev/null +++ b/_posts/2022-08-xx-histogramming-with-dask.md @@ -0,0 +1,11 @@ +--- +layout: post +title: Histogramming with Dask +author: Doug Davis +tags: [histogram, array, dask] +theme: twitter +--- + +{% include JB/setup %} + +Histograms are ... From 9beda75b2f2f85b6aac10b097792ce71d0e96fd3 Mon Sep 17 00:00:00 2001 From: Doug Davis Date: Wed, 24 Aug 2022 10:00:18 -0500 Subject: [PATCH 2/4] working --- _posts/2022-08-xx-histogramming-with-dask.md | 36 +++++++++++++++++++- 1 file changed, 35 insertions(+), 1 deletion(-) diff --git a/_posts/2022-08-xx-histogramming-with-dask.md b/_posts/2022-08-xx-histogramming-with-dask.md index 03198c9f..632e7e45 100644 --- a/_posts/2022-08-xx-histogramming-with-dask.md +++ b/_posts/2022-08-xx-histogramming-with-dask.md @@ -8,4 +8,38 @@ theme: twitter {% include JB/setup %} -Histograms are ... +## Primer on Histogramming in Python + +[Histograms][wiki] are a core data type for many uses of statistics. +Countless scientific computing libraries provide an interface for +binning data ("histogramming") in any number of dimensions. Python has +an embarrassment of riches with regards to histogramming, and we won't +go into all existing libraries. In the author's opinion, the best +avenues for histogramming in Python (as of August 2022) are NumPy and +[boost-histogram][bh]. NumPy provides a number of functions associated +with binning data; the main three are [`histogram`][nphist], +[`histogram2d`][nphist2d], and [`histogramdd`][nphistdd]. +Boost-histogram provides an object oriented API for binning data, with +multiple types of [axes][bhaxes], [storages][bhstorage], and +[accumulators][bhaccum]. For simple cases NumPy is great and gets the +job done, but boost-histogram provides a much more feature-rich +library and it is also [more performant][perf]. + +## Enter Dask + +Since `dask.array` implements a subset of the NumPy interface, it's +natural for `dask.array` to implement the functions that exist in +`NumPy`. That's exactly what we do! + +The [dask-histogram][dh] project... + +[bh]: https://boost-histogram.readthedocs.io/en/latest/ +[wiki]: https://en.wikipedia.org/wiki/Histogram +[nphist]: https://numpy.org/doc/stable/reference/generated/numpy.histogram.html +[nphistdd]: https://numpy.org/doc/stable/reference/generated/numpy.histogramdd.html +[nphist2d]: https://numpy.org/doc/stable/reference/generated/numpy.histogram2d.html +[bhaxes]: https://boost-histogram.readthedocs.io/en/latest/user-guide/axes.html +[bhaccum]: https://boost-histogram.readthedocs.io/en/latest/user-guide/accumulators.html +[bhstorage]: https://boost-histogram.readthedocs.io/en/latest/user-guide/storage.html +[perf]: https://boost-histogram.readthedocs.io/en/latest/notebooks/PerformanceComparison.html +[dh]: https://dask-histogram.readthedocs.io/en/stable/ From 7ad439f54004995b87a49dc0e58fa893a10a31e0 Mon Sep 17 00:00:00 2001 From: Doug Davis Date: Wed, 24 Aug 2022 10:17:23 -0500 Subject: [PATCH 3/4] rename --- ...ramming-with-dask.md => 2022-08-24-histogramming-with-dask.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename _posts/{2022-08-xx-histogramming-with-dask.md => 2022-08-24-histogramming-with-dask.md} (100%) diff --git a/_posts/2022-08-xx-histogramming-with-dask.md b/_posts/2022-08-24-histogramming-with-dask.md similarity index 100% rename from _posts/2022-08-xx-histogramming-with-dask.md rename to _posts/2022-08-24-histogramming-with-dask.md From 86ed22cbaf3d3914f08fcb9d42e27a422260602e Mon Sep 17 00:00:00 2001 From: Doug Davis Date: Wed, 24 Aug 2022 10:24:05 -0500 Subject: [PATCH 4/4] link --- _posts/2022-08-24-histogramming-with-dask.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2022-08-24-histogramming-with-dask.md b/_posts/2022-08-24-histogramming-with-dask.md index 632e7e45..4d194d79 100644 --- a/_posts/2022-08-24-histogramming-with-dask.md +++ b/_posts/2022-08-24-histogramming-with-dask.md @@ -1,7 +1,7 @@ --- layout: post title: Histogramming with Dask -author: Doug Davis +author: Doug Davis (Anaconda) tags: [histogram, array, dask] theme: twitter ---