anatomy of an array: multidimensional array start and stop from view #108

empeirikos · 2023-01-12T20:59:02Z

At the end of the section "Anatomy of an array" you give an example of how given a view of a multidimensional array we can infer the start:stop:step structure of each index. I have compiled the code and it works fine. But there is a line of code that I have trouble following:
offset_stop = (np.byte_bounds(view)[-1] - np.byte_bounds(base)[-1]-1)//itemsize
Following the one-dimensional array example we are trying to find the offset at the end of the two arrays ; for this reason we use the address of the last byte in the view array minus the address of the last byte in the base array. This difference is converted into a difference in terms of memory blocks by dividing with itemsize.
However there is a '-1' in the expression; initially I thought it was a mistake but if I remove it the code does not work. Still I cannot make sense of it since it does not correspond to an offset in memory blocks. I am sure I am missing something trivial that would be immediately recognized by someone with more experience. Would you be able to help?

The text was updated successfully, but these errors were encountered:

rougier · 2023-02-02T12:05:02Z

Good question and I don't remember exactly :) I think the last -1 is to have the offset stop just at the last start of last byte and not the byte after. This would need to be tested with a very simple array and check for the value of the two bounds.

empeirikos · 2023-02-03T11:43:51Z

Many thanks for your reply. Your online book has been a great help; I realise that I don't even know 10% of numpy! I am trying to demonstrate that if we reshape an array it is faster to go through its elements by preserving index precedence. Unfortunately I am not having much luck and performance is pretty much equal (I have copied the code below). Is there some other way to show that index precedence matters for performance? ``` import numpy as np import timeit a = np.arange(1,37).astype(np.int16).reshape((3,2,2,3)) def without_precedence(a): sum=0 n=0 for l in range(a.shape[3]): for k in range(a.shape[2]): for j in range(a.shape[1]): for i in range(a.shape[0]): sum+=a[i,j,k,l] n=n+1 return sum, n def with_precedence(a): sum=0 n=0 for i in range(a.shape[0]): for j in range(a.shape[1]): for k in range(a.shape[2]): for l in range(a.shape[3]): sum+=a[i,j,k,l] n=n+1 return sum, n if __name__ == "__main__": print(timeit.timeit("with_precedence(a)", "from __main__ import with_precedence, a",number=100000)) print(timeit.timeit("without_precedence(a)","from __main__ import without_precedence, a",number=100000)) ```

…

On Thu, 2 Feb 2023 at 12:05, Nicolas P. Rougier ***@***.***> wrote: Good question and I don't remember exactly :) I think the last -1 is to have the offset stop just at the last start of last byte and not the byte after. This would need to be tested with a very simple array and check for the value of the two bounds. — Reply to this email directly, view it on GitHub <#108 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ATE2KDUH5SLAMHGMXRGFJQLWVOPHTANCNFSM6AAAAAATZV5JWA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

rougier · 2023-02-03T21:04:00Z

Why do you think it would make a difference?

empeirikos · 2023-02-03T21:44:43Z

I am thinking in terms of latency and the way cache memory operates in the CPU. Since the array is stored as a one-dimensional block of memory, if the program has to use a stride of multiple bytes to move along the axis=0 direction, it will have to restart the read operation which typically requires more clock cycles. The nice thing about the axis=1 direction is that as soon as the program starts reading values every next memory block is consecutive so there is no latency. Then there is the cache size (this effect is more debatable); for a level2 cache and a large array above its size, the system stores successive blocks of the array and again if I use a stride of a few thousand bytes to move along axis=0 I will not be able to utilize the values stored in the more efficient level2 cache. This will generate a cache miss and the system will have to retrieve values from less efficient memory. By the way, I realise that my theory might be missing additional enhancements from SIMD that summing along axis=0 generates (I have no idea if numpy actually does this).

…

On Fri, 3 Feb 2023 at 21:04, Nicolas P. Rougier ***@***.***> wrote: Why do you think it would make a difference? — Reply to this email directly, view it on GitHub <#108 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ATE2KDVXZFY2W7AIIGXZXDTWVVXEXANCNFSM6AAAAAATZV5JWA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

rougier · 2023-02-14T10:00:07Z

Oh right, cache! I'm not sure the difference is measurable since python loops would eat most of the CPU time. You might have a better chance with numba that will reduce Python overcost. Or you can try to use nditer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

anatomy of an array: multidimensional array start and stop from view #108

anatomy of an array: multidimensional array start and stop from view #108

empeirikos commented Jan 12, 2023

rougier commented Feb 2, 2023

empeirikos commented Feb 3, 2023 via email •

edited by rougier

Loading

rougier commented Feb 3, 2023

empeirikos commented Feb 3, 2023 via email

rougier commented Feb 14, 2023

anatomy of an array: multidimensional array start and stop from view #108

anatomy of an array: multidimensional array start and stop from view #108

Comments

empeirikos commented Jan 12, 2023

rougier commented Feb 2, 2023

empeirikos commented Feb 3, 2023 via email • edited by rougier Loading

rougier commented Feb 3, 2023

empeirikos commented Feb 3, 2023 via email

rougier commented Feb 14, 2023

empeirikos commented Feb 3, 2023 via email •

edited by rougier

Loading