Skip to content

BUG: core: fix ilp64 blas dot/vdot/... for strides > int32 max#17121

Merged
mattip merged 2 commits into
numpy:masterfrom
pv:fix-ilp64-stride
Aug 22, 2020
Merged

BUG: core: fix ilp64 blas dot/vdot/... for strides > int32 max#17121
mattip merged 2 commits into
numpy:masterfrom
pv:fix-ilp64-stride

Conversation

@pv

@pv pv commented Aug 20, 2020

Copy link
Copy Markdown
Member

Fix overlooked int cast when HAVE_BLAS_ILP64 is defined.
It was supposed to cast to CBLAS_INT, not int.
Also add a regression test.

Fixes #17111

Comment thread numpy/core/tests/test_regression.py Outdated

@charris charris Aug 20, 2020

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My machine requires more than this for the allocation.

EDIT: Needs 32 GiB

Comment thread numpy/core/tests/test_regression.py Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should leave out the multiplication by itemsize here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use int32_max + 2 instead,

@pv pv Aug 21, 2020

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think no: the BLAS incx = stride//itemsize, and that needs to overflow to trigger this bug.
I can use itemsize*(int32_max + 2) but the test does fail also with the +1.

Ah crap, obviously you're right. No idea what I was thinking.

Comment thread numpy/core/src/multiarray/common.h Outdated
@pv pv force-pushed the fix-ilp64-stride branch from 510c4fc to f241d5f Compare August 21, 2020 19:48
Fix overlooked int cast when HAVE_BLAS_ILP64 is defined.
It was supposed to cast to CBLAS_INT, not int.
Also add a regression test.

Move blas_stride() to npy_cblas.h
Replace npy_is_aligned by modulo; we're going to call BLAS so no need to
micro-optimize integer division here.
@pv pv force-pushed the fix-ilp64-stride branch from f241d5f to 9cebb29 Compare August 21, 2020 19:49
@pv

pv commented Aug 21, 2020

Copy link
Copy Markdown
Member Author

Updated. Obvious mistakes fixed, and moved blas_strides to npy_cblas.h.

Comment thread numpy/core/src/multiarray/common.h Outdated

/*
* Define a chunksize for CBLAS. CBLAS counts in integers.
*/

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this definition move file too, since it's also blas-related?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like its used in exactly one file, and that file uses CBLAS_INT to store the result.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. I thought the chunking was mostly related to our *_dot loops, so I left it here.

@eric-wieser eric-wieser Aug 21, 2020

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should move it, if you're happy to make another update.

Uses: https://github.com/numpy/numpy/search?q=NPY_CBLAS_CHUNK&unscoped_q=NPY_CBLAS_CHUNK

Note CBLAS_INT on the same line in all cases - so the callers are already including both headers (and the headers are internal not public)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved + fixed the definition. No need to have it differ from intp max except when that's larger than CBLAS_INT max

Comment thread numpy/core/src/common/npy_cblas.h Outdated
@pv pv force-pushed the fix-ilp64-stride branch from b895334 to 9ecf482 Compare August 21, 2020 20:24
Comment thread numpy/core/src/common/npy_cblas.h Outdated
Comment on lines 85 to 94

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just to be clear - this is a change for when HAVE_BLAS_ILP64 is defined and NPY_MAX_INTP > INT_MAX.

  • Previously: NPY_CBLAS_CHUNK = (NPY_MAX_INT64 / 2 + 1)
  • Now: NPY_CBLAS_CHUNK = NPY_MAX_INTP

I don't have any understanding of the purpose of the /2 + 1, which makes me worry about removing it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it changes it to depend only on the CBLAS_INT maximum value. Switching to the smaller threshold for INTP>INT with ilp64 I think is a bug (but with no consequences, nobody has arrays of 2^62 elements). With 32-bit BLAS, there is no change here.

I suspect the INT_MAX/2 + 1 is chosen instead of INT_MAX because some 32-bit BLAS implementations have bugs that makes them fail if the number of elements gets too close to the maximum integer. However, for 64-bit blas, this is a theoretical concern as nobody can run BLAS with that large n.

@pv pv Aug 21, 2020

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, 46dd681 seems to clarify it's because that makes the chunksize a power of two.
So I think this is OK. (Restored the comment saying so back.)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think is a bug (but with no consequences, nobody has arrays of 2^62 elements)

However, for 64-bit blas, this is a theoretical concern as nobody can run BLAS with that large n.

I think this was what I was missing. I'd add that the compiler can optimize out the comparison if we use NPY_MAX_INTP, which is an argument for leaving the bug in.

If you could pull in the power of two comment from that commit, that would be great. Either way, I'm no longer worried by this.

@pv pv force-pushed the fix-ilp64-stride branch from 9ecf482 to 51bb217 Compare August 21, 2020 20:56
@charris

charris commented Aug 21, 2020

Copy link
Copy Markdown
Member

The error is unrelated. @mattip We have been getting web errors trying to fetch pypy, could you take a look?

@mattip

mattip commented Aug 22, 2020

Copy link
Copy Markdown
Member

@charris: seems to have been a temporary outage.

@mattip mattip merged commit 97f9fcb into numpy:master Aug 22, 2020
@mattip

mattip commented Aug 22, 2020

Copy link
Copy Markdown
Member

Thanks @pv

@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Sep 3, 2020
@charris charris removed this from the 1.19.2 release milestone Sep 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

blas_stride has the wrong return type with HAVE_BLAS_ILP64

4 participants