Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Compatibility with boto 1.36 #45305

Open
h-vetinari opened this issue Jan 20, 2025 · 16 comments
Open

[Python] Compatibility with boto 1.36 #45305

h-vetinari opened this issue Jan 20, 2025 · 16 comments

Comments

@h-vetinari
Copy link
Contributor

h-vetinari commented Jan 20, 2025

The failures described below where traced down to minio/minio#20845, which was worked around in #45311.

This issue is now about removing

# Not a direct dependency of s3fs, but needed for our s3fs fixture
# (temporary upper bound because of GH-45305)
boto3<1.36

once a compatible minio release has happened .

Previously:

Describe the enhancement requested

Related to but independent from #45304, the following errors appear in the python test suite with boto3 1.36.1 (still with aws-sdk 1.1.458):

=========================== short test summary info ============================
FAILED pyarrow/tests/test_fs.py::test_get_file_info_with_selector[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))] - OSError: [Errno 5] An error occurred (MissingContentMD5) when calling the DeleteObjects operation: Missing required header for this request: Content-Md5.
FAILED pyarrow/tests/test_fs.py::test_copy_file[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))] - OSError: [Errno 5] An error occurred (MissingContentMD5) when calling the DeleteObjects operation: Missing required header for this request: Content-Md5.
FAILED pyarrow/tests/test_fs.py::test_delete_file[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))] - OSError: [Errno 5] An error occurred (MissingContentMD5) when calling the DeleteObjects operation: Missing required header for this request: Content-Md5.
ERROR pyarrow/tests/parquet/test_dataset.py::test_read_s3fs - OSError: [Errno 5] An error occurred (MissingContentMD5) when calling the DeleteObjects operation: Missing required header for this request: Content-Md5.
ERROR pyarrow/tests/parquet/test_dataset.py::test_read_directory_s3fs - OSError: [Errno 5] An error occurred (MissingContentMD5) when calling the DeleteObjects operation: Missing required header for this request: Content-Md5.
ERROR pyarrow/tests/parquet/test_dataset.py::test_read_partitioned_directory_s3fs - OSError: [Errno 5] An error occurred (MissingContentMD5) when calling the DeleteObjects operation: Missing required header for this request: Content-Md5.
ERROR pyarrow/tests/parquet/test_dataset.py::test_write_to_dataset_with_partitions_s3fs - OSError: [Errno 5] An error occurred (MissingContentMD5) when calling the DeleteObjects operation: Missing required header for this request: Content-Md5.
ERROR pyarrow/tests/parquet/test_dataset.py::test_write_to_dataset_no_partitions_s3fs - OSError: [Errno 5] An error occurred (MissingContentMD5) when calling the DeleteObjects operation: Missing required header for this request: Content-Md5.
ERROR pyarrow/tests/parquet/test_metadata.py::test_write_metadata_fs_file_combinations - OSError: [Errno 5] An error occurred (MissingContentMD5) when calling the DeleteObjects operation: Missing required header for this request: Content-Md5.
ERROR pyarrow/tests/parquet/test_parquet_writer.py::test_parquet_writer_filesystem_s3fs - OSError: [Errno 5] An error occurred (MissingContentMD5) when calling the DeleteObjects operation: Missing required header for this request: Content-Md5.
ERROR pyarrow/tests/test_fs.py::test_get_file_info[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))] - OSError: [Errno 5] An error occurred (MissingContentMD5) when calling the DeleteObjects operation: Missing required header for this request: Content-Md5.
ERROR pyarrow/tests/test_fs.py::test_get_file_info_with_selector[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))] - OSError: [Errno 5] An error occurred (MissingContentMD5) when calling the DeleteObjects operation: Missing required header for this request: Content-Md5.
ERROR pyarrow/tests/test_fs.py::test_copy_file[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))] - OSError: [Errno 5] An error occurred (MissingContentMD5) when calling the DeleteObjects operation: Missing required header for this request: Content-Md5.
ERROR pyarrow/tests/test_fs.py::test_delete_file[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))] - OSError: [Errno 5] An error occurred (MissingContentMD5) when calling the DeleteObjects operation: Missing required header for this request: Content-Md5.
ERROR pyarrow/tests/test_fs.py::test_open_input_stream[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))-None-None-identity] - OSError: [Errno 5] An error occurred (MissingContentMD5) when calling the DeleteObjects operation: Missing required header for this request: Content-Md5.
ERROR pyarrow/tests/test_fs.py::test_open_input_stream[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))-None-64-identity] - OSError: [Errno 5] An error occurred (MissingContentMD5) when calling the DeleteObjects operation: Missing required header for this request: Content-Md5.
ERROR pyarrow/tests/test_fs.py::test_open_input_stream[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))-gzip-None-compress] - OSError: [Errno 5] An error occurred (MissingContentMD5) when calling the DeleteObjects operation: Missing required header for this request: Content-Md5.
ERROR pyarrow/tests/test_fs.py::test_open_input_stream[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))-gzip-256-compress] - OSError: [Errno 5] An error occurred (MissingContentMD5) when calling the DeleteObjects operation: Missing required header for this request: Content-Md5.
ERROR pyarrow/tests/test_fs.py::test_open_input_file[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))] - OSError: [Errno 5] An error occurred (MissingContentMD5) when calling the DeleteObjects operation: Missing required header for this request: Content-Md5.
ERROR pyarrow/tests/test_fs.py::test_open_output_stream[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))-None-None-identity] - OSError: [Errno 5] An error occurred (MissingContentMD5) when calling the DeleteObjects operation: Missing required header for this request: Content-Md5.
ERROR pyarrow/tests/test_fs.py::test_open_output_stream[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))-None-64-identity] - OSError: [Errno 5] An error occurred (MissingContentMD5) when calling the DeleteObjects operation: Missing required header for this request: Content-Md5.
ERROR pyarrow/tests/test_fs.py::test_open_output_stream[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))-gzip-None-decompress] - OSError: [Errno 5] An error occurred (MissingContentMD5) when calling the DeleteObjects operation: Missing required header for this request: Content-Md5.
ERROR pyarrow/tests/test_fs.py::test_open_output_stream[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))-gzip-256-decompress] - OSError: [Errno 5] An error occurred (MissingContentMD5) when calling the DeleteObjects operation: Missing required header for this request: Content-Md5.
ERROR pyarrow/tests/test_fs.py::test_open_append_stream[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))-None-None-identity-identity] - OSError: [Errno 5] An error occurred (MissingContentMD5) when calling the DeleteObjects operation: Missing required header for this request: Content-Md5.
ERROR pyarrow/tests/test_fs.py::test_open_append_stream[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))-None-64-identity-identity] - OSError: [Errno 5] An error occurred (MissingContentMD5) when calling the DeleteObjects operation: Missing required header for this request: Content-Md5.
ERROR pyarrow/tests/test_fs.py::test_open_append_stream[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))-gzip-None-compress-decompress] - OSError: [Errno 5] An error occurred (MissingContentMD5) when calling the DeleteObjects operation: Missing required header for this request: Content-Md5.
ERROR pyarrow/tests/test_fs.py::test_open_append_stream[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))-gzip-256-compress-decompress] - OSError: [Errno 5] An error occurred (MissingContentMD5) when calling the DeleteObjects operation: Missing required header for this request: Content-Md5.
ERROR pyarrow/tests/test_fs.py::test_open_output_stream_metadata[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))] - OSError: [Errno 5] An error occurred (MissingContentMD5) when calling the DeleteObjects operation: Missing required header for this request: Content-Md5.

Things run fine when constrained to boto3/botocore 1.35.88.

One question from the POV of the feedstock is whether the breakage is bad enough to encode boto3 <1.36 in the package metadata itself (until this issue is fixed), or whether to just add that constraint to the test requirements. That decision needs to be made for all maintenance branches too (not necessarily in the same way as for main), which are equally affected.

CC @raulcd @pitrou @kou @assignUser

Component(s)

Packaging, Python

@pitrou
Copy link
Member

pitrou commented Jan 20, 2025

PyArrow does not depend on boto3. These tests only exercise our fsspec bridge.

@pitrou
Copy link
Member

pitrou commented Jan 20, 2025

The only matching result in a Google search is aws/aws-sdk-cpp#1337 (comment) , but it doesn't involve boto3 at all...

@pitrou
Copy link
Member

pitrou commented Jan 20, 2025

Can you post the exception traceback(s)?

@h-vetinari
Copy link
Contributor Author

h-vetinari commented Jan 20, 2025

PyArrow does not depend on boto3. These tests only exercise our fsspec bridge.

AFAIU there's some optional support; at least, if boto3 is present, some more things get tested than without. This is what I meant above (so if we want to reflect this on the pyarrow-level, it would be a run_constraint, not a run dependency).

Can you post the exception traceback(s)?

Sure:

Example FAILURE
=================================== FAILURES ===================================
_ test_get_file_info_with_selector[PyFileSystem(FSSpecHandler(s3fs.S3FileSystem()))] _

func = <bound method ClientCreator._create_api_method.<locals>._api_call of <aiobotocore.client.S3 object at 0x7f31a4d940b0>>

    async def _error_wrapper(func, *, args=(), kwargs=None, retries):
        if kwargs is None:
            kwargs = {}
        for i in range(retries):
            try:
>               return await func(*args, **kwargs)

s3fs/core.py:114: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <aiobotocore.client.S3 object at 0x7f31a4d940b0>
operation_name = 'DeleteObjects'
api_params = {'Bucket': 'pyarrow-filesystem', 'Delete': {'Objects': [{'Key': 'selector-dir'}, {'Key': 'selector-dir/test_dir_a'}, {...ir/test_dir_a/test_file_c'}, {'Key': 'selector-dir/test_file_a'}, {'Key': 'selector-dir/test_file_b'}], 'Quiet': True}}

    async def _make_api_call(self, operation_name, api_params):
        operation_model = self._service_model.operation_model(operation_name)
        service_name = self._service_model.service_name
        history_recorder.record(
            'API_CALL',
            {
                'service': service_name,
                'operation': operation_name,
                'params': api_params,
            },
        )
        if operation_model.deprecated:
            logger.debug(
                'Warning: %s.%s() is deprecated', service_name, operation_name
            )
        request_context = {
            'client_region': self.meta.region_name,
            'client_config': self.meta.config,
            'has_streaming_input': operation_model.has_streaming_input,
            'auth_type': operation_model.resolved_auth_type,
            'unsigned_payload': operation_model.unsigned_payload,
        }
    
        api_params = await self._emit_api_params(
            api_params=api_params,
            operation_model=operation_model,
            context=request_context,
        )
        (
            endpoint_url,
            additional_headers,
            properties,
        ) = await self._resolve_endpoint_ruleset(
            operation_model, api_params, request_context
        )
        if properties:
            # Pass arbitrary endpoint info with the Request
            # for use during construction.
            request_context['endpoint_properties'] = properties
        request_dict = await self._convert_to_request_dict(
            api_params=api_params,
            operation_model=operation_model,
            endpoint_url=endpoint_url,
            context=request_context,
            headers=additional_headers,
        )
        resolve_checksum_context(request_dict, operation_model, api_params)
    
        service_id = self._service_model.service_id.hyphenize()
        handler, event_response = await self.meta.events.emit_until_response(
            f'before-call.{service_id}.{operation_name}',
            model=operation_model,
            params=request_dict,
            request_signer=self._request_signer,
            context=request_context,
        )
    
        if event_response is not None:
            http, parsed_response = event_response
        else:
            maybe_compress_request(
                self.meta.config, request_dict, operation_model
            )
            apply_request_checksum(request_dict)
            http, parsed_response = await self._make_request(
                operation_model, request_dict, request_context
            )
    
        await self.meta.events.emit(
            f'after-call.{service_id}.{operation_name}',
            http_response=http,
            parsed=parsed_response,
            model=operation_model,
            context=request_context,
        )
    
        if http.status_code >= 300:
            error_info = parsed_response.get("Error", {})
            error_code = error_info.get("QueryErrorCode") or error_info.get(
                "Code"
            )
            error_class = self.exceptions.from_code(error_code)
>           raise error_class(parsed_response, operation_name)
E           botocore.exceptions.ClientError: An error occurred (MissingContentMD5) when calling the DeleteObjects operation: Missing required header for this request: Content-Md5.

aiobotocore/client.py:412: ClientError

The above exception was the direct cause of the following exception:

fs = <pyarrow._fs.PyFileSystem object at 0x7f319ca9aa30>
pathfn = <method-wrapper '__add__' of str object at 0x7f31cd41e030>

    def test_get_file_info_with_selector(fs, pathfn):
        base_dir = pathfn('selector-dir/')
        file_a = pathfn('selector-dir/test_file_a')
        file_b = pathfn('selector-dir/test_file_b')
        dir_a = pathfn('selector-dir/test_dir_a')
        file_c = pathfn('selector-dir/test_dir_a/test_file_c')
        dir_b = pathfn('selector-dir/test_dir_b')
    
        try:
            fs.create_dir(base_dir)
            with fs.open_output_stream(file_a):
                pass
            with fs.open_output_stream(file_b):
                pass
            fs.create_dir(dir_a)
            with fs.open_output_stream(file_c):
                pass
            fs.create_dir(dir_b)
    
            # recursive selector
            selector = FileSelector(base_dir, allow_not_found=False,
                                    recursive=True)
            assert selector.base_dir == base_dir
    
            infos = fs.get_file_info(selector)
            if fs.type_name == "py::fsspec+('s3', 's3a')":
                # s3fs only lists directories if they are not empty
                len(infos) == 4
            else:
                assert len(infos) == 5
    
            for info in infos:
                if (info.path.endswith(file_a) or info.path.endswith(file_b) or
                        info.path.endswith(file_c)):
                    assert info.type == FileType.File
                elif (info.path.rstrip("/").endswith(dir_a) or
                      info.path.rstrip("/").endswith(dir_b)):
                    assert info.type == FileType.Directory
                else:
                    raise ValueError('unexpected path {}'.format(info.path))
                check_mtime_or_absent(info)
    
            # non-recursive selector -> not selecting the nested file_c
            selector = FileSelector(base_dir, recursive=False)
    
            infos = fs.get_file_info(selector)
            if fs.type_name == "py::fsspec+('s3', 's3a')":
                # s3fs only lists directories if they are not empty
                assert len(infos) == 3
            else:
                assert len(infos) == 4
    
        finally:
>           fs.delete_dir(base_dir)

pyarrow/tests/test_fs.py:782: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pyarrow/_fs.pyx:625: in pyarrow._fs.FileSystem.delete_dir
    check_status(self.fs.DeleteDir(directory))
pyarrow/error.pxi:89: in pyarrow.lib.check_status
    RestorePyError(status)
pyarrow/_fs.pyx:1529: in pyarrow._fs._cb_delete_dir
    handler.delete_dir(frombytes(path))
pyarrow/fs.py:366: in delete_dir
    self.fs.rm(path, recursive=True)
fsspec/asyn.py:118: in wrapper
    return sync(self.loop, func, *args, **kwargs)
fsspec/asyn.py:103: in sync
    raise return_result
fsspec/asyn.py:56: in _runner
    result[0] = await coro
s3fs/core.py:2052: in _rm
    out = await _run_coros_in_chunks(
fsspec/asyn.py:268: in _run_coros_in_chunks
    result, k = await done.pop()
fsspec/asyn.py:245: in _run_coro
    return await asyncio.wait_for(coro, timeout=timeout), i
../asyncio/tasks.py:520: in wait_for
    return await fut
s3fs/core.py:2026: in _bulk_delete
    out = await self._call_s3(
s3fs/core.py:371: in _call_s3
    return await _error_wrapper(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

func = <bound method ClientCreator._create_api_method.<locals>._api_call of <aiobotocore.client.S3 object at 0x7f31a4d940b0>>

    async def _error_wrapper(func, *, args=(), kwargs=None, retries):
        if kwargs is None:
            kwargs = {}
        for i in range(retries):
            try:
                return await func(*args, **kwargs)
            except S3_RETRYABLE_ERRORS as e:
                err = e
                logger.debug("Retryable error: %s", e)
                await asyncio.sleep(min(1.7**i * 0.1, 15))
            except ClientError as e:
                logger.debug("Client error (maybe retryable): %s", e)
                err = e
                wait_time = min(1.7**i * 0.1, 15)
                if "SlowDown" in str(e):
                    await asyncio.sleep(wait_time)
                elif "reduce your request rate" in str(e):
                    await asyncio.sleep(wait_time)
                elif "XAmzContentSHA256Mismatch" in str(e):
                    await asyncio.sleep(wait_time)
                else:
                    break
            except Exception as e:
                logger.debug("Nonretryable error: %s", e)
                err = e
                break
    
        if "'coroutine'" in str(err):
            # aiobotocore internal error - fetch original botocore error
            tb = err.__traceback__
            while tb.tb_next:
                tb = tb.tb_next
            try:
                await tb.tb_frame.f_locals["response"]
            except Exception as e:
                err = e
        err = translate_boto_error(err)
>       raise err
E       OSError: [Errno 5] An error occurred (MissingContentMD5) when calling the DeleteObjects operation: Missing required header for this request: Content-Md5.

s3fs/core.py:146: OSError
Example ERROR
==================================== ERRORS ====================================
_____________________ ERROR at teardown of test_read_s3fs ______________________

func = <bound method ClientCreator._create_api_method.<locals>._api_call of <aiobotocore.client.S3 object at 0x7f31a4d940b0>>

    async def _error_wrapper(func, *, args=(), kwargs=None, retries):
        if kwargs is None:
            kwargs = {}
        for i in range(retries):
            try:
>               return await func(*args, **kwargs)

s3fs/core.py:114: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <aiobotocore.client.S3 object at 0x7f31a4d940b0>
operation_name = 'DeleteObjects'
api_params = {'Bucket': 'test-s3fs', 'Delete': {'Objects': [{'Key': 'dc75fe8b0862473190a303f8c8ff48a1'}, {'Key': 'dc75fe8b0862473190a303f8c8ff48a1/test.parquet'}], 'Quiet': True}}

    async def _make_api_call(self, operation_name, api_params):
        operation_model = self._service_model.operation_model(operation_name)
        service_name = self._service_model.service_name
        history_recorder.record(
            'API_CALL',
            {
                'service': service_name,
                'operation': operation_name,
                'params': api_params,
            },
        )
        if operation_model.deprecated:
            logger.debug(
                'Warning: %s.%s() is deprecated', service_name, operation_name
            )
        request_context = {
            'client_region': self.meta.region_name,
            'client_config': self.meta.config,
            'has_streaming_input': operation_model.has_streaming_input,
            'auth_type': operation_model.resolved_auth_type,
            'unsigned_payload': operation_model.unsigned_payload,
        }
    
        api_params = await self._emit_api_params(
            api_params=api_params,
            operation_model=operation_model,
            context=request_context,
        )
        (
            endpoint_url,
            additional_headers,
            properties,
        ) = await self._resolve_endpoint_ruleset(
            operation_model, api_params, request_context
        )
        if properties:
            # Pass arbitrary endpoint info with the Request
            # for use during construction.
            request_context['endpoint_properties'] = properties
        request_dict = await self._convert_to_request_dict(
            api_params=api_params,
            operation_model=operation_model,
            endpoint_url=endpoint_url,
            context=request_context,
            headers=additional_headers,
        )
        resolve_checksum_context(request_dict, operation_model, api_params)
    
        service_id = self._service_model.service_id.hyphenize()
        handler, event_response = await self.meta.events.emit_until_response(
            f'before-call.{service_id}.{operation_name}',
            model=operation_model,
            params=request_dict,
            request_signer=self._request_signer,
            context=request_context,
        )
    
        if event_response is not None:
            http, parsed_response = event_response
        else:
            maybe_compress_request(
                self.meta.config, request_dict, operation_model
            )
            apply_request_checksum(request_dict)
            http, parsed_response = await self._make_request(
                operation_model, request_dict, request_context
            )
    
        await self.meta.events.emit(
            f'after-call.{service_id}.{operation_name}',
            http_response=http,
            parsed=parsed_response,
            model=operation_model,
            context=request_context,
        )
    
        if http.status_code >= 300:
            error_info = parsed_response.get("Error", {})
            error_code = error_info.get("QueryErrorCode") or error_info.get(
                "Code"
            )
            error_class = self.exceptions.from_code(error_code)
>           raise error_class(parsed_response, operation_name)
E           botocore.exceptions.ClientError: An error occurred (MissingContentMD5) when calling the DeleteObjects operation: Missing required header for this request: Content-Md5.

aiobotocore/client.py:412: ClientError

The above exception was the direct cause of the following exception:

s3_server = {'connection': ('127.0.0.1', 39255, 'arrow', 'apachearrow'), 'process': <Popen: returncode: None args: ['minio', '--compat', 'server', '--quiet', '-...>, 'tempdir': local('/tmp/pytest-of-conda/pytest-0')}
s3_bucket = 'test-s3fs'

    @pytest.fixture
    def s3_example_s3fs(s3_server, s3_bucket):
        s3fs = pytest.importorskip('s3fs')
    
        host, port, access_key, secret_key = s3_server['connection']
        fs = s3fs.S3FileSystem(
            key=access_key,
            secret=secret_key,
            client_kwargs={
                'endpoint_url': 'http://{}:{}'.format(host, port)
            }
        )
    
        test_path = '{}/{}'.format(s3_bucket, guid())
    
        fs.mkdir(test_path)
        yield fs, test_path
        try:
>           fs.rm(test_path, recursive=True)

pyarrow/tests/parquet/conftest.py:87: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
fsspec/asyn.py:118: in wrapper
    return sync(self.loop, func, *args, **kwargs)
fsspec/asyn.py:103: in sync
    raise return_result
fsspec/asyn.py:56: in _runner
    result[0] = await coro
s3fs/core.py:2052: in _rm
    out = await _run_coros_in_chunks(
fsspec/asyn.py:268: in _run_coros_in_chunks
    result, k = await done.pop()
fsspec/asyn.py:245: in _run_coro
    return await asyncio.wait_for(coro, timeout=timeout), i
../asyncio/tasks.py:520: in wait_for
    return await fut
s3fs/core.py:2026: in _bulk_delete
    out = await self._call_s3(
s3fs/core.py:371: in _call_s3
    return await _error_wrapper(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

func = <bound method ClientCreator._create_api_method.<locals>._api_call of <aiobotocore.client.S3 object at 0x7f31a4d940b0>>

    async def _error_wrapper(func, *, args=(), kwargs=None, retries):
        if kwargs is None:
            kwargs = {}
        for i in range(retries):
            try:
                return await func(*args, **kwargs)
            except S3_RETRYABLE_ERRORS as e:
                err = e
                logger.debug("Retryable error: %s", e)
                await asyncio.sleep(min(1.7**i * 0.1, 15))
            except ClientError as e:
                logger.debug("Client error (maybe retryable): %s", e)
                err = e
                wait_time = min(1.7**i * 0.1, 15)
                if "SlowDown" in str(e):
                    await asyncio.sleep(wait_time)
                elif "reduce your request rate" in str(e):
                    await asyncio.sleep(wait_time)
                elif "XAmzContentSHA256Mismatch" in str(e):
                    await asyncio.sleep(wait_time)
                else:
                    break
            except Exception as e:
                logger.debug("Nonretryable error: %s", e)
                err = e
                break
    
        if "'coroutine'" in str(err):
            # aiobotocore internal error - fetch original botocore error
            tb = err.__traceback__
            while tb.tb_next:
                tb = tb.tb_next
            try:
                await tb.tb_frame.f_locals["response"]
            except Exception as e:
                err = e
        err = translate_boto_error(err)
>       raise err
E       OSError: [Errno 5] An error occurred (MissingContentMD5) when calling the DeleteObjects operation: Missing required header for this request: Content-Md5.

s3fs/core.py:146: OSError

@pitrou
Copy link
Member

pitrou commented Jan 20, 2025

So, it's a bit weird that this is surfaced by a boto3 upgrade, but this is a Minio vs. AWS SDK compatibility issue, see upstream aws/aws-sdk-java-v2#5805 and minio/minio#20845

According to minio/minio#20845 (comment) , Minio will be fixed to handle new AWS SDK behavior.

@h-vetinari
Copy link
Contributor Author

So in other words, this issue ends up being a consequence of #45304, which itself is a consequence of the minio incompatibility. So once a compatible minio is out, things should go back to normal...

@pitrou
Copy link
Member

pitrou commented Jan 20, 2025

No, it seems to me that #45304 is a different error. It occurs on a different request (PutObject vs. DeleteObjects) and the error message is different. So we'll have to investigate that one separately.

pitrou added a commit to pitrou/arrow that referenced this issue Jan 20, 2025
pitrou added a commit that referenced this issue Jan 20, 2025
Until Minio gets fixed.

### Rationale for this change

### What changes are included in this PR?

### Are these changes tested?

### Are there any user-facing changes?

* GitHub Issue: #45305

Authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
@pitrou pitrou added this to the 20.0.0 milestone Jan 20, 2025
@pitrou
Copy link
Member

pitrou commented Jan 20, 2025

Issue resolved by pull request 45311
#45311

@pitrou pitrou closed this as completed Jan 20, 2025
@h-vetinari
Copy link
Contributor Author

h-vetinari commented Jan 20, 2025

So #45311 works around this on main, but we still have to decide about how to handle the maintenance branches

One question from the POV of the feedstock is whether the breakage is bad enough to encode boto3 <1.36 in the package metadata itself (until this issue is fixed), or whether to just add that constraint to the test requirements. That decision needs to be made for all maintenance branches too (not necessarily in the same way as for main), which are equally affected.

@pitrou
Copy link
Member

pitrou commented Jan 20, 2025

This is a problem in Minio together with boto3. Most users of PyArrow:

  • will use PyArrow for S3 access, not boto3
  • will not use Minio

So it would be bizarre to add a boto3 version cap to the PyArrow recipe.

@h-vetinari
Copy link
Contributor Author

Actually, even for main, the issue should IMO stay open, until

# Not a direct dependency of s3fs, but needed for our s3fs fixture
# (temporary upper bound because of GH-45305)
boto3<1.36

can be removed. I mean, I don't mind opening a separate issue, but this has all the required context already - could someone reopen?

@pitrou
Copy link
Member

pitrou commented Jan 20, 2025

Well, let's reopen then.

@pitrou pitrou reopened this Jan 20, 2025
@h-vetinari
Copy link
Contributor Author

So it would be bizarre to add a boto3 version cap to the PyArrow recipe.

conda has a concept of run_constrained, where the dependency does not get installed when installing pyarrow, but IF it present, the solver has to abide by the constraint. This is what would be an option for older packages (or we just ignore this corner case, and make our CI green by adding boto3 <1.36 only for the test requirements)

@pitrou
Copy link
Member

pitrou commented Jan 20, 2025

Well, the test suite here is failing because of an incompatibility between two third-party packages: boto3 and Minio. If Minio is packaged in conda-forge, perhaps its recipe can use the proposed run_constrained :)

@bollard
Copy link

bollard commented Jan 20, 2025

FYI cross posting for visibility as I do indeed use PyArrow, S3FS (i.e. boto3) and Minio: fsspec/s3fs#931

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants