Support different formats as outputs and integrate with Explorer #40

philss · 2024-09-10T13:31:02Z

This should close #36

req-athena-lb-recording.mp4

The idea is to make easier to compose the query based on options.

The "UNLOAD" command allows us to specify the format of the output, so we can tell Athena to save our results as Parquet. This will be useful for loading them with Explorer.

See: #36

Using decode_body: false is going to return the output location only.

aleDsz · 2024-09-10T14:01:32Z

test/integration_test.exs

-    req = Req.new(http_errors: :raise) |> ReqAthena.attach(opts)
-    response = Req.post!(req, athena: @create_table)
-    result = response.body
+  # TODO: check why it's not working only with "workgroup"


Is this needed to ship a new version or we can address it later?

I'm not sure yet. I ran the version from main and the error is the same:

Response body: "{\"__type\":\"InvalidRequestException\",\"AthenaErrorCode\":\"INVALID_INPUT\",\"ErrorCode\":\"INVALID_INPUT\",\"Message\":\"No output location provided. An output location is required either through the Workgroup result configuration setting or as an API input.\"}"

Since this is happening on main, I will ignore it for now.

philss · 2024-09-10T23:50:00Z

Thanks for the reviews!
I'm holding this until I can apply the changes needed for kino_db.

wojtekmach

Looks good to me. I left some minor comments and one big one which is out of the scope of this PR.

lib/req_athena.ex

lib/req_athena/s3.ex

test/req_athena_test.exs

wojtekmach · 2024-09-11T08:51:44Z

lib/req_athena.ex

@@ -19,8 +21,13 @@ defmodule ReqAthena do
    athena
    output_location
    cache_query
+    format
+    decode_body
+    output_compression


The following is out of the scope of this PR. Sorry I hijack this PR but I have a Req design question that this PR is I think perfect example to showcase and I also have a design question about ReqAthena as a whole.

The full list of options is the following:

access_key_id secret_access_key token workgroup region database athena output_location cache_query format decode_body output_compression

That's a lot, isn't it? A vanilla Req already has a ton already:

iex> Req.new().registered_options |> Enum.count() 41

I'm not sure if this would be a huge (if any) improvement but in cases like this I feel like grouping under an a single option name, :athena, could maybe be an improvement:

opts = [ athena: [ access_key_id: System.fetch_env!("AWS_ACCESS_KEY_ID"), secret_access_key: System.fetch_env!("AWS_SECRET_ACCESS_KEY"), region: System.fetch_env!("AWS_REGION"), database: "default" ] ] req = Req.new() |> ReqAthena.attach(opts) Req.post!(req, athena: [query: "SELECT * FROM planet"])

(I think I can change Req so it correctly merges these options for :athena key).

I'm not 100%, looking at the snippet above it doesn't feel like an improvement. But the idea is to move towards this across the board, for example instead of:

retry: :transient, retry_delay: 1000, max_retries: 3, retry_log_level: :debug

we could have:

retry: [on: :transient, delay: 1000, max: 3, log_level: :debug]

I actually started with above and then switched to flat options for unrelated reason which I always regretted a bit.

About ReqAthena as a whole, I think this is by far the most complicated Req plugin, by far the most options and the most complex internals (including making more than one request internally). Looking back throughout its history, is the API being Req.attach+Req.post! an asset or a liability? I'm personally not sure. I suppose one benefit is any step or option we set will be used to make all Athena requests? There was a fair amount of activity in ReqS3 recently and handling s3:// scheme happened through Req steps but we also added distinct functions, ReqS3.presign_url/1 and ReqS3.presign_form/1. The latter is I think a good example that we don't need to shoehorn everything into Req API (steps etc). I think it'd be totally fine for req_x to not be a Req plugin, you know? And so perhaps it'd be more straightforward to instead do:

options = [ aws_access_key_id: ..., query: ..., ... ] ReqAthena.query!(options)

and if we want to customise the underlying Req, it is:

req = Req.new() ReqAthena.query!(req, options)

I haven't fully thought this through but just throwing this out there, again, out of the scope of this PR for sure.

feel like grouping under an a single option name, :athena, could maybe be an improvement

I agree with you. I'm afraid one of the options could conflict between plugins.

I think it'd be totally fine for req_x to not be a Req plugin, you know?

Yeah, I see. I think for more "product" (or API) specific things, we may not want to have it as a plugin. Your example with ReqAthena.query!/2 makes a lot of sense to me :)

For most plugins, I think it would be beneficial to effectively scope all options, as is the case for retry_delay, retry_log_level, and then a nested keyword list just makes it more ergonomic.

I think it'd be totally fine for req_x to not be a Req plugin, you know?

I agree. I think plugins make most sense if they have some level of generality. In case of req_athena, everything is designed towards making the query, so we effectively use req step options as query options, which is additional indirection. One way to look at it would be to ask: once a plugin is attached, what kind of requests do we end up making; and in this case I think the answer is basically Req.post!(req, athena: {"...", []}).

including making more than one request internally

That alone is a pretty solid argument for ReqAthena.query!.

One factor that may have contributed to the design, is that the primary use case was the "Database connection" + "SQL query" cells, which requires a separation between "prepare" and "query". If the "connection" options were passed directly to ReqAthena.query!, this is a bit more tricky (unless the options are a dedicated struct altogether). cc @josevalim

Co-authored-by: Wojtek Mach <[email protected]>

philss added 15 commits August 20, 2024 00:00

Refactor handle of query to its own module

283fefd

The idea is to make easier to compose the query based on options.

Introduce "UNLOAD" command to the query builder

42f1f61

The "UNLOAD" command allows us to specify the format of the output, so we can tell Athena to save our results as Parquet. This will be useful for loading them with Explorer.

Add initial integration with Explorer

670a179

See: #36

WIP: make the default output be the decoded API result

94ef79b

Add tests for both cases that work

dde6c2b

Add "format: :explorer" decoding body to Explorer DFs

3f68631

Add support for the ":csv" format

996706d

Using decode_body: false is going to return the output location only.

Fix supported formats

c66782d

Add JSON format support and fix Explorer result handle

22056ee

Fix docs

5853765

Add test cases for JSON and CSV formats - integration tests

059646c

Refactor to isolate S3 interactions to a module

762443b

Fix integration tests with new responses

5f86fea

Update docs and make Explorer optional

fe5d7b4

Fix readme

888f506

philss requested review from josevalim and aleDsz September 10, 2024 13:31

philss added 2 commits September 10, 2024 10:37

Remove unused Result struct

8dba414

Fix output at README

cb5308c

aleDsz reviewed Sep 10, 2024

View reviewed changes

aleDsz approved these changes Sep 10, 2024

View reviewed changes

Hide ReqAthena.S3

1eefe50

josevalim approved these changes Sep 10, 2024

View reviewed changes

wojtekmach approved these changes Sep 11, 2024

View reviewed changes

philss and others added 2 commits September 11, 2024 12:21

Apply suggestions from code review

10f7eca

Co-authored-by: Wojtek Mach <[email protected]>

Update test/req_athena_test.exs

30593d5

Co-authored-by: Wojtek Mach <[email protected]>

philss mentioned this pull request Sep 12, 2024

Prepare for ReqAthena v0.2 livebook-dev/kino_db#78

Merged

philss merged commit 70828c0 into main Sep 12, 2024
2 checks passed

philss deleted the ps-explorer-integration branch September 12, 2024 14:53

This was referenced Nov 7, 2024

Support http proxy configuration via env variables livebook-dev/livebook#2850

Merged

Add basic query feature livebook-dev/req_ch#1

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support different formats as outputs and integrate with Explorer #40

Support different formats as outputs and integrate with Explorer #40

philss commented Sep 10, 2024 •

edited

Loading

aleDsz Sep 10, 2024

philss Sep 10, 2024

philss Sep 12, 2024

philss commented Sep 10, 2024

wojtekmach left a comment

wojtekmach Sep 11, 2024 •

edited

Loading

philss Sep 11, 2024

jonatanklosko Sep 23, 2024

Support different formats as outputs and integrate with Explorer #40

Support different formats as outputs and integrate with Explorer #40

Conversation

philss commented Sep 10, 2024 • edited Loading

aleDsz Sep 10, 2024

Choose a reason for hiding this comment

philss Sep 10, 2024

Choose a reason for hiding this comment

philss Sep 12, 2024

Choose a reason for hiding this comment

philss commented Sep 10, 2024

wojtekmach left a comment

Choose a reason for hiding this comment

wojtekmach Sep 11, 2024 • edited Loading

Choose a reason for hiding this comment

philss Sep 11, 2024

Choose a reason for hiding this comment

jonatanklosko Sep 23, 2024

Choose a reason for hiding this comment

philss commented Sep 10, 2024 •

edited

Loading

wojtekmach Sep 11, 2024 •

edited

Loading