Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support different formats as outputs and integrate with Explorer #40

Merged
merged 20 commits into from
Sep 12, 2024

Conversation

philss
Copy link
Contributor

@philss philss commented Sep 10, 2024

This should close #36

req-athena-lb-recording.mp4

@philss philss requested review from josevalim and aleDsz September 10, 2024 13:31
req = Req.new(http_errors: :raise) |> ReqAthena.attach(opts)
response = Req.post!(req, athena: @create_table)
result = response.body
# TODO: check why it's not working only with "workgroup"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed to ship a new version or we can address it later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure yet. I ran the version from main and the error is the same:

Response body: "{\"__type\":\"InvalidRequestException\",\"AthenaErrorCode\":\"INVALID_INPUT\",\"ErrorCode\":\"INVALID_INPUT\",\"Message\":\"No output location provided. An output location is required either through the Workgroup result configuration setting or as an API input.\"}"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is happening on main, I will ignore it for now.

@philss
Copy link
Contributor Author

philss commented Sep 10, 2024

Thanks for the reviews!
I'm holding this until I can apply the changes needed for kino_db.

Copy link
Contributor

@wojtekmach wojtekmach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I left some minor comments and one big one which is out of the scope of this PR.

lib/req_athena.ex Outdated Show resolved Hide resolved
lib/req_athena.ex Outdated Show resolved Hide resolved
lib/req_athena/s3.ex Outdated Show resolved Hide resolved
test/req_athena_test.exs Outdated Show resolved Hide resolved
test/req_athena_test.exs Outdated Show resolved Hide resolved
@@ -19,8 +21,13 @@ defmodule ReqAthena do
athena
output_location
cache_query
format
decode_body
output_compression
Copy link
Contributor

@wojtekmach wojtekmach Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following is out of the scope of this PR. Sorry I hijack this PR but I have a Req design question that this PR is I think perfect example to showcase and I also have a design question about ReqAthena as a whole.

The full list of options is the following:

    access_key_id
    secret_access_key
    token
    workgroup
    region
    database
    athena
    output_location
    cache_query
    format
    decode_body
    output_compression

That's a lot, isn't it? A vanilla Req already has a ton already:

iex> Req.new().registered_options |> Enum.count()
41

I'm not sure if this would be a huge (if any) improvement but in cases like this I feel like grouping under an a single option name, :athena, could maybe be an improvement:

opts = [
  athena: [
    access_key_id: System.fetch_env!("AWS_ACCESS_KEY_ID"),
    secret_access_key: System.fetch_env!("AWS_SECRET_ACCESS_KEY"),
    region: System.fetch_env!("AWS_REGION"),
    database: "default"
  ]
]

req = Req.new() |> ReqAthena.attach(opts)
Req.post!(req, athena: [query: "SELECT * FROM planet"])

(I think I can change Req so it correctly merges these options for :athena key).

I'm not 100%, looking at the snippet above it doesn't feel like an improvement. But the idea is to move towards this across the board, for example instead of:

retry: :transient, retry_delay: 1000, max_retries: 3, retry_log_level: :debug

we could have:

retry: [on: :transient, delay: 1000, max: 3, log_level: :debug]

I actually started with above and then switched to flat options for unrelated reason which I always regretted a bit.

About ReqAthena as a whole, I think this is by far the most complicated Req plugin, by far the most options and the most complex internals (including making more than one request internally). Looking back throughout its history, is the API being Req.attach+Req.post! an asset or a liability? I'm personally not sure. I suppose one benefit is any step or option we set will be used to make all Athena requests? There was a fair amount of activity in ReqS3 recently and handling s3:// scheme happened through Req steps but we also added distinct functions, ReqS3.presign_url/1 and ReqS3.presign_form/1. The latter is I think a good example that we don't need to shoehorn everything into Req API (steps etc). I think it'd be totally fine for req_x to not be a Req plugin, you know? And so perhaps it'd be more straightforward to instead do:

options = [
  aws_access_key_id: ...,
  query: ...,
  ...
]

ReqAthena.query!(options)

and if we want to customise the underlying Req, it is:

req = Req.new()
ReqAthena.query!(req, options)

I haven't fully thought this through but just throwing this out there, again, out of the scope of this PR for sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feel like grouping under an a single option name, :athena, could maybe be an improvement

I agree with you. I'm afraid one of the options could conflict between plugins.

I think it'd be totally fine for req_x to not be a Req plugin, you know?

Yeah, I see. I think for more "product" (or API) specific things, we may not want to have it as a plugin. Your example with ReqAthena.query!/2 makes a lot of sense to me :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For most plugins, I think it would be beneficial to effectively scope all options, as is the case for retry_delay, retry_log_level, and then a nested keyword list just makes it more ergonomic.

I think it'd be totally fine for req_x to not be a Req plugin, you know?

I agree. I think plugins make most sense if they have some level of generality. In case of req_athena, everything is designed towards making the query, so we effectively use req step options as query options, which is additional indirection. One way to look at it would be to ask: once a plugin is attached, what kind of requests do we end up making; and in this case I think the answer is basically Req.post!(req, athena: {"...", []}).

including making more than one request internally

That alone is a pretty solid argument for ReqAthena.query!.

One factor that may have contributed to the design, is that the primary use case was the "Database connection" + "SQL query" cells, which requires a separation between "prepare" and "query". If the "connection" options were passed directly to ReqAthena.query!, this is a bit more tricky (unless the options are a dedicated struct altogether). cc @josevalim

philss and others added 2 commits September 11, 2024 12:21
@philss philss merged commit 70828c0 into main Sep 12, 2024
2 checks passed
@philss philss deleted the ps-explorer-integration branch September 12, 2024 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider supporting and returning Explorer results directly
5 participants