-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support different formats as outputs and integrate with Explorer #40
Conversation
The idea is to make easier to compose the query based on options.
The "UNLOAD" command allows us to specify the format of the output, so we can tell Athena to save our results as Parquet. This will be useful for loading them with Explorer.
Using decode_body: false is going to return the output location only.
req = Req.new(http_errors: :raise) |> ReqAthena.attach(opts) | ||
response = Req.post!(req, athena: @create_table) | ||
result = response.body | ||
# TODO: check why it's not working only with "workgroup" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed to ship a new version or we can address it later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure yet. I ran the version from main
and the error is the same:
Response body: "{\"__type\":\"InvalidRequestException\",\"AthenaErrorCode\":\"INVALID_INPUT\",\"ErrorCode\":\"INVALID_INPUT\",\"Message\":\"No output location provided. An output location is required either through the Workgroup result configuration setting or as an API input.\"}"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is happening on main, I will ignore it for now.
Thanks for the reviews! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I left some minor comments and one big one which is out of the scope of this PR.
@@ -19,8 +21,13 @@ defmodule ReqAthena do | |||
athena | |||
output_location | |||
cache_query | |||
format | |||
decode_body | |||
output_compression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following is out of the scope of this PR. Sorry I hijack this PR but I have a Req design question that this PR is I think perfect example to showcase and I also have a design question about ReqAthena as a whole.
The full list of options is the following:
access_key_id
secret_access_key
token
workgroup
region
database
athena
output_location
cache_query
format
decode_body
output_compression
That's a lot, isn't it? A vanilla Req already has a ton already:
iex> Req.new().registered_options |> Enum.count()
41
I'm not sure if this would be a huge (if any) improvement but in cases like this I feel like grouping under an a single option name, :athena, could maybe be an improvement:
opts = [
athena: [
access_key_id: System.fetch_env!("AWS_ACCESS_KEY_ID"),
secret_access_key: System.fetch_env!("AWS_SECRET_ACCESS_KEY"),
region: System.fetch_env!("AWS_REGION"),
database: "default"
]
]
req = Req.new() |> ReqAthena.attach(opts)
Req.post!(req, athena: [query: "SELECT * FROM planet"])
(I think I can change Req so it correctly merges these options for :athena key).
I'm not 100%, looking at the snippet above it doesn't feel like an improvement. But the idea is to move towards this across the board, for example instead of:
retry: :transient, retry_delay: 1000, max_retries: 3, retry_log_level: :debug
we could have:
retry: [on: :transient, delay: 1000, max: 3, log_level: :debug]
I actually started with above and then switched to flat options for unrelated reason which I always regretted a bit.
About ReqAthena as a whole, I think this is by far the most complicated Req plugin, by far the most options and the most complex internals (including making more than one request internally). Looking back throughout its history, is the API being Req.attach+Req.post! an asset or a liability? I'm personally not sure. I suppose one benefit is any step or option we set will be used to make all Athena requests? There was a fair amount of activity in ReqS3 recently and handling s3://
scheme happened through Req steps but we also added distinct functions, ReqS3.presign_url/1
and ReqS3.presign_form/1
. The latter is I think a good example that we don't need to shoehorn everything into Req API (steps etc). I think it'd be totally fine for req_x to not be a Req plugin, you know? And so perhaps it'd be more straightforward to instead do:
options = [
aws_access_key_id: ...,
query: ...,
...
]
ReqAthena.query!(options)
and if we want to customise the underlying Req, it is:
req = Req.new()
ReqAthena.query!(req, options)
I haven't fully thought this through but just throwing this out there, again, out of the scope of this PR for sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feel like grouping under an a single option name, :athena, could maybe be an improvement
I agree with you. I'm afraid one of the options could conflict between plugins.
I think it'd be totally fine for req_x to not be a Req plugin, you know?
Yeah, I see. I think for more "product" (or API) specific things, we may not want to have it as a plugin. Your example with ReqAthena.query!/2
makes a lot of sense to me :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For most plugins, I think it would be beneficial to effectively scope all options, as is the case for retry_delay
, retry_log_level
, and then a nested keyword list just makes it more ergonomic.
I think it'd be totally fine for req_x to not be a Req plugin, you know?
I agree. I think plugins make most sense if they have some level of generality. In case of req_athena, everything is designed towards making the query, so we effectively use req step options as query options, which is additional indirection. One way to look at it would be to ask: once a plugin is attached, what kind of requests do we end up making; and in this case I think the answer is basically Req.post!(req, athena: {"...", []})
.
including making more than one request internally
That alone is a pretty solid argument for ReqAthena.query!
.
One factor that may have contributed to the design, is that the primary use case was the "Database connection" + "SQL query" cells, which requires a separation between "prepare" and "query". If the "connection" options were passed directly to ReqAthena.query!
, this is a bit more tricky (unless the options are a dedicated struct altogether). cc @josevalim
Co-authored-by: Wojtek Mach <[email protected]>
Co-authored-by: Wojtek Mach <[email protected]>
This should close #36
req-athena-lb-recording.mp4