Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upper/Lower Case support for Databricks column names #293

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

diveyseek
Copy link

@diveyseek diveyseek commented Dec 16, 2024

Note

DRAFT: Waiting on additional change to sat_v1 macro to also set case here too

Description

  • Currently all columns default to uppercase, this change allows setting the default case for column names on the Databricks adapter to either upper or lower case based on a variable (Matches Fabric Functionality)

Existing Fabric Functionality - https://github.com/ScalefreeCOM/datavault4dbt/blob/main/macros/internal/metadata_processing/escape_column_names.sql#L171-L187

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce.

  • Tests you ran

Reproducing the functionality

  • Created standard Databricks model (Staging Table)
  • Run the Model
  • All column will default to be upper case (As expected)
  • Set the datavault4dbt.set_casing to lower
  • Run the Model
  • All columns will now be lower case

Test Configuration:

  • datavault4dbt-Version: 1.8.1
  • dbt-Version: 1.8.8/core
  • dbt-adapter-Version: 1.8.1

Checklist:

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation or included information that needs updates (e.g. in the Wiki)

- Allows setting case default for Databricks adapter (Matches Fabric Functionality)
@diveyseek diveyseek marked this pull request as draft December 16, 2024 23:11
@kafkaguru
Copy link

Hi @diveyseek, anything I can help with on this PR? I'd love to be able to get this in as we are looking just starting out with Databricks and having our column names lowercase would match our requirements nicely. Thanks!

@tkiehn
Copy link
Collaborator

tkiehn commented Jan 7, 2025

Hi @diveyseek and thanks for the contribution!

I see a problem in your implementation tho, if the set_casing variable is neither lower(case) or upper(case) it will return nothing because of a missing else-case.

My suggestion is to use the same logic like in the fabric implementation:
First escape the column name and then conditionally change it to upper or lower using the jinja-filter.

The next step is to ensure the escape_column_names-macro is called everywhere where columns are selected in a model.
Because as of right now this is not the case for all macros.

For a start I would try to add this in the last CTE or SELECT-statement of each macro in the databricks folder, this should be sufficient already.
This could be a great thing to look into for you @Brianbrifri if you want to contribute to the PR.

Feel free to reach out if you have any questions or comments!

Best regards
Theo

@kafkaguru
Copy link

kafkaguru commented Jan 8, 2025

@tkiehn Thanks for the information.
You mentioned that escape_column_names-macro is not called everywhere. Is there a global variable that can set what the escape characters are? I tried setting

  datavault4dbt.escape_char_left: '`'
  datavault4dbt.escape_char_right: '`'

but it didn't work when column names contained # in them.
I also noticed for databricks it defaults to an empty string:
https://github.com/ScalefreeCOM/datavault4dbt/blob/8a738187fa536ec446caf71506547f0ecebac46e/macros/internal/metadata_processing/escape_column_names.sql#L190C1-L199C17
but databricks escape character is a backtick `. Should this change be part of this PR or should there be a different PR addressing this issue?

@tkiehn
Copy link
Collaborator

tkiehn commented Jan 9, 2025

Hi @kafkaguru,

I have reviewed the macro and observed the syntax that was implemented for the variables is missing the datavault4dbt.-prefix, maybe that is the reason why you had no success in setting the escape characters?

You are right that we should add the backtick as default-escape-character on databricks to ensure broader compatibility.

Since we have to do the same thing(ensuring that the escape_column_names-macro is called everywhere where necessary) for both topics(implement set_casing and appropriate escaping) this can happen in the same PR, imho.
What do you think @tkirschke?

Best Regards
Theo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants