[BUG] Delta table properties should be case-insensitive · Issue #1291 · delta-io/delta

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement . We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Describe the problem

I'm currently populating a Hive metastore with Delta tables created by Databricks.
These tables have table properties, including Databricks-specific properties such as delta.autoOptimize.autoCompact , which can be ignored via the spark.databricks.delta.allowArbitraryProperties.enabled configuration since Delta 2.0.0. So far so good!
Unfortunately, when trying to create the table in the metastore, I get the following error:
Exception in thread "main" org.apache.spark.sql.AnalysisException: The specified properties do not match the existing properties at s3a://redacted/redacted/redacted/table/0000.
== Specified ==
delta.autooptimize.autocompact=true
redacted.public=false
== Existing ==
delta.autoOptimize.autoCompact=true
redacted.public=false
Due to the Delta code thinking the properties are case-insensitive, but grabbing the existing properties as a straightforward Map and comparing the two in a case-sensitive fashion.
Steps to reproduce
Create a Databricks Delta table with table properties including upper case letters
Try to read them with open-source Delta
Observed results
Failing due to case sensitivity on the table properties.
Expected results
Not failing :)
Further details
Full traceback:
You are setting a property: delta.autooptimize.autocompact that is not recognized by this version of Delta
Exception in thread "main" org.apache.spark.sql.AnalysisException: The specified properties do not match the existing properties at s3a://redacted/redacted/redacted/table/0000.
== Specified ==
delta.autooptimize.autocompact=true
redacted.public=false
== Existing ==
delta.autoOptimize.autoCompact=true
redacted.public=false
	at org.apache.spark.sql.delta.DeltaAnalysisException$.apply(DeltaSharedExceptions.scala:57)
	at org.apache.spark.sql.delta.DeltaErrorsBase.createTableWithDifferentPropertiesException(DeltaErrors.scala:1161)
	at org.apache.spark.sql.delta.DeltaErrorsBase.createTableWithDifferentPropertiesException$(DeltaErrors.scala:1157)
	at org.apache.spark.sql.delta.DeltaErrors$.createTableWithDifferentPropertiesException(DeltaErrors.scala:2264)
	at org.apache.spark.sql.delta.commands.CreateDeltaTableCommand.verifyTableMetadata(CreateDeltaTableCommand.scala:324)
	at org.apache.spark.sql.delta.commands.CreateDeltaTableCommand.createTransactionLogOrVerify$1(CreateDeltaTableCommand.scala:186)
	at org.apache.spark.sql.delta.commands.CreateDeltaTableCommand.$anonfun$run$2(CreateDeltaTableCommand.scala:193)
	at org.apache.spark.sql.delta.metering.DeltaLogging.recordFrameProfile(DeltaLogging.scala:141)
	at org.apache.spark.sql.delta.metering.DeltaLogging.recordFrameProfile$(DeltaLogging.scala:139)
	at org.apache.spark.sql.delta.commands.CreateDeltaTableCommand.recordFrameProfile(CreateDeltaTableCommand.scala:49)
	at org.apache.spark.sql.delta.metering.DeltaLogging.$anonfun$recordDeltaOperationInternal$1(DeltaLogging.scala:134)
	at com.databricks.spark.util.DatabricksLogging.recordOperation(DatabricksLogging.scala:77)
	at com.databricks.spark.util.DatabricksLogging.recordOperation$(DatabricksLogging.scala:67)
	at org.apache.spark.sql.delta.commands.CreateDeltaTableCommand.recordOperation(CreateDeltaTableCommand.scala:49)
	at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperationInternal(DeltaLogging.scala:133)
	at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperation(DeltaLogging.scala:123)
	at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperation$(DeltaLogging.scala:111)
	at org.apache.spark.sql.delta.commands.CreateDeltaTableCommand.recordDeltaOperation(CreateDeltaTableCommand.scala:49)
	at org.apache.spark.sql.delta.commands.CreateDeltaTableCommand.run(CreateDeltaTableCommand.scala:110)
	at org.apache.spark.sql.delta.catalog.DeltaCatalog.org$apache$spark$sql$delta$catalog$DeltaCatalog$$createDeltaTable(DeltaCatalog.scala:163)
	at org.apache.spark.sql.delta.catalog.DeltaCatalog.createTable(DeltaCatalog.scala:212)
	at org.apache.spark.sql.execution.datasources.v2.CreateTableExec.run(CreateTableExec.scala:42)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106)
	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457)
	at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:106)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:93)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:91)
	at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:128)
	at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:133)
	at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:130)
	at org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:148)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executedPlan$1(QueryExecution.scala:166)
	at org.apache.spark.sql.execution.QueryExecution.withCteMap(QueryExecution.scala:73)
	at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:163)
	at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:163)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:101)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
	at io.delta.tables.DeltaTableBuilder.execute(DeltaTableBuilder.scala:357)
	at com.redacted.spark.SparkSessionHelpersKt.createDeltaTable(SparkSessionHelpers.kt:57)
	at com.redacted.spark.jobs.schema.MainKt$main$$inlined$withConfiguration$1.invoke(Configuration.kt:80)
	at com.redacted.spark.jobs.schema.MainKt$main$$inlined$withConfiguration$1.invoke(Configuration.kt:40)
	at com.redacted.spark.jobs.configuration.CliktRun.run(Configuration.kt:49)
	at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:198)
	at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:18)
	at com.github.ajalt.clikt.core.CliktCommand.parse(CliktCommand.kt:395)
	at com.github.ajalt.clikt.core.CliktCommand.parse$default(CliktCommand.kt:392)
	at com.github.ajalt.clikt.core.CliktCommand.main(CliktCommand.kt:410)
	at com.github.ajalt.clikt.core.CliktCommand.main(CliktCommand.kt:435)
	at com.redacted.spark.jobs.schema.MainKt.main(Main.kt:21)
Environment information
Delta Lake version: 2.0.0
Spark version: 3.2.1
Scala version: 2.12
Willingness to contribute
The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?
 Yes. I can contribute a fix for this bug independently.
 Yes. I would be willing to contribute a fix for this bug with guidance from the Delta Lake community.
 No. I cannot contribute a bug fix at this time.
(PR coming!)