相关文章推荐

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement . We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Describe the problem

I'm currently populating a Hive metastore with Delta tables created by Databricks.

These tables have table properties, including Databricks-specific properties such as delta.autoOptimize.autoCompact , which can be ignored via the spark.databricks.delta.allowArbitraryProperties.enabled configuration since Delta 2.0.0. So far so good!

Unfortunately, when trying to create the table in the metastore, I get the following error:

Exception in thread "main" org.apache.spark.sql.AnalysisException: The specified properties do not match the existing properties at s3a://redacted/redacted/redacted/table/0000.
== Specified ==
delta.autooptimize.autocompact=true
redacted.public=false
== Existing ==
delta.autoOptimize.autoCompact=true
redacted.public=false

Due to the Delta code thinking the properties are case-insensitive, but grabbing the existing properties as a straightforward Map and comparing the two in a case-sensitive fashion.

Steps to reproduce

  • Create a Databricks Delta table with table properties including upper case letters
  • Try to read them with open-source Delta
  • Observed results

    Failing due to case sensitivity on the table properties.

    Expected results

    Not failing :)

    Further details

    Full traceback:

    You are setting a property: delta.autooptimize.autocompact that is not recognized by this version of Delta
    Exception in thread "main" org.apache.spark.sql.AnalysisException: The specified properties do not match the existing properties at s3a://redacted/redacted/redacted/table/0000.
    == Specified ==
    delta.autooptimize.autocompact=true
    redacted.public=false
    == Existing ==
    delta.autoOptimize.autoCompact=true
    redacted.public=false
    	at org.apache.spark.sql.delta.DeltaAnalysisException$.apply(DeltaSharedExceptions.scala:57)
    	at org.apache.spark.sql.delta.DeltaErrorsBase.createTableWithDifferentPropertiesException(DeltaErrors.scala:1161)
    	at org.apache.spark.sql.delta.DeltaErrorsBase.createTableWithDifferentPropertiesException$(DeltaErrors.scala:1157)
    	at org.apache.spark.sql.delta.DeltaErrors$.createTableWithDifferentPropertiesException(DeltaErrors.scala:2264)
    	at org.apache.spark.sql.delta.commands.CreateDeltaTableCommand.verifyTableMetadata(CreateDeltaTableCommand.scala:324)
    	at org.apache.spark.sql.delta.commands.CreateDeltaTableCommand.createTransactionLogOrVerify$1(CreateDeltaTableCommand.scala:186)
    	at org.apache.spark.sql.delta.commands.CreateDeltaTableCommand.$anonfun$run$2(CreateDeltaTableCommand.scala:193)
    	at org.apache.spark.sql.delta.metering.DeltaLogging.recordFrameProfile(DeltaLogging.scala:141)
    	at org.apache.spark.sql.delta.metering.DeltaLogging.recordFrameProfile$(DeltaLogging.scala:139)
    	at org.apache.spark.sql.delta.commands.CreateDeltaTableCommand.recordFrameProfile(CreateDeltaTableCommand.scala:49)
    	at org.apache.spark.sql.delta.metering.DeltaLogging.$anonfun$recordDeltaOperationInternal$1(DeltaLogging.scala:134)
    	at com.databricks.spark.util.DatabricksLogging.recordOperation(DatabricksLogging.scala:77)
    	at com.databricks.spark.util.DatabricksLogging.recordOperation$(DatabricksLogging.scala:67)
    	at org.apache.spark.sql.delta.commands.CreateDeltaTableCommand.recordOperation(CreateDeltaTableCommand.scala:49)
    	at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperationInternal(DeltaLogging.scala:133)
    	at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperation(DeltaLogging.scala:123)
    	at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperation$(DeltaLogging.scala:111)
    	at org.apache.spark.sql.delta.commands.CreateDeltaTableCommand.recordDeltaOperation(CreateDeltaTableCommand.scala:49)
    	at org.apache.spark.sql.delta.commands.CreateDeltaTableCommand.run(CreateDeltaTableCommand.scala:110)
    	at org.apache.spark.sql.delta.catalog.DeltaCatalog.org$apache$spark$sql$delta$catalog$DeltaCatalog$$createDeltaTable(DeltaCatalog.scala:163)
    	at org.apache.spark.sql.delta.catalog.DeltaCatalog.createTable(DeltaCatalog.scala:212)
    	at org.apache.spark.sql.execution.datasources.v2.CreateTableExec.run(CreateTableExec.scala:42)
    	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
    	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
    	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
    	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110)
    	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
    	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
    	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
    	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
    	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
    	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110)
    	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106)
    	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
    	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
    	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
    	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
    	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
    	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
    	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
    	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
    	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457)
    	at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:106)
    	at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:93)
    	at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:91)
    	at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:128)
    	at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:133)
    	at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:130)
    	at org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:148)
    	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executedPlan$1(QueryExecution.scala:166)
    	at org.apache.spark.sql.execution.QueryExecution.withCteMap(QueryExecution.scala:73)
    	at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:163)
    	at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:163)
    	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:101)
    	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
    	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
    	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
    	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
    	at io.delta.tables.DeltaTableBuilder.execute(DeltaTableBuilder.scala:357)
    	at com.redacted.spark.SparkSessionHelpersKt.createDeltaTable(SparkSessionHelpers.kt:57)
    	at com.redacted.spark.jobs.schema.MainKt$main$$inlined$withConfiguration$1.invoke(Configuration.kt:80)
    	at com.redacted.spark.jobs.schema.MainKt$main$$inlined$withConfiguration$1.invoke(Configuration.kt:40)
    	at com.redacted.spark.jobs.configuration.CliktRun.run(Configuration.kt:49)
    	at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:198)
    	at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:18)
    	at com.github.ajalt.clikt.core.CliktCommand.parse(CliktCommand.kt:395)
    	at com.github.ajalt.clikt.core.CliktCommand.parse$default(CliktCommand.kt:392)
    	at com.github.ajalt.clikt.core.CliktCommand.main(CliktCommand.kt:410)
    	at com.github.ajalt.clikt.core.CliktCommand.main(CliktCommand.kt:435)
    	at com.redacted.spark.jobs.schema.MainKt.main(Main.kt:21)
    

    Environment information

  • Delta Lake version: 2.0.0
  • Spark version: 3.2.1
  • Scala version: 2.12
  • Willingness to contribute

    The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?

  • Yes. I can contribute a fix for this bug independently.
  • Yes. I would be willing to contribute a fix for this bug with guidance from the Delta Lake community.
  • No. I cannot contribute a bug fix at this time.
  • (PR coming!)

     
    推荐文章