相关文章推荐
愉快的柚子  ·  Spark 技術支援- DJI·  3 月前    · 
愉快的柚子  ·  Apache Sqoop - Amazon EMR·  3 月前    · 
愉快的柚子  ·  Sqoop源码- 从MySQL ...·  3 月前    · 
愉快的柚子  ·  Run Apache Sqoop jobs ...·  3 月前    · 
Access to this page requires authorization. You can try or changing directories . Access to this page requires authorization. You can try changing directories .

Bootstrap scripts allow you to install and configure components in Azure HDInsight programmatically.

There are three approaches to set configuration file settings as your HDInsight cluster is created:

  • Use Azure PowerShell
  • Use .NET SDK
  • Use Azure Resource Manager template
  • For example, using these programmatic methods, you can configure options in these files:

  • clusterIdentity.xml
  • core-site.xml
  • gateway.xml
  • hbase-env.xml
  • hbase-site.xml
  • hdfs-site.xml
  • hive-env.xml
  • hive-site.xml
  • mapred-site
  • oozie-site.xml
  • oozie-env.xml
  • tez-site.xml
  • webhcat-site.xml
  • yarn-site.xml
  • server.properties (kafka-broker configuration)
  • For information on installing more components on HDInsight cluster during the creation time, see Customize HDInsight clusters using Script Action (Linux) .

    Prerequisites

  • If using PowerShell, you need the Az Module .
  • Use Azure PowerShell

    The following PowerShell code customizes an Apache Hive configuration:

    Important

    The parameter Spark2Defaults may need to be used with Add-AzHDInsightConfigValue . You can pass empty values to the parameter as shown in the following code example.

    # hive-site.xml configuration
    $hiveConfigValues = @{ "hive.metastore.client.socket.timeout"="90s" }
    $config = New-AzHDInsightClusterConfig `
             -ClusterType "Spark"  `
        | Set-AzHDInsightDefaultStorage `
            -StorageAccountResourceId "$storageAccountResourceId" `
            -StorageAccountKey $defaultStorageAccountKey `
        | Add-AzHDInsightConfigValue `
            -HiveSite $hiveConfigValues `
            -Spark2Defaults @{}
    New-AzHDInsightCluster `
        -ResourceGroupName $resourceGroupName `
        -ClusterName $hdinsightClusterName `
        -Location $location `
        -ClusterSizeInNodes 2 `
        -Version "4.0" `
        -HttpCredential $httpCredential `
        -SshCredential $sshCredential `
        -Config $config
    

    A complete working PowerShell script can be found in Appendix.

    To verify the change:

  • Navigate to https://CLUSTERNAME.azurehdinsight.cn/ where CLUSTERNAME is the name of your cluster.
  • From the left menu, navigate to Hive > Configs > Advanced.
  • Expand Advanced hive-site.
  • Locate hive.metastore.client.socket.timeout and confirm the value is 90s.
  • Some more samples on customizing other configuration files:

    # hdfs-site.xml configuration
    $HdfsConfigValues = @{ "dfs.blocksize"="64m" } #default is 128MB in HDI 3.0 and 256MB in HDI 2.1
    # core-site.xml configuration
    $CoreConfigValues = @{ "ipc.client.connect.max.retries"="60" } #default 50
    # mapred-site.xml configuration
    $MapRedConfigValues = @{ "mapreduce.task.timeout"="1200000" } #default 600000
    # oozie-site.xml configuration
    $OozieConfigValues = @{ "oozie.service.coord.normal.default.timeout"="150" }  # default 120
    

    Use .NET SDK

    See Azure HDInsight SDK for .NET.

    Use Resource Manager template

    You can use bootstrap in Resource Manager template:

    "configurations": {
        "hive-site": {
            "hive.metastore.client.connect.retry.delay": "5",
            "hive.execution.engine": "mr",
            "hive.security.authorization.manager": "org.apache.hadoop.hive.ql.security.authorization.DefaultHiveAuthorizationProvider"
    

    Sample Resource Manager template snippet to switch configuration in spark2-defaults to periodically clean-up event logs from storage.

    "configurations": {
        "spark2-defaults": {
            "spark.history.fs.cleaner.enabled": "true",
            "spark.history.fs.cleaner.interval": "7d",
            "spark.history.fs.cleaner.maxAge": "90d"
    

    See also

  • Create Apache Hadoop clusters in HDInsight provides instructions on how to create an HDInsight cluster by using other custom options.
  • Develop Script Action scripts for HDInsight
  • Install and use Apache Spark on HDInsight clusters
  • Install and use Apache Giraph on HDInsight clusters.
  • Appendix: PowerShell sample

    This PowerShell script creates an HDInsight cluster and customizes a Hive setting. Be sure to enter values for $nameToken, $httpPassword, and $sshPassword.

    ####################################
    # Service names and variables
    ####################################
    $nameToken = "<ENTER AN ALIAS>"
    $namePrefix = $nameToken.ToLower() + (Get-Date -Format "MMdd")
    $resourceGroupName = $namePrefix + "rg"
    $hdinsightClusterName = $namePrefix + "hdi"
    $defaultStorageAccountName = $namePrefix + "store"
    $defaultBlobContainerName = $hdinsightClusterName
    $location = "China East"
    ####################################
    # Connect to Azure
    ####################################
    Write-Host "Connecting to your Azure subscription ..." -ForegroundColor Green
    $sub = Get-AzSubscription -ErrorAction SilentlyContinue
    if(-not($sub))
        Connect-AzAccount -Environment AzureChinaCloud
    # If you have multiple subscriptions, set the one to use
    #$context = Get-AzSubscription -SubscriptionId "<subscriptionID>"
    #Set-AzContext $context
    ####################################
    # Create a resource group
    ####################################
    Write-Host "Creating a resource group ..." -ForegroundColor Green
    New-AzResourceGroup `
        -Name  $resourceGroupName `
        -Location $location
    ####################################
    # Create a storage account and container
    ####################################
    Write-Host "Creating the default storage account and default blob container ..."  -ForegroundColor Green
    New-AzStorageAccount `
        -ResourceGroupName $resourceGroupName `
        -Name $defaultStorageAccountName `
        -Location $location `
        -SkuName Standard_LRS `
        -Kind StorageV2 `
        -EnableHttpsTrafficOnly 1
    $defaultStorageAccountKey = (Get-AzStorageAccountKey `
                                    -ResourceGroupName $resourceGroupName `
                                    -Name $defaultStorageAccountName)[0].Value
    $defaultStorageContext = New-AzStorageContext `
                                    -StorageAccountName $defaultStorageAccountName `
                                    -StorageAccountKey $defaultStorageAccountKey
    New-AzStorageContainer `
        -Name $defaultBlobContainerName `
        -Context $defaultStorageContext #use the cluster name as the container name
    ####################################
    # Create a configuration object
    ####################################
    $hiveConfigValues = @{"hive.metastore.client.socket.timeout"="90s"}
    $storageAccountResourceId = (Get-AzStorageAccount -ResourceGroupName $resourceGroupName ` -Name $defaultStorageAccountName).Id
    $config = New-AzHDInsightClusterConfig `
              -ClusterType "Spark"  `
        | Set-AzHDInsightDefaultStorage `
            -StorageAccountResourceId "$storageAccountResourceId" `
            -StorageAccountKey $defaultStorageAccountKey `
        | Add-AzHDInsightConfigValue `
            -HiveSite $hiveConfigValues `
    		-Spark2Defaults @{}
    ####################################
    # Set Ambari admin username/password
    ####################################
    $httpUserName = "admin"  #HDInsight cluster username
    $httpPassword = '<ENTER A PASSWORD>'
    $httpPW = ConvertTo-SecureString -String $httpPassword -AsPlainText -Force
    $httpCredential = New-Object System.Management.Automation.PSCredential($httpUserName,$httpPW)
    ####################################
    # Set ssh username/password
    ####################################
    $sshUserName = "sshuser" #HDInsight ssh user name
    $sshPassword = '<ENTER A PASSWORD>'
    $sshPW = ConvertTo-SecureString -String $sshPassword -AsPlainText -Force
    $sshCredential = New-Object System.Management.Automation.PSCredential($sshUserName,$sshPW)
    ####################################
    # Create an HDInsight cluster
    ####################################
    New-AzHDInsightCluster `
        -ResourceGroupName $resourceGroupName `
        -ClusterName $hdinsightClusterName `
        -Location $location `
        -ClusterSizeInNodes 2 `
        -Version "4.0" `
        -HttpCredential $httpCredential `
        -SshCredential $sshCredential `
        -Config $config
    ####################################
    # Verify the cluster
    ####################################
    Get-AzHDInsightCluster `
        -ClusterName $hdinsightClusterName