Small Volume Hadoop Deployment Guide

If you like to provision our Small Volume Hadoop in your AppAgile PaaS environment you have to do some steps:

  • make sure, your environment is based on openShift 3.3 or newer.
  • you have to create a change-request to your SDM (Service Delivery Manager) to push initial environment
    • This is necessary because of the complexity of that microservice-hadoop environment. There are some initial tasks to be performed to provide a working project with hadoop inside. These tasks will provide a new POD + couple of containers, containing pre-configured environments to provide a ready-to-go scaling hadoop environment.We are working hard to automize these steps as well to provide full self-service-capabilities.
  • Capacity recommendation:

    SmallVolume Initial Capacity

    Capacity for “complete” environments to start with. PCU Price depending on used environment/cloud.

    Scale out as needed.

    20 PCU

    500 GB Storage

    Minimum 2 * Slaves Server every Server minimal 4 PCU
    Master Node

    6 PCU

    250 GB persistent storage

    add more depending on use-case


    2 PCU – 512 GB Ram



    2 PCU – 512 MB Ram

    100 GB persistent Storage

    Further services like Spark, Hive, Hue per demand 2-8 PCU
  • If escpecially Spark is testet, which is a in Memory Batch processing engine, more Memory inside the slaves is helpful.


  • once you get feedback from SDM you can use the following deployment guide to use the environment:



As an initial provisioning your own project “Hadoop Project” has been created. This is just the display name. The name of the project is hadop-xxxxx, xxxxx = 5 random characters out of a-z, 0-9

You have to perform all subsequent deployments within this project.

Please don’t delete it in order to clean for a redeployment (see later chapters). Within this project security tokens, service accounts and other resources have been created in order to simpiify the deployment for you. E.g. the project containts the secret token information to pull images from AppAgile’s central docker registry.

If you delete this project, Operating to create a new one for you.

So when you login, you’ll see your project. Click on it and press “Add to Project” enter “appagile” in the filter textbox


You first have to deploy Hadoop via the deployer. Press “Select” Button for the “appagile-hadoop-deployer”


You’will be provided with a menu showing lots of parameters for the template of the deployer pod. Most of them you could just leave to the defaults.

For a start the most important are:

  • Deploy mode, pod should perfom: Leave to “1”. This will deploy the Hadoop Containers
  • Number of Workers: This is the amount of worker containers, running Hadoop Datanodes and Spark Workers
  • Persistent Storage: For master, worker and hive DB Container you could specify the amount of persistent storage to be used
  • Storage class: If your Openshift Environment supports automatic provisiong of storage you could enter a storage class, otherwise leave empty.
  • CPU and Memory Limits should only be changed, if the defaults are not sufficient or told otherwise

Scroll to the bottom and press “Create”. The deployer will start as an Openshift Pod and will create resources for you.

Comment: OTC-Storage class: appao3-prod-sas



For Hue Deployment a template has been provided in your Hadoop Project. So in the Openshift Web Console, press “Add to Project”, enter “appagile” into the filter textbox and select the

“appagile-hue” template.

You’ll be prompted with lots of parameters. Again, the most important are:

  • HOSTNAME_HUE_HTTP: This is the hostname of the hue http(s) Openshift Route. If left empty, a default will be created
  • Storage class for Database claim: If your Openshift Environment supports automatic provisiong of storage you could enter a storage class, otherwise leave empty.
  • HUE_APP_BLACKLIST: This contains the Apps not shown in Hue

Scroll to the end and press “Create”. The template will then start deploying resources.

Comment: OTC-Storage class: appao3-prod-sas

To remove resources before a redeployment

If you would like to remove the resources, please never delete the hadoop project. This project has been created for you and contains additional information, like secrets to access AppAgile’s central docker registry.

When you delete the project you cannot go further and you have to contact the operating team to provide a new project to you!

Hadoop Containers via deployer pod

The hadoop deployer pod is also able to remove all resources related to these containers:

  • Master
  • Dataworkers
  • Hive DB
  • Persistent Volume Claims
  • ConfigMap with Hive DB credentials

To proceed, just run the hadoop deployer on the AppAgile Openshift Console as you did before to install the containers. But use a different Deploy Mode:


  • 3 to delete all resources

There is also a Deploy Mode “4” which leaves the Persistent Volume Claims left. But this is only recommended if you’ve been explicitly asked for.

  1. So enter “3” in “Deploy mode, pod should perfom”. All other parameters are ignored.
  2. Scroll down and press “Create” Button. All resources label as “…” will be removed

Hue Containers via web console

Hue is installed via an Openshift Template. So you have to remove the resources manually

Deployment Configurations



  1. On the Openshift console, select “Applications/Deployments” on the left panel
  2. Double click into the “the filter by label” Box and select
    1. app
    2. in …
    3. hadoop-hue-quickstart
  3. You will see two deployment configurations
    1. appagile-hue
    2. mysql-hue
  4. For every of the two above:
    1. Press on the link name of the deployment, and
    2. on the next page on the right side press “Actions>Delete”
    3. on the pop up box confirm delete.
    4. Your deployment configuration is marked for deletion.
    5. You will be navigated back to the deployments overview
    6. You might have to select the filter again to find the next deployment



  1. The same way the deployments, delete two services
    1. appagile-hue
    2. mysql-hue
  2. Select “Applications>Services”
  3. Filter for “app in hadoop-hue-quickstart”
  4. And delete the two services by pressing the link and press “Actions>Delete”, on the service overview page and confirm delete

Image Stream


Delete image stream “hadoop-hue”.

  1. Select “Builds>Images” on the left menu panel
  2. Filter for “app in hadoop-hue-quickstart”
  3. Delete image stream by selecting the image stream “appagile-hue” and delete by pressing “Actions>Delete”, on the service overview page and confirm delete




There is just one route you also have to delete. So

  1. Select “Applications>Route”
  2. Filter for “app in hadoop-hue-quickstart”
  3. Select the Route “hue” and on the overview page select “Actions>Delete” and confirm delete on the popup.

Storage Claim


For the hue MySQL DB a persistent volume claim has been created. Delete it by

  1. Select “Applications>Storage”
  2. Filter for “app in hadoop-hue-quickstart”
  3. Select the claim link “mysql-hue-claim” and agin on the overview page delete by “Actions>Delete” and confirm on the popup

Replication Controller

The Openshift web console does not perform any cascading deletes in case of deployment configs and replication controllers. So in the above steps only the deployment configs have been deleted.

So for this tasks you might have to get some knowledge about deployment configs and replication controllers:

For every deployment configuration and changes, a new replication controller will be created.

It is assumed, that you’ve done no changes to the existing deployment configurations and so the replicat controllers are related to the first deployment.

  1. So first scale down the remaining pods. Select “Overview” an the left menu panel. You will see replication controllers for:
    1. mysql-hue-1
    2. appagile-hue-1
  2. Every replication controller has a Pod (described as a circle and the number of replicas). So for every replication controller
    1. press the down arrow right beside the pod circle (“scale down”) until the Pods replica count is 0
    2. Select the replication controller link and in the overview page select “Action>Delete” and confirm
    3. You’ll will get an Error Message, because the related deployment configuration has been deleted in a previous step
  3. If done for every replication controller, switch back to the Overview page.
    1. The named replication controllers should now be removed
    2. Press “Applications>Pods”.
    3. You should see no pods for hue and hue mysql

Remove Hue via oc client

As described above, the web console could not perform any cascading deletes in case of deployment configs and replication controllers

If you would like to remove hue via the oc client:

  1. oc login…
  2. oc project to find out the hadoop project
  3. Please ensure that you have the right project name because you are now deleting resources by label
  4. Delete all resources: oc delete all -l “app=hadoop-hue-quickstart” -n <your project>
    $ oc delete all -l "app=hadoop-hue-quickstart" -n <your project>
    imagestream "appagile-hue" deleted
    deploymentconfig "appagile-hue" deleted
    deploymentconfig "mysql-hue" deleted
    route "hue" deleted
    service "hadoop-hue" deleted
    service "mysql-hue" deleted


  5. For safety, this does not include volume claims: oc delete pvc -l “app=hadoop-hue-quickstart”
    $ oc delete pvc -l "app=hadoop-hue-quickstart" -n <your project>
    persistentvolumeclaim "mysql-hue-claim" deleted