Thursday, August 18, 2016

Quick 1..2 on how to create an EMR cluster with a custom boostrap action (BA)

This is more a reference for me than for those reading this. I know I'm going to have to do it again in the future and I really don't want to run into the problems I had in getting this to work.

So, the AWS cli command to create the cluster:

myBA='--bootstrap-action Path="s3://< my-s3-bucket >/shell/install_profile.sh"'; aws emr create-cluster --release-label emr-5.0.0 --name testBA-10 --applications {Name="Hive",Name="Spark",Name="Zeppelin",Name="Ganglia"} --ec2-attributes my-Keypair --region us-east-1 --use-default-roles --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge InstanceGroupType=CORE,InstanceCount=1,InstanceType=m3.xlarge ${myBA}

The environmental variable myBA is the path to the S3 bucket that stores my script for the BA. It's worth noting that the script should exit with a 0 exit value if you don't want your BA to fail. Here's my script: #!/usr/bin/env bash set -x set -e HADOOP_HOME=/home/hadoop # My profile aws s3 cp s3://< my S3 bucket >/AWSStuff/std/myfuncs ${HADOOP_HOME} echo "About to modify my .bashrc login" echo -e "\n. ~/myfuncs\ngetNewProfile\n. .hbw" >> ${HADOOP_HOME}/.bashrc # Install byobu sudo /usr/bin/yum -y --enablerepo=epel install byobu # And we're ready to rock exit 0 Bingo. We're ready to roll.