Configuring SSH Tunnel to EMR Master Node for Zeppelin in Chrome

I recently needed to set up an SSH tunnel through a proxy to run Apache Zeppelin in my local browser so that I could access my Apache Spark cluster on Amazon EMR. Amazon’s documentation on getting this running on Chrome is slightly out of date so I thought I would share the setup instructions here for posterity and my future self.

Set up SSH Tunnel

First, add a host profile to your ssh config, which will by default be located at ~/.ssh/config. Mine looks like this, and it defines the path to my proxy server and my final destination server at AWS:

Host <tunnel name>
    ProxyCommand ssh <proxy server address> exec nc %h %p
    Hostname <AWS address>
    IdentifyFile <path to pem file>
    User <user name>

The text enclosed in <> brackets would of course need to be replaced with your own addresses/names. This configuration will allow you to simply type ssh -ND 8157 <tunnel-name> to open your tunnel connection to AWS. The IdentifyFile configuration should point to the path of the AWS PEM key provided to you by AWS when you created your cluster. You can shorten this command further by adding the following to your shell rc:

alias <alias name>="ssh -ND 8157 <tunnel-name>"

Configure proxy extension

Now, that’s the easy part. Well, it’s all pretty easy, but what threw me was the configuration necessary for the proxy connection via my browser. The Amazon Documentation describes using a Chrome extension called FoxyProxy, but it appears this extension is no longer available (at least not for free). There is an alternative extension called SwitchyOmega, but the configuration required is not described in the Amazon docs. Configuring this proxy will allow your browser to automatically filter URLs based on text patterns and to limit the proxy settings to domains that match the form of your master node’s public DNS name.

So, here’s how to configure it. Install the extension and open its options menu. Delete the default profiles and click to create a new profile. Choose to create a PAC profile. Copy and paste the following code snippet into the PAC Script field. These functions will send only AWS urls matching the regex through the proxy.

function regExpMatch(url, pattern) {
  try { return new RegExp(pattern).test(url); } catch(ex) { return false; }
}

function FindProxyForURL(url, host) {
    if (shExpMatch(url, "*ec2*.amazonaws.com*")) return 'SOCKS5 localhost:8157';
    if (shExpMatch(url, "*.compute.internal*") || shExpMatch(url, "*://compute.internal*")) return 'SOCKS5 localhost:8157';
    if (shExpMatch(url, "*ec2.internal*")) return 'SOCKS5 localhost:8157';
    return 'DIRECT';
}

Save the profile. Finally, you can just fire up the tunnel using your alias, ensure that the new proxy profile is active in the SwitchyOmega extension, and navigate to your AWS master public DNS name, specifying the port 8890 (default for Zeppelin servers). Master DNS names should have the following format: c2-###-##-##-###.compute-1.amazonaws.com, and you would specify the port by appending :8890.


© 2018. All rights reserved.