How I want use the Zookeeper by Apache

First off, why am I looking into the Apache Zookeeper? What is that anyway?

I am running into the problem to manage my Hadoop & Spark installation. My goal is to set up a quickly deployable Hadoop Cluster with Spark running in yarn client mode, so that I can work on my data analysis using spark-notebook. Okay that sounds easy enough (well once you got those details!). However, I forgot to mention that under ‘quickly deployable’ I am talking about an Amazon AWS EC2 cluster, that should spawn quickly which means under 5min.

I can set up all stuff really quickly, since once you got how hadoop works, it really is a piece of cake. I wonder however, if there is a more elegant way. Usually I do stuff differently as the “smart folks” would suggest. The “smart folk” really are smart and they do things only necessary to do and thus achieve their goal in a great way and quickly. But I am not one of those. I need to *understand* and – oh boy – that may take a while! But the problem with me is, that by “understand” I usually don’t mean that I go so far as to be able to explain everything in every detail, but what I mean is that I start feeling comfortable.

So I first I start using hadoop, understand the concepts, install it like 7 times from scratch and go every mile that needs to be gone. Then I start thinking how I could get this done more elegantly. The “smart folks” would just read a magazine or two and understand that it is completely rediculous to do all that stuff 7 times and go directly to the right technology. But if you aren’t one of those, you might find this helpful!


to be continued..

Leave a Reply