Configure Linux: Hadoop & Spark

Daemons everywhere

In order to have fun and close any kind of terminal while still running hadoop, spark, etc. under linux, you should use some sort of background job. I must admit being not entirely a linux freak, and thus my solutions aren’t the standard. While I admire those you are true linux admins I shudder when being confronted by this kind of arcane art. Primarily because I think that we have evolved (sorry for all who read this and hate me now) and have way better ways of making things happen. For one, I personally hate that Windows never really cared to provide a neat and easy way to install a service in a cool way. That was true until I came across NSSM ( the NonSuckingServiceManager.

Linux is by far more developed, but has many “issues” with methodologies that aren’t considered cool anymore. I mean, C++ has still its greatness but honestly, if you have ever worked with modern languages you are forced to shake your head in disbelief of the atrocities that you are facing.

I am getting carried away by ranting, sorry. So I googled a bit and found a nice and easy way to “daemonize” my spark-notebook. It goes like this:

nohup /opt/spark-notebook-0.6.0-scala-2.10.4-spark-1.4.1-hadoop-2.7.1-with-parquet/bin/spark-notebook -Dconfig.File=/opt/spark-notebook-0.6.0-scala-2.10.4-spark-1.4.1-hadoop-2.7.1-with-parquet/conf/application.conf 0<&- &> /var/opt/spark-notebook-0.6.0-scala-2.10.4-spark-1.4.1-hadoop-2.7.1-with-parquet/stdout.log &

Aaaah, isn’t that a beaty? I mean, I could dramatically reduce this line to


nohup $SPARK_NOTEBOOK_HOME/bin/spark-notebook -Dconfig.File=$SPARK_NOTEBOOK_HOME/conf/application.conf 0<&- &> /var/$SPARK_NOTEBOOK_HOME/stdout.log &

Did I say anything bad about linux? Forget it! This is great! You might wonder what this means and all… well just visit stackoverflow.

By the way, to stop this monster, it provides a `RUNNING_PID` file, and thus you can simply write this script_

kill $PID


Leave a Reply