We are proud to anounce that ESG GmbH has prolonged our commitment at their site in Garching!
Unlike our last Project this one is almost purely research. The question is how can we benefit with big-data technologies in order to make analytics on car measurement traces and predictions on failure or errors.
We delve into the world of Apache’s Hadoop: hdfs and yarn. We plan on using Spark to run analytics on the data.
But before we can do anything, we have been asked to plan and install a Cluster consisting of 1 Namenode and 3 Datanodes each being represented by a 19″ Rack H2 Server, connected with at least 1Gbit Networking, and about 32GB of RAM and 8-16Cores XEON CPUs.
Before the hardware arrives, as a first test so to speak, we set up 6 desktop office PCs running VirtualBox by Oracle, therefore using CentOS and Cloudera’s CDH to set up one namenode and five datanodes.
We have experienced difficulties choosing a feasible data format to save trace data. We are looking into ASCII files representing Binary. This certainly is ‘funny’ but it may just be the right thing. On the other hand we must find the answer, how we can make hdfs know custom line breaks in order to split up files in blocks for binary files. How Yarn will work with binary files is still an unknown issue. We may have updates next week.