word count

Hadoop MapReduce wordcount example in Java. Introduction to Hadoop job.

In this article we are going to review the classic Hadoop word count example, customizing it a little bit. As usual I suggest to use Eclipse with Maven in order to create a project that can be modified, compiled and easily executed on the cluster. First of all, download the maven boilerplate project from here: https://github.com/H4ml3t/maven-hadoop-java-wordcount-template

$ git clone git@github.com:H4ml3t/maven-hadoop-java-wordcount-template.git

If you want to compile it directly than you can

$ cd maven-hadoop-java-wordcount-template
$ mvn package

the result fat jar will be found in the target folder with name “maven-hadoop-java-wordcount-template-0.0.1-SNAPSHOT-jar-with-dependencies.jar“.

Alternatively, if you want to modify the code (like we are about to do now) open Eclipse and go for [File] -> [Import] -> [Existing maven project] -> Browse for the directory …Continue reading →


HelloWorld Spark? Smart (selective) wordcount Scala example!

In the previous post I showed how to build a Spark Scala jar and submit a job using spark-submit, now let’s customize a little bit our main Scala Spark object. You can find the project of the following example here on github.

Let’s imagine we’ve collected a series of messages about football (tweets or whatever) and we want to count all words, but not simply every word, all those are of interest. Say we have a “dictionary” of football players’ names, and we want to see which of them appears the most in those messages.

Example time!

Imagine we have a file (called names) with a list of names (one per line):



And in another file (called messages) we have a list of messages

I’m obviously with Harry Kane (Hurricane) today… Let’s go Tottenham!!!
Kid Kane with Arsenal jersey, lol
Another top save from Sirigu to keep Toulouse out. PSG seconds away from top spot.
Why Sirigu and Verratti are laughing so much?
Francesco Totti Scores With Flying Kung Fu Kick, Celebrates With Selfie
What would Roma do without Totti? See his great goal & celebration HERE
! #Totti #Goals
Wenger wasn’t the Arsenal coach when Totti started to play in Serie A

The result analyzing these lines has to be (Kane, 2), (Sirigu, 2), (Totti, 3). To achieve this, …continue reading →