eclipse

Working with JSON in Scala using the json4s library (Part one).

In this very brilliant article, you can find a comparison between Scala libraries in terms of parsing speed. One of the best result was given by the json4s library. In the first part I will describe the library and it’s main functions, while in the second part I’ll go in deep showing some more detailed examples. As usual let’s create a Maven Scala project with Eclipse, adding the following dependency to the Maven pom.xml file:

org.json4s
json4s-native_${scala.version}
3.2.10

Substitute ${$scala.version} with your version of Scala (2.10 for example). If you don’t know how to create a Maven project with Scala in Eclipse follow this article (just the first part in which it is showed how to setup/install Eclipse with the Scala plugin). At the time of writing, I’ve found some problem with the 3.2.11 version (which is the last one), but the previous one was working smoothly. Now let’s create a Scala object with a main function to run:

package com.nosqlnocry.test
import org.json4s._
import org.json4s.JsonDSL._
import org.json4s.jackson.JsonMethods._
object Json4sTest {  
  def main(arg: Array[String]) {
    ...
  }
}

Before starting, we have to take a look at how the library json4s is modelling JSONs. Looking in the box below, we can see that it is using a syntax tree AST (Abstract Syntax Tree). …Continue reading →

Advertisements

Setup Eclipse to start developing in Spark Scala and build a fat jar

I suggest two ways to get started to develop Spark in Scala, both with Eclipse: one is to download (from the site scala-ide.org) the full pre-configured Eclipse which already includes the Scala IDE; another one consists in updating your existing Eclipse adding the Scala plugin (detailed instructions below). This basically will allow you to start Scala projects and run them locally. In each case, at the end of the procedure, in order to start developing in Spark, you have to import inside Eclipse as “existing maven project” a project template (that you can find linked at the bottom of this article).

Now I’ll illustrate how to integrate the Scala plugin in you existing maven installation. In this example I used an Eclipse Kepler EE. From the site http://scala-ide.org/download/current.html copy the latest link version for Kepler, or if not present, follow the link “Older versions” in the page, and choose the right Scala version for you. I copied the link from an older stable version for Scala 2.10.4 (which is the version available in the cluster I’m using at the moment), precisely this: http://download.scala-ide.org/sdk/lithium/e38/scala211/stable/site.

scala-ide-site-older-versions

Make sure you have Java JDK 1.7 installed and that Eclipse is pointing at it. Click on [Windows] -> [Preferences] -> (in the left menu) [Java] -> (click on) [Installed JRE] and check if a JDK 1.7 installation is selected. In case, use …Continue reading →

How to build a Spark fat jar in Scala and submit a job

Are you looking for a ready-to-use solution to submit a job in Spark? These are short instructions about how to start creating a Spark Scala project, in order to build a fat jar that can be executed in a Spark environment. I assume you already have installed Maven (and Java JDK) and Spark (locally or in a real cluster); you can either compile the project from your shell (like I’ll show here) or “import an existing Maven project” with Eclipse and build it from there (read this other article to see how).

Requirements: Maven installation, Spark installation.

Simply download the following Maven project from github: https://github.com/H4ml3t/spark-scala-maven-boilerplate-project

If you have git installed, you can clone the repository:

git clone git@github.com:H4ml3t/spark-scala-maven-boilerplate-project.git
cd spark-scala-maven-boilerplate-project

or without git you have to download the zip from here: https://github.com/H4ml3t/spark-scala-maven-boilerplate-project/archive/master.zip (to open use: unzip master.zip)

Here it is the pom.xml maven file: …continue reading →