Apache Hadoop installation is supported in 3 modes. Standalone, Pseudo-Distributed, or Fully-Distributed modes. This article is a guide to understand the steps involved to install Hadoop on Mac OS X in Standalone Mode.
The standalone mode of Hadoop is the default installation mode. In this mode, Hadoop is configured to run in a non-distributed approach as a single java process.
Though it doesn't simulate a distributed environment this mode is very useful for debugging purposes.
Installing java is one of the prerequisites for Hadoop installation. Version 2.7 and later of Hadoop require a minimum of Java 7 i.e., Java 1.7 to be installed.
Recommended Java versions are described at Hadoop Java Versions.
One can check the version of java installed by running the below command in the terminal.
I have Java 1.8 installed on my machine so the command's output was something like below.
Download And Install Hadoop
Once Java has been installed, go ahead and download any latest stable binary distribution of Hadoop from the Apache Hadoop site.
At the time of writing the blog, the latest version of Hadoop is Hadoop-3.2.1
Move the downloaded hadoop-<version>tar.gz file to the desired location (I want my install to be in the Install/Hadoop dir under the home dir)
Unpack the downloaded file, one can do this by double-clicking the tar.gz file or by running the below tar xvfz command.
At this stage we have successfully installed Hadoop on standalone mode. The location must contain various directories as below.
To verify installation one can run a sample map-reduce job provided in the install directory over an input directory containing some files.
Once run the job would create an output containing word count of occurrences matching the regex dfs[a-z.]+
The Hadoop jar command runs a program contained in a JAR file. Users can bundle their MapReduce code in a JAR file and execute it using this command.