Installing Hadoop on Mac requires a bit of computer knowledge or else users entangle in a never-ending puzzle of errors. Luckily, you have arrived at the right place. Here I’ll be briefly explaining you How to install Hadoop on Mac. Firstly, I’ll provide needed information about what Hadoop actually is and what it is capable of doing and then we’ll move onward to the simplified step-wise guide for installing Hadoop.
What is Hadoop?
Hadoop is basically an open-source java-based programming software framework for storing data and running applications on clusters of commodity hardware. It provides huge storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs. This makes Hadoop a must have for people meeting any of those requirements. Hadoop is a part of the Apache project sponsored by the Apache Software Foundation. It was invented by computer scientists Doug Cutting and Mike Cafarella in 2006 to support distribution for the Nutch search engine.
Quick Fun Fact: Name “Hadoop” was taken from the name of a yellow toy elephant owned by the son of one of its inventors, Mr. Cutting.
Why Use Hadoop ?
There are plenty of reasons as why Hadoop is a must have. Nonetheless, the major ones being:
- Hadoop’s distributed computing model processes big data very fast.
- Data and application processing are protected against hardware failure so you don’t have to worry about them.
- Hadoop has the ability to store and process massive amounts of any kind of data in a quick manner.
- It has great flexibility as you don’t have to preprocess data before storing it, unlike traditional databases.
- You can easily grow your system to handle more data simply by adding nodes.
- Hadoop is completely free to use and uses commodity hardware to store large amounts of data.
Components of Hadoop
- H uses Hadoop Common as a kernel to provide the framework’s essential libraries.
- Hadoop Distributed File System (HDFS) is capable of storing data across thousands of commodity servers to achieve high bandwidth between nodes.
- The Hadoop MapReduce provides the programming model used to tackle large distributed data processing, i.e. mapping data and then reducing it to a result.
- Hadoop Yet Another Resource Negotiator (YARN) provides resource management and scheduling for user applications.
These were the major components of Hadoop. Along with these, there are several other projects that can improvise and extend Hadoop’s basic capabilities like Apache Flume, Apache HBase, Cloudera Impala, Apache Oozie, Apache Phoenix, Apache Pig, Apache Sqoop, Apache Spark, Apache Storm and Apache ZooKeeper.
How to Install Hadoop on Mac
Now let’s move further to the procedure of installation of Hadoop on Mac OS X. Installing Hadoop on Mac is not as simple as typing a single line command in Terminal to perform an action. It requires a mix of knowledge, concentration and patience. However, you don’t need to worry about not knowing everything. Just follow the steps that I tell you and you will succeed without a delay.
Guide to Install Hadoop on Mac OS
Run this command before everything in order to check if Java is already installed on your system: $ java –version . If Java is installed, move forward with the guide but if it isn’t, download it from here. Follow these steps accurately in order to install Hadoop on your Mac operating system:
Part-1
Step-1: Firstly, you have to install HomeBrew. You can download and install it from here. Alternatively, you can also install Hadoop by simply pasting the following command in Terminal:
$ ruby -e “$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)”
Step-2: Secondly, you have to install Hadoop. You can do so by pasting the following script in the Terminal:
$ brew install hadoop
Part-2
Step-3: Now you have to configure Hadoop ( Hadoop is installed in the following directory
/usr/local/Cellar/hadoop). Do the following to configure Hadoop:
- Locate hadoop-env.sh at /usr/local/Cellar/hadoop/2.6.0/libexec/etc/hadoop/hadoop-env.sh ( 2.6.0 is the version and it could be different in your case). Now
and
- Then locate Core-site.xml at /usr/local/Cellar/hadoop/2.6.0/libexec/etc/hadoop/core-site.xml and edit it in the same manner:
<
property
>
<
name
>hadoop.tmp.dir</
name
>
<
value
>/usr/local/Cellar/hadoop/hdfs/tmp</
value
>
<
description
>A base for other temporary directories.</
description
>
</
property
>
<
property
>
<
name
>fs.default.name</
name
>
<
value
>hdfs://localhost:9000</
value
>
</
property
>
- Locate mapred-site.xml at /usr/local/Cellar/hadoop/2.6.0/libexec/etc/hadoop/mapred-site.xml and edit in the same way:
<
configuration
>
<
property
>
<
name
>mapred.job.tracker</
name
>
<
value
>localhost:9010</
value
>
</
property
>
</
configuration
>
- Then locate hdfs-site.xml at /usr/local/Cellar/hadoop/2.6.0/libexec/etc/hadoop/hdfs-site.xml edit it too:
<
configuration
>
<
property
>
<
name
>dfs.replication</
name
>
<
value
></
value
>
</
property
>
</
configuration
>
- Edit ~/.profile using the editor you like such a vim, etc, and add the following two commands ( ~/.profile may not exist by default):
$ source ~/.profile
- Now you have to execute the following in Terminal to update:
$ source ~/.profile
Step-4: Before using Hadoop, you must formate HDFS. You can do so by using this:
$ hdfs namenode -format
Step-5: Check for the existance of ~/.ssh/id_rsa and the~/.ssh/id_rsa.pub files to verify the existence of ssh localhost keys. If these exist move forward, if they don’t, execute this in Terminal:
$ ssh-keygen -t rsa
Step-6: Enable Remote Login by navigating the following path :“System Preferences” -> “Sharing”. Check “Remote Login” . You can do so by using this:
$ ssh-keygen -t rsa
Step-7: Now you have to Authorize SSH Keys to make the system aware of the keys that will be used so that it accepts login. You can do this by using this:
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Part-3
Step-8: Finally, you can try to login by following this:
$ ssh localhost
Last login: Fri Mar 6 20:30:53 2015
$ exit
and start Hadoop using this:
$ hstart
and then, stop Hadoop using:
$ hstop
In this way, you have successfully installed working Hadoop on your Mac. Now you can use it anytime you desire and anywhere you need.
Summary
This was the simplest guide to learn how to install Hadoop on Mac. You now know the basic info, installation, and working of Hadoop. You can also install Hadoop by going through this file.
In this way, you have also learned that you can even do the complex things by paying close attention. Hadoop has a humongous base of possibilities when it comes to data processing and storage, you are now capable of exploring them based on your expertise. Sometimes you may also encounter certain errors while using Hadoop, I will cover them too in the coming topics.Feel free to contact us for any of the issues. If you have faced any issue regarding this topic, feel free to comment down in the comments section below.