Hadoop Cluster Automation with Ansible!!

4 min readApr 6, 2021

What is ansible ???

Ansible is an Automation tool. What do I mean by automation? It can be any one of Infrastructure Automation, Cloud Automation, Security Automation, Application Automation, many others!! . With Ansible you can configure Routers, Clusters, Webservers, Applications, Docker, and anything. We know how to configure many services or clusters if we can make all the configurations the most intelligent configurations (but we don’t have to know how and where to do it, we just need what to do ! ) in a single click! That’s what Ansible is.

What is hadoop ?

Apache Hadop is software that works in a master-slave topology. It is used to overcome big data problems by distributing storage to its slaves. The master is also known as the nominal and the slave as the datanode.

1.NameNode

Namenode is the centerpiece of an HDFS file system in Hadoop. It keeps a directory tree of all files in the file system and tracks where file data is stored across clusters. It does not store the data of these files manually.

2.DataNode

Datanodes store data in a Hadop cluster and name the daemon that manages the data. File data is copied to multiple data nodes for reliability and so that localization data can be executed nearby. In a cluster, the datanodes should be equal

💢Task Completion💢

I’m going to create two files. One is a playbook and the other is a variable file for both master node and slave node. Snapshots of files and executions are attached below.