Automatic Repair in Windows 10 Operating System

In Windows 10 operating system has some errors while opening the system. Sometimes will shutdown system automatically or our computer did not start correctly so at the time opening our operating system will showing below like this.

Automatic Repair:

Your PC did not start correctly

Press “Restart” to restart your Pc which can sometimes fix the problem. You can also press “Advanced options” to try other options to repair your PC.

After rising this issue in windows then how to solve and how to open windows machine simply.

Simple Solution:

Step 1: Open your Windows operating system then will be showing automatic repair. After that click on “Advanced options”

Step 2:  After clicking on Advanced options then will go with “Troubleshoot” option then will showing more options for the next step.

Step 3:  Following the Troubleshoot option then simply click on another “Advanced options”

Step 4: After clicking on “Advanced options” then click on “Startup Settings”. For booting process to open windows machine.

Step 5: Following startup settings then simply “Restart” the windows operating system.

Step 6: After booting your Windows 10 operating system. The screen will be showing some options then press a number to choose from the options below.

Use number keys or function keys F1 to F9

1)Enable debugging
2)Enable boot logging
3)Enable low-resolution video
4)Enable Safe Mode
5)Enable Safe Mode with Networking
6)Enable Safe Mode with Command Prompt
7)Disable driver signature enforcement
8)Disable early launch anti-malware protection
9)Disable automatic restart after a failure.

Step 7: Press on F4 then open enable safe mode for the open windows machine. It is called a safe mode then open your desktop and use it simply.

Step 8: Press on F6 then open enable safe mode with the command prompt for using some commands to troubleshoot the machine then open desktop.

Summary: Automatic Repair is abandoned by the Windows system shutdown at the time of updating or malware situation or installation of software it may have happened. Above steps simply solve the automatic repair in Windows operating system.

Apache Flume in Hadoop


Apache Flume is a data ingestion mechanism for collecting aggregating and transporting large amounts of streaming data such a log files from various sources to the centralized data store. It is a distributes system that gets logs from their source and aggregates them to where you want to process them. Flume is the highly reliable, distributed and configurable tool.

Advantages of Flume:

  1. Using Apache Flume we can store the data into any of the centralized stores in HDFS or HBase.
  2. Flume provides the features of contextual routing
  3. Flume acts as a mediator between data producers and the centralized stores and provides a basic flow of data between them.

Features of Flume :

  • Using flume we get the data from multiple servers immediately into Hadoop
  • Apache flume supports a large set of sources and destinations types.
  • Flume supports multi-hop flows, contextual routing etc
  • Flume can be scaled horizontally.

Core Concepts in Flume :


An Event is the Fundamental unit of data transported by flume from its point of origination to its final destination.

Here Headers are specified as an unordered collection of string key, value pairs. Headers are used for contextual routing.


Here the client is an entity that generates events and sends them to one or more Agents


An Agent is a container for hosting sources, channels, sinks and other components that enable the transportation of events from one place to another.


Flume Streaming :

In general, a large amount of data that is to be analyzed will be produced by various data sources like applications servers, social networking websites and cloud-related servers. This data will be in the form of log files an events.

Log File:

A log file is a file that lists actions that occur in an operating system.

  • The application performance and locate various software and hardware failures.
  • The user behavior and derive better business.

Basic Terminology in Hadoop

Bigdata Solutions:

1.NoSQL – database(Non relational database) – Only for structured and semi-structured

2. Hadoop – Implementation – structured,semi-structured and unstructured data

3.Hadoop eco-systems and its components for everything.


Hadoop is a parallel system for large data storage and processing. It is a solution for Bigdata.

For Storage purpose HDFS -Hadoop Distributed File System

For Processing purpose MapReduce using simply.

In Hadoop, some keywords are very important for learning scope.

Hadoop Basic Terminology:


2.Clustered Node

3.Hadoop Clustered Node

4.Hadoop cluster

5. Hadoop Cluster Size


A cluster is a group of all nodes belongs to one common network is called a cluster.

2.Clustered Node:

A Clustered Node is a grouping of all individual machines is called a clustered node in Hadoop

3.Hadoop Cluster Node:

A Hadoop Cluster Node is basic storage and processing purpose of a cluster is called as Hadoop Cluster Node.

For storage purpose, we are using the Hadoop Distributed File System.

For processing purpose, we are using MapReduce

4.Hadoop Cluster:

A Hadoop Cluster is a collection of “Hadoop Cluster Node” in a common network is called Hadoop Cluster

5.Hadoop Cluster Size:

A Hadoop cluster size is a total no.of node in a Hadoop cluster.

Hadoop Ecosystem:

1. Apache Pig              –  Processing           – Pig Scripting

2. Hive                             – Processing           – HiveQL (Query language like SQL)

3.SQOOP                       – Integration tool  – Import and Export data

4.Zookeeper               – Coordination      – Distribution coordinator

5.Apache Flume      – Streaming              – log data for streaming purpose

6.Oozie                        – Scheduling             – Open source scheduling jobs

7.HBase                     – Random Access   – Hadoop+dataBASE

8.NoSQL                  – NotOnlySql              – MongoDB, Cassandra

9.Apache Kafka    – Messaging               – Distributed messaging

10.YARN                  – Resource Manager – Yet Another Resource Negotiator

Note: Apache Spark is not a part of Hadoop but including nowadays. It is used for Data Processing purpose. Spark 100 times faster than Hadoop MapReduce.

Compatible Operating System for Hadoop Installation:

1. Linux

2.Mac OS

3.Sun Solaris


Hadoop Versions:

Hadoop 1.x

Hadoop 2.x

Hadoop 3.x

Different Distributions of Hadoop

1. Cloudera Distribution for Hadoop (CDH)



Resilient Distributed Datasets(RDD) in Spark


Resilient Distributed Datasets represents a collection of partitioned data elements that can be operated on in a parallel manner. RDD is the primary data abstraction mechanism in Spark and defined as an abstract class in Spark library it is similar to SCALA collection and it supports LAZY evaluation.

Characteristics of RDD:

1.Immutable :

RDD is an immutable data structure. Once created, it cannot be modified in-place. Basically, an operation that modifies RDD returns a new RDD.


In RDD Data is split into partitions. These partitions are generally distributed across a cluster of nodes. When Spark is running on a single machine all the partitions are on that machine.


RDD Operations :

Applications in Spark process data using the same methods in RDD class. It referred to as operations

RDD operations are two types:



 1.Transformations :

A transformation method of an RDD creates a new RDD by performing a computation on the source RDD.

RDD transformations are conceptually similar to SCALA collection methods.

The key difference is that the SCALA collection methods operate on data that can fit in the memory of a single machine, whereas RDD methods can operate on data distributed across a cluster of node RDD transformations are LAZY but SCALA collection methods are strict.

A) Map:

The map method is a higher order method that takes a function as input and applies it to each element in the source RDD to create a new RDD.

B) filter:

The filter method is a high order method that takes a Boolean function as input and applies it to each element in the source RDD to create a new RDD. A Boolean function takes an input and returns false or true. It returns a new RDD formed by selecting only those elements for which the input Boolean function returned true. The new RDD contains a subset of the elements in the original RDD.

c) flatMap:

This method is a higher order method that takes an input function in Spark, it returns a sequence for each input element passed to it. The flatMap method returns a new RDD formed by flattening this collection of the sequence.

D) mapPartitions :

It is a higher order method allows you to process data at a partition level. Instead of passing one element at a time to its input function, mapPartitions passes a partition in the form an iterator. The input function to the mapPartitions method takes an iterator as input and returns iterator as output.


Intersection method itakesRDD as input and returns a new RDD that contains the intersection of the element in the source RDD and the RDD passed to it as an input.


This method takes  RDD as input and returns a new RDD that contains a Union of the element in the resource RDD and the RDD passed to it as an input.


Subtract method takes RDD as input and returns a new RDD that contains elements in the source RDD but not in the input RDD.



The Prallelized collections are created by calling Spark Context’s parallelize method on an existing collection in your driver program. The elements of the collection are copied to form a distributed data set that can be operated on in parallel.



Distinct method of an RDD returns a new RDD containing the distinct elements in the source RDD


J)Group By:

Group By is a higher order method it groups the elements of  RDD according to user-specified criteria. It takes as input a function that generates a key for each element in the source RDD. It is applicable to all the elements in the source RDD and returns an RDD of pairs.


K)Sort By:

The sortBy method is a higher order it returns RDD with sorted elements from the source RDD. It takes two input parameters. The first input is a function that generates a key for each element in the source RDD. The second input allows specifying ascending or descending order for sort.



Coalesce method reduces the number of partitions in  RDD. It takes an integer input and returns new RDD with the specified number of partitions.



The GroupByKey method returns an RDD of pairs, where the first element in a pair is a key from the source RDD and the second element is a collection of all values that have the same key. It is the same as the groupBy method. The major difference is that groupBy is a higher order method that takes an input function that returns a key for each element in the source RDD. The groupByKey method operates in an RDD of key-value pairs.


The higher-order reduceBy key method takes an associative binary operator as input and reduces values with the same key to a single value using specified binary operators.


Actions are RDD methods that return a value to a driver program.


The collect method returns the elements in the source RDD as an array. This method should be used with caution since it moves data from all the worker to the driver program.


This method returns a count of the elements in the source RDD.

C)Count By Value :

The countByValue method returns a count of each unique element in the source RDD. It returns an instance of the Map class containing each unique element and its count as a key-value pair.


The first method returns the first element in the source RDD


The max method returns the largest element in  RDD


The min method returns the smallest element in RDD


The top method takes an integer N as input and returns an array containing the N largest elements in the source RDD.


The high order reduces method aggregates the elements of the source RDD using an associative and commutative binary operator provided to it.


The countByKey methods count the occurrences of each unique key in the source RDD. It returns a Map of key count pairs.

Neural networks overview


An Artificial Neural Network(ANN) is an information processing model that is quality, by the way, the biological nervous system, such as the brain, process information. The key element of this model is the establishing structure of the information processing system.

A Neural network is a group of a huge number of highly interconnected processing elements or neurons.

In Artificial Neural Network is Connectionist systems are computing systems are inspired by the biological neural networks that constitute brains.

For example in image recognition, they might learn to identify images that contain cows by analyzing example images that have been manually labeled “cow” or “no cow” and using the results to identify cows in other images.

Why use neural networks?

Neural networks, with their extraordinary ability to obtain meaning from complicated or lose data, can be used to extract patterns and detect trends that are too complex to be noticed by either computer or humans.

A trained neural network can be thought of as an “authority” in the category of information it has been given to analysis. This authority can be used to provide estimate given new situations of interest and answer “what if ” questions.


Adaptive learning: The capacity to learn how to do tasks based on the data given for instructions or initial experience.

Self Organization: An Artificial Neural Network can create its own organization or representation of the information it receives during learning time.

Real-Time Operation: An Artificial Neural Network can be carried out in parallel, and special hardware devices are planning and make which take a lead of this ability.

Summary: An Artificial Neural Network is an information processing, recognition to identify images. And processing elements like neurons for complicated problems. In ANN have more advantages including adaptive learning, self-organization, fault tolerance and real-time operations for the neural network.

Latest Hadoop Admin Interview Questions with Answers

LatestHadoop admin interview questions and answers:

1. What is Edge Node? Why choose two edge nodes in a cluster?

Basically, Edge Nodes are end-user connectivity purposes like an interface between cluster and client.

One Edge node is a single point if the edge node goes down another edge node will connect that’s why we use two edge nodes.

2. If you have four master nodes what are services are installed?

In master node 1: installed, Name node, Secondary node Hive server, Resource manager one zookeeper

In master node 2: HBase master, Oozie server

In master node 3: Hue, spark, three zookeeper

In master node 4: High availability

3. Tell me about default block size of Hadoop and  Unix?

The default block size of HDFS is 128MB

The default block size of Unix is 4kb

4. What are security measures that are implemented in the Hadoop cluster?

LDAP is the first level authentication

Kerberos for the second level authentication

Sentry for role-based authorization to data and metadata stored on Hadoop cluster

Knox, who access the cluster to provide security like a  gateways

Ranger is to provide security across Hadoop eco-system folder access and data authorization

5. What about data transmitted over the network data in transit how do you secure the data?

By using encrypted data transmitted over the networks and also using SSL certifications and HTTPS and some other protocols also.

6. What are the types of accounts used in the Hadoop cluster?

Service account: This account belongs to create in the active directory,  within the Hadoop cluster access the jobs and applications.

Technical account: This account related to access from outside clients for application related for example Java client to Hive access.

Business user account: This account belongs to some business users want to access the Hadoop cluster.

Admin account: highly privileged account for giving credentials for users from active directory

Local account: This account belongs to Unix based for active directory principals.


What is MapR?

MapR is one of the Big Data Distribution. It is a complete enterprise distribution for Apache Hadoop which is designed to improve the Hadoop’s reliability, performance, and ease of use.

Why MapR?

1. High Availability:

MapR provides High Availability features such as Self – Healing it means that no Namenode architecture.

It has job tracker High Availability and NFS. MapR achieves only distributing its file system metadata.

2. Disaster Recovery:

MapR provides mirroring facility which allows users to enable policies and mirror data. It automatically within the multinode cluster or single node cluster between on-premise and cloud infrastructure

3.Record Performance:

MapR is a world record performance cost only $9 to the earlier cost of $5M at a  speed of 54 sec. And it handles the large size of clusters like 2,200 nodes.

4.Consistent Snapshots:

MapR is the only big data distribution which provides a consistent, point in time recovery because of its unique read and writes storage architecture.

5. Complete Data Protection:

MapR has own security system for data protection in cluster level.


MapR provides automatic behind the scenes compression to data. It applies compression automatically to files in the cluster.

7.Unbiased Open Source:

MapR completely unbiased opensource distribution

8. Real Multitenancy Including YARN also

9.Enterprise-grade NoSQL

10. Read and Write file system:

MapR has Read and Write file system.

MapR Ecosystem Packs (MEP):

The “MapR Ecosystem” is the set of open source that is included in the MapR Platform, and the “pack” means a bundled set of MapR Ecosystem projects with specific versions.

Mostly MapR Ecosystem Packs are released in every quarter and yearly also

A single version of MapR may support multiple MEPs, but only one at a time.

In a familiar case, Hadoop Ecosystem components and Opensource components are like Spark, Hive etc components are included in MapR Ecosystem Packs are like below tools:


MapR Vs Cloudera Vs Hortonworks

In Bigdata distributions are mostly three familiar in the present market.





Cloudera, HDP (Hadoop Data Platform) are open source and enterprise editions are also available but MapR is a complete enterprise distribution for Apache Hadoop which is designed to improve the Hadoop’s reliability, performance, and ease of use.

                                                      Hortonworks              Cloudera                         MapR


Management Tools                 Ambari                  Cloudera Manager       MapR CS

Volume Support                             No                              No                                  Yes

Heat map, Alarms                         Yes                              Yes                                  Yes

Alerts                                                  Yes                               Yes                                 Yes

REST API                                           Yes                               Yes                                 Yes

High Availability:

Hortonworks  - Single failure recovery
Cloudera     - Single failure recovery
MapR         - Self healing across multiple failures


Hortonworks - Data
Cloudera    - Data
MapR        - Data + Metadata

Disaster Recovery:

Hortonworks - No
Cloudera    - File Copy Scheduling
MapR        -  Monitoring


Hortonworks - Planned downtime
Cloudera    - Rolling Upgrades
MapR        - Rolling Upgrades

Summary:  Nowadays Big data and Analytics are the most emerging technology. Especially Big data distributions are Cloudera, HDP, and MapR. These are some special features and open source and enterprise editions. MapR is used in the Banking and Finance sectors are used mostly. Cloudera is used anywhere with enterprise and open source. Hortonworks is also same like as Cloudera.

85 Dangerous apps remove by Google from Play store

Recently Google removes approximately 85 dangerous apps from the google play app store in android devices.

Why these apps removed from Google app store?

Firstly these apps are founded security issues are “The capable of displaying full-screen ads, monitoring device’s screen unlocking functionality and running in the background of Android mobile devices.”

Here are 85 Malware apps removed from the Google Play store. Mostly like Sports TV, Remote, Games apps are removed.


Prado Parking Simulator 3D

Parking Game

City Extremepolis 100

3d Monster Truck

Idle Drift

Bus Driver

America Muscle Car

Prado Parking City

Pirate Story

Prado Car 10

Extreme Trucks

GA Player

3D Racing

Real Drone Simulator

Police Chase

Checkout: Google to Retire Classic Hangouts for G Suite Customers From October 2019

Remote Controls:

TV Remote

A/C Remote

Remote Control

Garage Door Remote



World TV





Brasil TV


Movie Stickers

Christmas Stickers

Trump Stickers



Nigeria TV





Spanish TV


Love Stickers

Televisao do Brasil

TV of the World

Summary: Remaining apps are related to the above apps and another type of fake apps also that exhibited a different type of ad showing behavior. Some malware apps would disappear after showing that it is buffering and finally crashed android devices.




Python Loops

What is a Loop?

A loop is a sequence or order of instructions that are frequently repeated until a certain condition reached.

Types of loops in Python:

1.for loop

2.while loop

3.nested loop

for loop:

for( i =0;i<n;i++) —–> Which is not implemented in Python

>>> for i range(3)
>>> for i in range(1,4)


1. To do the operation on each and every element of a list

for i in a;

2.Return to list

for i in a;
b=[(i**2)for i in a]

for with if:

student_marks = [10,36,53,28,90]
for data in student_marks:
                       print(data, "is even number")
                       print(data,"is odd number")

for loop with else clause:

numbers = [10,20,30,40,50,60]
for i in numbers:
           print("Loop completed ")

Looping control statement: A statement that converts the execution of loop from its designated cycle is called a loop control statement. The best example is the Break.


To break out the loop we can use a break function


for variablie_name in sequence:






>>>list = [10,20,30,40,50]
>>>for i in list:

Continue statement:

Continue statement is used, Python to jump to the next iteration of a loop.

list = [10,20,30,40,50]
for i in list:

While Loop:

While loop is used to execute no.of statements till the condition passed in while loop once a condition is false, the control will come out the loop.






infinite loop

while else loop:

a =int(input("Enter integer less 100\n"))


Summary: In Python programming language loops are very useful while using programs. In Loops will improve our logical thinking also. Python loops are very simple to learn and improve. Here only tell to for loop and while because these two are a major role in loops environment in Python Programming language