Analytics & BI

Data Analytics,Big Data,Data Storage and Business Intelligence.

Subscribe

spark scala parquet

Write and Read Parquet Files in Spark/Scala

102 views   2 comments last modified about 3 months ago

In this page, I’m going to demonstrate how to write and read parquet files in Spark/Scala by using Spark SQLContext class. Reference What is parquet format? Go the following project site to understand more about parquet. ...

View detail
teradata

Useful DBC (Data Base Computer) System Views in Teradata

6 views   0 comments last modified about 2 days ago

This page summarize some of the commonly used views in Teradata. Conventions In all the views in the following sections, X views are also available though they only return rows that contain information on objects that the requesting database user owns, created, granted privilige on,...

View detail
lite-log hadoop hdfs

Resolve Hadoop RemoteException - Name node is in safe mode

10 views   0 comments last modified about 8 days ago

In Safe Mode, the HDFS cluster is read-only. After completion of block replication maintenance activity, the name node leaves safe mode automatically. If you try to delete files in safe mode, the following exception may raise: org.apache.hadoop.ipc.RemoteException(org.apac...

View detail
hadoop hdfs parquet sqoop

Configure Sqoop in a Edge Node of Hadoop Cluster

52 views   0 comments last modified about 8 days ago

This page continues with the following documentation about configuring a Hadoop multi-nodes cluster via adding a new edge node to configure administration or client tools. ...

View detail
hadoop yarn

Configure YARN and MapReduce Resources in Hadoop Cluster

6 views   0 comments last modified about 8 days ago

When configuring YARN and MapReduce in Hadoop cluster, it is very important to configure the memory and virtual processors correctly. If the configurations are incorrect, the nodes may not be able to start properly and the applications may not be able to run successfully. For example...

View detail
hadoop yarn hdfs

Configure Hadoop 3.1.0 in a Multi Node Cluster

261 views   0 comments last modified about 8 days ago

Previously, I summarized the steps to install Hadoop in a single node Windows machine. Install Hadoop 3.0.0 in Windows (Single Node) In this page, I...

View detail
lite-log

Install Big Data Tools (Spark, Zeppelin, Hadoop) in Windows for Learning and Practice

134 views   2 comments last modified about 15 days ago

Are you a Windows/.NET developer and willing to learn big data concepts and tools in your Windows? If yes, you can follow the links below to install them in your PC. The installations are usually easier to do in Linux/UNIX but they are not difficult to implement in Windows either since the...

View detail
hadoop yarn hdfs

Default Ports Used by Hadoop Services (HDFS, MapReduce, YARN)

36 views   0 comments last modified about 22 days ago

This page summarizes the default ports used by Hadoop services. It is useful when configuring network interfaces in a cluster. Hadoop 3.1.0 HDFS The secondary namenode http/https server address and port. ...

View detail
sql server spark hdfs parquet sqoop

Load Data into HDFS from SQL Server via Sqoop

33 views   0 comments last modified about 28 days ago

This page shows how to import data from SQL Server into Hadoop via Apache Sqoop. Prerequisites Please follow the link below to install Sqoop in your machine if you don’t have one environment ready. ...

View detail
sqoop

Install Apache Sqoop in Windows

28 views   0 comments last modified about 28 days ago

This page summarizes the steps required to install Apache Sqoop (v1.4.7) in Windows 10 environment. What is Sqoop Sqoop is an ETL tool for Hadoop,which is designed to efficiently transfer data between structured (RDBMS), semi-structured (Cassandra, Hbase and etc.) and unstructured ...

View detail