The field of Computer Science is experiencing a transition from computation-intensive to data-intensive problems, wherein data is produced in massive amounts by large sensor networks, new data acquisition techniques, simulations, and social networks. Efficiently extracting, interpreting, and learning from very large datasets requires a new generation of scalable algorithms as well as new data management technologies. Business and Big Data Analytics is about harnessing the power of data for new insights. The book covers the breadth of activities and methods and tools that Data Scientists/Analysts use. The content focuses on concepts, principles and practical applications that are applicable to any industry and technology environment, and the learning is supported and explained with examples that you can replicate using open-source software. In this book(s), we explore key data analysis and management techniques, which applied to massive datasets are the cornerstone that enables real-time decision-making in distributed environments, business intelligence in the web, and scientific discovery at large scale. The book provides the readers with a combination of technical and managerial course work needed for dealing with future challenges in the technology and data-driven global environment. Course work includes use of various software tools and applications in Finance, Marketing, Logistics and Supply Chain, Human Resources, Operations, Manufacturing, Information Systems, and Economics using these tools. “Big Data is the new frontier for innovation, competition and productivity” – McKinsey & Co. Big Data is a reality today. Companies that want to be innovative and maximize their revenue potential must have the right solution in place so that they can harness the volume, velocity and variety of data coming into their organization and leverage actionable insight from that data. Today’s phenomenal growth of data requires that companies not only understand big data to decipher the information that counts, but also – more importantly – the possibilities of what they can do with it using big data analytics. An MBA/MS/M.Sc. in Business Analytics and Big Data is intended for hands-on Big Data Specialists, for people in leadership roles working with Big Data and for Entrepreneurs. The curriculum of this program is highly multi-disciplinary, with courses from A (Analytics), B (Business) and C (Computer Science) and projects to practice and implement the integration of these three aspects. Students graduate with an understanding of how to manage industry, product, consumer behaviour and social media data and deliver it in formats that can be used to make decisions. The book is written keeping in view the syllabi of various Universities/Institutions offering “Business and Big Data Analytics” as a subject of Study/Specialization/Research in their Programs viz., MBA, MMS, M.Mgt, MCA, ME/BE (CSE, CE, IT), M.Tech./B.Tech. (CSE, CE, IT), M.Sc. (Business Analytics), MS (Management Science), BBA, PGDM, PGPBA, CPBAE, PGDBA, Certificate Programs, PG Certificate in Big Data Engineering and others.
This is the Volume I of the Business and Big Data Analytics book comprising of FIVE units: The Unit I describe Business and Big Data Analytics with Data Science, Data Warehousing, Data Mining, Data Lake, Data Vault, Web Analytics: Concepts, Technologies and Applications The Unit II discusses What is Business and Big Data analytics? The Unit III answers the question Why is Business and Big Data Analytics important? The Unit IV is on Apache Hadoop – Big Data Technologies and Solutions The Unit V discusses Map Reduce – A processing technique and a program model for data-intensive computing
This is Volume II of the Business and Big Data Analytics book comprising FIVE units: The Unit I is on HDFS Hadoop’s Lower Levels Unit II is on HBase Distributed Column-Oriented Database for Improving Data Access Unit III is on Cassandra – Highly Scalable, High Performance, Distributed NoSQL Database Unit IV is on CouchDB – Document Storage NoSQL Database Unit V is on MongoDB – An open-source NoSQL Database.
This is Volume III of the Business and Big Data Analytics book comprising FOUR units: The Unit I is on Sqoop & Flume – Improving Data Access Unit II is on Pig and Hive – Improving Programmability Unit III is on Zookeeper & Oozie – Coordination, Management and Workflow in Distributed Applications Unit IV is on Mahout – Creating Scalable Machine Learning Algorithms.
This is Volume IV of the Business and Big Data Analytics book comprising FIVE units: The Unit I is on Big Data Processing with Apache Spark Unit II is on AVRO Language-Neutral Data Serialization System Unit III is on Kafka – The Messaging System Unit IV is on Apache Storm – A Distributed Real-Time Big Data Processing Unit V is on Impala – A Massive Parallel Processing SQL Query Engine.
This is Volume V of the Business and Big Data Analytics book comprising FOUR units: The Unit I is on Apache HCatalog – Table Storage Management Tool for Hadoop Unit II is on Apache Tajo – An Open-Source Distributed Data Warehouse Framework for Hadoop Unit III is on Apache Solr – Building Search Applications Unit IV is on Talend – ETL Tool for Data Integration.
Contents –
VOLUME I
1. Business and Big Data Analytics with Data Science, Data Warehousing, Data Mining, Data Lake, Data Vault, Web Analytics: Concepts, Technologies and Applications
2. What is Business and Big Data Analytics?
3. Why is Business and Big Data Analytics Important?
4. Apache Hadoop – Big Data Technologies and Solutions
5. MapReduce – A Processing Technique and A Program Model for Data-Intensive Computing
VOLUME II
1. HDFS Hadoop’s Lower Levels
2. HBase Distributed Column – Oriented Database for Improving Data Access
3. Cassandra – Highly Scalable, High-Performance, Distributed NoSQL Database
4. CouchDB – Document Storage NoSQL Database
5. MongoDB – An Open Source NoSQL Database
VOLUME III
1. Sqoop and Flume – Improving Data Access
2. PIG and HIVE – Improving Programmability
3. ZooKeeper and Oozie – Coordination, Management and Workflow in Distributed Applications
4. Mahout – Creating Scalable Machine Learning Algorithms
VOLUME IV
1. Big Data Processing with Apache Spark
2. AVRO Language – Neutral Data Serialization System
3. Kafka – The Messaging System
4. Apache Storm – A Distributed Real-Time Big Data Processing
5. Impala – A Massive Parallel Processing SQL Query Engine
VOLUME V
1. Apache HCatalog – Table Storage Management Tool for Hadoop
2. Apache Tajo – An Open – Source Distributed Data Warehouse Framework for Hadoop
3. Apache Solr – Building Search Applications
4. Talend – ETL Tool for Data Integration