Monday, February 5, 2018

Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hadoop Training | Edureka

Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hadoop Training | Edureka



Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hadoop Training | Edureka from Edureka!

  1. 1. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Agenda 1. Evolution Of Data 2. What is Big Data? 3. Big Data as an Opportunity 4. Problems in Encasing Opportunity 5. Hadoop as a Solution 6. Hadoop Ecosystem 7. Edureka Big Data & Hadoop Training
  2. 2. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Other Factors IOT Evolution of Technology Telephone Mobile Desktop Cloud Car Smart Car 4 Social Media 3 2 Evolution of Technology 1
  3. 3. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Other Factors IOT IOT: 50 Billion devices by 2020 Evolution of Technology 4 Social Media 3 IOT 2 1
  4. 4. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Other Factors Social Media 4,166,667 likes & 200,000 photos 347,222 tweets 300 hours of video uploaded 204,000,000 emails 1,736,111 Instagram pics Social Media Evolution of Technology IOT 4 Social Media 3 2 1
  5. 5. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Other Factors Evolution of Technology IOT Other Factors 4 Social Media 3 2 1
  6. 6. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING What Is Big Data?
  7. 7. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING What is Big Data? Big data is the term for collection of data sets so large and complex that it becomes difficult to process using on-hand database system tools or traditional data processing applications
  8. 8. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING 5 V’s of Big Data
  9. 9. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING 2020 10,000 20,000 30,000 40,000 ………...……………………..…………………... 2009 2010 2011 2012 20142013 2015 2016 2017 2018 2019 Exabytes By 2020, accumulated digital universe of data will grow from 4.4 zetabyets today to around 44 zettabytes, or 44 trillion gigabytes. Volume Volume 1
  10. 10. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Different kinds of data is being generated from various sources Structured Semi-Structured Un-Structured XML CSV TSV Variety Table Audio Video ImageLog XML CSV TSVJSON E-mail Variety 2 Volume 1
  11. 11. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Data is being generated at an alarming rate 100,000+ tweets 695,000 + status update 11,000,000 + instant messages 698,445 Google Searches 168,000,000 + emails 1,820 TB data created 217+ new mobile users Every 60 secondsEvery 60 seconds Velocity Velocity 3 Variety 2 Volume 1 M a i n f r a m e C l i e n t / S e r v e r I n t e r n e t M o b i l e , s o c i a l m e d i a , c l o u d …
  12. 12. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Mechanism to bring the correct meaning out of the data Value? Value Value 4 Velocity 3 Variety 2 Volume 1
  13. 13. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Veracity 5 Uncertainty and inconsistencies in the data Veracity Value 4 Velocity 3 Variety 2 Volume 1
  14. 14. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Data is being generated at an alarming rate Value ? Mechanism to bring the correct meaning out of the data Uncertainty and inconsistencies in the data Volume Variety Velocity VeracityValue . . . . . . V ’ s associated wit h B ig Dat a may grow wit h t ime Different kinds of data is being generated from various sources 5 V’s of Big Data
  15. 15. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Big Data as an Opportunity
  16. 16. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Big Data as an Opportunity Cost effective storage system for huge data sets Cost Reduction Improved Services or Products Faster and Better Decision Making Next Generation Products Big Data Analytics Provides ways to analyze information quickly and make decisions Evaluation of customer needs & satisfaction Automated Car, Healthcare, etc. Many more opportunitiesMany more opportunities
  17. 17. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING IBM Big Data Analytics
  18. 18. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Big Data Collected by Smart Meter Managing the large volume and velocity of information generated by short-interval reads of smart meter data can overwhelm existing IT resources … Big Data generated by Smart Meter Data was collected in 1 Month Data is collected in 15 Minutes 96 million reads per day for every million meters Earlier Now
  19. 19. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Problem with Smart Meter Big Data To manage and use this information to gain insight, utility companies must be capable of high-volume data management and advanced analytics designed to transform data into actionable insights. … … … … … … Store Analyze
  20. 20. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING How Smart Meter Big Data Is Analysed Before analyzing Big Data After analyzing Big Data Time-of-use pricing encourages cost-savvy retail like industrial heavy machines to be used at off-peak times Energy utilization and billing has increased During peak-load the users require more energy During off-peak times the users required less energy
  21. 21. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING IBM Smart Meter Solution IBM offers an integrated suite of products designed to enable IT to leverage big data in a variety of ways that can contribute to the success of energy companies IBM Solution Data Analysis Data Mining Data Warehousing User Data Security Reporting Managing smart meter data Forecasting and scheduling loads 5 Optimizing energy trading 4 Optimizing unit commitment 3 Monitoring the distribution grid 2 1
  22. 22. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING ONCOR using IBM Smart Meter Solution Utilizes smart electricity meters to accurately measure the electricity usage of a household 1 Instrumented Unprecedented access to detailed information about their electricity use 2 Interconnected Consumers monitor and control their electricity usage through near-real time readings of electricity meters 3 Intelligent Customers in Oncor’s service territory showed last year during the company’s biggest energy saver contest that by using the information from Oncor’s advanced meter Users reduced their electric usage and bills by 25 percent or more Oncor Electric Delivery has incorporate IBM Smart Meter service
  23. 23. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Problems with Encasing Opportunity
  24. 24. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Problems with Big Data • Data generated in past 2 years is more than the previous history in total • By 2020, total digital data will grow to 44 Zettabytes approximately • By 2020, about 1.7 MB of new info will be created every second for every person Problem 1: Storing exponentially growing huge datasets
  25. 25. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Problems with Big Data Problem 2: Processing data having complex structure Structured • Organized data format • Data schema is fixed • Ex: RDBMS data, etc. Semi – Structured • Partial organized data • Lacks formal structure of a data model • Ex: XML & JSON files, etc. Unstructured ▪ Un-organized data ▪ Unknown schema ▪ Ex: multi-media files, etc.
  26. 26. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Problems with Big Data Problem 3: Processing data faster The data is growing at much faster rate than that of disk read/write speed Bringing huge amount of data to computation unit becomes a bottleneck Slave A Slave B Slave C Slave D Slave E Master Data Source: Tom’s Hardware
  27. 27. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Hadoop-as-a-Solution
  28. 28. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Hadoop - Solution to Big Data Problems Hadoop is a framework that allows us to store and process large data sets in parallel and distributed fashion HDFS (Storage) MapReduce (Processing) Allows parallel processing of the data stored in HDFS Allows to dump any kind of data across the cluster
  29. 29. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Hadoop Distributed File System
  30. 30. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Hadoop Distributed File System HDFS creates a level of abstraction over the resources, from where we can see the whole HDFS as a single unit. Hadoop Cluster DataNode (Slaves) NameNode (Master) HDFS has two core components, i.e. NameNode and DataNode. • The NameNode is the main node that contains metadata about the data stored. • Data is stored on the DataNodes which are commodity hardware in the distributed environment.
  31. 31. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Storing Data (Solution) Solution: HDFS ▪ Storage unit of Hadoop ▪ It is a Distributed File System ▪ Divide files (input data) into smaller chunks and stores it across the cluster ▪ Scalable as per requirement 512 MB File 128 MB 128 MB 128 MB 128 MB Problem 1: Storing exponentially growing huge datasets
  32. 32. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Store Different Kinds Of Data (Solution) HDFS ReadWrite Solution: HDFS ▪ Allows to store any kind of data, be it structured, semi-structured or unstructured ▪ Follows WORM (Write Once Read Many) ▪ No schema validation is done while dumping data Problem 2: Storing unstructured data
  33. 33. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Processing Data Faster (Solution) Solution: Hadoop MapReduce ▪ Provides parallel processing of data present in HDFS ▪ Allows to process data locally i.e. each node works with a part of data which is stored on it Problem 3: Processing data faster 2 1 hr. 1 4 hr.
  34. 34. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Hadoop Ecosystem
  35. 35. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Hadoop Ecosystem
  36. 36. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Hadoop Ecosystem Hadoop provides a scalable solution to store and process huge data sets in parallel and distributed fashion. Apache Hive is a data warehousing tool that allows us to perform big data analytics using Hive Query Language which is very similar to SQL. Apache Pig is a platform, used to analyze large data sets representing them as data flows. Apache Spark is an in-memory data processing engine that allows us to efficiently execute streaming, machine learning or SQL workloads and requires fast iterative access to datasets. Apache HBase is a NoSQL database that allows us to store unstructured and semi – structured data with ease and provides real time read/write access.
  37. 37. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Big Data & Hadoop Certification Training
  38. 38. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Big Data Hadoop Certification Training
  39. 39. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Some Big Data & Hadoop Projects @ Edureka 1 2 3 Project #3: Tourism Data Analysis Industry: Tourism Project #1: Analyze social bookmarking sites Industry: Social Media Project #2: Customer Complaints Analysis Industry: Retail
  40. 40. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Some Big Data & Hadoop Projects @ Edureka 4 5 6 Project #6: Analyze Movie Ratings Industry: Media Project #4: Airline Data Analysis Industry: Aviation Project #5: Analyze Loan Dataset Industry: Banking and Finance
  41. 41. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Session In A Minute Big Data as an Opportunity Big Data & Hadoop Training By Edureka 5 V’s of Big Data Hadoop-as-a-Solution How Data Evolved as Big Data Problems with Big Data 512 MB File 128 MB 128 MB 128 MB 128 MB
  42. 42. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING