DTS 302Data Science· Computing

Big Data Computing

2 UnitsStatus: C300 LevelSemester 1LH 15PH 45Emerging Tech

Learning outcomes

At the end of the course the students should be able to: 1. identify Big Data; 2. identify some of the foundational tools, systems, and platforms that feature in working with Big Data across several domains; 3. install Big Data working tools on a computer; and 4. analyse Big Data contents.

Course contents

Installation: Cloudera VM, Jupyter server. Big data retrieval and relational querying: Postgres databases, NoSQL data, MongoDB, Aerospike, and Pandas for data aggregation and working with data frames. Big Data Integration: Splunk and Datameer. Big Data Processing: Apache Spark, Hadoop, Spark Core (Spark MLlib and GraphX). Big Data Applications (Graph Processing). Big Data Streaming Platforms for Fast Data. Lab Work: Analysing Twitter Data using Spark and MongoDB. Learn Big Data analytics skills. Practical procedure for the crafting of an enterprise-scale cost-efficient Big Data and machine learning solution to uncover insights and value from data. Use the practical exercises to bridge the gap between the theoretical world of technology with the practical ground reality of building corporate Big Data and data science platforms. Hands-on exposure to Hadoop and Spark (or any of the BD tools), build machine learning dashboards using R and R Shiny, create web-based apps using NoSQL databases. Practical assignment of BD security. New Computing

Modules

1Syllabus