Hivi Technology

Spark Up Your Skills: Dive into PySpark for Big Data Exploration

PySpark Training in Hivi Technology

PySpark Module

Module Outline:

 

  1. Introduction to Apache Spark and PySpark

    • Overview of Apache Spark and its features.
    • Introduction to PySpark: Python API for Apache Spark.
    • Understanding the Spark ecosystem and its components.
  2. Setting Up a PySpark Environment

    • Installation and configuration of Apache Spark and PySpark.
    • Setting up a SparkSession for Python.
    • Using PySpark shell (pyspark) and Jupyter notebooks for interactive development.
  3. Resilient Distributed Datasets (RDDs)

    • Understanding RDDs and their role in PySpark.
    • Creating RDDs from various data sources.
    • Basic RDD transformations and actions.
  4. Data Transformations and Actions in PySpark

    • Advanced RDD transformations and actions.
    • Working with key-value pairs in RDDs.
    • Lazy evaluation and Spark lineage graph.
  5. Introduction to Spark SQL

    • Overview of Spark SQL and DataFrame API.
    • Creating DataFrames from RDDs and external data sources.
    • Performing SQL queries and DataFrame operations.
  6. Working with External Data Sources

    • Reading and writing data from/to external sources.
    • Integrating PySpark with big data platforms like Hadoop and Hive.
    • Handling structured and semi-structured data formats.
  7. Introduction to PySpark MLlib

    • Overview of MLlib: machine learning library for PySpark.
    • Building machine learning pipelines for classification, regression, and clustering.
    • Training and evaluating machine learning models.
  • 1 Month
  • Weekdays : Mon to Fri ( 1hr/day )
  • Weekend: 2hrs/day
  • Flexible Time
  • Free Session Videos
  • Course Completion Certificate
  • Lifetime Customer Support
  • Helping to Get a Job
  • Resume Preparation
Scroll to Top