COURSE CONTENT
Week #1: Hashing, Bloom Filters, Internet Caching Protocols, Distributed Hash Tables (DHTs).
Week #2: Decentralized Data Structures and P2P systems, Decentralized Systems based on distributed hashing (Chord).
Week #3: Block-Chain Data Structures and Decentralized Applications (DAPPs).
Week #4: Hadoop, Distributed File Systems (HDFS), Map/Reduce Programming Model and NoSQL Databases, Cluster Architectures, Data Flow Systems, Spark, RDDs Structures.
Week #5: Python Language Overview for Decentralized Data Technologies.
(Practical Part: The basic steps for data manipulation with Python and PySpark).
Week #6: Data Storage and Processing in Decentralized Systems.
(Practice Part: Batch Processing with PySpark).
Week #7: Data Storage and Processing in Decentralized Systems (Cont.).
(Practice Part: Batch Processing with PySpark).
Week #8: Machine Learning at Large Scale with PySpark
(Practice Part: Implementing a simple machine learning model using python’s scikit-learn)
Week #9: Large Scale Machine Learning with PySpark (Cont.).
(Practice Part: Implementing a simple machine learning model using python’s scikit-learn)
Week #10: Large Scale Machine Learning with PySpark (Cont.).
(Practice Part: Implementing a Simple Machine Learning Model Using PySpark’s MLlib ).
Week #11: Large Scale Machine Learning with PySpark (Cont.).
(Practice Part: Implementing a Simple Machine Learning Model Using PySpark’s MLlib ).
Week #12: Advanced Topics and Case Studies.
(Practical Part: Implementation of a large Project (or several smaller ones) combining all the previous ones).
Week #13: Advanced Topics and Case Studies (Cont.).
(Practical Part: Implementation of a large Project (or several smaller ones) combining all the previous ones).