Data Analysis Using Hadoop/Spark

Host Site:

Texas Advanced Computing Center

Host site URL:

https://www.tacc.utexas.edu

This course will focus on introducing existing libraries and tools available within the Hadoop and Spark ecosystem. We will introduce general practices of using Hadoop/Spark cluster for practical analysis problems, such as running batch jobs with different cluster deployment modes and running interactive jobs. Existing analysis libraries and applications will be introduced during the class, including Hadoop streaming, MLlib, SparkSQL and GraphX. We will also introduce how to use Hadoop/Spark cluster with other programming languages including R and Python. Participants should have basic knowledge, experience and are comfortable with coding. Participants are also expected to have knowledge of the Hadoop cluster system, concepts of parallelism and can work on computing resources at TACC.

More information: https://www.tacc.utexas.edu

Sessions:

Webcast

05/04/2017 13:00 - 05/04/2017 16:30 CDT (SESSION HAS ENDED)
View Session Details →
Registration CLOSED
Registration open date
03/29/2017 09:00 CDT
Registration close date
05/02/2017 17:00 CDT
Class size restriction
50 registrants

(0 spots left)

Waitlist

66 registrants

Contact Information
Contact
Jason Allison
Contact phone
5124759238
Contact email
jasona@tacc.utexas.edu
Location
Name
Texas Advanced Computing Center
Phone
5124759238
URL
https://www.tacc.utexas.edu
Posted: 03/29/2017 16:00 UTC