Data Science Program
Majors
-
Data Science Major, Bachelor of Science
Minors
Classes
CS 4433/DS 4433: Big Data Management and Analytics
This course introduces the emerging techniques and infrastructures for big data management and analytics including parallel and distributed database systems, map-reduce, Spark, and NoSQL infrastructures, data stream processing systems, scalable analytics and mining, and cloud-based computing. Query processing and optimization, access methods, and storage layouts developed on these infrastructures will be covered. Students are expected to engage in hands-on projects using one or more of these technologies.
Knowledge in database systems at the level of CS 4432, and programming experience are assumed.
CS 4804: Data Visualization
This course trains students in data visualization, the graphical communication of data and information for presentation, confirmation, and exploration. Students learn the stages of the visualization pipeline, including data characterization, mapping data attributes to graphical attributes, user task abstraction, visual display techniques, tools, paradigms, and perceptual issues. Students evaluate the effectiveness of visualizations for specific data, task, and user types. Students implement visualization algorithms and undertake projects involving the use of commercial and public-domain visualization tools.
DS 1010: Data Science I: Introduction to Data Science
This course provides an introduction to the core concepts in Data Science. It covers a broad range of methodologies for working with and making informed decisions based on real-world data. Core topics introduced in this course include basic statistics, data exploration, data cleaning, data visualization, business intelligence, and data analysis. Students will utilize various techniques and tools to explore, understand and visualize real-world data sets from various domains and learn how to communicate data results to decision makers.
None
DS 2010: Data Science II: Statistical Modeling and Analysis
This course focuses on model- and data-driven approaches in Data Science. It covers methods from applied statistics, optimization, and machine learning to analyze and make predictions and inferences from real-world data sets. Topics covered in this course include a brief overview of statistics and linear algebra, followed by introductory machine learning methods such as linear and nonlinear regression, classification, decision trees, and dimension reduction techniques. Data exploration, data cleaning, feature engineering, and the bias-variance tradeoff will also be covered. Students will utilize various techniques and tools to explore and understand real-world data sets from various domains.
DS 3010: Data Science III: Computational Methods
This course covers a broad range of computational methods to make informed decisions on large and/or high-dimensional data sets following the data science pipeline. Core topics include collecting data via APIs, processing and managing large-scale data, cloud computing, and applying machine learning and deep learning toolkits to extract insights. The goal is to aid decision-making in different domains. Students will learn these skills by working on projects using real-world data sets.
Data science basics equivalent to DS 1010, and data analysis principles and modeling equivalent to DS 2010, knowledge of basic statistics equivalent to (MA 2611 and MA 2612), and the ability to program equivalent to (CS 1004 or CS 1101 or CS 1102) and (CS 2102, CS 2103 or CS 2119), as well as understanding of databases equivalent to (CS 3431 or MIS 3720) are assumed.
DS 4099: Special Topics in Data Science
Instances of this course will explore advanced and emerging topics in Data Science that are not covered by the current regular Data Science offerings. Content and format will vary to suit the interests and needs of the faculty and students. This course may be repeated by students for credit as topics change.
DS 4635/MA 4635: Data Analytics and Statistical Learning
The focus of this class will be on statistical learning - the intersection of applied statistics and modeling techniques used to analyze and to make predictions and inferences from complex real-world data. Topics covered include: regression; classification/clustering; sampling methods (bootstrap and cross validation); and decision tree learning. Students may not receive credit for both MA 463X and MA 4635.