Data Science
E. A. RUNDENSTEINER, PROGRAM DIRECTOR
PROFESSORS: E. A. Rundensteiner, C. Ruiz, D. M. Strong, S.A. Zekavat
ASSOCIATE PROFESSORS: M. Y. Eltabakh, L. T. Harrison, X. Kong, K. Lee, Y. Li, X. Liu, R. Paffenroth, A. Trapp, J. Zou
ASSISTANT PROFESSORS: N. Kordzadeh, O. Mangoubi, R. Shraga
TEACHING PROFESSOR: F. Emdad
ASSISTANT TEACHING PROFESSOR: T. Ghoshal, C. K. Ngan
Mission Statement
Data Science prepares WPI undergraduates with the skills to understand, apply and develop models, algorithms and statistical techniques to gather huge amounts of data, draw new insights from it, and formulate appropriate action plans. Through courses and handson project work, students in the Data Science program will master foundational and advanced topics, including stateoftheart data analytic technologies like machine/deep learning, artificial intelligence, and big data. This prepares the student to tackle the most critical data challenges in interdisciplinary teams with diverse perspectives in this increasingly digital world from climate change, selfdriving cars, digital healthcare, to social justice. In addition to being a discipline in and of itself, Data Science complements many of the existing undergraduate majors at WPI. Disciplines from the sciences to engineering increasingly grapple with large data sets using computational and statistical techniques and tools.
Students interested in Data Science, both majors and minors, should check with the Data Science program as early as possible in their academic career to develop a plan of study. Students will be assigned a Data Science advisor after completing a major/minor declaration form.
Program Educational Objectives
In support of its goals and mission, the WPI Data Science undergraduate program’s educational objectives are to graduate students who will:
 Bring together a community of diverse disciplinary backgrounds and experiential perspectives to promote creative solutions to critical realworld problems and advance knowledge at the cutting edge
 Achieve professional success due to their mastery of Data Science theory and practice
 Conduct impactful research and project work in data science tacking the world’s most challenging problems
 Engage in discovery through purposedriven projectbased learning
 Collaborate with partners both internally and externally in interdisciplinary projects
 Become leaders in business, academia, and society due to a broad preparation in data science, computational thinking, mathematics, science & engineering, communication, and social issues
 Pursue lifelong learning and continuing professional development
 Use their understanding of the impact of data science on society for the benefit of humankind
Theme:
“Gather Information, Form Insights, Impact the World”!
Program Outcomes
Students graduating with a Bachelor of Science degree in Data Science:
 Have mastered foundational studies in business, computer science, and mathematical sciences
 Have mastered advanced principles and techniques in at least one of the three disciplines
 Can apply computational and mathematical knowledge to the solution of big data problems
 Can communicate effectively across disciplines both verbally and in writing
 Can locate, read, and interpret primary literature in data science
 Can function effectively as members of an interdisciplinary team
 Have an understanding of accepted standards of ethical and professional behavior
 Have the ability to be a lifelong independent learner
Majors

Data Science Major, Bachelor of Science
Minors
Classes
CS 4433/DS 4433: Big Data Management and Analytics
This course introduces the emerging techniques and infrastructures for big data management and analytics including parallel and distributed database systems, mapreduce, Spark, and NoSQL infrastructures, data stream processing systems, scalable analytics and mining, and cloudbased computing. Query processing and optimization, access methods, and storage layouts developed on these infrastructures will be covered. Students are expected to engage in handson projects using one or more of these technologies.
Knowledge in database systems at the level of CS4432, and programming experience are assumed.
CS 4804: Data Visualization
This course trains students in data visualization, the graphical communication of data and information for presentation, confirmation, and exploration. Students learn the stages of the visualization pipeline, including data characterization, mapping data attributes to graphical attributes, user task abstraction, visual display techniques, tools, paradigms, and perceptual issues. Students evaluate the effectiveness of visualizations for specific data, task, and user types. Students implement visualization algorithms and undertake projects involving the use of commercial and publicdomain visualization tools.
CS 2102 or CS 2103, and CS 2223.
DS 1010: Data Science I: Introduction to Data Science
This course provides an introduction to the core concepts in Data Science. It covers a broad range of methodologies for working with and making informed decisions based on realworld data. Core topics introduced in this course include basic statistics, data exploration, data cleaning, data visualization, business intelligence, and data analysis. Students will utilize various techniques and tools to explore, understand and visualize realworld data sets from various domains and learn how to communicate data results to decision makers.
None
DS 2010: Data Science II: Modeling and Data Analysis
This course focuses on model and datadriven approaches in Data Science. It covers methods from applied statistics (regression), optimization, and machine learning to analyze and make predictions and inferences from realworld data sets. Topics introduced in this course include basic statistics (regression), analytics (explanatory and predictive), basics of machine learning (classification and clustering), eigen values and singular matrices, data exploration, data cleaning, data visualization, and business intelligence. Students will utilize various techniques and tools to explore and understand realworld data sets from various domains.
Data science basics equivalent to DS 1010, applied statistics and regression equivalent to MA2611 and MA 2612, and the ability to write computer programs in a scientific language equivalent to a CS programming course at the CS 1000 or CS 2000 level are assumed.
DS 3010: Data Science III: Computational Data Intelligence
This course introduces core methods in Data Science. It covers a broad range of methodologies for working with large and/or highdimensional data sets to making informed decisions based on realworld data. Core topics introduced in this course include data collection through use cycle, data management of largescale data, cloud computing, machine learning and deep learning. Students will acquire experience with big data problems through handson projects using realworld data sets.
Data science basics equivalent to DS 1010, and data analysis principles and modeling equivalent to DS 2010, knowledge of basic statistics equivalent to (MA 2611 and MA 2612), and the ability to program equivalent to (CS 1004 or CS 1101 or CS 1102) and (CS 2102, CS 2103 or CS 2119), as well as understanding of databases equivalent to (CS 3431 or MIS 3720) are assumed.
DS 4635/MA 4635: Data Analytics and Statistical Learning
The focus of this class will be on statistical learning  the intersection of applied statistics and modeling techniques used to analyze and to make predictions and inferences from complex realworld data. Topics covered include: regression; classification/clustering; sampling methods (bootstrap and cross validation); and decision tree learning. Students may not receive credit for both MA 463X and MA 4635.
Linear Algebra (MA 2071 or equivalent), Applied Statistics and Regression (MA 2612 or equivalent), Probability (MA 2631 or equivalent). The ability to write computer programs in a scientific language is assumed.