Introduction To Data Mining By Tan Steinbach And Kumar

0 views

Skip to first unread message

Madox Valdivia

unread,

Aug 5, 2024, 3:25:09 AM8/5/24

to sienonwhito

Ourwebsites may use cookies to personalize and enhance your experience. By continuing without changing your cookie settings, you agree to this collection. For more information, please see our University Websites Privacy Notice.

Financial industry specifically, and most of companies in general have been accumulating data for years and mine data to drive their financial decisions. Data are extremely large nowadays, and keep growing exponentially in the future, and become prohibitive to traditional machine learning and data mining methods. Mining data is said to be more valuable than mining oil.

This course introduces standard machine learning and data mining algorithms with financial applications and prepares students to work with large sized data sets. In the first part, students learn pre-processing, supervised learning algorithms such as logistic regression, nave bayes, k-nearest neighbors, decision trees, neural networks, SVM, and unsupervised learning algorithms such as k-means clustering, and agglomerative clustering, association rule mining. Starting from neural networks, the course introduces an overview of convolutional neural networks (CNN) and typical architectures, recurrent neural networks (RNN), unidirectional and bidirectional long short-term memory networks, unidirectional and bidirectional gated recurrent unit networks, and neural transfer learning. Recommender systems are introduced, including content-based filtering, and collaborative filtering. Reinforcement learning is introduced. Students learn and practice manipulating data using resilient distributed datasets (RDDs) and data frames, and modeling using MLlib on Google Cloud Platform (GCP). A brief introduction of Hive/Pig for data analysis is provided.

The course uses external educational materials such as books, code, videos, and websites to support teaching, and accelerate student learning. Students are expected to spend significant amount of time to digest the assigned materials.

A textbook is not required. The following books are recommended for hands-on practice. These books are available for reading on using UConn credentials. You need to be on the campus network or use vpn.uconn.edu from off-campus.

An introduction to data mining, including data cleaning, the application of statistical and machine learning techniques to discover patterns in data, and the analysis of the quality and meaning of results. Machine learning topics may include algorithms for discovering association rules, classification, prediction, and clustering. Lab assignments provide practice applying specific techniques and analyzing results. An independent project provides students with the opportunity to guide a project from data selection and cleaning through to presentation of results. Pre-requisite: CSCI 362 and statistics (MATH 235, MATH 333 or MATH 335) or permission of instructor

Data mining is the analysis of (often large) observational datasets to findunsuspected relationships and to summarize the data in novel ways that areboth understandable and useful to the data analyst (Hand, Mannila andSmyth: Principles of Data Mining)

The goal of this course is to provide an introduction to the main topics indata mining including: frequent-itemset mining, clustering, classification,link-analysis ranking, dimensionality reduction etc. The focus of the course will be on the algorithmic issues aswell as applications of data mining to real-world problems. Students willbe required to solve small written and programming assignments that willhelp them better understand the covered material.