CAS MA 881: Topics in High Dimensional Data Analysis

CAS MA 881: Topics in High Dimensional Data Analysis

Fall 2007

Instructor: Dr. Surajit Ray

Department of Mathematics and Statistics

MCS 222, 111 Cummington Street, Boston, MA 02215

Phone: (617) 353-5209, Fax: (617) 353-8100

Meeting time: Tue 3:30-6:30 pm [Break 5:00-5:15 pm]

Class Room: MCS B23

Courseinfo Login ( BU Students)

Department of Mathematics and Statistics

MCS 222, 111 Cummington Street, Boston, MA 02215

Phone: (617) 353-5209, Fax: (617) 353-8100

Meeting time: Tue 3:30-6:30 pm [Break 5:00-5:15 pm]

Class Room: MCS B23

Courseinfo Login ( BU Students)

**LECTURE NOTES**

Course Description:

This course will focus on challenges presented by data from "large magnitude", both in
the dimension of data vectors and in the number of vector. Both classical and
modern statistical techniques and their application
in exploration, regression, testing, visualization and clustering, for euclidean and non-euclidean
space will be discussed. Computing challenges in high dimensional data will be a
focus throughout the course.
Special focus will be put on evaluation and appropriateness of classical statistical methods like Principal Component Analysis, Regression Analysis, Clustering in high-dimensional low sample size setting ( n << p ) .

Grading:

Term paper and class presentation
Topics

- Genesis of high dimensional data.
- Microarray Gene Expression Analysis.
- Medical Imaging.
- Immunoinformatics.
- Participants describing their encounter with high dimensional problem
- Challenges in high dimensional data

- Distances for high dimensional data
- Computational Challenges.
- Distances between probability distributions.
- Generalized Quadratic Distance | article
- Model Selection using quadratic distances.

- Clustering in high dimensional data
- Parametric vs Non-parametric clustering of high-dimensional data.
- Topography of Mixtures in high dimensions| article
- Modal Clustering and modal inference| article | slides
- Visualization of clustered data| article
- Web Resources:
**High**-**Dimensional**Shapes from their low dimensional projection

- The high dimensional low sample size (HDLSS) framework
- Limitations of Classical Statistical techniques in HDLSS.
- High dimensional low sample size geometry.
- Regression in HDLSS
- Phase Transition in linear models
- Phase Transition in Principal Component Analysis (PCA) article

- Elements from Random Matrix Theory
- Application of Random Matrix Theory to Multivariate Statistics | Arxiv Download Page
- Limiting distribution of largest eigenvalues. pdf
- High Dimensional Statistical Inference and Random Matrices
- Clustering using Random Projection

- Analysis of data embedded in Manifold.
- PCA vs Principal Geodesic Analysis
- Structure Covariance Matrix

Please Note:

You are responsible for knowing,
and abiding by, the provisions of the GRS Academic Conduct Code,
which is posted at
http://www.bu.edu/grs/academics/resources/adp.html

Violations of the code are punishable by sanctions including expulsion from the University.