B U P R O B A B I L I T Y A N D S T A T I S T I C S S E M I N A R

Organized by Solesne Bourguin and Ting Zhang at Boston University, on Thursdays from 4:00pm to 5:00pm in room MCS 148 unless otherwise specified. Tea served from 3:45pm to 4:00pm in room MCS 144. Click on the title to read the abstract.

FALL SEMESTER 2017

- Th, Sep 14th, Annie Qu (UIUC),
*TBA*.

TBA

- Th, Sep 21st, TBA,
*TBA*.

TBA

- Th, Sep 28th, TBA,
*TBA*.

TBA

- Th, Oct 5th, TBA,
*TBA*.

TBA

- Th, Oct 12th, TBA,
*TBA*.

TBA

- Th, Oct 19th, TBA,
*TBA*.

TBA

- Th, Oct 26th, Alexandra Chronopoulou (UIUC),
*TBA*.

TBA

- Th, Nov 2nd, TBA,
*TBA*.

TBA

- Th, Nov 9th, TBA,
*TBA*.

TBA

- Th, Nov 16th, TBA,
*TBA*.

TBA

- Th, Nov 30th, TBA,
*TBA*.

TBA

- Th, Dec 7th, TBA,
*TBA*.

TBA

SPRING SEMESTER 2017

- Th, Jan 12th, Yu Gu (Stanford),
*Scaling limits of fluctuations in stochastic homogenization*.

Equations with small scales abound in physics and applied science. When the coefficients vary on microscopic scales, the local fluctuations average out under certain assumptions and we have the so-called homogenization phenomenon. In this talk, I will try to explain some probabilistic approaches we use to obtain the first order random fluctuations in stochastic homogenization. If homogenization is to be viewed as a law of large number type result, here we are looking for a central limit theorem. The tools we use include the Kipnis-Varadhan's method, a quantitative martingale central limit theorem and the Stein's method. Based on joint work with Jean-Christophe Mourrat.

- Mo, Jan 19th, Fei Lu (Berkeley),
*Data-driven stochastic model reduction*.

The need to infer reduced computational models of complex systems from discrete partial observations arises in many scientific and engineering applications, for example in climate prediction, materials science, and biology. The challenges come mainly from memory effects due to unresolved scales, from nonlinear interactions between resolved and unresolved scales, and from the difficulty in drawing inferences from discrete partial data. We address these challenges by a discrete-time stochastic parametrization method, and demonstrate by examples that the resulting stochastic reduced models can capture the key statistical dynamical features of the full system and make accurate short-term predictions. The examples include the Lorenz 96 system (which is a simplified model of the atmosphere) and the Kuramoto-Sivashinsky equation that describes spatiotemporally chaotic dynamics.

- Fr, Jan 20th, Sanchayan Sen (McGill),
*Random discrete structures: Phase transitions, scaling limits, and universality*.

The aim of this talk is to give an overview of some recent results in two interconnected areas: a) Random graphs and complex networks: The last decade of the 20th century saw significant growth in the availability of empirical data on networks, and their relevance in our daily lives. This stimulated activity in a multitude of fields to formulate and study models of network formation and dynamic processes on networks to understand real-world systems. One major conjecture in probabilistic combinatorics, formulated by statistical physicists using non-rigorous arguments and enormous simulations in the early 2000s, is as follows: for a wide array of random graph models on n vertices and degree exponent tau>3, typical distance both within maximal components in the critical regime as well as on the minimal spanning tree on the giant component in the supercritical regime scale like n^{\frac{\tau\wedge 4 -3}{\tau\wedge 4 -1}}. In other words, the degree exponent determines the universality class the random graph belongs to. The mathematical machinery available at the time was insufficient for providing a rigorous justification of this conjecture. More generally, recent research has provided strong evidence to believe that several objects, including (i) components under critical percolation, (ii) the vacant set left by a random walk, and (iii) the minimal spanning tree, constructed on a wide class of random discrete structures converge, when viewed as metric measure spaces, to some random fractals in the Gromov-Hausdorff-Prokhorov sense, and these limiting objects are universal under some general assumptions. We report on recent progress in proving these conjectures. b) Stochastic geometry: In contrast, less precise results are known in the case of spatial systems. We discuss a recent result concerning the length of spatial minimal spanning trees that answers a question raised by Kesten and Lee in the 90's, the proof of which relies on a variation of Stein's method and a quantification of the classical Burton-Keane argument in percolation theory. Based on joint work with Louigi Addario-Berry, Shankar Bhamidi, Nicolas Broutin, Sourav Chatterjee, Remco van der Hofstad, and Xuan Wang.

- Mo, Jan 23rd, Michael Salins (BU),
*Uniform large deviations principle for a general class of stochastic partial differential equations*. (Unusual location: MCS B21)

A rare weather event is sometimes called a "100 year storm" if the event is so unlikely that it happens on average only about once per century. As this phrase suggests, there is a strong connection between the probabilities of rare events and the time it takes those events to occur. The theory of large deviations was developed in the 1960s by Varadhan, Freidlin, Wentzell and others to quantify both the decay rates of probabilities of rare events for finite-dimensional stochastic differential equations and the growth rates of the so-called exit times, the amount of time it takes for those events to occur. The exit time problems require the large deviations principles to be uniform with respect to initial conditions in bounded sets. Over the past few decades, researches have proven uniform large deviations principles for many examples of stochastic partial differential equations, but the methods tend to be equation specific and dependent on the chosen topology of the function space. In this talk, I demonstrate how to use a weak convergence approach and the uniform Laplace principle to prove large deviations principles that are uniform with respect to initial conditions in bounded sets. This is a needed improvement over the previous formulations which only could be used to prove uniformity over compact sets. The method works for a large class of semilinear Banach-space-valued stochastic differential equations whose linear part generates a compact semigroup.

- We, Jan 25th, Martina Hofmanova (TU Berlin),
*Randomness in convection-diffusion problems*.

In this talk, I will consider quasilinear parabolic PDEs subject to stochastic or rough perturbation and explain how various assumptions on coefficients and roughness of the noise naturally ask for different notions of solution with different regularity properties and different techniques of the proofs. On the one hand, the problems under consideration will be stochastic second order parabolic PDEs with noise smooth in space, either with a possible degeneracy in the leading order operator, where only low regularity holds true, or under the uniform ellipticity assumption, where arbitrarily high regularity can be proved under suitable assumptions on the coefficients. On the other hand, I will discuss a rough pathwise approach towards these problems based on tools from paracontrolled calculus.

- Th, Jan 26th, Andrey Sarantsev (UCSB),
*Competing Brownian particles*.

We study finite and infinite rank-based systems of Brownian particles on the real line, with drift and diffusion coefficients of a particle depending on the current rank relative to other particles. These systems have applications in financial modeling, exclusion processes, and other areas.

- Th, Feb 16th, Afonso Bandeira (NYU),
*On Phase Transitions for Spiked Random Matrix and Tensor Models*.

A central problem of random matrix theory is to understand the eigenvalues of spiked random matrix models, in which a prominent eigenvector (or low rank structure) is planted into a random matrix. These distributions form natural statistical models for principal component analysis (PCA) problems throughout the sciences, where the goal is often to recover or detect the planted low rank structured. In this talk we discuss fundamental limitations of statistical methods to perform these tasks and methods that outperform PCA at it. Emphasis will be given to low rank structures arising in Synchronization problems. Time permitting, analogous results for spiked tensor models will also be discussed. Joint work with: Amelia Perry, Alex Wein, and Ankur Moitra.

- Th, Feb 23rd, Yao Li (UMass Amherst),
*Polynomial convergence rate to nonequilibrium steady-state*.

In this talk I will present my recent result about the ergodic properties of nonequilibrium steady-state (NESS) for a stochastic energy exchange model. The energy exchange model is numerically reduced from a billiards-like deterministic particle system that models the microscopic heat conduction in a 1D chain. By using a technique called the induced chain method, I proved the existence, uniqueness, polynomial speed of convergence to the NESS, and polynomial speed of mixing for the stochastic energy exchange model. All of these are consistent with the numerical simulation results of the original deterministic billiards-like system.

- Th, Mar 2nd, Jun Yan (UConn),
*Stagewise generalized estimating equations with grouped variables*.

Forward stagewise estimation is a revived slow-brewing approach for model building that is particularly attractive in dealing with complex data structures for both its computational efficiency and its intrinsic connections with penalized estimation. Under the framework of generalized estimating equations, we study general stagewise estimation approaches that can handle clustered data and non-Gaussian/non-linear models in the presence of prior variable grouping structure. As the grouping structure is often not ideal in that even the important groups may contain irrelevant variables, the key is to simultaneously conduct group selection and within-group variable selection, i.e., bi-level selection. We propose two approaches to address the challenge. The first is a bi-level stagewise estimating equations (BiSEE) approach, which is shown to correspond to the sparse group lasso penalized regression. The second is a hierarchical stagewise estimating equations (HiSEE) approach to handle more general hierarchical grouping structure, in which each stagewise estimation step itself is executed as a hierarchical selection process based on the grouping structure. Simulation studies show that BiSEE and HiSEE yield competitive model selection and predictive performance compared to existing approaches. We apply the proposed approaches to study the association between the suicide-related hospitalization rates of the 15--19 age group and the characteristics of the school districts in the State of Connecticut.

- Th, Mar 16th, Han Xiao (Rutgers),
*On the cross correlations under high dimension*.

As an initial step before modeling high dimensional time series, it is of interest to check whether the component series are correlated. We suggest to perform the test based on the sample cross correlations of the original series, under the presence of temporal dependence. We consider test statistics based on: maximum sample cross correlations, maximum of the pairwise portmanteau type statistics, and some other variants. Asymptotics are developed in the high dimensional setting where the dimension p can grow either as a power of the sample size T, or as an exponential function of T. We employ the moving blocks bootstrap method to calibrate the sizes of the tests for finite samples. Extensions to nonstationary time series are also considered.

- Th, Mar 23rd, Zhengjun Zhang (University of Wisconsin-Madison),
*ATM: autoregressive tail-index model for maxima in financial time series*.

Classical generalized extreme value (GEV) models have been widely used in the practice of financial risk management for the modeling of extreme observations such as intra-day maximum loss from high-frequency trading or maximum daily loss across a large number of assets in a given portfolio. However, due to the time dependency of financial time series, the classical GEV model, as a static model, cannot fulfill the task of adequately modeling the time-varying behavior of extreme observations. In this paper we integrate the classical GEV with dynamic modeling approach to introduce a novel dynamic GEV framework. Specifically, an autoregressive tail-index model (ATM) is proposed to capture the time-varying tail risk of financial market. Probabilistic properties of the model are studied and an irregular maximum likelihood estimator is used for model estimation, with its asymptotic properties investigated. Finite sample performance is illustrated by simulations. The results of two real data examples in which ATM is used for market tail risk monitoring and VaR calculation are presented, where significant improvement over classical GEV has been observed.

- Th, Mar 30th, Tyler McCormick (University of Washington),
*TBA*.

TBA

- Th, Apr 6th, Brent Nelson (Berkeley),
*TBA*.

TBA

- Th, Apr 13th, Jan Rosinski (University of Tennessee),
*TBA*.

TBA

- Th, Apr 20th, Simon Campese (University of Luxembourg),
*TBA*.

TBA

- Th, Apr 27th, Brian Caffo (Johns Hopkins),
*TBA*.

TBA

FALL SEMESTER 2016

- Th, Sep 22nd, David Lipshutz (Brown University),
*Sensitivity analysis for the invariant measure of reflected Brownian motion*.

Reflected Brownian motions (RBMs) in polyhedral cones arise in a variety of applications ranging from queueing theory to mathematical finance. The invariant measure of an RBM (assuming it exists) is often used to approximate the long time behavior of the RBM, and depends on parameters that describe the RBM - namely, the drift vector, covariance matrix and directions of reflection. The focus of this talk is to understand sensitivities of the invariant measure to these parameters. In particular, we show that sensitivities of the invariant measure can be characterized using the invariant measure of a joint process which consists of an RBM and its so-called pathwise derivative. One of the main challenges is to establish existence and uniqueness for the invariant measure of this joint process.

- Tu, Sep 29th, Soumendra Lahiri (NCSU),
*A frequency domain empirical likelihood method for irregularly spaced spatial data*.

In this talk, we consider empirical likelihood methodology for irregularly spaced spatial data in the frequency domain. The main result of the paper shows that upto a suitable (and nonstandard) scaling, Wilk’s phenomenon holds for the logarithm of the empirical likelihood ratio in the sense that it is asymptotically distribution free and has a chi-squared limit. As a result, the proposed spatial FDEL method can be used to build nonparametric, asymptotically correct confidence regions and tests for a class spectral parameters that are defined through spectral estimating equations. A major advantage of the method is that unlike the more common studentization approach, it does not require explicit estimation of the standard error, which itself is a difficult problem due to intricate interactions among several unknown quantities, including the spectral density of the spatial process, the spatial sampling density and the spatial asymptotic structure. Applications of the methodology to some important inference problems for spatial data are given. Joint work with Soutir Bandyopadhyay and Dan Nordman.

- Th, Oct 6th, Harry Crane (Rutgers University),
*The edge exchangeable framework for network modeling*.

Most of the statistical networks literature focuses on theory and methods for inference for data from one of a few default models. For several reasons, these default models, e.g., stochastic blockmodels and graphon models, fail to possess basic statistical properties, raising questions about the soundness of inferences based on these models. I will outline a general framework that clarifies the major issues of statistical network modeling and lends some insight for resolving them. Within this framework, I introduce the class of edge exchangeable network models, which addresses the longstanding problem of modeling sparse network structures in a way the permits sound inference. This is joint work with Walter Dempsey, U. Michigan.

- Fr, Oct 13th, Kung-Sik Chang (University of Iowa),
*Inference for threshold diffusions*.

The threshold diffusion model assumes the underlying diffusion process to have a piece-wise linear drift term and a piece-wise smooth diffusion term, which is useful for analyzing nonlinear continuous-time processes. In practice, the functional form of the diffusion term is often unknown. We develop a quasi-likelihood approach for testing and estimating a threshold diffusion model, by employing a constant working diffusion term, which amounts to a least squares approach. Large-sample properties of the proposed methods are derived under mild regularity conditions. Unlike the discrete-time case, the threshold estimate admits a closed-form asymptotic distribution. We apply the threshold model to examine the nonlinearity in the term structure of a long time series of US interest rates.

- Th, Oct 20th, Han Liang Gan (Northwestern),
*Dirichlet approximation of genetic drift models*.

Abstract: A genetic drift model studies how gene variants and their frequencies evolve in time. However, even for a relatively innocuous looking model, the exact distribution is often intractable. As a result, approximate distributions may be useful. The Dirichlet distribution takes values in K dimensional space where the sum of the entries is equal to 1. This makes it a natural candidate for the approximation of genetic drift models. In this talk we will discuss various genetic drift models (such as the Wright-Fisher model), their approximating Dirichlet distributions, and calculate explicit error bounds for the approximations. If time permits we will cover the Stein's method framework used to derive the results and offer some insights regarding their derivation.

- Th, Nov 3rd, David Degras (UMass Boston),
*A high dimensional group fused lasso*.

Group fused lasso (GFL) is a powerful approach to sparse linear regression problems subject to structural constraints. It is widely used in machine learning, signal processing, and bioinformatics for tasks such as prediction, signal recovery, segmentation, and change point detection. From a computational perspective, GFL is a nonsmooth convex optimization problem that can be solved by off-the-shelf methods such as proximal algorithms and subgradient methods. In high dimension however, these methods require intensive computations and may only approximately enforce structural constraints. To address these concerns, I present a new GFL method that combines block coordinate descent, which is fast but has no convergence guarantees, with subgradient descent, which is slower but provably converges to a global solution. The proposed method is compared to the state of the art in a numerical experiment. It is also applied to resting-state fMRI data to investigate dynamic brain connectivity. Open questions of parameter selection and statistical inference are set forth.

- Th, Nov 17th, Aidong Ding (Northeastern),
*A robust-equitable dependence measure for feature selection*.

Dependence measure plays an important role in filter-based feature selection. To correctly identify important features with complex relationship in large data sets, we like the measure to be equitable (Reshef et al. Science, 2011): treating all types of functional relationships, linear and nonlinear, equally. We provides a theoretical treatment of equitability, including the self-equitability definition (Kinney and Atwal, PNAS 2014) and a new robust-equitablity definition. The robust copula dependence (RCD) measure based on L1-distance of copula density is shown to be equitable under all equitability definitions. We also provide theoretical justification that RCD can be fundamentally easier to estimate than mutual information (MI), the recommended self-equitable measure in Kinney and Atwal. Numerical examples, on synthetic data sets and real data sets illustrate the effect of equitability in feature ranking and selection. Particularly, selection based on RCD can be more robust to varying sample size than selection through MI and other measures.

- Th, Dec 1st, Dean Eckles (MIT),
*Massive meta-analysis using regularized instrumental variables, with an application to peer effects*.

The widespread adoption of randomized experiments (i.e. A/B tests) in the Internet industry means that there are often numerous well-powered experiments on a given product. Individual experiments are often simple "bake-off" evaluations of a new intervention: They allow us to estimate effects of that particular intervention on outcomes of interest, but they are often not informative about the mechanisms for these effects or what other inventions might do. We consider what else we can learn from a large set of experiments. In particular, we use many experiments to learn about the effects of the various endogenous variables (or mechanisms) via which the experiments affect outcomes. This involves treating the experiments as instrumental variables, and so this setting is similar to, but somewhat different from, "many instrument" settings in econometrics and biostatistics. Motivated by the distribution of experiment first-stage effects, we present and evaluate regularization methods for improving on standard IV estimators. Joint work with Alex Peysakhovich (Facebook AI Research).

- Th, Dec 8th, Scott Robertson (BU),
*Robust asymptotic growth in the presence of stability*.

In this talk, we revisit the problem considered in "Robust Aysmptotic Growth" (Kardaras, Robertson 2012, Annals of Applied Probability) where the investor seeks to maximize the growth rate of her portfolio when there is uncertainty in the drift of asset prices. In this setting, while the instantaneous covariance matrix and domain of the underlying asset prices are known, the precise drift is unknown, beyond the qualitative statement that asset prices do not "explode" to the boundary over the investment horizon. Therein, robust growth optimal portfolios are constructed via the generalized principal eigenfunction for a degenerate elliptic operator, and such portfolios are seen as the long horizon limit of the functionally generated finite horizon relative arbitrage portfolios introduced by Fernholz and Karatzas in their work on stochastic portfolio theory. In the present work, we seek to extend the robust growth optimal analysis to the situation where in addition to knowing asset prices do not explode to the boundary of the state space, the investor also knows that asset prices are stable over time. Such a setting naturally arises in the study of ranked based portfolios where optimal policies are driven not by the asset prices themselves, but rather the ranked relative market capitalizations. In this setting, we provide simple conditions upon the domain, covariance matrix and limiting invariant density under which growth optimal portfolios may be constructed. Here, the answer is relatively easy to obtain when an associated diffusion is symmetric, or reversing, but requires a very delicate analysis in the non-symmetric case. Growth optimal portfolios are governed by a solution to a variational problem in the space of functions which are locally in W^{1,2} the space of square integrable weakly differentiable functions. After presenting the results for the case when asset prices do not exhibit local time behavior on the boundary of the state space, the case containing local times will be considered, as this is the natural setting for ranked based diffusions, which is the primary example of interest. This is joint work with Kostas Kardaras, of the London School of Economics.

SPRING SEMESTER 2016

- Tu, Jan 19th (in MCS B21), Daniel Sussman (Harvard),
*Adjacency spectral embedding for random graphs*.

The eigendecomposition of an adjacency matrix provides a way to embed a graph as points in finite dimensional Euclidean space. This embedding allows the full arsenal of statistical and machine learning methodology for multivariate Euclidean data to be deployed for graph inference. Our work analyzes this embedding, a graph version of principal component analysis, in the context of various random graph models with a focus on the impact for subsequent inference. We show that for a particular model this embedding yields a consistent estimate of its parameters and that these estimates can be used to accurately perform a variety of inference tasks including vertex clustering, vertex classification as well as estimation and hypothesis testing about the parameters.

- Th, Jan 21st, Colin B. Fogarty (UPenn),
*Leveraging multiple outcomes in matched observational studies*.

In order to bridge the gap between association and causation in observational studies, Fisher advocated for the testing of “elaborate theories.” One manner in which a causal theory can be made elaborate is through the prediction of a particular direction of effect for multiple outcome variables. When testing hypotheses on multiple outcomes, multiple comparisons must be taken into account. This is true not only when assuming no unmeasured confounding, but also when assessing how robust a study's findings are to unmeasured confounding in the subsequent sensitivity analysis. Concerns over a loss in power may lead practitioners to instead investigate the outcome variable they believe *a priori* will be most affected by the intervention, thus reducing the extent to which Fisher's advice is followed in practice. We demonstrate that when performing multiple comparisons in a sensitivity analysis, the loss in power from controlling the familywise error rate can be attenuated. This is because unmeasured confounding cannot have a different impact on the probability of assignment to treatment for a given individual depending on the outcome being analyzed. Existing methods for testing the overall truth of multiple hypotheses allow this to occur by combining the results of sensitivity analyses performed on individual outcomes. By solving a quadratically constrained linear program, we are able to perform a sensitivity analysis while avoiding this logical inconsistency. We show that this allows for uniform improvements in the power of a sensitivity analysis when compared to combining individual sensitivity analyses. This is true not only for testing the overall null across outcomes, but also for testing null hypotheses on specific outcome variables when using certain sequential rejection procedures. We illustrate our method through an example examining the impact of smoking on naphthalene levels in the body.

- Tu, Jan 26th, Veronika Rockova (UPenn),
*Fast Bayesian factor analysis via automatic rotations to sparsity*.

Rotational post-hoc transformations have traditionally played a key role in enhancing the interpretability of factor analysis. Regularization methods also serve to achieve this goal by prioritizing sparse loading matrices. In this work, we bridge these two paradigms with a unifying Bayesian framework. Our approach deploys intermediate factor rotations throughout the learning process, greatly enhancing the eﬀectiveness of sparsity inducing priors. These automatic rotations to sparsity are embedded within a PXL-EM algorithm, a Bayesian variant of parameter-expanded EM for posterior mode detection. By iterating between soft-thresholding of small factor loadings and transformations of the factor basis, we obtain (a) dramatic accelerations, (b) robustness against poor initializations and (c) better oriented sparse solutions. To avoid the pre-speciﬁcation of the factor cardinality, we extend the loading matrix to have inﬁnitely many columns with the Indian Buﬀet Process (IBP) prior. The factor dimensionality is learned from the posterior, which is shown to concentrate on sparse matrices. Our deployment of PXL-EM performs a dynamic posterior exploration, outputting a solution path indexed by a sequence of spike-and-slab priors. For accurate recovery of the factor loadings, we deploy the Spike-and-Slab LASSO prior, a two-component reﬁnement of the Laplace prior (Rockova 2015). A companion criterion, motivated as an integral lower bound, is provided to eﬀectively select the best recovery. The potential of the proposed procedure is demonstrated on both simulated and real high-dimensional gene expression data, which would render posterior simulation impractical.

- Th, Jan 28th, Sumanta Basu (UC Berkeley),
*Learning dynamics of complex systems from high-dimensional data*.

The problem of learning interrelationships among the components of large, complex systems from high-dimensional datasets is common in many areas of modern economic and biological sciences. Examples include macroeconomic policy making, financial risk management, gene regulatory network reconstruction and elucidating functional roles of epigenetic regulators driving cellular mechanisms. In addition to their inherent computational challenges, principled statistical analyses of these big data problems often face unique challenges emerging from temporal and cross-sectional dependence in the data and complex dynamics (heterogeneity, nonlinear and high-order interactions) among the system components. In this talk, I will present Network Granger causality - a unified framework for structure learning and forecasting of large dynamic systems using multivariate time series and panel data. The proposed framework relies on regularized estimation of high-dimensional vector autoregressive models (VAR), is flexible enough to incorporate grouping and latent structures, allows parallel implementation for large scale data sets and enjoys strong theoretical guarantees under high-dimensional scaling. I will demonstrate the advantage of the proposed methodology on a motivating application from financial econometrics - system-wide risk monitoring of U.S. financial sector before, during and after the crisis of 2007-2009. I will conclude with some of my ongoing works on learning nonlinear and potentially high-order interactions in high-dimensional, heterogeneous settings.

- Fr, Jan 29th, Tirthankar Dasgupta (Harvard),
*Designing experiments for new-generation scientific studies: some challenges and potential solutions*.

Many modern-day experiments conducted by researches in the physical, social, behavioral, biomedical and management/business sciences involve complications like (a) simultaneous study of multiple factors (b) availability of multiple covariate measurements for each experimental unit – before or after conducting the experiment, necessitating a strategy for achieving covariate balance/adjustment across treatment groups and (c) varying level of randomization restrictions across factors. In this talk, we will present real-life examples of these complications from different fields of application, and discuss some ideas and research results that focus on the development of a unified approach for designing and analyzing experiments that address all of the above complexities.

- Th, Feb 4th, Christine B. Peterson (Stanford),
*Statistical approaches for making sense of high-throughput biological data*.

In this talk, I will discuss statistical approaches I have developed to gain insight into the complex networks of regulation and interaction that govern biological systems. Understanding these networks and how they are disrupted by disease is an important step in identifying potential targets for the treatment of disease. Firstly, I will describe my work on the inference of biological networks such as metabolic or protein interaction networks from high-throughput data. In particular, I will address graphical modeling methods I have proposed in the Bayesian framework for inferring such networks based on limited sample sizes, and illustrate the application of these approaches to highlight mechanisms underlying cancer progression. Secondly, I will address the problem of establishing the genetic basis of multivariate traits such as gene expression or other molecular profiling data. Here I propose a multi-stage multiple testing procedure which controls important error rates regarding the discovery of regulatory variants and the association of these variants to traits.

- Th, Mar 3rd, Karl Rohe (Univ of Wisconsin Madison),
*Network driven sampling: a critical threshold for design effects*.

Web crawling and respondent-driven sampling (RDS) are two types of network driven sampling techniques that are popular when it is difficult to contact individuals in the population of interest. This paper studies network driven sampling as a Markov process on the social network that is indexed by a tree. Each node in this tree corresponds to an observation and each edge in the tree corresponds to a referral. Indexing with a tree, instead of a chain, allows for the sampled units to refer multiple future units into the sample. In survey sampling, the design effect characterizes the additional variance induced by a novel sampling strategy. If the design effect is D, then constructing an estimator from the novel design makes the variance of the estimator D times greater than it would be under a simple random sample. Under certain assumptions on the referral tree, the design effect of network driven sampling has a critical threshold that is a function of the referral rate m and the clustering structure in the social network, represented by the second eigenvalue of the Markov transition matrix \lambda_2. If m < 1/\lambda_2^2, then the design effect is finite (i.e. the standard estimator is \sqrt{n}-consistent). However, if m > 1/\lambda_2^2, then the design effect grows with n (i.e. the standard estimator is no longer \sqrt{n}-consistent; it converges at the slower rate of \log_m \lambda_2).

- Th, Mar 17th, Simon Campese (University of Rome Tor Vergata),
*Abstract fourth moment theorems*.

The classical Fourth Moment Theorem says that for a normalized sequence of multiple Wiener-Itô integrals, convergence of just the fourth moment suffices to ensure convergence in law towards a standard Gaussian random variable. Since its discovery, several proofs and extensions of this result have been found, all of them heavily exploiting the rich structure of multiple integrals. In an exciting new development, it turned out that such Fourth Moment Theorems hold in much greater generality, namely for generic eigenfunctions of Markov diffusion generators with a certain chaotic property and target laws fulfilling some sufficient condition (examples being the Gaussian, Gamma and Beta distribution). We will present an overview of this new approach.

- Th, Mar 24th, Nikolai Leonenko (Cardiff University),
*Limit theorems for weighted non-linear transformations of Gaussian processes with singular spectrum*.

The limit Gaussian distribution of multivariate weighted functionals of non-linear transformations of Gaussian stationary processes, having multiple singular spectra, is derived, under very general conditions on the weight function. This work is motivated by applications to the estimation of a harmonic components in non-linear regression model with singular spectrum, and asymptotic inference on non-linear functionals of Gaussian stationary processes with singular spectra. This is a continuation of the pioneering results of Rosenblatt (1961), Taqqu (1975,1979), Dobrushin and Major (1979) for convergence to Gaussian and non-Gaussian distributions, under long range dependence, in terms of Hermite expansions, and Breuer and Major (1983), Avram and Brown (1989), Chambers and Slud (1989) on convergence to the Gaussian distribution by using diagram formulae or graphical methods. This line of research continues to be of interest today, see Berman (1992) for m-dependent approximation approach, Ho and Hsing (1997) for martingale approach, Nualart and Pecatti (2005), Nourdin and Pecatti (2009) for the application of Malliavin calculus and Stein method, among the others. This is a joint work with A.V. Ivanov, M.D. Ruiz-Medina, M.D. and I.N. Savich.

- Th, Mar 31st, Ivan Fernandez-Val (Boston University),
*The sorted effects method: discovering heterogeneous effects beyond their averages*.

The partial (ceteris paribus) effects of interest in nonlinear and interactive linear models are heterogeneous as they can vary dramatically with the underlying observed or unobserved covariates. Despite the apparent importance of heterogeneity, a common practice in modern empirical work is to largely ignore it by reporting average partial effects (or, at best, average effects for some groups, see e.g. Angrist and Pischke, 2008). While average effects provide very convenient scalar summaries of typical effects, by definition they fail to reflect the entire variety of the heterogenous effects. In order to discover these effects much more fully, we propose to estimate and report sorted effects - a collection of estimated partial effects sorted in increasing order and indexed by percentiles. By construction the sorted effect curves completely represent and help visualize all of the heterogeneous effects in one plot. They are as convenient and easy to report in practice as the conventional average partial effects. We also provide a quantification of uncertainty (standard errors and confidence bands) for the estimated sorted effects. We apply the sorted effects method to demonstrate several striking patterns of gender-based discrimination in wages, and of race-based discrimination in mortgage lending. Using differential geometry and functional delta methods, we establish that the estimated sorted effects are consistent for the true sorted effects, and derive asymptotic normality and bootstrap approximation results, enabling construction of pointwise confidence bands (pointwise with respect to percentile indices). We also derive functional central limit theorems and bootstrap approximation results, enabling construction of simultaneous confidence bands (simultaneous with respect to percentile indices). The derived statistical results in turn rely on establishing Hadamard differentiability of a multivariate sorting operator, a result of independent mathematical interest. This is a joint work with Victor Chernozhukov and Ye Luo.

- Th, Apr 14th, Kun Chen (UConn),
*Sequential estimation in sparse factor regression*.

Multivariate regression models of large scales are increasingly required and formulated in various fields. A sparse singular value decomposition of the regression component matrix is appealing for achieving dimension reduction and facilitating model interpretation. However, how to recover such a composition of sparse and low-rank structures remains a challenging problem. By exploring the connections between factor analysis and reduced-rank regression, we formulate the problem as a sparse factor regression and develop an efficient sequential estimation procedure. At each sequential step, a latent factor is constructed as a sparse linear combination of the observed predictors, for predicting the responses after accounting for the effects of the previously found latent factors. Comparing to the complicated joint estimation approach, a prominent feature of our proposed sequential method is that each step reduces to a simple regularized unit-rank regression, in which the orthogonality requirement among the sparse factors becomes optional rather than necessary. The ideas of coordinate descent and Bregman iterative methods are utilized to ensure fast computation and algorithmic convergence, even in the presence of missing data and when exact orthogonality is desired. Theoretically, we show that the sequential estimators enjoy the oracle properties for recovering the underlying sparse factor structure. The efficacy of the proposed approach is demonstrated by simulation studies and two real applications in genetics.

- Th, Apr 21th, Daniel Schwarz (Carnegie Mellon),
*Integral representation of martingales in mathematical finance*.

In this talk we will present recent results concerning a class of integral representation theorems for martingales which lie at the heart of two fundamental problems in mathematical finance: the completion of financial markets with derivative securities and the existence of partial Radner equilibria. Some popular examples and open problems will be discussed.

- Th, Apr 28th, Xiaofeng Shao (University of Illinois at Urbana-Champaign),
*A new approach to dimension reduction for multivariate time series*.

In this talk, we introduce a new methodology to reduce the number of parameters in multivariate time series modeling. Our method is motivated from the consideration of optimal prediction and focuses on the reduction of the effective dimension in conditional mean of time series given the past information. In particular, we seek a contemporaneous linear transformation such that the transformed time series has two parts with one part being conditionally mean independent of the past information. Our dimension reduction procedure is based on eigen-decomposition of the so-called cumulative martingale difference divergence matrix, which encodes the number and form of linear combinations that are conditional mean independent of the past. Interestingly, there is a factor model representation for our dimension reduction framework and our method can be further extended to reduce the dimension of volatility matrix. We provide a simple way of estimating the number of factors and factor loading space, and obtain some theoretical results about the estimators. The finite sample performance is examined via simulations in comparison with some existing methods.

FALL SEMESTER 2015

- Th, Sep 10th, Liliya Zax (Boston University),
*Statistics application in industry: financial institutions and tech companies*.

In my presentation I would share some of the aspects of my statistics related experience in different industries, namely in financial and technology companies. We would discuss some specific statistical problems that are of interest to the industry, what statistical tools do they use to try to solve those problems, and what are the statistical challenges that they are facing. The goal of the presentation is to help students to understand better how knowledge and skills they get in their academic programs can be later applied if they prefer to continue their career in industry.

- Th, Sep 17th, Leu Guo (Boston University),
*The power of message networks: semantic network analysis of media effects in twittersphere during the 2012 U.S. presidential election*.

Do traditional news media still lead public opinion in this digital age? This talk will present a study that explores how media such as newspapers and televisions set the public agenda through constructing message networks. Semantic network analysis and big data analytics were used to examine the large dataset collected on Twitter during the 2012 U.S. presidential election.

- Th, Oct 1st, Philippe Rigollet (MIT),
*Batched bandits*.

Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic multi-armed bandits under the constraint that the employed policy must split trials into a small number of batches. Our results show that a very small number of batches gives already close to minimax optimal regret bounds and we also evaluate the number of trials in each batch. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.

- Th, Oct 8th, John Harlim (Penn State),
*Diffusion forecast: a nonparametric modeling approach*.

I will discuss a nonparametric modeling approach for forecasting stochastic dynamical systems on low-dimensional manifolds. In the limit of large data, this approach converges to a Galerkin projection of the semigroup solution of the backward Kolmogorov equation of the underlying dynamics on a basis adapted to the invariant measure. This approach allows one to evolve the probability distribution of non-trivial dynamical systems with equation-free modeling. I will also discuss nonparametric filtering methods, leveraging the diffusion forecast in Bayesian framework to initialize the forecasting distribution given noisy observations.

- Th, Oct 15th, Pierre Jacob (Harvard),
*Estimation of the derivatives of functions that can only be evaluated with noise*.

Iterated Filtering methods have recently been introduced to perform maximum likelihood parameter estimation in state-space models, and they only require being able to simulate the latent Markov model according to its prior distribution. They rely on an approximation of the score vector for general statistical models based upon an artificial posterior distribution and bypasses the calculation of any derivative. We show here that this score estimator can be derived from a simple application of Stein’s lemma and how an additional application of this lemma provides an original derivative-free estimator of the observed information matrix. These methods tackle the general problem of estimating the first two derivatives of a function that can only be evaluated point-wise with some noise. We compare these new methods with finite difference schemes and make connections with proximal mappings. In particular we look at the bias and variance of these estimators, the effect of the variance of the noise, and the effect of the dimension of the parameter space.

- Th, Oct 22nd, Jian Zhou (WPI),
*Volatility inference using high-frequency financial data and efficient computations*.

The field of high-frequency finance has experienced a rapid evolvement over the past few decades. One focus point is volatility modeling and analysis for high-frequency financial data. It plays a major role in finance and economics. In this talk, we focus on the statistical inference problem on large volatility matrix using high-frequency financial data, and propose a methodology to tackle this problem under various settings. We illustrate the methodology with the high-frequency price data on stocks traded in New York Stock Exchange in 2013. The theory and numerical results show that our approach perform well while pooling together the strengths of regularization and estimation from a high-frequency finance perspective.

- Th, Oct 29th, Markos Katsoulakis (UMass Amherst),
*Path-space information metrics for uncertainty quantification and coarse-graining of molecular systems*.

We present path-space, information theory-based, sensitivity analysis, uncertainty quantification and variational inference methods for complex high-dimensional stochastic dynamics, including chemical reaction networks with hundreds of parameters, Langevin-type equations and lattice kinetic Monte Carlo. We establish their connections with goal-oriented methods in terms of new, sharp, uncertainty quantification inequalities that scale appropriately at both long times and for high dimensional state and parameter space. The combination of proposed methodologies is capable to (a) tackle non-equilibrium processes, typically associated with coupled physicochemical mechanisms or boundary conditions, such as reaction-diffusion problems, and where even steady states are unknown altogether, e.g. do not have a Gibbs structure. The path-wise information theory tools, (b) yield a surprisingly simple, tractable and easy-to-implement approach to quantify and rank parameter sensitivities, as well as (c) provide reliable parameterizations for coarse-grained molecular systems based on fine-scale data, and rational model selection through path-space (dynamics-based) variational inference methods.

- Th, Nov 5th, Iddo Ben-Ari (UConn),
*The Bak-Sneppen model of biological evolution and related models*.

The Bak-Sneppen model is a Markovian model for biological evolution that was introduced as an example for Self-Organized Criticality. In this model, a population of size N evolves according to the following rule. The population is arranged on a circle, or more generally a connected graph. Each individual is assigned a random fitness, uniform on [0,1], independent of the other fitness of the other individuals. At each unit of time, the least fit individual and its neighbors are removed from the population, and are replaced by new individuals. Despite being extremely simple, the model is known to be very challenging, and the evidence for Self-Organized Criticality provided by Bak and Sneppen was obtained through numerical simulations. I will review the main rigorous results on this model, mostly due to R. Meester and his coauthors, present some new results and open problems. I will then turn to a recent and more tractable variants of the model, in which on the one hand the spatial structure is relaxed, while on the other hand the population size is random. I will focus on the functional central limit for model, which has a somewhat unusual form.

- Th, Nov 12th, Mokshay Madiman (University of Delaware),
*Optimal concentration of information for log-concave distributions*.

It was shown by Bobkov and the speaker that for a random vector X in R^n drawn from a log-concave density e^{-V}, the information content per coordinate, namely V(X)/n, is highly concentrated about its mean. Their argument was nontrivial, involving the localization technique, and also gave suboptimal exponents, but it was sufficient to demonstrate that high-dimensional log-concave measures are in a sense close to uniform distributions on the annulus between 2 nested convex sets. We will present recent work that obtains an optimal concentration bound in this setting (optimal even in the constant terms, not just the exponent), using very simple techniques, and outline the proof. Applications that motivated the development of these results include high-dimensional convex geometry and random matrix theory, and we will outline these applications.

- Th, Nov 19th, Youssef M. Marzouk (MIT),
*Transport maps for Bayesian computation*.

We will discuss how transport maps, i.e., deterministic couplings between probability measures, can enable useful new approaches to Bayesian computation. A first use involves a combination of optimal transport and Metropolis correction; here, we use continuous transportation to transform typical MCMC proposals into adapted non-Gaussian proposals, both local and global. Second, we discuss a variational approach to Bayesian inference that constructs a deterministic transport map from a reference distribution to the posterior, without resorting to MCMC. Independent and unweighted posterior samples can then be obtained by pushing forward reference samples through the map. Making either approach efficient in high dimensions, however, requires identifying and exploiting low-dimensional structure. We present new results relating sparsity of transport maps to the conditional independence structure of the target distribution, and discuss how this structure can be revealed through the analysis of certain average derivative functionals. A connection between transport maps and graphical models yields many useful algorithms for efficient ordering and decomposition---here, generalized to the continuous and non-Gaussian setting. The resulting inference algorithms involve either the direct identification of sparse maps or the composition of low-dimensional maps and rotations. We demonstrate our approaches on Bayesian inference problems arising in spatial statistics and in partial differential equations.

- Th, Dec 3rd, Shuyang Bai (Boston University),
*Self-normalized resampling for time series*.

The inference procedures for the mean of a stationary time series are usually quite different depending on the strength of the dependence as well as the heavy tailedness of the model. In this talk, combining the ideas of resampling and self-normalization, we introduce a unified procedure which is valid under various different model assumptions. The procedure avoids estimation of any nuisance parameter, and requires only the choice of one bandwidth. Simulation examples will be given to illustrate its performance. The asymptotic theory will also be introduced.

- Th, Dec 10th, Vidhu Prasad (UMass Lowell),
*Towers, codes and approximate conjugacy*.

Consider the following question about an irrational rotation $T$ of the unit circle and a mixing Markov chain: is there a partition of the circle (indexed by the state space of the MC) so that the itinerary process given by T and the partition has the distribution of the given Markov Chain? Furthermore, this will be true for any aperiodic measure preserving transformation (not just irrational rotation): the existence of “tower structures” for any T is equivalent to the coding property above (the existence of a partition which is moved like the MC by T) and the latter property is equivalent to an “almost conjugacy” property for T. The “tower property” is generalization of one of the truly basic results in ergodic theory: (Kakutani)-Rokhlin's Lemma.