Boston University Probability and Statistics Seminar

B U P R O B A B I L I T Y A N D S T A T I S T I C S S E M I N A R

Organized by Michael Salins and Daniel Sussman at Boston University, on Thursdays from 4:00pm to 5:00pm in room MCS 148 unless otherwise specified (formerly organized by Solesne Bourguin, Kostas Spiliopoulos and Ting Zhang). Tea served from 3:45pm to 4:00pm in room MCS 144. Below is an archive of past talks. Click on the title to read the abstract.

FALL SEMESTER 2018

Th, Sep 20th, Cencheng Shen (Delaware University), Dependency discovery via multiscale generalized correlation.
Determining how certain properties are related to other properties is fundamental to scientific discovery; further investigations into the geometry of the relationship and future predictions are warranted only if two properties are significantly related. To better discover any type of relationship underlying paired sample data, we introduce the multiscale generalized correlation (MGC), which combines distance correlation, the locality principle, and smoothed maximum to yield a new and superior correlation measure.
Th, Sep 27th, Guangqu Zheng (Kansas University), Exchangeable pairs for weak convergence and a.s. convergence on chaoses.
This talk consists of two parts: the first part is about recent advances on fourth moment phenomena via exchangeable pairs, and will be a quick overview. The second part of the talk will be the main focus of this talk, and will provide several new phenomena concerning a.s. convergence of homogeneous chaoses that include Gaussian Wiener chaoses and homogeneous sums in centered independent random variables. The second part is based on a recent work with Guillaume Poly.
Th, Oct 4th, Gunduz Caginalp (University of Pittsburgh), Volatility maxima as a forecaster of trading price extrema.
This is joint work with Carey Caginalp. The relationship between price volatility and a market extremum is examined using a fundamental economics model of supply and demand. By examining randomness through a microeconomic setting, we obtain the implications of randomness in the supply and demand, rather than assuming that price has randomness on an empirical basis. Within a very general setting the volatility has a maximum that precedes the extremum of the price. A key issue is that randomness arises from the supply and demand, and the variance in the stochastic differential equation governing the logarithm of price must reflect this. Analogous results are obtained by further assuming that the supply and demand are dependent on the deviation from fundamental value of the asset. The supply/demand approach also shows that fat tails (in particular x^(-2) falloff) are endogenous to the trading mechanism.
Th, Oct 11th, Vince Lyzinski (UMass Amherst), Graph matching in edge-independent networks.
The graph matching problem seeks to find an alignment between the vertex sets of two graphs that best preserves common structure across graphs. Herein, we consider the closely related problem of graph matchability: Given a latent alignment between the vertex sets of two graphs, under what conditions will the solution to the graph matching optimization problem recover this alignment in the presence of shuffled vertex labels. Working in a general class of correlated edge-independent network models, we establish limits on graph matchability in terms of the correlation across graphs when the graphs are of the same order, and establish analogous results when the graph orders differ significantly. While there are currently no efficient algorithms for solving the graph matching problem in general, these results nonetheless provide practical algorithmic guidance for approximately matching networks in both real and synthetic data applications.
Th, Oct 18th, Xiaojing Wang (UConn), Estimating Shape Constrained Functions Using Gaussian Processes.
Gaussian processes are a popular tool for nonparametric function estimation because of their flexibility and the fact that much of the ensuing computation is parametric Gaussian computation. Often, the function is known to be in a shape-constrained class, such as the class of monotonic or convex functions. Such shape constraints can be incorporated through the use of derivative processes, which are joint Gaussian processes with the original process, as long as the conditions of mean square differentiability hold. The possibilities and challenges of introducing shape constraints through this device are explored, and illustrated through simulations and two real data examples. Computation is carried out through a Gibbs sampling scheme. Joint work with Jim Berger, Duke University.
Th, Oct 25th, Dheeraj Nagaraj (MIT), Stein's method on Finite Spaces.
Stein's method is a powerful framework to deal with distributional convergence of functions of dependent random variables. We formulate Stein's method for convergence to distributions on finite spaces using finite state Markov Chains and use probabilistic machinery to understand and bound the solutions to the Stein equation. We use this formulation to show that moments of super-critical Ising model on expander graphs are similar to moments of super-critical Curie-Weiss model at the same inverse temperature, in an average sense. We will also sketch applications to random graphs.
Th, Nov 1st, Xiaohui Chen (University of Illinois at Urbana-Champaign), Gaussian and bootstrap approximations of high-dimensional U-statistics with applications and extensions.
We shall first discuss the Gaussian approximation of high-dimensional and non-degenerate U-statistics of order two under the supremum norm. A two-step Gaussian approximation procedure that does not impose structural assumptions on the data distribution is proposed. Subject to mild moment conditions on the kernel, we establish the explicit rate of convergence that decays polynomially in sample size for a high-dimensional scaling limit, where the dimension can be much larger than the sample size. We also provide computable approximation methods for the quantiles of the maxima of centered U-statistics. Specifically, we provide a unified perspective for the empirical, the randomly reweighted, and the multiplier bootstraps as randomly reweighted quadratic forms, all asymptotically valid and inferentially first-order equivalent in high-dimensions. The bootstrap methods are applied on statistical applications for high-dimensional non-Gaussian data including: (i) principled and data-dependent tuning parameter selection for regularized estimation of the covariance matrix and its related functionals; (ii) simultaneous inference for the covariance and rank correlation matrices. In particular, for the thresholded covariance matrix estimator with the bootstrap selected tuning parameter, we show that the Gaussian-like convergence rates can be achieved for heavy-tailed data, which are less conservative than those obtained by the Bonferroni technique that ignores the dependency in the underlying data distribution. In addition, we also show that even for subgaussian distributions, error bounds of the bootstrapped thresholded covariance matrix estimator can be much tighter than those of the minimax estimator with a universal threshold. Time permitting, we will discuss some extensions to the infinite-dimensional version (i.e., U-processes of increasing complexity) and to the randomized inference via the incomplete U-statistics whose computational cost can be made independent of the order.
Th, Nov 8th, Daniel Schwarz (University College London), Recent results on quadratic BSDEs arising in equilibrium models.
We present a system of fully coupled BSDEs with quadratic growth and a discontinuity in its driving term. The system is shown to characterise the price of a risky asset in an endogenous Radner equilibrium model of an exchange economy: agents trade in an incomplete market and prices are set to ensure that markets clear at any time. We show that this system admits a solution without any unnatural smallness assumptions on the norm of the data. The proof relies, in particular, on two techniques used in the study of partial differential equations, the unique continuation and the backward uniqueness of solutions to differential inequalities. (Joint work with Hao Xing).
Th, Nov 15th, Vasileios Maroulas (University of Tennessee), Distributions of Persistence Diagrams and Approximations.
In this talk, a nonparametric way is introduced to estimate the global probability density function of a random persistence diagram. A kernel density function centered at a given persistence diagram and a given bandwidth is constructed. Our approach encapsulates the number of topological features and considers the appearance or disappearance of features near the diagonal in a stable fashion. In particular, the structure of our kernel individually tracks long persistence features, while considering features near the diagonal as a collective unit. The choice to describe short persistence features as a group reduces computation time while simultaneously retaining accuracy. Indeed, we prove that the associated kernel density estimate converges to the true distribution as the number of persistence diagrams increases and the bandwidth shrinks accordingly. Lastly, examples of kernel density estimation are presented for typical underlying datasets.
Th, Nov 29th, Stefanie Jegelka (MIT), Negative Dependence and Sampling in Machine Learning.
Discrete Probability distributions with strong negative dependencies (negative association) occur in a wide range of settings in Machine Learning, from probabilistic modeling to randomized algorithms for accelerating a variety of popular ML models. In addition, these distributions enjoy rich theoretical connections and properties. A prominent example are Determinantal Point Processes. In this talk, I will survey recent applications and developments, and in particular efficient, fast-mixing Markov Chains for sampling. The sampling results exploit connections with linear algebra and a specific use of classic quadrature, and, importantly, close connections with matroid theory and the theory of real stable polynomials. The resulting algorithms have theoretical convergence guarantees and are easily applicable in practice too. This talk is based on joint work with Chengtao Li, Zelda Mariet and Suvrit Sra.
Th, Dec 6th, Zhongyang Li (UConn), Phase transitions and scaling limits in lattice models.
The perfect matching is a subset of a graph where each vertex is incident to exactly one edge. It is a natural mathematical model for molecule structures, and can provide exact solutions to various other statistical mechanical models, including the celebrated Ising model and the 1-2 model. We will discuss the limit shape of the perfect matching when a rescaled graph approximates a certain simply-connected domain in the plane, as well as the frozen boundary, which is the boundary separating the frozen region and the liquid region. A closely related model is the 1-2 model, which is a probability measure on subgraphs of the hexagonal lattice where each vertex is incident to 1 or 2 edges. With the help of the dimer model, we can obtain a sharp phase transition result for the 1-2 model. We will also discuss the exact formula to compute the probability that a path occurs in a 1-2 model configuration, and almost sure non-existence of an infinite path, with the help of the mass-transport principle.

SPRING SEMESTER 2018

Th, Feb 1st, Antoine Jacquier (Imperial College London), Volatility options in rough volatility models.
We discuss the pricing and hedging of volatility options in some of the recently introduced rough volatility models. First, we develop efficient Monte Carlo methods and asymptotic approximations for computing option prices and hedge ratios in models where log-volatility follows a Gaussian Volterra process. While providing a good fit for European options, these models are unable to reproduce the VIX option smile observed in the market, and are thus not suitable for VIX products. To accommodate VIX options we therefore introduce modulated Volterra processes, and show that these models successfully capture the skew of VIX products. Joint work with Blanka Horvath (Imperial College London) and Peter Tankov (ENSAE, Paris).
Th, Feb 8th, Julio Castrillon (BU), Analytic regularity and stochastic collocation approximation for elliptic PDEs with random domain deformations.
In many physical processes the practicing engineer or scientist encounters the problem of optimal design under uncertainty of the underlying domain. For example, in the lithographic process of semi-conductor design the exact geometries of the designed patterns are not easy to control due to uncertainties. If there is no quantitative understanding in the involved domain uncertainty such a design may be carried out by trial and error. However, in order to accelerate the design cycle, it is essential to quantify the influence of this uncertainty on Quantities of Interest. Another example includes graphene sheet nano fabrication. In this talk we consider the problem of approximating the statistics of a given Quantity of Interest (QoI) that depends on the solution of a linear elliptic PDE defined over a random domain parameterized by $N$ random variables. The elliptic problem is remapped to a corresponding stochastic PDE with a fixed deterministic domain. We show that the solution can be analytically extended to a well defined region in $\C^{N}$ with respect to the random variables. A sparse grid stochastic collocation method is then used to compute the mean and standard deviation of the QoI. Convergence rates are derived and compared to those obtained in numerical experiments.
Th, Feb 15th, Alexandros Gelastopoulos (BU), A probabilistic model for markets of informational goods.
In markets of informational goods, like books, music albums or scientific papers, a great number of alternatives compete for consumer attention. People or the content curators may use simple rules to direct attention to promising alternatives, effectively decreasing the costs of search. A very broadly implemented heuristic strategy suggests ordering the alternatives according to their popularity. We study what happens when agents with diverse yet correlated preferences search alternatives in order of popularity and settle on the first satisficing alternative. We develop a probabilistic model to study long-term popularity dynamics in the market and show that markets almost surely converge to some stable popularity order. However, more than one stable popularity orders may exist. Random fluctuations early in time are reinforced by rich-get-richer dynamics, until the market settles on a popularity order, in general sub-optimal for the collective welfare.
Th, Feb 22nd, Keith Levin (University of Michigan), A central limit theorem for an omnibus embedding of random dot product graphs.
Performing statistical inference on collections of graphs is of import to many disciplines. Graph embedding, in which the vertices of a graph are mapped to vectors in a low-dimensional Euclidean space, has gained traction as a basic tool for graph analysis. In this talk, I will present an omnibus embedding in which multiple graphs on the same vertex set are jointly embedded into a single space with a distinct representation for each graph. I will show a central limit theorem for this omnibus embedding, and show that this simultaneous embedding into a common space allows comparison of graphs without the need to perform pairwise alignments of graph embeddings. I will present experimental results demonstrating that the omnibus embedding improves upon existing methods, allowing better power in multiple-graph hypothesis testing and yielding better estimation in a latent position model. If time allows, I will discuss preliminary work applying the omnibus embedding to brain imaging data.
Th, Mar 1st, Natallia Katenka (University of Rhodes Island), Increasing feedback from generation Z: students' attitudes and achievement in an introductory biostatistics course.
The Millennial Generation is phasing out of undergraduate classes and being replaced by the technologically savvy and visual learners of Generation Z. To help to increase our understanding of the learning needs and attitudes of this new population of students, we collected survey and grade data in an introductory biostatistics course over two semesters (Spring 2016, Spring 2017) at the University of Rhode Island. For Spring 2016 data, our purpose was three-fold. First, to increase the amount of immediate feedback collected from students by implementing weekly quizzes. These quizzes were analyzed using longitudinal mean response profiles and generalized linear mixed models to discover a significant effect of time on the student performance, but not of grade incentives. Next, students attitudes towards statistics were analyzed to determine how the starting attitudes effected performance using hierarchical linear models to find a significant effect of starting affect and cognitive competence on students final grades. Finally, regression trees were utilized to identify groups of learners who increased their attitude throughout the semester dependent on their starting attitude and final grade. In addition to the attitude component and students’ final grades, the follow-up study included the collection of information pertaining to students’ learning and teaching preferences, as well as their collaboration throughout the semester. Preliminary analysis of new data was performed using principal component analysis, clustering, structural equation modeling, and network data modeling revealed suggestive grouping patterns among students who share similar teaching/learning preferences and attitudes toward the subject.
Th, Mar 15th, Jeff Miller (Harvard), Robust inference using power posteriors: calibration and inference.
Small departures from model assumptions can lead to misleading inferences, especially as data sets grow large. Recent work has shown that robustness to small perturbations can be obtained by using a power posterior, which is proportional to the likelihood raised to a certain fractional power, times the prior. In many models, inference under a power posterior can be implemented via minor modifications of standard algorithms, however, mixture models present a particular challenge requiring new algorithms. We have found a simple and scalable algorithm that yields results very similar to the power posterior for mixture models, by modifying the standard Gibbs sampling algorithm to use power likelihoods for only the mixture parameter updates. Another challenge in the practical implementation of power posteriors is how to choose the power appropriately. We present a data-driven technique for choosing the power in an objective way to obtain robustness to small perturbations. We illustrate with real and simulated data, including an application to flow cytometry clustering.
Th, Mar 22nd, Ilwoo Cho (St Ambrose University), Semicircular elements: free probabilistic approaches.
In this talk, we briefly review the combinatorial free probability theory of Speicher, and consider the semicircular law from a free probabilistic point of view. The (free probabilistic) semicircular law is the free distribution of a so-called semicircular element, which is a free random variable (or an operator) of a topological *-probability space (resp., of a topological *-algebra) having free moments determined by the Catalan numbers. We are interested in how to construct such semicircular elements for fixed mutually-orthogonal integer-many projections in a C*-algebra. As an application, we discuss how to construct semicircular elements from p-adic analysis.
Th, Mar 29th, Yichen Qin (University of Cincinatti), Penalized maximum tangent likelihood estimation and robust variable selection.
We introduce a new class of mean regression estimators - penalized maximum tangent likelihood estimation - for high-dimensional regression estimation and variable selection. We first explain the motivations for the key ingredient, maximum tangent likelihood estimation (MTE), and establish its asymptotic properties. We further propose a penalized MTE for variable selection and show that it is root-n-consistent, enjoys the oracle property. The proposed class of estimators consists penalized L2 distance, penalized exponential squared loss, penalized least trimmed square and penalized least square as special cases and can be regarded as a mixture of minimum Kullback-Leibler distance estimation and minimum L2 distance estimation. Furthermore, we consider the proposed class of estimators under the high-dimensional setting when the number of variables d can grow exponentially with the sample size n, and show that the entire class of estimators (including the aforementioned special cases) can achieve the optimal rate of convergence in the order of sqrt{ln(d)/n}. Finally, simulation studies and real data analysis demonstrate the advantages of the penalized MTE.
Th, Apr 5th, Mingchu Gao (Louisiana College), Multidimensional free Poisson distribution limits in free stochastic integral algebras.
We will review the development and current trend of the fourth moment theory in free probability, and present our work in this area. For a bi-indexed sequence of free stochastic integrals in free Wigner algebra or free Poisson algebra, we proved that, under mild technical hypotheses, such a sequence converges in distribution to a free sequence of free Poisson random variables if and only if the moments of the sequence with order not greater than four converge to the corresponding moments of the limit sequence of random variables. Similar four-moment theorems hold when the limit sequence is not free, but has a multidimensional free Poisson distribution with parameters and a real number sequence.
Th, Apr 12th, Erol Peköz (BU), Wealth exchange, Bitcoin, Stein’s method, and dueling bandits.
We discuss several probability models and give some results and open questions surrounding them: a wealth exchange model from Econophysics (and its asymptotic analysis via Stein’s method) that explains why some wealth distributions arise in a society, a model for block generation in the Bitcoin blockchain that illustrates the pros and cons of reducing mining time variance, and a user preference model that can improve some content recommendation systems via Thompson sampling.
Th, Apr 19th, Chanmin Kim (BU), Bayesian Methods for Causal Inference in the Analysis of Power Plant Emission Controls.
Emission control technologies installed on power plant smokestacks are a key feature of many air pollution regulations in the US. While such regulations are predicated on the presumed relationships between emissions, ambient air pollution, and human health, many of these relationships have never been estimated or empirically verified amid the realities of actual regulatory implementation. The goal of this paper is to develop new statistical methods to quantify these relationships. We frame this problem as one of mediation analysis to evaluate the extent to which the effect of a particular control technology on ambient pollution is mediated through causal effects on power plant emissions. Since power plants emit various compounds that contribute to ambient pollution, we develop new statistical methods for settings with multiple intermediate mediating factors that are measured contemporaneously, may interact with one another, and may exhibit joint mediating effects. Specifically, we propose new methods leveraging two related frameworks for causal inference in the presence of mediating variables: principal stratification and causal mediation analysis. We define principal effects based on multiple mediators, and also introduce a new decomposition of the total effect of an intervention on ambient pollution into the natural direct effect and natural indirect effects for all mediating emissions jointly, each pair of emissions, and each emission individually. Both approaches are anchored to the exact same model for the observed data, which we specify with flexible Bayesian nonparametric techniques. We provide assumptions for estimating principal causal effects, then augment these with an additional assumption required to conduct a causal mediation analysis relying on natural direct and indirect effects. The principal stratification and causal mediation analyses are interpreted in tandem to provide the first comprehensive empirical investigation of the presumed causal pathways that motivate a variety of air quality regulatory policies. Further extensions will be discussed.
Th, Apr 26th, Henry Lam (Columbia), Assessing solution quality in stochastic optimization with limited data.
We study methods to assess the optimality gap, as a measurement of the quality of solutions, in stochastic optimization under limited-data situations. We demonstrate how viewing an optimistic bound for these problems through classical symmetric statistics leads to bagging-based approaches that are statistically more efficient than existing ones. We discuss the theoretical guarantees and computational requirements of our methods, and some extensions of our investigation to optimization problems where feasibility is also of interest.

FALL SEMESTER 2017

Th, Sep 14th, Annie Qu (UIUC), Individualized Multilayer Tensor Learning with An Application in Imaging Analysis.
This work is motivated by breast cancer imaging data produced by a multimodal multiphoton optical imaging technique. One unique aspect of breast cancer imaging is that different individuals might have breast imaging at different locations, which also creates a technical difficulty in that the imaging background could vary for different individuals. We develop a multilayer tensor learning method to predict disease status effectively through utilizing subject-wise imaging information. In particular, we construct an individualized multilayer model which leverages an additional layer of individual structure of imaging in addition to employing a high-order tensor decomposition shared by populations. In addition, to incorporate multimodal imaging data for different profiling of tissue, cellular and molecular levels, we propose a higher order tensor representation to combine multiple sources of information at different modalities, so important features associated with disease status and clinical outcomes can be extracted effectively. One major advantage of our approach is that we are able To capture the spatial information of microvesicles observed in certain modalities of optical imaging through combining multimodal imaging data,. This has medical and clinical significance since microvesicles are more frequently observed among cancer patients than healthy ones, and identification of microvesicles enables us to provide an effective diagnostic tool for early-stage cancer detection. This is joint work with Xiwei Tang and Xuan Bi.
Th, Sep 21st, Minh Tang (Johns Hopkins), Limit theorems for eigenvectors of the normalized Laplacian for random graphs.
We prove a central limit theorem for the components of the eigenvectors corresponding to the d largest eigenvalues of the normalized Laplacian matrix of a finite-dimensional random dot product graph. As a corollary, we show that for stochastic blockmodel graphs, the rows of the spectral embedding of the normalized Laplacian converge to multivariate normals and furthermore the mean and the covariance matrix of each row are functions of the associated vertex's block membership. Together with prior results for the eigenvectors of the adjacency matrix, we then compare, via the Chernoff information between multivariate normal distributions, how the choice of embedding method impacts subsequent inference. We demonstrate that neither embedding method dominates with respect to the inference task of recovering the latent block assignments.
Th, Oct 5th, Yihong Wu (Yale), Optimal estimation of Gaussian mixtures via denoised method of moments.
The Method of Moments is one of the most widely used methods in statistics for parameter estimation, obtained by solving the system of equations that match the population and estimated moments. However, in practice and especially for the important case of mixture models, one frequently needs to contend with the difficulties of non-existence or non-uniqueness of statistically meaningful solutions, as well as the high computational cost of solving large polynomial systems. Moreover, theoretical analysis of method of moments are mainly confined to asymptotic normality style of results established under strong assumptions. In this talk I will present some recent results for estimating Gaussians location mixtures with known or unknown variance. To overcome the aforementioned theoretic and algorithmic hurdles, a crucial step is to denoise the moment estimates by projecting to the truncated moment space before executing the method of moments. Not only does this regularization ensures existence and uniqueness of solutions, it also yields fast solvers by means of Gauss quadrature. Furthermore, by proving new moment comparison theorems in Wasserstein distance via polynomial interpolation and marjorization, we establish the statistical guarantees and optimality of the proposed procedure. These results can also be viewed as provable algorithms for Generalized Method of Moments which involves non-convex optimization. Extensions to multiple dimensions will be discussed. This is based on joint work with Pengkun Yang (Illinois).
Th, Oct 12th, Daniel Coombs (University of British Columbia), Stochastic approaches to mathematical modelling of HIV infection.
The overwhelming majority of mathematical models of viral infections are based on differential equations. These models provide a good approximation to the average behavior of the system when the numbers of infected cells and virions are high. However, during the first few days of infection, or during successful ongoing treatment that suppresses the viral load, this assumption is definitely violated. In this talk I will describe work with stochastic models (branching processes) that can give interesting insights into the population dynamics of HIV - for instance: the likelihood of extinction of HIV during long-term therapy, the window of opportunity for prophylactic treatment, and the duration of the gap between risky exposure and detectable infection.
Th, Oct 19th, Alan Izenman (Temple), On the Forensic Analysis of Latent Fingerprint Evidence.
Statistical thinking and practice can make a substantial contribution to the manner in which forensic science is handled in the laboratory and the courtroom. We first present some background on the history of fingerprint identification. Then, we describe the various types of impressions made by fingerprints and the automated techniques used to extract information. We set out the competing hypotheses used to compare a fingerprint found at a crime scene and a fingerprint from a potential suspect. The ACE-V system of latent fingerprint identification is described, and the errors in identification that have been made. We describe the fingerprint databases, such as AFIS, IAFIS, and NGI. We then propose a new method of matching minutiae of a pair of fingerprints by interpreting it as a two-sample problem in two dimensions. We adapt a graphical procedure that computes a nonparametric statistic R based upon a minimum spanning tree to the problem of matching a set of latent minutiae to a set of tenprint minutiae, and we apply the method to a set of fingerprint data. Suggestions are also made for estimating the standard error of R.
Th, Oct 26th, Alexandra Chronopoulou (UIUC), Fractional Stochastic Volatility Models: Statistical Inference & Hedging.
Long memory stochastic volatility (LMSV) models have been used to explain the persistence of volatility in the market, while rough stochastic volatility (RSV) models have been shown to reproduce statistical properties of low frequency financial data. In these two classes of models, the volatility process is often described by a fractional Ornstein-Uhlenbeck process with Hurst index H, where H>1/2 for LMSV models and H<1/2 for RSV models. In this talk, we focus on the long-range dependent case and propose a methodology for the estimation of the leverage effect (that is the correlation between the stock’s volatility and the stock returns), based on the discrete quadratic covariation of the processes. We also study the sensitivity of the option price with respect to the strike and determine when the option is underhedged, overhedged or perfectly hedged.
Th, Nov 2nd, Chen Kun (UConn), Dealing with uncertain suicidal deaths due to imperfect data integration: a first step towards a data-driven suicide prevention framework.
The concept of integrating data from disparate sources to accelerate scientific discovery has generated tremendous excitement in many fields. The potential benefits from data integration, however, may be compromised by the uncertainty due to imperfect record linkage. Motivated by a suicide risk study, we propose an approach for analyzing survival data with uncertain event records arising from data integration. Specifically, deaths identified from the hospital discharge records together with reported suicidal deaths determined by medical examiners may still not include all the death events of patients, and the missing deaths can be recovered from a complete database of death records. Since the hospital discharge data can only be linked to the death record data by matching basic patient characteristics, a patient with a censored death time from the first dataset could be linked to multiple potential event records in the second dataset. We develop an integrative Cox proportional hazards regression (iCox), in which the uncertainty in the matched event times is modeled probabilistically. The estimation procedure combines the ideas of profile likelihood and the expectation conditional maximization algorithm (ECM). Simulation studies demonstrate that under realistic settings of imperfect data linkage, iCox outperforms several competing approaches including multiple imputation. A marginal screening analysis using iCox is performed to identify risk factors associated with death following suicide-related hospitalization in Connecticut. The identified diagnostics codes provide several new insights on suicide risk prediction and prevention. This study is only a first step towards a data-driven suicide prevention. We will discuss other aspects of our proposal, include data unification, data fusion, and joint feature construction, selection and predictive modeling.
Th, Nov 9th, Tamara Broderick (MIT), Fast Quantification of Uncertainty and Robustness with Variational Bayes.
In Bayesian analysis, the posterior follows from the data and a choice of a prior and a likelihood. These choices may be somewhat subjective and reasonably vary over some range. Thus, we wish to measure the sensitivity of posterior estimates to variation in these choices. While the field of robust Bayes has been formed to address this problem, its tools are not commonly used in practice. We demonstrate that variational Bayes (VB) techniques are readily amenable to fast robustness analysis. Since VB casts posterior inference as an optimization problem, its methodology is built on the ability to calculate derivatives of posterior quantities with respect to model parameters. We use this insight to develop local prior robustness measures for mean-field variational Bayes (MFVB), a particularly popular form of VB due to its fast runtime on large data sets. A potential problem with MFVB is that it has a well-known major failing: it can severely underestimate uncertainty and provides no information about covariance. We generalize linear response methods from statistical physics to deliver accurate uncertainty estimates for MFVB---both for individual variables and coherently across variables. We call our method linear response variational Bayes (LRVB).
Th, Nov 16th, Ruoyu Wu (Brown), Large Deviation Principle for the Exploration Process of the Configuration Model.
The configuration model is a sequence of random graphs constructed such that in the large network limit the degree distribution converges to a pre-specified probability distribution. The component structure of such random graphs can be obtained from an infinite dimensional Markov chain referred to as the exploration process. We establish a large deviation principle for the exploration process associated with the configuration model. Proofs rely on a representation of the exploration process as a system of stochastic differential equations driven by Poisson random measures and variational formulas for moments of nonnegative functionals of Poisson random measures. Uniqueness results for certain controlled systems of deterministic equations play a key role in the analysis. Applications of the large deviation results, for studying asymptotic behavior of the degree sequence in large components of the random graphs, are discussed.
Th, Nov 30th, Vince Lyzinski (UMass Amherst), On consistent vertex nomination schemes.
Given a vertex of interest in a network G1, the vertex nomination problem seeks to find the corresponding vertex of interest (if it exists) in a second network G2. Although the vertex nomination problem and related tasks have attracted much attention in the machine learning literature, with applications to social and biological networks, the framework has so far been confined to a comparatively small class of network models, and the concept of statistically consistent vertex nomination schemes has been only shallowly explored. In this paper, we extend the vertex nomination problem to a very general statistical model of graphs. Further, drawing inspiration from the long-established classification framework in the pattern recognition literature, we provide definitions for the key notions of Bayes optimality and consistency in our extended vertex nomination framework, including a derivation of the Bayes optimal vertex nomination scheme. In addition, we prove that no universally consistent vertex nomination schemes exist. Illustrative examples are provided throughout.
Th, Dec 7th, David Nualart (KU), Functional central limit theorem for the self-intersection local time of the fractional Brownian motion.
The purpose of this talk is to discuss some recent results on the asymptotic properties of the self intersection local time of the multidimensional fractional Brownian motion. We will present a functional version of the central limit theorem for the renormalized self intersection local time of the d-dimensional fractional Brownian motion with Hurst parameter H satisfying 3/4>H>3/(2d). The tightness property is proved using techniques of Malliavin calculus. On the other hand, when the Hurst parameter H is greater than 3/4, we establish the convergence in mean square to a sum of independent Rosenblatt-type processes.

SPRING SEMESTER 2017

Th, Jan 12th, Yu Gu (Stanford), Scaling limits of fluctuations in stochastic homogenization.
Equations with small scales abound in physics and applied science. When the coefficients vary on microscopic scales, the local fluctuations average out under certain assumptions and we have the so-called homogenization phenomenon. In this talk, I will try to explain some probabilistic approaches we use to obtain the first order random fluctuations in stochastic homogenization. If homogenization is to be viewed as a law of large number type result, here we are looking for a central limit theorem. The tools we use include the Kipnis-Varadhan's method, a quantitative martingale central limit theorem and the Stein's method. Based on joint work with Jean-Christophe Mourrat.
Mo, Jan 19th, Fei Lu (Berkeley), Data-driven stochastic model reduction.
The need to infer reduced computational models of complex systems from discrete partial observations arises in many scientific and engineering applications, for example in climate prediction, materials science, and biology. The challenges come mainly from memory effects due to unresolved scales, from nonlinear interactions between resolved and unresolved scales, and from the difficulty in drawing inferences from discrete partial data. We address these challenges by a discrete-time stochastic parametrization method, and demonstrate by examples that the resulting stochastic reduced models can capture the key statistical dynamical features of the full system and make accurate short-term predictions. The examples include the Lorenz 96 system (which is a simplified model of the atmosphere) and the Kuramoto-Sivashinsky equation that describes spatiotemporally chaotic dynamics.
Fr, Jan 20th, Sanchayan Sen (McGill), Random discrete structures: Phase transitions, scaling limits, and universality.
The aim of this talk is to give an overview of some recent results in two interconnected areas: a) Random graphs and complex networks: The last decade of the 20th century saw significant growth in the availability of empirical data on networks, and their relevance in our daily lives. This stimulated activity in a multitude of fields to formulate and study models of network formation and dynamic processes on networks to understand real-world systems. One major conjecture in probabilistic combinatorics, formulated by statistical physicists using non-rigorous arguments and enormous simulations in the early 2000s, is as follows: for a wide array of random graph models on n vertices and degree exponent tau>3, typical distance both within maximal components in the critical regime as well as on the minimal spanning tree on the giant component in the supercritical regime scale like n^{\frac{\tau\wedge 4 -3}{\tau\wedge 4 -1}}. In other words, the degree exponent determines the universality class the random graph belongs to. The mathematical machinery available at the time was insufficient for providing a rigorous justification of this conjecture. More generally, recent research has provided strong evidence to believe that several objects, including (i) components under critical percolation, (ii) the vacant set left by a random walk, and (iii) the minimal spanning tree, constructed on a wide class of random discrete structures converge, when viewed as metric measure spaces, to some random fractals in the Gromov-Hausdorff-Prokhorov sense, and these limiting objects are universal under some general assumptions. We report on recent progress in proving these conjectures. b) Stochastic geometry: In contrast, less precise results are known in the case of spatial systems. We discuss a recent result concerning the length of spatial minimal spanning trees that answers a question raised by Kesten and Lee in the 90's, the proof of which relies on a variation of Stein's method and a quantification of the classical Burton-Keane argument in percolation theory. Based on joint work with Louigi Addario-Berry, Shankar Bhamidi, Nicolas Broutin, Sourav Chatterjee, Remco van der Hofstad, and Xuan Wang.
Mo, Jan 23rd, Michael Salins (BU), Uniform large deviations principle for a general class of stochastic partial differential equations. (Unusual location: MCS B21)
A rare weather event is sometimes called a "100 year storm" if the event is so unlikely that it happens on average only about once per century. As this phrase suggests, there is a strong connection between the probabilities of rare events and the time it takes those events to occur. The theory of large deviations was developed in the 1960s by Varadhan, Freidlin, Wentzell and others to quantify both the decay rates of probabilities of rare events for finite-dimensional stochastic differential equations and the growth rates of the so-called exit times, the amount of time it takes for those events to occur. The exit time problems require the large deviations principles to be uniform with respect to initial conditions in bounded sets. Over the past few decades, researches have proven uniform large deviations principles for many examples of stochastic partial differential equations, but the methods tend to be equation specific and dependent on the chosen topology of the function space. In this talk, I demonstrate how to use a weak convergence approach and the uniform Laplace principle to prove large deviations principles that are uniform with respect to initial conditions in bounded sets. This is a needed improvement over the previous formulations which only could be used to prove uniformity over compact sets. The method works for a large class of semilinear Banach-space-valued stochastic differential equations whose linear part generates a compact semigroup.
We, Jan 25th, Martina Hofmanova (TU Berlin), Randomness in convection-diffusion problems.
In this talk, I will consider quasilinear parabolic PDEs subject to stochastic or rough perturbation and explain how various assumptions on coefficients and roughness of the noise naturally ask for different notions of solution with different regularity properties and different techniques of the proofs. On the one hand, the problems under consideration will be stochastic second order parabolic PDEs with noise smooth in space, either with a possible degeneracy in the leading order operator, where only low regularity holds true, or under the uniform ellipticity assumption, where arbitrarily high regularity can be proved under suitable assumptions on the coefficients. On the other hand, I will discuss a rough pathwise approach towards these problems based on tools from paracontrolled calculus.
Th, Jan 26th, Andrey Sarantsev (UCSB), Competing Brownian particles.
We study finite and infinite rank-based systems of Brownian particles on the real line, with drift and diffusion coefficients of a particle depending on the current rank relative to other particles. These systems have applications in financial modeling, exclusion processes, and other areas.
Th, Feb 16th, Afonso Bandeira (NYU), On Phase Transitions for Spiked Random Matrix and Tensor Models.
A central problem of random matrix theory is to understand the eigenvalues of spiked random matrix models, in which a prominent eigenvector (or low rank structure) is planted into a random matrix. These distributions form natural statistical models for principal component analysis (PCA) problems throughout the sciences, where the goal is often to recover or detect the planted low rank structured. In this talk we discuss fundamental limitations of statistical methods to perform these tasks and methods that outperform PCA at it. Emphasis will be given to low rank structures arising in Synchronization problems. Time permitting, analogous results for spiked tensor models will also be discussed. Joint work with: Amelia Perry, Alex Wein, and Ankur Moitra.
Th, Feb 23rd, Yao Li (UMass Amherst), Polynomial convergence rate to nonequilibrium steady-state.
In this talk I will present my recent result about the ergodic properties of nonequilibrium steady-state (NESS) for a stochastic energy exchange model. The energy exchange model is numerically reduced from a billiards-like deterministic particle system that models the microscopic heat conduction in a 1D chain. By using a technique called the induced chain method, I proved the existence, uniqueness, polynomial speed of convergence to the NESS, and polynomial speed of mixing for the stochastic energy exchange model. All of these are consistent with the numerical simulation results of the original deterministic billiards-like system.
Th, Mar 2nd, Jun Yan (UConn), Stagewise generalized estimating equations with grouped variables.
Forward stagewise estimation is a revived slow-brewing approach for model building that is particularly attractive in dealing with complex data structures for both its computational efficiency and its intrinsic connections with penalized estimation. Under the framework of generalized estimating equations, we study general stagewise estimation approaches that can handle clustered data and non-Gaussian/non-linear models in the presence of prior variable grouping structure. As the grouping structure is often not ideal in that even the important groups may contain irrelevant variables, the key is to simultaneously conduct group selection and within-group variable selection, i.e., bi-level selection. We propose two approaches to address the challenge. The first is a bi-level stagewise estimating equations (BiSEE) approach, which is shown to correspond to the sparse group lasso penalized regression. The second is a hierarchical stagewise estimating equations (HiSEE) approach to handle more general hierarchical grouping structure, in which each stagewise estimation step itself is executed as a hierarchical selection process based on the grouping structure. Simulation studies show that BiSEE and HiSEE yield competitive model selection and predictive performance compared to existing approaches. We apply the proposed approaches to study the association between the suicide-related hospitalization rates of the 15--19 age group and the characteristics of the school districts in the State of Connecticut.
Th, Mar 16th, Han Xiao (Rutgers), On the cross correlations under high dimension.
As an initial step before modeling high dimensional time series, it is of interest to check whether the component series are correlated. We suggest to perform the test based on the sample cross correlations of the original series, under the presence of temporal dependence. We consider test statistics based on: maximum sample cross correlations, maximum of the pairwise portmanteau type statistics, and some other variants. Asymptotics are developed in the high dimensional setting where the dimension p can grow either as a power of the sample size T, or as an exponential function of T. We employ the moving blocks bootstrap method to calibrate the sizes of the tests for finite samples. Extensions to nonstationary time series are also considered.
Th, Mar 23rd, Zhengjun Zhang (University of Wisconsin-Madison), ATM: autoregressive tail-index model for maxima in financial time series.
Classical generalized extreme value (GEV) models have been widely used in the practice of financial risk management for the modeling of extreme observations such as intra-day maximum loss from high-frequency trading or maximum daily loss across a large number of assets in a given portfolio. However, due to the time dependency of financial time series, the classical GEV model, as a static model, cannot fulfill the task of adequately modeling the time-varying behavior of extreme observations. In this paper we integrate the classical GEV with dynamic modeling approach to introduce a novel dynamic GEV framework. Specifically, an autoregressive tail-index model (ATM) is proposed to capture the time-varying tail risk of financial market. Probabilistic properties of the model are studied and an irregular maximum likelihood estimator is used for model estimation, with its asymptotic properties investigated. Finite sample performance is illustrated by simulations. The results of two real data examples in which ATM is used for market tail risk monitoring and VaR calculation are presented, where significant improvement over classical GEV has been observed.
Th, Mar 30th, Tyler McCormick (University of Washington), Estimating features of a social network using a sample.
An individual's social environment influences many economic and health behaviors. Social network data, consisting of interactions or relationships between individuals, provide a glimpse of this environment but are extremely arduous to obtain. Collecting network data via surveys is financially and logistically prohibitive in many circumstances, whereas online network data are often proprietary and only informative about a subset of possible relationships. Designing efficient sampling strategies, and corresponding inference paradigms, for social network data is, therefore, fundamental for scaleable, generalizable network research in the social and behavioral sciences. This talk proposes methods that estimate network features (such as centrality or the fraction of a network made up of individuals with a given trait) using data that can be collected using standard surveys. These data, known as aggregated relational data (ARD), poll individuals about the number of connections they have with certain groups in the population, but do not measure any links in the graph directly. We demonstrate the the utility of the proposed models using data from a savings monitoring experiment in India. This is joint work with Emily Breza, Arun Chandrasekhar, and Mengjie Pan.
Th, Apr 6th, Brent Nelson (Berkeley), Free Stein kernels and an improvement of the free logarithmic Sobolev inequality.
In their 2015 paper, Ledoux, Nourdin, and Peccati use Stein kernels and Stein discrepancies to improve the classical logarithmic Sobolev inequality (relative to a Gaussian distribution). Simply put, Stein discrepancy measures how far a probability distribution is from the Gaussian distribution by looking at how badly it violates the integration by parts formula. In free probability (i.e. non-commutative probability), the analogue of the Gaussian distribution is the semicircle law, which arises as the joint distribution of certain natural self-adjoint operators. Moreover, the semicircle law is known to also satisfy an "integration by parts formula." Using this fact, one can define the non-commutative analogues of Stein kernels and Stein discrepancies and use them to produce an improvement of Biane and Speicher's free logarithmic Sobolev inequality from 2001. In this talk, we will address these ideas after providing a light introduction to free probability. This is based on joint work with Max Fathi.
Th, Apr 13th, Jan Rosinski (University of Tennessee), Isomorphism identities for perturbed infinitely divisible random fields.
We consider infinitely divisible random fields perturbed by an additive independent noise. We investigate admissible perturbations under which the perturbed field, which need not be infinitely divisible, is absolutely continuous with respect to the unperturbed one, and establish the related isomorphism identities. The celebrated Dynkin's isomorphism theorem is an example of such phenomenon, where the local time of a Markov process is the perturbation.
Th, Apr 20th, Simon Campese (University of Luxembourg), A limit theorem for the moments in space of Brownian local time increments.
We present a limit theorem for moments in space of the increments of Brownian local time. As special cases for the second and third moments, previous results by Chen et al. and Rosen, which were later reproven by Hu and Nualart and Rosen are included and a conjecture of Rosen for the fourth moment is settled. In comparison to the previous methods of proof, we follow a fundamentally different approach by exclusively working in the space variable of the Brownian local time, which allows to give a unified argument for arbitrary orders. The main ingredients are Perkins' semimartingale decomposition, the Kailath-Segall identity and an asymptotic Ray-Knight Theorem by Pitman and Yor.
Th, Apr 27th, Brian Caffo (Johns Hopkins), Am I my connectome? Fingerprinting with repeated resting state functional MRI data.
In the context of resting state functional MRI (rs-fMRI), fingerprinting is the practice of matching a set of subjects to themselves using only rs-fMRI correlations. The quality of the matching is then validated using the subjects' IDs. A statistical inference on this matching is often performed using permutation tests. We discuss many aspects of this process in this talk. First, we discuss desired invariances in the matching process and distance metric. Secondly, we discuss matching statistics and strategies and the resulting null distributions they induce. Thirdly, we discuss variations on the null hypothesis, which is typically left unspecified despite the calculation of a permutation based null distribution. We discuss these topics in the context of the rich history of this problem, spanning over two centuries from Montmort's matching problem.

FALL SEMESTER 2016

Th, Sep 22nd, David Lipshutz (Brown University), Sensitivity analysis for the invariant measure of reflected Brownian motion.
Reflected Brownian motions (RBMs) in polyhedral cones arise in a variety of applications ranging from queueing theory to mathematical finance. The invariant measure of an RBM (assuming it exists) is often used to approximate the long time behavior of the RBM, and depends on parameters that describe the RBM - namely, the drift vector, covariance matrix and directions of reflection. The focus of this talk is to understand sensitivities of the invariant measure to these parameters. In particular, we show that sensitivities of the invariant measure can be characterized using the invariant measure of a joint process which consists of an RBM and its so-called pathwise derivative. One of the main challenges is to establish existence and uniqueness for the invariant measure of this joint process.
Tu, Sep 29th, Soumendra Lahiri (NCSU), A frequency domain empirical likelihood method for irregularly spaced spatial data.
In this talk, we consider empirical likelihood methodology for irregularly spaced spatial data in the frequency domain. The main result of the paper shows that upto a suitable (and nonstandard) scaling, Wilk’s phenomenon holds for the logarithm of the empirical likelihood ratio in the sense that it is asymptotically distribution free and has a chi-squared limit. As a result, the proposed spatial FDEL method can be used to build nonparametric, asymptotically correct confidence regions and tests for a class spectral parameters that are defined through spectral estimating equations. A major advantage of the method is that unlike the more common studentization approach, it does not require explicit estimation of the standard error, which itself is a difficult problem due to intricate interactions among several unknown quantities, including the spectral density of the spatial process, the spatial sampling density and the spatial asymptotic structure. Applications of the methodology to some important inference problems for spatial data are given. Joint work with Soutir Bandyopadhyay and Dan Nordman.
Th, Oct 6th, Harry Crane (Rutgers University), The edge exchangeable framework for network modeling.
Most of the statistical networks literature focuses on theory and methods for inference for data from one of a few default models. For several reasons, these default models, e.g., stochastic blockmodels and graphon models, fail to possess basic statistical properties, raising questions about the soundness of inferences based on these models. I will outline a general framework that clarifies the major issues of statistical network modeling and lends some insight for resolving them. Within this framework, I introduce the class of edge exchangeable network models, which addresses the longstanding problem of modeling sparse network structures in a way the permits sound inference. This is joint work with Walter Dempsey, U. Michigan.
Fr, Oct 13th, Kung-Sik Chang (University of Iowa), Inference for threshold diffusions.
The threshold diffusion model assumes the underlying diffusion process to have a piece-wise linear drift term and a piece-wise smooth diffusion term, which is useful for analyzing nonlinear continuous-time processes. In practice, the functional form of the diffusion term is often unknown. We develop a quasi-likelihood approach for testing and estimating a threshold diffusion model, by employing a constant working diffusion term, which amounts to a least squares approach. Large-sample properties of the proposed methods are derived under mild regularity conditions. Unlike the discrete-time case, the threshold estimate admits a closed-form asymptotic distribution. We apply the threshold model to examine the nonlinearity in the term structure of a long time series of US interest rates.
Th, Oct 20th, Han Liang Gan (Northwestern), Dirichlet approximation of genetic drift models.
Abstract: A genetic drift model studies how gene variants and their frequencies evolve in time. However, even for a relatively innocuous looking model, the exact distribution is often intractable. As a result, approximate distributions may be useful. The Dirichlet distribution takes values in K dimensional space where the sum of the entries is equal to 1. This makes it a natural candidate for the approximation of genetic drift models. In this talk we will discuss various genetic drift models (such as the Wright-Fisher model), their approximating Dirichlet distributions, and calculate explicit error bounds for the approximations. If time permits we will cover the Stein's method framework used to derive the results and offer some insights regarding their derivation.
Th, Nov 3rd, David Degras (UMass Boston), A high dimensional group fused lasso.
Group fused lasso (GFL) is a powerful approach to sparse linear regression problems subject to structural constraints. It is widely used in machine learning, signal processing, and bioinformatics for tasks such as prediction, signal recovery, segmentation, and change point detection. From a computational perspective, GFL is a nonsmooth convex optimization problem that can be solved by off-the-shelf methods such as proximal algorithms and subgradient methods. In high dimension however, these methods require intensive computations and may only approximately enforce structural constraints. To address these concerns, I present a new GFL method that combines block coordinate descent, which is fast but has no convergence guarantees, with subgradient descent, which is slower but provably converges to a global solution. The proposed method is compared to the state of the art in a numerical experiment. It is also applied to resting-state fMRI data to investigate dynamic brain connectivity. Open questions of parameter selection and statistical inference are set forth.
Th, Nov 17th, Aidong Ding (Northeastern), A robust-equitable dependence measure for feature selection.
Dependence measure plays an important role in filter-based feature selection. To correctly identify important features with complex relationship in large data sets, we like the measure to be equitable (Reshef et al. Science, 2011): treating all types of functional relationships, linear and nonlinear, equally. We provides a theoretical treatment of equitability, including the self-equitability definition (Kinney and Atwal, PNAS 2014) and a new robust-equitablity definition. The robust copula dependence (RCD) measure based on L1-distance of copula density is shown to be equitable under all equitability definitions. We also provide theoretical justification that RCD can be fundamentally easier to estimate than mutual information (MI), the recommended self-equitable measure in Kinney and Atwal. Numerical examples, on synthetic data sets and real data sets illustrate the effect of equitability in feature ranking and selection. Particularly, selection based on RCD can be more robust to varying sample size than selection through MI and other measures.
Th, Dec 1st, Dean Eckles (MIT), Massive meta-analysis using regularized instrumental variables, with an application to peer effects.
The widespread adoption of randomized experiments (i.e. A/B tests) in the Internet industry means that there are often numerous well-powered experiments on a given product. Individual experiments are often simple "bake-off" evaluations of a new intervention: They allow us to estimate effects of that particular intervention on outcomes of interest, but they are often not informative about the mechanisms for these effects or what other inventions might do. We consider what else we can learn from a large set of experiments. In particular, we use many experiments to learn about the effects of the various endogenous variables (or mechanisms) via which the experiments affect outcomes. This involves treating the experiments as instrumental variables, and so this setting is similar to, but somewhat different from, "many instrument" settings in econometrics and biostatistics. Motivated by the distribution of experiment first-stage effects, we present and evaluate regularization methods for improving on standard IV estimators. Joint work with Alex Peysakhovich (Facebook AI Research).
Th, Dec 8th, Scott Robertson (BU), Robust asymptotic growth in the presence of stability.
In this talk, we revisit the problem considered in "Robust Aysmptotic Growth" (Kardaras, Robertson 2012, Annals of Applied Probability) where the investor seeks to maximize the growth rate of her portfolio when there is uncertainty in the drift of asset prices. In this setting, while the instantaneous covariance matrix and domain of the underlying asset prices are known, the precise drift is unknown, beyond the qualitative statement that asset prices do not "explode" to the boundary over the investment horizon. Therein, robust growth optimal portfolios are constructed via the generalized principal eigenfunction for a degenerate elliptic operator, and such portfolios are seen as the long horizon limit of the functionally generated finite horizon relative arbitrage portfolios introduced by Fernholz and Karatzas in their work on stochastic portfolio theory. In the present work, we seek to extend the robust growth optimal analysis to the situation where in addition to knowing asset prices do not explode to the boundary of the state space, the investor also knows that asset prices are stable over time. Such a setting naturally arises in the study of ranked based portfolios where optimal policies are driven not by the asset prices themselves, but rather the ranked relative market capitalizations. In this setting, we provide simple conditions upon the domain, covariance matrix and limiting invariant density under which growth optimal portfolios may be constructed. Here, the answer is relatively easy to obtain when an associated diffusion is symmetric, or reversing, but requires a very delicate analysis in the non-symmetric case. Growth optimal portfolios are governed by a solution to a variational problem in the space of functions which are locally in W^{1,2} the space of square integrable weakly differentiable functions. After presenting the results for the case when asset prices do not exhibit local time behavior on the boundary of the state space, the case containing local times will be considered, as this is the natural setting for ranked based diffusions, which is the primary example of interest. This is joint work with Kostas Kardaras, of the London School of Economics.

SPRING SEMESTER 2016

Tu, Jan 19th (in MCS B21), Daniel Sussman (Harvard), Adjacency spectral embedding for random graphs.
The eigendecomposition of an adjacency matrix provides a way to embed a graph as points in finite dimensional Euclidean space. This embedding allows the full arsenal of statistical and machine learning methodology for multivariate Euclidean data to be deployed for graph inference. Our work analyzes this embedding, a graph version of principal component analysis, in the context of various random graph models with a focus on the impact for subsequent inference. We show that for a particular model this embedding yields a consistent estimate of its parameters and that these estimates can be used to accurately perform a variety of inference tasks including vertex clustering, vertex classification as well as estimation and hypothesis testing about the parameters.
Th, Jan 21st, Colin B. Fogarty (UPenn), Leveraging multiple outcomes in matched observational studies.
In order to bridge the gap between association and causation in observational studies, Fisher advocated for the testing of “elaborate theories.” One manner in which a causal theory can be made elaborate is through the prediction of a particular direction of effect for multiple outcome variables. When testing hypotheses on multiple outcomes, multiple comparisons must be taken into account. This is true not only when assuming no unmeasured confounding, but also when assessing how robust a study's findings are to unmeasured confounding in the subsequent sensitivity analysis. Concerns over a loss in power may lead practitioners to instead investigate the outcome variable they believe *a priori* will be most affected by the intervention, thus reducing the extent to which Fisher's advice is followed in practice. We demonstrate that when performing multiple comparisons in a sensitivity analysis, the loss in power from controlling the familywise error rate can be attenuated. This is because unmeasured confounding cannot have a different impact on the probability of assignment to treatment for a given individual depending on the outcome being analyzed. Existing methods for testing the overall truth of multiple hypotheses allow this to occur by combining the results of sensitivity analyses performed on individual outcomes. By solving a quadratically constrained linear program, we are able to perform a sensitivity analysis while avoiding this logical inconsistency. We show that this allows for uniform improvements in the power of a sensitivity analysis when compared to combining individual sensitivity analyses. This is true not only for testing the overall null across outcomes, but also for testing null hypotheses on specific outcome variables when using certain sequential rejection procedures. We illustrate our method through an example examining the impact of smoking on naphthalene levels in the body.
Tu, Jan 26th, Veronika Rockova (UPenn), Fast Bayesian factor analysis via automatic rotations to sparsity.
Rotational post-hoc transformations have traditionally played a key role in enhancing the interpretability of factor analysis. Regularization methods also serve to achieve this goal by prioritizing sparse loading matrices. In this work, we bridge these two paradigms with a unifying Bayesian framework. Our approach deploys intermediate factor rotations throughout the learning process, greatly enhancing the eﬀectiveness of sparsity inducing priors. These automatic rotations to sparsity are embedded within a PXL-EM algorithm, a Bayesian variant of parameter-expanded EM for posterior mode detection. By iterating between soft-thresholding of small factor loadings and transformations of the factor basis, we obtain (a) dramatic accelerations, (b) robustness against poor initializations and (c) better oriented sparse solutions. To avoid the pre-speciﬁcation of the factor cardinality, we extend the loading matrix to have inﬁnitely many columns with the Indian Buﬀet Process (IBP) prior. The factor dimensionality is learned from the posterior, which is shown to concentrate on sparse matrices. Our deployment of PXL-EM performs a dynamic posterior exploration, outputting a solution path indexed by a sequence of spike-and-slab priors. For accurate recovery of the factor loadings, we deploy the Spike-and-Slab LASSO prior, a two-component reﬁnement of the Laplace prior (Rockova 2015). A companion criterion, motivated as an integral lower bound, is provided to eﬀectively select the best recovery. The potential of the proposed procedure is demonstrated on both simulated and real high-dimensional gene expression data, which would render posterior simulation impractical.
Th, Jan 28th, Sumanta Basu (UC Berkeley), Learning dynamics of complex systems from high-dimensional data.
The problem of learning interrelationships among the components of large, complex systems from high-dimensional datasets is common in many areas of modern economic and biological sciences. Examples include macroeconomic policy making, financial risk management, gene regulatory network reconstruction and elucidating functional roles of epigenetic regulators driving cellular mechanisms. In addition to their inherent computational challenges, principled statistical analyses of these big data problems often face unique challenges emerging from temporal and cross-sectional dependence in the data and complex dynamics (heterogeneity, nonlinear and high-order interactions) among the system components. In this talk, I will present Network Granger causality - a unified framework for structure learning and forecasting of large dynamic systems using multivariate time series and panel data. The proposed framework relies on regularized estimation of high-dimensional vector autoregressive models (VAR), is flexible enough to incorporate grouping and latent structures, allows parallel implementation for large scale data sets and enjoys strong theoretical guarantees under high-dimensional scaling. I will demonstrate the advantage of the proposed methodology on a motivating application from financial econometrics - system-wide risk monitoring of U.S. financial sector before, during and after the crisis of 2007-2009. I will conclude with some of my ongoing works on learning nonlinear and potentially high-order interactions in high-dimensional, heterogeneous settings.
Fr, Jan 29th, Tirthankar Dasgupta (Harvard), Designing experiments for new-generation scientific studies: some challenges and potential solutions.
Many modern-day experiments conducted by researches in the physical, social, behavioral, biomedical and management/business sciences involve complications like (a) simultaneous study of multiple factors (b) availability of multiple covariate measurements for each experimental unit – before or after conducting the experiment, necessitating a strategy for achieving covariate balance/adjustment across treatment groups and (c) varying level of randomization restrictions across factors. In this talk, we will present real-life examples of these complications from different fields of application, and discuss some ideas and research results that focus on the development of a unified approach for designing and analyzing experiments that address all of the above complexities.
Th, Feb 4th, Christine B. Peterson (Stanford), Statistical approaches for making sense of high-throughput biological data.
In this talk, I will discuss statistical approaches I have developed to gain insight into the complex networks of regulation and interaction that govern biological systems. Understanding these networks and how they are disrupted by disease is an important step in identifying potential targets for the treatment of disease. Firstly, I will describe my work on the inference of biological networks such as metabolic or protein interaction networks from high-throughput data. In particular, I will address graphical modeling methods I have proposed in the Bayesian framework for inferring such networks based on limited sample sizes, and illustrate the application of these approaches to highlight mechanisms underlying cancer progression. Secondly, I will address the problem of establishing the genetic basis of multivariate traits such as gene expression or other molecular profiling data. Here I propose a multi-stage multiple testing procedure which controls important error rates regarding the discovery of regulatory variants and the association of these variants to traits.
Th, Mar 3rd, Karl Rohe (Univ of Wisconsin Madison), Network driven sampling: a critical threshold for design effects.
Web crawling and respondent-driven sampling (RDS) are two types of network driven sampling techniques that are popular when it is difficult to contact individuals in the population of interest. This paper studies network driven sampling as a Markov process on the social network that is indexed by a tree. Each node in this tree corresponds to an observation and each edge in the tree corresponds to a referral. Indexing with a tree, instead of a chain, allows for the sampled units to refer multiple future units into the sample. In survey sampling, the design effect characterizes the additional variance induced by a novel sampling strategy. If the design effect is D, then constructing an estimator from the novel design makes the variance of the estimator D times greater than it would be under a simple random sample. Under certain assumptions on the referral tree, the design effect of network driven sampling has a critical threshold that is a function of the referral rate m and the clustering structure in the social network, represented by the second eigenvalue of the Markov transition matrix \lambda_2. If m < 1/\lambda_2^2, then the design effect is finite (i.e. the standard estimator is \sqrt{n}-consistent). However, if m > 1/\lambda_2^2, then the design effect grows with n (i.e. the standard estimator is no longer \sqrt{n}-consistent; it converges at the slower rate of \log_m \lambda_2).
Th, Mar 17th, Simon Campese (University of Rome Tor Vergata), Abstract fourth moment theorems.
The classical Fourth Moment Theorem says that for a normalized sequence of multiple Wiener-Itô integrals, convergence of just the fourth moment suffices to ensure convergence in law towards a standard Gaussian random variable. Since its discovery, several proofs and extensions of this result have been found, all of them heavily exploiting the rich structure of multiple integrals. In an exciting new development, it turned out that such Fourth Moment Theorems hold in much greater generality, namely for generic eigenfunctions of Markov diffusion generators with a certain chaotic property and target laws fulfilling some sufficient condition (examples being the Gaussian, Gamma and Beta distribution). We will present an overview of this new approach.
Th, Mar 24th, Nikolai Leonenko (Cardiff University), Limit theorems for weighted non-linear transformations of Gaussian processes with singular spectrum.
The limit Gaussian distribution of multivariate weighted functionals of non-linear transformations of Gaussian stationary processes, having multiple singular spectra, is derived, under very general conditions on the weight function. This work is motivated by applications to the estimation of a harmonic components in non-linear regression model with singular spectrum, and asymptotic inference on non-linear functionals of Gaussian stationary processes with singular spectra. This is a continuation of the pioneering results of Rosenblatt (1961), Taqqu (1975,1979), Dobrushin and Major (1979) for convergence to Gaussian and non-Gaussian distributions, under long range dependence, in terms of Hermite expansions, and Breuer and Major (1983), Avram and Brown (1989), Chambers and Slud (1989) on convergence to the Gaussian distribution by using diagram formulae or graphical methods. This line of research continues to be of interest today, see Berman (1992) for m-dependent approximation approach, Ho and Hsing (1997) for martingale approach, Nualart and Pecatti (2005), Nourdin and Pecatti (2009) for the application of Malliavin calculus and Stein method, among the others. This is a joint work with A.V. Ivanov, M.D. Ruiz-Medina, M.D. and I.N. Savich.
Th, Mar 31st, Ivan Fernandez-Val (Boston University), The sorted effects method: discovering heterogeneous effects beyond their averages.
The partial (ceteris paribus) effects of interest in nonlinear and interactive linear models are heterogeneous as they can vary dramatically with the underlying observed or unobserved covariates. Despite the apparent importance of heterogeneity, a common practice in modern empirical work is to largely ignore it by reporting average partial effects (or, at best, average effects for some groups, see e.g. Angrist and Pischke, 2008). While average effects provide very convenient scalar summaries of typical effects, by definition they fail to reflect the entire variety of the heterogenous effects. In order to discover these effects much more fully, we propose to estimate and report sorted effects - a collection of estimated partial effects sorted in increasing order and indexed by percentiles. By construction the sorted effect curves completely represent and help visualize all of the heterogeneous effects in one plot. They are as convenient and easy to report in practice as the conventional average partial effects. We also provide a quantification of uncertainty (standard errors and confidence bands) for the estimated sorted effects. We apply the sorted effects method to demonstrate several striking patterns of gender-based discrimination in wages, and of race-based discrimination in mortgage lending. Using differential geometry and functional delta methods, we establish that the estimated sorted effects are consistent for the true sorted effects, and derive asymptotic normality and bootstrap approximation results, enabling construction of pointwise confidence bands (pointwise with respect to percentile indices). We also derive functional central limit theorems and bootstrap approximation results, enabling construction of simultaneous confidence bands (simultaneous with respect to percentile indices). The derived statistical results in turn rely on establishing Hadamard differentiability of a multivariate sorting operator, a result of independent mathematical interest. This is a joint work with Victor Chernozhukov and Ye Luo.
Th, Apr 14th, Kun Chen (UConn), Sequential estimation in sparse factor regression.
Multivariate regression models of large scales are increasingly required and formulated in various fields. A sparse singular value decomposition of the regression component matrix is appealing for achieving dimension reduction and facilitating model interpretation. However, how to recover such a composition of sparse and low-rank structures remains a challenging problem. By exploring the connections between factor analysis and reduced-rank regression, we formulate the problem as a sparse factor regression and develop an efficient sequential estimation procedure. At each sequential step, a latent factor is constructed as a sparse linear combination of the observed predictors, for predicting the responses after accounting for the effects of the previously found latent factors. Comparing to the complicated joint estimation approach, a prominent feature of our proposed sequential method is that each step reduces to a simple regularized unit-rank regression, in which the orthogonality requirement among the sparse factors becomes optional rather than necessary. The ideas of coordinate descent and Bregman iterative methods are utilized to ensure fast computation and algorithmic convergence, even in the presence of missing data and when exact orthogonality is desired. Theoretically, we show that the sequential estimators enjoy the oracle properties for recovering the underlying sparse factor structure. The efficacy of the proposed approach is demonstrated by simulation studies and two real applications in genetics.
Th, Apr 21th, Daniel Schwarz (Carnegie Mellon), Integral representation of martingales in mathematical finance.
In this talk we will present recent results concerning a class of integral representation theorems for martingales which lie at the heart of two fundamental problems in mathematical finance: the completion of financial markets with derivative securities and the existence of partial Radner equilibria. Some popular examples and open problems will be discussed.
Th, Apr 28th, Xiaofeng Shao (University of Illinois at Urbana-Champaign), A new approach to dimension reduction for multivariate time series.
In this talk, we introduce a new methodology to reduce the number of parameters in multivariate time series modeling. Our method is motivated from the consideration of optimal prediction and focuses on the reduction of the effective dimension in conditional mean of time series given the past information. In particular, we seek a contemporaneous linear transformation such that the transformed time series has two parts with one part being conditionally mean independent of the past information. Our dimension reduction procedure is based on eigen-decomposition of the so-called cumulative martingale difference divergence matrix, which encodes the number and form of linear combinations that are conditional mean independent of the past. Interestingly, there is a factor model representation for our dimension reduction framework and our method can be further extended to reduce the dimension of volatility matrix. We provide a simple way of estimating the number of factors and factor loading space, and obtain some theoretical results about the estimators. The finite sample performance is examined via simulations in comparison with some existing methods.

FALL SEMESTER 2015

Th, Sep 10th, Liliya Zax (Boston University), Statistics application in industry: financial institutions and tech companies.
In my presentation I would share some of the aspects of my statistics related experience in different industries, namely in financial and technology companies. We would discuss some specific statistical problems that are of interest to the industry, what statistical tools do they use to try to solve those problems, and what are the statistical challenges that they are facing. The goal of the presentation is to help students to understand better how knowledge and skills they get in their academic programs can be later applied if they prefer to continue their career in industry.
Th, Sep 17th, Leu Guo (Boston University), The power of message networks: semantic network analysis of media effects in twittersphere during the 2012 U.S. presidential election.
Do traditional news media still lead public opinion in this digital age? This talk will present a study that explores how media such as newspapers and televisions set the public agenda through constructing message networks. Semantic network analysis and big data analytics were used to examine the large dataset collected on Twitter during the 2012 U.S. presidential election.
Th, Oct 1st, Philippe Rigollet (MIT), Batched bandits.
Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic multi-armed bandits under the constraint that the employed policy must split trials into a small number of batches. Our results show that a very small number of batches gives already close to minimax optimal regret bounds and we also evaluate the number of trials in each batch. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.
Th, Oct 8th, John Harlim (Penn State), Diffusion forecast: a nonparametric modeling approach.
I will discuss a nonparametric modeling approach for forecasting stochastic dynamical systems on low-dimensional manifolds. In the limit of large data, this approach converges to a Galerkin projection of the semigroup solution of the backward Kolmogorov equation of the underlying dynamics on a basis adapted to the invariant measure. This approach allows one to evolve the probability distribution of non-trivial dynamical systems with equation-free modeling. I will also discuss nonparametric filtering methods, leveraging the diffusion forecast in Bayesian framework to initialize the forecasting distribution given noisy observations.
Th, Oct 15th, Pierre Jacob (Harvard), Estimation of the derivatives of functions that can only be evaluated with noise.
Iterated Filtering methods have recently been introduced to perform maximum likelihood parameter estimation in state-space models, and they only require being able to simulate the latent Markov model according to its prior distribution. They rely on an approximation of the score vector for general statistical models based upon an artificial posterior distribution and bypasses the calculation of any derivative. We show here that this score estimator can be derived from a simple application of Stein’s lemma and how an additional application of this lemma provides an original derivative-free estimator of the observed information matrix. These methods tackle the general problem of estimating the first two derivatives of a function that can only be evaluated point-wise with some noise. We compare these new methods with finite difference schemes and make connections with proximal mappings. In particular we look at the bias and variance of these estimators, the effect of the variance of the noise, and the effect of the dimension of the parameter space.
Th, Oct 22nd, Jian Zhou (WPI), Volatility inference using high-frequency financial data and efficient computations.
The field of high-frequency finance has experienced a rapid evolvement over the past few decades. One focus point is volatility modeling and analysis for high-frequency financial data. It plays a major role in finance and economics. In this talk, we focus on the statistical inference problem on large volatility matrix using high-frequency financial data, and propose a methodology to tackle this problem under various settings. We illustrate the methodology with the high-frequency price data on stocks traded in New York Stock Exchange in 2013. The theory and numerical results show that our approach perform well while pooling together the strengths of regularization and estimation from a high-frequency finance perspective.
Th, Oct 29th, Markos Katsoulakis (UMass Amherst), Path-space information metrics for uncertainty quantification and coarse-graining of molecular systems.
We present path-space, information theory-based, sensitivity analysis, uncertainty quantification and variational inference methods for complex high-dimensional stochastic dynamics, including chemical reaction networks with hundreds of parameters, Langevin-type equations and lattice kinetic Monte Carlo. We establish their connections with goal-oriented methods in terms of new, sharp, uncertainty quantification inequalities that scale appropriately at both long times and for high dimensional state and parameter space. The combination of proposed methodologies is capable to (a) tackle non-equilibrium processes, typically associated with coupled physicochemical mechanisms or boundary conditions, such as reaction-diffusion problems, and where even steady states are unknown altogether, e.g. do not have a Gibbs structure. The path-wise information theory tools, (b) yield a surprisingly simple, tractable and easy-to-implement approach to quantify and rank parameter sensitivities, as well as (c) provide reliable parameterizations for coarse-grained molecular systems based on fine-scale data, and rational model selection through path-space (dynamics-based) variational inference methods.
Th, Nov 5th, Iddo Ben-Ari (UConn), The Bak-Sneppen model of biological evolution and related models.
The Bak-Sneppen model is a Markovian model for biological evolution that was introduced as an example for Self-Organized Criticality. In this model, a population of size N evolves according to the following rule. The population is arranged on a circle, or more generally a connected graph. Each individual is assigned a random fitness, uniform on [0,1], independent of the other fitness of the other individuals. At each unit of time, the least fit individual and its neighbors are removed from the population, and are replaced by new individuals. Despite being extremely simple, the model is known to be very challenging, and the evidence for Self-Organized Criticality provided by Bak and Sneppen was obtained through numerical simulations. I will review the main rigorous results on this model, mostly due to R. Meester and his coauthors, present some new results and open problems. I will then turn to a recent and more tractable variants of the model, in which on the one hand the spatial structure is relaxed, while on the other hand the population size is random. I will focus on the functional central limit for model, which has a somewhat unusual form.
Th, Nov 12th, Mokshay Madiman (University of Delaware), Optimal concentration of information for log-concave distributions.
It was shown by Bobkov and the speaker that for a random vector X in R^n drawn from a log-concave density e^{-V}, the information content per coordinate, namely V(X)/n, is highly concentrated about its mean. Their argument was nontrivial, involving the localization technique, and also gave suboptimal exponents, but it was sufficient to demonstrate that high-dimensional log-concave measures are in a sense close to uniform distributions on the annulus between 2 nested convex sets. We will present recent work that obtains an optimal concentration bound in this setting (optimal even in the constant terms, not just the exponent), using very simple techniques, and outline the proof. Applications that motivated the development of these results include high-dimensional convex geometry and random matrix theory, and we will outline these applications.
Th, Nov 19th, Youssef M. Marzouk (MIT), Transport maps for Bayesian computation.
We will discuss how transport maps, i.e., deterministic couplings between probability measures, can enable useful new approaches to Bayesian computation. A first use involves a combination of optimal transport and Metropolis correction; here, we use continuous transportation to transform typical MCMC proposals into adapted non-Gaussian proposals, both local and global. Second, we discuss a variational approach to Bayesian inference that constructs a deterministic transport map from a reference distribution to the posterior, without resorting to MCMC. Independent and unweighted posterior samples can then be obtained by pushing forward reference samples through the map. Making either approach efficient in high dimensions, however, requires identifying and exploiting low-dimensional structure. We present new results relating sparsity of transport maps to the conditional independence structure of the target distribution, and discuss how this structure can be revealed through the analysis of certain average derivative functionals. A connection between transport maps and graphical models yields many useful algorithms for efficient ordering and decomposition---here, generalized to the continuous and non-Gaussian setting. The resulting inference algorithms involve either the direct identification of sparse maps or the composition of low-dimensional maps and rotations. We demonstrate our approaches on Bayesian inference problems arising in spatial statistics and in partial differential equations.
Th, Dec 3rd, Shuyang Bai (Boston University), Self-normalized resampling for time series.
The inference procedures for the mean of a stationary time series are usually quite different depending on the strength of the dependence as well as the heavy tailedness of the model. In this talk, combining the ideas of resampling and self-normalization, we introduce a unified procedure which is valid under various different model assumptions. The procedure avoids estimation of any nuisance parameter, and requires only the choice of one bandwidth. Simulation examples will be given to illustrate its performance. The asymptotic theory will also be introduced.
Th, Dec 10th, Vidhu Prasad (UMass Lowell), Towers, codes and approximate conjugacy.
Consider the following question about an irrational rotation $T$ of the unit circle and a mixing Markov chain: is there a partition of the circle (indexed by the state space of the MC) so that the itinerary process given by T and the partition has the distribution of the given Markov Chain? Furthermore, this will be true for any aperiodic measure preserving transformation (not just irrational rotation): the existence of “tower structures” for any T is equivalent to the coding property above (the existence of a partition which is moved like the MC by T) and the latter property is equivalent to an “almost conjugacy” property for T. The “tower property” is generalization of one of the truly basic results in ergodic theory: (Kakutani)-Rokhlin's Lemma.