Please check the weakly seminar schedule at http://www.bu.edu/stat/seminar/ for possible changes and updates. Below is a tentative schedule.

- Speaker:
**Luc Rey-Bellet, Department of Mathematics and Statistics, University of Massachusetts, Amherst, Thursday 13 Sep 2012**ROOM CHANGE: B21 (BASEMENT)!

Title: Irreversibility, entropy production, and fluctuations.

Absract: In the past 15 years there has been a number of results on the structure on non-equilibrium steady states in statistical mechanics models. From a mathematical point of view non-equilibrium means lacks of time-reversibility or lack of detailed balance. In this talk we will explain which kind of general results one can obtain for irreversible (Markov or deterministic) processes and illustrate these results with a number of physical examples. - Speaker:
**Konstantinos Spiliopoulos, Department of Mathematics and Statistics, Boston University, Thursday 20 Sep 2012**

Title: Escaping from an attractor: importance sampling and rest points

Absract: Questions like understudying transitions between metastable equilibrium states of stochastic dynamical systems and computing transition times have attracted a lot of attention in both the the probability and applied mathematics community and at the same time are generic questions in disciplines such as chemical physics and biology. However, despite the substantial developments of the last five decades in both theory and algorithms, very little is known on how to design and rigorously analyze provably efficient Monte Carlo methods for rare event problems, like probability of escape from an equilibrium and transition to another one, when rest points play a key role. Even though several algorithms do exist, they have been applied only to specific systems and have not been rigorously analyzed. Therefore, it is unclear when they work and how one should efficiently design them. In this talk, I will discuss importance sampling schemes for the estimation of finite time exit probabilities of small noise diffusions that involve escape from an equilibrium. We build importance sampling schemes with provably good performance both pre-asymptotically, i.e., for fixed size of the noise, and asymptotically, i.e., as the size of the noise goes to zero, and that do not degrade as the time horizon gets large. Extensive simulation studies demonstrated the theoretical results. - Speaker:
**Jing Zhang, Department of Statistics, Yale University , Thursday 27 Sep 2012**

Title: Detecting and understanding combinatorial mutation patterns responsible for HIV drug resistance

Absract: We propose a systematic approach for a better understanding of how HIV viruses employ various combinations of mutations to resist drug treatments, which is critical to developing new drugs and optimizing the use of existing drugs. By probabilistically modeling mutations in the HIV-1 protease or reverse transcriptase (RT) isolated from drug-treated patients, we present a statistical procedure that first detects mutation combinations associated with drug resistance and then infers detailed interaction structures of these mutations. The molecular basis of our statistical predictions is further studied by using molecular dynamics simulations and free energy calculations. We have demonstrated the usefulness of this systematic procedure on three HIV drugs, (Indinavir, Zidovudine, and Nevirapine), discovered unique interaction features between viral mutations induced by these drugs, and revealed the structural basis of such interactions. More advanced Bayesian models are also developed for transmitted drug resistance and cross-resistance for multiple drugs. This is a joint work with Tingjun Hou, Wei Wang, and Jun S. Liu Ref: 1. Zhang, J., Hou, T., Wei, W., Liu, S.J. (2010) Detecting and understanding combinatorial mutation patterns responsible for HIV drug resistance. PNAS 107, 1321. 2. Systematic Investigation on Interactions for HIV Drug Resistance and Cross-Resistance among Protease Inhibitors. (2012) Jing Zhang Tingjun Hou, Yang Liu, Gang Chen, Xiao Yang, Jun S Liu and Wei Wang. Accepted by Journal of Proteome Science & Computational Biology. - Speaker:
**Andrew Papanicolaou, Department of Operations Research and Financial Engineering, Princeton University, Thursday 4 Oct 2012**

Title: Dimension reduction of the Bellman equations for maximum expected utility with partial information in discrete time

Absract: The full availability of information in nancial markets is something that is often assumed when working with models. However, parameters such as an asset's volatility and rate of return are not known and need to be estimated from past data. In this regard, the optimization of expected utility of wealth over a set of admissible trading strategies becomes a ltering problem, wherein the investor must use the ltration generated by past events to make the optimal decision for future returns. It turns out that this non-Markovian problem can be Markovianized once the dynam- ics of the lter are determined, but this Markovianized problem requires optimization over an innite dimensional eld. However, there is a class of perturbation models for which the Markovianized prob- lem is well-approximated by an unperturbed nite dimensional problem. This approximation to the perturbed problem is analyzed, and there is found to be an information premium in the market. - Speaker:
**Ivan Corwin, Clay Mathematics Institute, Department of Mathematics MIT and Microsoft Research, Thursday 11 Oct 2012**

Title: Beyond the Gaussian Universality Class

Absract: The Gaussian central limit theorem says that for a wide class of stochastic systems, the bell curve (Gaussian distribution) describes the statistics for random fluctuations of important observables. In this talk I will look beyond this class of systems to a collection of probabilistic models which include random growth models, polymers,particle systems, matrices and stochastic PDEs, as well as certain asymptotic problems in combinatorics and representation theory. I will explain in what ways these different examples all fall into a single new universality class with a much richer mathematical structure than that of the Gaussian. - Speaker:
**Philippe Rigollet, Department of Operations Research and Financial Engineering, Princeton University, Thursday 25 Oct 2012**

Title: Deviation optimal model selection using greedy algorithms

Absract: A statistical problem of model selection for regression can be simply described as a stochastic optimization problem where the objective is quadratic and the domain finite or countable. To solve this problem it is now known that, contrary to the principle of empirical risk minimization, one should seek a solution in the convex hull of the domain. This idea is implemented by exponential weights that are known to solve the problem in expectation, but they are, surprisingly, sub-optimal in deviation. We propose a new formulation called Q-aggregation that consists in minimizing a penalized version of the original criterion but for which the penalty vanishes at the points of interest. This approach leads to efficient greedy algorithms in the spirit of Frank-Wolfe but for which stronger bounds can be derived. - Speaker:
**Lee Jones, UMass Lowell, Thursday 1 November 2012**

Title: Order statistics probability rates and some new results for statistical inference from transactional data in queuing systems

Abstract: Efficient algorithms were initially developed for computing the probability that the order statistics of n i.i.d. uniform random variables lie in a given n-dimensional rectangular region in order to calculate the cumulative distribution of the Kolmogorov statistic. These algorithms were rediscovered and used to find expected queue length (and other queue performance measures) in a queuing system from the set of recorded start/stop service data in a time interval in the interior of which each server who became free was immediately reengaged by a waiting customer. With most practical data there are time gaps between the recorded service completion and the recorded start of service with a waiting customer. These may be due to customer delay in engaging a free server , to server delay in availability to the next in queue, or to both. We propose models for the various delays . By generalizing the order statistics probability computational problem and developing feasible algorithms for its solution we can give confidence intervals for queue performance measures for practical transactional data. - Speaker:
**Bud Mishra,The Courant Institute of Mathematical Sciences, NYU, Thursday 8 November 2012---CANCELLED**

Title: Towards Cancer Hybrid Automata

Abstract: Recently, we introduced Cancer Hallmark Automata, a formalism to model the progression of cancers through discrete phenotypes (so-called “hallmarks”). The classification of various cancers using stages and hallmarks has become common in the biology literature, but primarily as an organizing principle, and not as an executable formalism. The precise computational model developed here aims to exploit this untapped potential, namely, through automatic verification of progression models (e.g., consistency, causal connections, etc.), classification of unreachable or unstable states (e.g., “anti-hallmarks”) and computer-generated (individualized or universal) therapy plans. This talk builds on a phenomenological approach, and as such does not need to model the biochemistry underlying the progression. Rather, it abstractly models transition timings between hallmarks as well as the effects of drugs and clinical tests, and thus allows formalization of temporal statements about the progression as well as notions of timed therapies. The model proposed here is ultimately based on hybrid automata (with multiple clocks), for which relevant verification and planning algorithms exist in the literature. "Towards Cancer Hybrid Automata," (with L. Olde Loohuis and A. Witzel), First International Workshop on Hybrid Systems and Biology: HSB 2012, Newcastle upon Tyne, UK, September 3, 2012. - Speaker:
**Samuel Kou, Department of Statistics, Harvard University, Thursday 15 November 2012**

Title: Optimal Shrinkage Estimation in Heteroscedastic Hierarchical Models

Abstract: Hierarchical models are powerful statistical tools widely used in scientific and engineering applications. The homoscedastic (equal variance) case has been extensively studied, and it is well known that shrinkage estimates, the James-Stein estimate in particular, have nice theoretical (e.g., risk) properties. The heteroscedastic (the unequal variance) case, on the other hand, has received less attention, even though it frequently appears in real applications. It is not clear of how to construct "optimal" shrinkage estimate. In this talk, we study this problem. We introduce a class of shrinkage estimates, inspired by Stein's unbiased risk estimate. We will show that this class is asymptotically optimal in the heteroscedastic case. We apply the estimates to real examples and observe excellent numerical results. This talk is based on joint work with Lawrence Brown and Xianchao Xie. - Speaker:
**clayton Scott, Department of Electrical Engineering and Computer Science, University of Michigan, Thursday 29 November 2012**

Title: Classification with Asymmetric Label Noise

Abstract: In many real-world classification problems, the labels of training examples are randomly corrupted. That is, the set of training examples for each class is contaminated by examples of the other class. Existing approaches to this problem assume that the two classes are separable, that the label noise is independent of the true class label, or that the noise proportions for each class are known. We introduce a general framework for classification with label noise that eliminates these assumptions. In particular, we identify necessary and sufficient distributional assumptions for the existence of a consistent estimator of the optimal risk, with associated estimation strategies. We find that learning in the presence of label noise is possible even when the class-conditional distributions overlap and the label noise is not symmetric. A key to our approach is a universally consistent estimator of the maximal proportion of one distribution that is present in another, or equivalently, of the so-called "separation distance" between two distributions. The methodology is motivated by a problem in nuclear particle classification. - Speaker:
**Erhan Bayraktar, Department of Mathematics, University of Michigan, Thursday 6 December 2012**

Title: Quickest Search over Brownian Channels

Abstract:In this paper we resolve an open problem proposed by Lai, Poor, Xin, and Georgiadis (2011, IEEE Transactions on Information Theory). Consider a sequence of Brownian Motions with unknown drift equal to one or zero, which we may be observed one at a time. We give a procedure for finding, as quickly as possible, a process which is a Brownian Motion with nonzero drift. This original quickest search problem, in which the filtration itself is dependent on the observation strategy, is reduced to a single filtration impulse control and optimal stopping problem, which is in turn reduced to an optimal stopping problem for a reflected diffusion, which can be explicitly solved. Joint work with Ross Kravitz.

- Speaker:
**Ioannis Karatzas, Department of Mathematics and Departmanet of Statistics, Columbia University , Thursday 21 February 2013**

Title: DIFFUSIONS WITH RANK-BASED CHARACTERISTICS

Absract: Imagine you run two Brownian-like particles on the real line. At any given time, you assign drift g and dispersion \sigma to the laggard; and you assign drift -h and dispersion \rho to the leader. Here g , h , \rho and \sigma are given nonnegative constants with \rho^2 + \sigma^2 = 1 and g + h > 0 . Is the martingale problem for the resulting innitesimal generator \[ \mathcal{L}\,=\, \mathbf{ 1}_{ \{ x_1 > x_2 \} } \left( { \, \rho^2\, \over \,2\,} { \partial^2 \over \, \partial x_1^2} + { \, \sigma^2\, \over \,2\,} { \partial^2 \over \, \partial x_2^2} \, - h\,{ \partial \over \, \partial x_1 } + g\, { \partial \over \, \partial x_2} \right) \] well-posed? If so, what is the probabilistic structure of the resulting two-dimensional diusion process? What are its transition probabilities? How does it look like when time is reversed? Questions like these arise in the context of systems of diusions interacting through their ranks; see, for instance, [1], [6], [8]. They become a lot more interesting, if one poses them for several particles instead of just two. The construction we carry out involves features of Brownian motion with \bangbang" drift [7], as well as of \skew Brownian motion" [4], [2]. Surprises are in store when one sets up a system of stochastic dierential equations for this planar diusion and then tries to decide questions of strength and/or weakness (cf. [2] for a onedimensional analogue); also when one looks at the time-reversal of the diusion. There are also very strong connections with the recent work [9] on the so-called \perturbed Tanaka equations". I'll try to explain what we know about all this, then pose a few open questions. (This talk covers joint work with E. Robert Fernholz, Tomoyuki Ichiba, Vilmos Prokaj and Mykhaylo Shkolnikov.) - Speaker:
**Ping Li, Department of Statistical Science, Cornell University , Joint with the HARIRI Institute and on Thursday 7 March 2013**

Title: Exact Sparse Recovery with L0 Projections

Absract: Many applications concern sparse signals, for example, detecting anomalies from the differences between consecutive images taken by surveillance cameras. In general, anomaly events are sparse. This talk focuses on the problem of recovering a K-sparse signal in N dimensions (coordinates). Classical theories in compressed sensing say the required number of measurement is M = O(K log N). In our most recent work on L0 projections, we show that an idealized algorithm needs about M = 5K measurements, regardless of N. In particular, 3 measurements suffice when K = 2 nonzeros. Practically, our method is very fast, accurate, and very robust against measurement noises. Even when there are no sufficient measurements, the algorithm can still accurately reconstruct a significant portion of the nonzero coordinates, without catastrophic failures (unlike popular methods such as linear programming). This is joint work with Cun-Hui Zhang at Rutgers University. Paper URL: http://stat.cornell.edu/~li/Stable0CS/Stable0CS.pdf - Speaker:
**Herold Dehling, Department of Mathematics, Ruhr-Universität Bochum , Thursday 21 March 2013**

Title: Empirical Process CLT for Markov Chains and Dynamical Systems

Absract: In our talk we present some recent developments concerning the empirical process central limit theorem for dependent data that do not satisfy any of the classical mixing conditions. Our results are applicable, e.g. to Markov chains and certain dynamical systems. As a special example, we can prove the empirical process CLT for ergodic torus automorphisms. (Joint work with Olivier Durieu, Marco Tusche and Dalibor Volny) - Speaker:
**Luke W. Miratrix, Department of Statistics, Harvard University , Thursday 28 March 2013**

Title: An introspection on using sparse regression techniques to analyze text

Abstract: In this talk, I propose a general framework for topic-specific summarization of large text corpora, and illustrate how it can be used for analysis in two quite different contexts: legal decisions on workers' compensation claims (to understand relevant case law) and an OSHA database of occupation-related accident reports (to search for high risk circumstances). Our summarization framework, built on sparse classification methods, is a lightweight and flexible tool that offers a compromise between simple word frequency based methods currently in wide use, and more heavyweight, model-intensive methods such as Latent Dirichlet Allocation (LDA). For a particular topic of interest (e.g., emotional disability, or chemical gas), we automatically labels documents as being either on- or off-topic, and then use sparse classification methods to predict these labels with the high-dimensional counts of all the other words and phrases in the documents. The resulting small set of phrases found as predictive are then harvested as the summary. Using a branch-and-bound approach, this method can be extended to allow for phrases of arbitrary length, which allows for potentially rich summarization. I further discuss how focus on specific aspects of the corpus and the purpose of the summaries can inform choices of regularization parameters and constraints on the model. Overall, I argue that sparse methods have much to offer text analysis, and hope that this work opens the door for a new branch of research in this important field. - Speaker:
**Manfred Denker, Department of Mathematics, Penn State University, Thursday 4 April 2013**

Title: Von Mises statistics for a measure preserving transformation.

Abstract: Let $T$ be a measure preserving transformation on a probability space. I will present three theorems on the almost sure and weak convergence of sums of the form $$ sum_{0 <= i_k

- Speaker:
**Hongzhe Li, Department of Biostistics and Epidemiology University of Pennsylvania Perelman School of Medicine , Thursday 11 April 2013**

Title: Robust Segment Identification in Next-Generation Sequencing Data

Absract: Copy number variants (CNVs) are alternations of DNA of a genome that results in the cell having a less or more than two copies of segments of the DNA. CNVs correspond to relatively large regions of the genome, ranging from about one kilobase to several megabases, that are deleted or duplicated. Motivated by CNV analysis based on next generation sequencing data, we consider the problem of detecting and identifying sparse short segments hidden in a long linear sequence of data with an unspecified noise distribution. We propose a computationally efficient method that provides a robust and near-optimal solution for segment identification over a wide range of noise distributions. We theoretically quantify the conditions for detecting the segment signals and show that the method near-optimally estimates the signal segments whenever it is possible to detect their existence. Simulation studies are carried out to demonstrate the efficiency of the method under different noise distributions. We present results from a CNV analysis of a HapMap Yoruban sample to further illustrate the theory and the methods. - Speaker:
**Tanya Berger-Wolfe, Department of Computer Science, University of Illinois, Thursday 18 April 2013**

Title: Analysis of Dynamic Interaction Networks

Abstract: From gene interactions and brain activity to highschool friendships and zebras grazing together, large, noisy, and highly dynamic networks of interactions are everywhere. Unfortunately, in this domain, our ability to analyze data lags substantially behind our ability to collect it. In this talk I will show how computational approaches can be part of every stage of the scientific process of understanding how entities interact, from data collection (by using our network sampling framework which results in representative samples for many network problems) to hypothesis formulation (using unique clustering and pattern discovery methods), leading to novel scientific insights. - Speaker:
**Kavita Ramanan, Division of Applied Mathematics, Brown University , Thursday 25 April 2013**

Title: Asymptotic analysis of a class of stochastic networks

Absract: Finite-dimensional diffusions have been successfully used as tractable approximations to gain insight into a certain class of queueing systems. On the other hand, we show that many classes of queueing systems, including many-server queues with general service distributions, are more naturally modeled by measure-valued processes. We describe asymptotic limit theorems for these measure-valued processes and describe the insight they provide into the performance of the original networks.

- Speaker:
**Evan Johnson, Computational Biomedicine, Boston University , Thursday 12 September 2013**

Title: Adaptive factor analysis models for assessing drug sensitivity and pathway activation in individual patient samples

Absract: The development of personalized treatment regimes is an active area of current research in genomics. The focus of our research is to investigate core biological components that contribute to disease prognosis and development, and to develop latent variable models to accurately determine optimal therapeutic regimens for individual patients. To accomplish this aim, we have developed an adaptive Bayesian factor analysis model that integrates in vitro experimental data into our models while still allowing for the refinement and adaptation of drug or pathway profiles within each patient cohort and individual, efficiently accounting for cell-type specific pathway differences or any “rewiring” do to cancer deregulation. Our modeling approach serves an essential role in our attempts to develop a comprehensive and integrated set of relevant, biologically interpretable computational tools for genomic studies in personalized medicine. We are currently working on a variety of applications using data from cancer and pulmonary disease with the potential to be extremely important in treating patients with these diseases. - Speaker:
**Jiashun Jin, Department of Statistics, Carnegie Mellon University , Thursday 26 September 2013**

Title: Fast Network Community Detection by SCORE

Absract: Consider a network where the nodes split into K different communities. The community labels for the nodes are unknown and it is of major interest to estimate them (i.e., community detection). Degree Corrected Block Model (DCBM) is a popular network model. How to detect communities with the DCBM is an interesting problem, where the main challenge lies in the degree heterogeneity.

We propose Spectral Clustering On Ratios-of-Eigenvectors (SCORE) as a new approach to community detection. Compared to existing spectral methods, the main innovation is to use the entry-wise ratios between the first a few leading eigenvectors for community detection. The central surprise is, the effect of degree heterogeneity is largely ancillary, and can be effectively removed by taking such entry-wise ratios. We have applied SCORE to the well-known web blogs data and the statistics co-author network data which we have collected very recently. We find that SCORE is competitive both in computation and in performance. On top of that, SCORE is conceptually simple and has the potential for extensions in various directions. Addi- tionally, we have identied several interesting communities in statisticians, including what we call the \Object Bayesian community", \Theoretic Machine Learning Com- munity", and the \Dimension Reduction Community".

We develop a theoretic framework where we show that under mild regularity conditions, SCORE stably yields consistent community detection. In the core of the analysis is the recent development on Random Matrix Theory (RMT), where the matrix-form Bernstein inequality is especially helpful. - Speaker:
**Soumyadip Ghosh, IBM Research , Thursday 3 October 2013**

Title: Optimal Sampling in Stochastic Recursions

Absract: We refer to classical iterative algorithms such as quasi-Newton recursions, trust-region methods, and fixed-point recursions as "Stochastic" recursions when they involve quantities (functions, their gradients, Hessians etc.) that can only be estimated using a simulation oracle. The primary motivating settings are the Stochastic Root Finding problem that seeks the zero for a simulation-estimated function, and the closely related Simulation Optimization problem that seeks a minima. The estimation quality of the simulation oracle depends on the effort expended in the simulation: in a typical scenario where a Central Limit Theorem applies, estimation error drops to zero at the canonical $\sqrt{n}$ rate with sample size $n$. We address the central question that arises in the practical context where the primary computational burden in the stochastic recursion is the Monte Carlo sampling procedure: how should sampling proceed within stochastic recursion iterates in order to ensure that the identified candidate solutions remain consistent to the true solution, and more importantly, when can we ensure that sampling is efficient, that is, converges at the fastest possible rate. The answer involves a trade-off between the two types of error inherent in the iterates: the deterministic error due to the recursion algorithm and the "stochastic" component due to sampling. We characterize the relationship between sample sizing and convergence rates, and demonstrate that consistency and efficiency are intimately coupled with the speed of the underlying recursion, with faster algorithms yielding a wider regime of "optimal" sampling rates. - Speaker:
**Stephan Sturm, Department of Mathematical Sciences, Worcester Polytechnic Institute , Thursday 10 October 2013**

Title: From Smile Wings to Market Risk Measures

Absract: The left tail of the implied volatility skew, coming from quotes on out-of-the-money put options, can be thought to reflect the market's assessment of the risk of a huge drop in stock prices. We analyze how this market information can be integrated into the theoretical framework of convex monetary measures of risk. In particular, we make use of indifference pricing by dynamic convex risk measures, which are given as solutions of backward stochastic differential equations (BSDEs), to establish a link between these two approaches to risk measurement. We derive a characterization of the implied volatility in terms of the solution of a nonlinear PDE and provide a small time-to-maturity expansion. This procedure allows to choose convex risk measures in a conveniently parametrized class, distorted entropic dynamic risk measures, such that the asymptotic volatility skew under indifference pricing can be matched with the market skew. This is joint work with Ronnie Sircar. - Speaker:
**Yu Gu, Department of Applied Mathematics and Physics, Columbia University , Thursday 17 October 2013**

Title: Weak Convergence Approach to a Parabolic Equation with Large Random Potential

Abstract: Solutions to partial differential equations with highly oscillatory, large random potential have been shown to converge either to homogenized, deterministic limits or to stochastic limits depending on the statistical properties of the potential. We obtain the convergence rate in the homogenization setting. The derivations are based on a Feynman-Kac representation, an invariance principle for Brownian motion in random scenery, and a quantitative version of martingale CLT. Joint work with Guillaume Bal. - Speaker:
**Marvin K. Nakayama, Computer Science Department, New Jersey Institute of Technology , Thursday 24 October 2013**

Title: Efficient Simulation of Risk and its Error: Confidence Intervals for Quantiles When Using Variance-Reduction Techniques

Abstract: The p-quantile of a continuous random variable is the constant for which exactly p of the mass of its distribution lies to the left of the quantile; e.g., the median is the 0.5-quantile. Quantiles are widely used to assess risk. For example, a project manager may want to determine a time T such that the project has a 95% chance of completing by T, which is the 0.95-quantile. In finance, where a quantile is known as a value-at-risk, analysts frequently measure risk with the 0.99-quantile of a portfolio’s loss. For complex stochastic models, analytically computing a quantile often is not possible, so simulation is employed. In addition to providing a point estimate for a quantile, we also want to measure the simulation estimate's error, and this is typically done by giving a confidence interval (CI) for the quantile. Indeed, the U.S. Nuclear Regulatory Commission requires that licensees of nuclear power plants demonstrate compliance using a “95/95 criterion,” which entails ensuring (with 95% confidence) that a 0.95-quantile lies below a mandated limit.

In this talk we present some methods for constructing CIs for a quantile estimated via simulation. Unfortunately, crude Monte Carlo often produces wide CIs, so analysts often apply variance-reduction techniques (VRTs) in simulations to decrease the error. We first discuss forming a CI using a finite difference, and the second approach applies a procedure known as sectioning, which is closely related to batching. The asymptotic validity of both CIs follows from a so-called Bahadur representation, which shows that a quantile estimator can be approximated by a linear transformation of a probability estimator. We have established Bahadur representations for a broad class of VRTs, including antithetic variates, control variates, replicated Latin hypercube sampling, and importance sampling. We present some empirical results comparing the different CIs.

This work is supported by NSF grants CMMI-0926949, CMMI-1200065, and DMS-1331010. - Speaker:
**Luke Bornn, Department of Statistics, Harvard University , Thursday 31 October 2013**

Title: Towards the Derandomization of Markov chain Monte Carlo for Bayesian Inference

Absract: In this talk, I will explore the current trend towards conducting Bayesian inference through Markov chain Monte Carlo (MCMC) algorithms which exhibit converge at a rate faster than $n^{-1/2}$ by derandomizing components of the algorithm. For instance, herded Gibbs sampling (Bornn et al., 2013) can be shown to exhibit convergence in certain settings at a $n^{-1}$ rate. These algorithms exhibit remarkable similarity to existing MCMC algorithms; as an example, herded Gibbs sampling is equivalent to the Wang-Landau algorithm with various specified tuning parameters, and with the random sampling replaced with an argmax step. We demonstrate that many such MCMC algorithms lie in a middle-ground between vanilla Gibbs samplers and deterministic algorithms by using clever auxiliary variable schemes to induce both negatively correlated samples as well as force exploration of the parameter space. Based on this observation, we propose several new algorithms which exploit elements of both MCMC and deterministic algorithms to improve exploration and convergence. - Speaker:
**Peter I. Frazier, School of Operations Research and Information Engineering, Cornell University, NOTE: Friday 8 November 2013 (Joint seminar with CISE) 3:00 PM to 4:00 PM 8 St. Mary's Street, Room 211 Refreshments served at 2:45.**

Title: Bayesian Methods for Simulation Optimization

Abstract: We consider simulation optimization, in which we wish to solve an optimization problem whose objective function can only be evaluated using stochastic simulation. When the simulator is large and time-consuming, the time to solve a simulation optimization problem is gated by the number of simulation replications required. One increasingly popular approach to algorithm development for such problems is to place a Bayesian prior distribution on the underlying objective function, and to value potential function evaluations, or collections of function evaluations, according to the probability distribution of the improvement they would provide. We provide an overview of this class of algorithms, discussing links to decision theory and Markov decision processes, and present an application to the design of cardiovascular bypass grafts. - Speaker:
**Benjamin Kedem, Department of Mathematics, University of Maryland College Park , Thursday 14 November 2013**

Title: Estimation of Small Tail Probabilities in Food Safety and Bio-Surveillance

Absract: In food safety and bio-surveillance in many cases it is often desired to estimate the probability that a contaminant such as some insecticide or pesticide exceeds unsafe very high thresholds. The probability or chance in question is then very small. To estimate such a probability we need information about large values. However, in many cases the data do not contain information about exceedingly large contamination levels, which ostensibly makes the problem impossible to solve. A solution is provided whereby more information about small tail probabilities is obtained by combining the real data with computer generated data. The method provides short but reliable interval estimates from moderately large samples. Examples are given in terms of DDT derivatives and chlorpyrifos found in fish, mussel, and sediments, and in terms of mercury levels obtained from males and females of all ages from 1 to 150 years. - Speaker:
**David F. Anderson, Department of Mathematics, University of Wisconsin Madison , Thursday 21 November 2013**

Title: Stochastic analysis of biochemical reaction networks with absolute concentration robustness

Absract: It has recently been shown that structural conditions on the reaction network, rather than a fine-tuning of system parameters, often suffice to impart "absolute concentration robustness" on a wide class of biologically relevant, deterministically modeled mass-action systems [Shinar and Feinberg, Science, 2010]. Many biochemical networks, however, operate on a scale insufficient to justify the assumptions of the deterministic mass-action model, which raises the question of whether the long-term dynamics of the systems are being accurately captured when the deterministic model predicts stability. I will discuss recent results that show that fundamentally different conclusions about the long-term behavior of such systems are reached if the systems are instead modeled with stochastic dynamics and a discrete state space. Specifically we characterize a large class of models which exhibit convergence to a positive robust equilibrium in the deterministic setting, whereas trajectories of the corresponding stochastic models are necessarily absorbed by a set of states that reside on the boundary of the state space (i.e. an extinction event). The results are proved with a combination of methods from stochastic processes and chemical reaction network theory. - Speaker:
**Matthew T Harrison , Division of Applied Mathematics, Brown University , Thursday 5 December 2013**

Title: Robust inference for nonstationary spike trains

Absract: The coordinated spiking activity of simultaneously recorded neurons can reveal clues about the dynamics of neural information processing, about the mechanisms of brain disorders, and about the underlying anatomical microcircuitry. Statistical models and methods play an important role in these investigations. In cases where the scientific questions require disambiguating dependencies across multiple spatial and temporal scales, conditional inference can be used to create procedures that are strikingly robust to nonstationarity, model misspecification, and incidental parameters problems, which are common neurostatistical challenges. Examples include testing for cell assembly dynamics in human epilepsy data and learning putative anatomical networks from spike train data in behaving rodents.

- Speaker:
**Mark van der Laan, Biostatistics and Statistics at UC Berkeley , Thursday 20 February 2014**

Title: Targeted Learning of Optimal Individualized Treatment Rules

Absract: Suppose we observe n independent and identically distributed observations of a time-dependent random variable consisting of baseline covariates, initial treatment and censoring indicator, intermediate covariates, subsequent treatment and censoring indicator, and a final outcome. For example, this could be data generated by a sequentially randomized controlled trial, where subjects are sequentially randomized to a first line and second line treatment, possibly assigned in response to an intermediate biomarker, and are subject to right-censoring. We consider data adaptive estimation of an optimal dynamic multiple time-point treatment rule defined as the rule that maximizes the mean outcome under the dynamic treatment, where the candidate rules are restricted to only respond to a user-supplied subset of the baseline and intermediate covariates. This estimation problem is addressed in a statistical model for the data distribution that is nonparametric beyond possible knowledge about the treatment and censoring mechanism. In addition, we provide a targeted minimum loss-based estimator of the mean outcome under the optimal rule, with corresponding statistical inference. Both estimation problems addressed contrasts from the current literature that relies on parametric assumptions. We also present a cross-validated TMLE estimators of data adaptive target parameters such as the mean outcome under a data adaptive fit of the optimal rule. Practical performance of the methods is demonstrated with some simulations. - Speaker:
**Jeremy Achin, DataRobot , Thursday 27 February 2014**

Title: Applied Data Science: Extracting Maximum Value from Real-World Data

Absract: This talk is about extracting maximum value from the real-world data using modern statistical and machine learning techniques. Real-world data is diverse, messy, and spread out across many data sources. Extracting maximum value equates to using the data to make the most accurate predictions possible on out-of-sample examples. The talk will focus on a single case study in which we predict diabetes in undiagnosed patients using their medical records. The dataset comes from a Kaggle competition sponsored by Practice Fusion: http://www.kaggle.com/c/pf2012-diabetes. - Speaker:
**Liming Feng, University of Illinois at Urbana-Champaign, Department of Industrial and Enterprise Systems Engineering , Thursday 6 March 2014**

Title: Hilbert Transform and Options Valuation

Absract: Transform methods have been widely used for options valuation in models with explicit characteristic functions. We explore the analyticity of the characteristic functions and propose Hilbert transform based schemes for the valuation of European, American and path dependent options and Monte Carlo simulation from analytic characteristic functions. The schemes are based on sinc expansions of functions analytic in a horizontal strip in the complex plane. They are very easy to implement. Despite the simplicity, they are very accurate with exponentially decaying errors. Numerical examples illustrate the effectiveness of these schemes. - Speaker:
**Lie Wang, Department of Mathematics, MIT , Tuesday 18 March 2014, NOTE:UNUSUAL DAY!**

Title: Multivariate Regression with Calibration

Absract: We propose a new method named calibrated multivariate regression (CMR) for fitting high dimensional multivariate regression models. Compared to existing methods, CMR calibrates the regularization for each regression task with respect to its noise level so that it is simultaneously tuning insensitive and achieves an improved finite sample performance. We also develop an efficient smoothed proximal gradient algorithm to implement it. Theoretically, it is proved that CMR achieves the optimal rate of convergence in parameter estimation. We illustrate the usefulness of CMR by thorough numerical simulations and show that CMR consistently outperforms existing multivariate regression methods. We also apply CMR on a brain activity prediction problem and find that CMR even outperforms the handcrafted models created by human experts. - Speaker:
**Xinyun Chen, Applied Mathematics and Statistics, Stony Brook University , Thursday 20 March 2014**

Title: Perfect sampling and gradient simulation of Queueing Networks

Abstract: Perfect sampling is a Monte Carlo technique to generate samples from the stationary distribution of Markov processes without any bias. We develop a perfect sampling algorithm for a class of queueing models called stochastic fluid networks, as used in communication network and data processing systems. Our framework can be combined with infinitesimal perturbation analysis to simulate the gradient of the stationary queue length with no bias. Therefore, our perfect sampling algorithm can be used in sensitivity analysis and simulation optimization for resource allocation in the network. In the end, we will discuss the potential extension of our algorithm to reflected Brownian motion and generalized Jackson network. - Speaker:
**Scott Robertson, Department of Mathematics, Carnegie Mellon University , Thursday 3 April 2014**

Title: Continuous Time Perpetuities and the Time Reversal of Diffusions. Joint work with Kostas Kardaras, LSE.

Abstract: In this talk we consider the problem of obtaining the distribution of a continuous time perpetuity, where the non-discounted cash flow rate is determined by an ergodic diffusion. Using results regarding the time reversal of diffusions, we identify the distribution of the perpetuity with the invariant measure associated to a certain (different) ergodic diffusion. This enables efficient estimation of the distribution via simulation and, in certain instances, an explicit formula for the distribution. Time permitting, we will talk about how Large Deviations Principles and results concerning Couplings of diffusions can be used to estimate rates of convergence, thus providing upper bounds for how long simulations must be run when obtaining the distribution. - Speaker:
**Harrison Zhou, Department of Statistics, Yale University , Thursday 10 April 2014**

Title: Asymptotic Normality and Efficiency In Estimation of High-dimensional Graphical Models

Absract: In this talk we will first introduce an asymptotically normal and efficient result for estimation of high-dimensional Gaussian graphical model under a sparseness assumption, which is shown to be not only sufficient, but also necessary, then present some preliminary analogous results for Ising model. - Speaker:
**Ryan Adams, School of Engineering and Applied Science, Harvard University, Thursday 17 April 2014**

Title: Accelerating Exact MCMC with Subsets of Data

Abstract: One of the challenges of building statistical models for large data sets is balancing the correctness of inference procedures against computational realities. In the context of Bayesian procedures, the pain of such computations has been particularly acute as it has appeared that algorithms such as Markov chain Monte Carlo necessarily need to touch all of the data at each iteration in order to arrive at a correct answer. Several recent proposals have been made to use subsets (or "minibatches") of data to perform MCMC in ways analogous to stochastic gradient descent. Unfortunately, these proposals have only provided approximations, although in some cases it has been possible to bound the error of the resulting stationary distribution. In this talk I will discuss two new, complementary algorithms for using subsets of data to perform faster MCMC. In both cases, these procedures yield stationary distributions that are exactly the desired target posterior distribution. The first of these, "Firefly Monte Carlo", is an auxiliary variable method that uses randomized subsets of data to achieve valid transition operators, with connections to recent developments in pseudo-marginal MCMC. The second approach I will discuss, parallel predictive prefetching, uses subsets of data to parallelize Markov chain Monte Carlo across multiple cores, while still leaving the target distribution intact. These methods have both yielded significant gains in wallclock performance in sampling from posterior distributions with millions of data. - Speaker:
**Ofer Harel, Department of Statistics, University of Connecticut, Thursday 24 April 2014**

Title: Generating multiple imputation from multiple models to reflect missing data mechanism uncertainty: Application to a longitudinal clinical trial.

Absract: We present a framework for generating multiple imputations for continuous variables when the missing data are assumed to be nonignorably missing. Imp utations are generated from more than one imputation model in order to incorporate uncer tainty regarding the miss- ing data mechanism. Parameter estimates based on the differe nt imputation models are combined using rules for nested multiple imputation. Throu gh the use of simulation, we investigate the impact of missing data mechanism uncertain ty on post-imputation infer- ences and show that incorporating this uncertainty can incr ease the coverage of parameter estimates. We apply our method to a longitudinal clinical tr ial of low-income women with depression where nonignorably missing data were a concern. We show that different assump- tions regarding the missing data mechanism can have a substa ntial impact on inferences. Our method provides a simple approach for formalizing subjecti ve notions regarding nonresponse so that they can be easily stated, communicated, and compare d. This is a joint work with Juned Siddique and Catherine Crespi.

- Speaker:
**Jose Blanchet, IEOR, Columbia University , Thursday 11 September 2014**

Title: Strong Monte Carlo for Multidimensional SDEs via Rough Path Analysis

Absract: Underlying there is a multidimensional SDE X(.) driven by Brownian motion. A strongly simulatable approximation to X(.) is a sequence of process {X_n(.)} which are piece-wise constant, with finitely many discountinuities for each n, and such that the uniform norm between X(.) and X_n(.) in the compact set [0,1] is less than 1/n with probability one. The probability one statement is crucial. Strong Monte Carlo approximations have been known basically for one dimensional diffusions and related processes. We provide the first strong simulatable approximations for multidimensional SDEs. The construction leverages off the theory of rough paths, and novel simulation techniques of times that look into the infinite future of a sequence of information often used to approximate SDEs. - Speaker:
**Georgios Tripodis, School of Public Health, Boston University, Thursday 18 Septmber 2014. NOTE: Seminar takes place in MCS B21!!!**

Title: Predicting the cognitive status of an aging population

Absract: Cognitive trajectories are characterized by tremendous heterogeneity in rates of change. We utilized a subset of the NACC dataset to estimate cognitive trajectories in order to investigate possible causes of differences in variability among normal controls.

We analyzed data from 298 cases that were free from any cognitive impairment for at least 2 visits from the National Alzheimer Coordinating Center (NACC). 149 cases remained normal for at least 2 visits following our observation period, while 149 cases were diagnosed subsequently with Mild Cognitive Impairment (MCI). For all cases, we consider only time points when their cognitive status was normal. The groups were matched by age, sex, education and total number of visits. We used an innovative statistical method of dynamic factor models developed by the authors on the NACC neuropsychological battery. Based on a large array of test scores (MMSE, logical memory: immediate and delayed, digits backward and forward, animals, vegetables, TRAILS A and B, Boston naming test and WAIS), we estimated one latent composite trajectory for each individual. We then used linear mixed effect models to compare differences between groups in their rate of cognitive decline. We hypothesized that there will be differences in the cognitive trajectory between the two groups during their normal state.

Factor analytic models are limited to cross-sectional datasets ignoring any longitudinal or dynamic analysis. The latent cognitive index is a weighted average of past and present scores of neuropsychological tests. These weights are a function of the between-subject variability as well as the correlation between tests. Measures that are highly correlated with other measures will get higher weight. Moreover, measures that show increased between- subject-variability will receive higher weight. Current factor analytic methods do not use any information from within- subject-variability over time. If we do not account for time variability we may over(under)inflate the weights. Past observations of measures that are stable over time will be discounted. Tests with rates of change that are highly correlated with other tests’ rates of change will receive more weight. The estimated cognitive trajectory shows significant differences in the rate of change (p-value=0.0003). The cases that remain in a normal cognitive status show significant improvement over time (estimate=-.06, p-value=0.01), indicating a probable learning effect. The cases that will convert to MCI show no improvement in their cognitive trajectory during the period, which are assigned with normal cognition (estimate=--.003, p-value=0.79).These data suggest that there is a probable learning effect in repeated testing only for those that remain in a normal cognitive status. For the cases that will convert to MCI in the future, there is no improvement in their cognitive trajectory. These differences may be used for a more timely diagnosis of MCI. - Speaker:
**Victor de la Pena, Department of Statistics, Columbia University , NOTE: Friday 26 September 2014 (Joint seminar with CISE) 3:00 PM to 4:00 PM 8 St. Mary's Street, Room 210 Refreshments served at 2:45.**

Title: Dependence Measures: A Perspective

Absract: In recent years there has been an increasing interest in the development of new measures of dependence. In this talk I will provide an overview of some of these results including work developed using copulas as well as the distance covariance. Finally, I will introduce a general framework that includes several of the known dependence measures. (Joint work with Y. Liu (Google) and T. Zheng (Columbia). - Speaker:
**Neil Shephard, Department of Statistics and Department of Economics, Harvard University , Thursday, 2 October 2014**

Title: Low Latency Financial Data: Continuous Time Analysis of Fleeting Discrete Price Moves

Absract: Computer based automated trading dominates many of the most important financial markets. Extracting information from the order and trading flow from such markets is important for trading at high frequency, for policy, regulation and forensic finance. What is distinctive about this area is that the policy, the regulation, the policing and the trading focus is often on the very short term, frequently over time intervals which may be much less than a second. At very short time scales, for most important markets, such low latency data is dominated by three essential aspects: (i) prices are crucially discrete, due to the market's tick structure, (ii) prices change in continuous time, (iii) a high proportion of price changes are fleeting, reversed in a fraction of a second. But the econometricians cupboard is practically bare, for there are nearly no models or techniques which focus on all of these features, putting the role of the impact of time at center stage. In this paper we develop a novel continuous time framework which captures these types of low latency environments in an analytically tractable, semi-parametric manner where the role of calendar time is straightforward to calculate. - Speaker:
**Josh Reed, Stern School of Business,NYU , Friday 10 October 2014 at MCS148, NOTE: Special DAY!!**

Title: Series Expansions for the All-time Maximum of alpha-stable Random Walks

Abstract: We study random walks whose increments are alpha-stable distributions with shape parameter 1 < alpha < 2. Specifically, assuming a mean increment size which is negative, we provide series expansions in terms of the mean increment size for the probability that the all-time maximum of an alpha-stable random walk is equal to zero and, in the totally skewed to the left case of beta=-1, for the expected value of the all-time maximum of an alpha-stable random walk. Our proofs also cover the Gaussian case of alpha=2 and beta=0 for which previous results have already been obtained in the literature using different techniques. Key ingredients in our proofs are Spitzer's identity for random walks and Zolotarev's integral representation for the CDF of an alpha-stable random variable. We also discuss an application of our results to a problem arising in queueing theory. This is joint work with Cliff Hurvich. - Speaker:
**Michael Dietze, Earth and Environment, Boston University , Thursday 16 October 2014**

Title: Ecological Forecasting: An Emerging Challenge.

Abstract: Understanding how terrestrial ecosystems will respond to climate change is one of the most critical scientific questions of our time. This is not only because these ecosystems provide the natural resources and ecosystem services our species depends upon for survival, but because feedbacks from the terrestrial biosphere are one of the greatest sources of uncertainty in climate change projections. Reducing uncertainty requires not only a better understanding of the basic science involved, but also a systematic effort to synthesize existing knowledge, quantify uncertainties, and target measurements where they maximize new information. In this effort ecologists are increasingly being called upon to make quantitative, data-driven forecasts using sophisticated statistical tools and computer models. Such models are not only tools for forecasting but also represent a mathematical formalization of our current understanding of how ecosystems function. As such they provide a critical scaffold for assimilating a diverse array of data types on different spatial and temporal scales which cannot otherwise be directly compared. My work within the nascent field of ecological forecasting is heavily focused on the assimilation of data into terrestrial biosphere models as a means of quantifying, partitioning, and reducing uncertainty about how terrestrial ecosystems will respond to climate change. In this talk I will highlight work done in my lab to confront process-based ecosystem models with data and introduce some of the tools we have been developing to manage model-data fusion. I will also discuss the nature of the ecological forecasting problem, how it differs from other forecasting problems (e.g. weather forecasting), and some of the open statistical challenges in this emerging discipline. - Speaker:
**Nalini Ravishanker, Department of Statistics, University of Connecticut, Thursday 23 October 2014**

Title: Estimating Function Approach for Nonlinear Time Series.

Absract: The framework of martingale estimating functions (Godambe, 1985) provides an optimal approach for developing inference for linear and nonlinear time series based on information on the first two conditional moments of the observed process. In situations where information about higher order conditional moments of the process is also available, combined (linear and quadratic) estimating functions are more informative. This approach is especially useful in practice when recursive estimates of model parameters can be derived, resulting in a fast computational estimation approach. The approach is illustrated for different classes of nonlinear time series models, such as generalized duration models, and random coefficient autoregressive models with heavy-tailed errors, which are useful in financial data analysis. - Speaker:
**Gustavo A. Schwenkler, School of Management, Boston University, Thursday 30 October 2014**

Title: Simulated Likelihood Estimators for Discretely Observed Jump-Diffusions.

Abstract: This paper develops an unbiased Monte Carlo approximation to the transition density of a jump-diffusion process with state-dependent drift, volatility, jump intensity, and jump magnitude. The approximation is used to construct a likelihood estimator of the parameters of a jump-diffusion observed at fixed time intervals that need not be short. The estimator is asymptotically unbiased for any sample size. It has the same large-sample asymptotic properties as the true but uncomputable likelihood estimator. Numerical results illustrate its computational advantages. - Speaker:
**Vladas Pipiras, Department of Statistics and Operations Research, University of North Carolina, Thursday 6 November 2014**

Title: Quadratic programming in synthesis of stationary Gaussian fields

Absract: Circulant matrix embedding is one of the most popular and efficient methods for the exact generation of Gaussian stationary univariate series, given its autocovariance function. Although the circulant matrix embedding has also been used for the generation of Gaussian stationary random fields, there are many practical covariance structures of random fields where the classical embedding method breaks down, in the sense that some of the eigenvalues of the covariance embedding are negative. In this talk, I will discuss several approaches to modify the classical circulant matrix embedding so that all the eigenvalues are nonnegative. In one such approach, feasible circulant embeddings are constructed based on quadratic optimization problem with linear inequality constraints, with an objective function measuring the distance of the covariance embedding to the targeted covariance structure over the domain of interest. A well-known interior point optimization strategy called primal log barrier method can be suitably adapted to solve the quadratic problem faster than commercial solvers. The talk is based on joint work with S. Kechagias (University of North Carolina), H. Helgason (University of Iceland), and P. Abry (ENS Lyon). - Speaker:
**Lizhen Lin, Department of Statistics and Data Sciences, University of Texas, Thursday 13 November 2014**

Title: Robust and scalable inference using median posteriors.

Absract: While theoretically justified and computationally efficient point estimators were developed in robust estimation for many problems, robust Bayesian analogues are not sufficiently well-understood. We propose a novel approach to Bayesian analysis that is provably robust to the presence of outliers in the data, and often has noticeable computational advantages over standard methods. Our approach is based on the idea of splitting the data into several non-overlapping subsets, evaluating the posterior distribution given each subset data, and then combining the resulting subset posterior measures by taking the geometric medians. The resulting final measure is called the median posterior which is the ultimate object used for inference. We show several strong theoretical results for the median posterior, including concentration rates and provable robustness. We illustrate and validate the method through experiments on simulated and real data. [Joint work with Stas Minker, Sanvesh Srivastava and David Dunson] - Speaker:
**Vic Patragenarou, Department of Statistics, Florida State University, Thursday 20 November 2014**

Title: All about Statistics as far as Objects on Sample Spaces are concerned.

Absract: Noncategorical observations, when regarded as points on a stratified space, lead to a nonparametric data analysis extending data analysis on manifolds. In particular, given a probability measure on a sample space with a manifold stratification, one may define the associated Fr\'echet function, Fr\'echet total variance and Cartan mean set. The sample counterparts of these parameters have a more nuanced asymptotic behaviors than in nonparametric data analysis on manifolds. This allows for the most inclusive data analysis known to date. Unlike the case of manifolds, Fr\'echet sample means on stratified spaces, such as graphs, may stick to a lower dimensional stratum, a new dimension reduction phenomenon. The downside of stickiness is that it yields a less meaningful interpretation of the analysis. To compensate for this, an extrinsic data analysis, that is more sensitive to input data is suggested. In this paper one explores analysis of data on low-dimensional stratified spaces, via simulations. An example of extrinsic analysis on phylogenetic tree data is also given. This is joint work with Leif Ellingson (Texas Tech), Harrie Hendricks (Radboud University, Nijmegen, Netherlands) and Paul San Valentin (Florida State University). - Speaker:
**ShuYang (Ray) Bai, Department of Mathematics and Statistics, Boston University, Thursday 4 December 2014**

Title: Self-similar processes with stationary increments on Wiener chaos.

Absract: Self-similar processes with stationary increments are important because they exhaust scaling limits of sums of stationary sequences. In this talk, we introduce a broad class of such processes represented by a multiple stochastic integral, called the generalized Hermite processes. We show that the sum of some nonlinear long-memory stationary sequences scale to these generalized Hermite processes. We then look at one particular example of the generalized Hermite processes represented by a double stochastic integral, and show some interesting limit phenomena of this process as its parameters approach the critical values. Some tools used involving the recent developments of the connection between the Malliavin calculus and the Stein's method will be briefly introduced along the way. - Speaker:
**Ramis Movassagh, Department of Mathematics, MIT and Northwestern University, Thursday 11 December 2014**

Title: Eigenvalues of Sums of Matrices from Free Probability Theory and Their Stochastic Dynamics

Absract: The method of "Isotropic Entanglement" (IE), inspired by Free Probability Theory and Random Matrix Theory, predicts the eigenvalue distribution of quantum many-body systems with generic interactions. At the heart is a "Slider", which interpolates between two extrema by matching fourth moments. The first extreme treats the non-commuting terms classically and the second treats them isotropically. Isotropic means that the eigenvectors are in generic positions. We prove that the interpolation is universal. We then show that free probability theory also captures the density of states of the Anderson model with arbitrary disorder and with high accuracy. Theory will be illustrated by numerical experiments. Lastly and time permitting, we shall present a very recent result applicable to non-Hermitian models. We prove that the complex conjugate eigenvalues of a real asymmetric matrix "attract" in response to additive real randomness. The motion of the eigenvalues can be seen as a many-body system; we derive their stochastic dynamics in the complex plane.

- Speaker:
**Pierre Nyquist, Division of Applied Mathematics, Brown University , Thursday 26 February 2015**

Title: Min-max representations of viscosity solutions of Hamilton-Jacobi equations and applications in rare-event simulation

Absract: The problem of rare-event sampling is a hindrance for using methods of stochastic simulation in situations where one is interested in quantities that are determined mainly by events of small probabilities. One of the more successful ways to overcome this is importance sampling, a technique used to reduce the variance of standard Monte Carlo. In the last decade, by the works of Dupuis, Wang and collaborators (2004 and onwards), it has been understood that the design of efficient simulation algorithms is intimately connected to subsolutions of the Hamilton-Jacobi equation associated with the underlying stochastic system. We will discuss a duality relation between the Manae potential and a functional common in control theory, referred to as Mather’s action functional in weak KAM theory, in the context of convex and state-dependent Hamiltonians. The duality is used to obtain min-max representations of viscosity solutions of first order Hamilton-Jacobi equations. The representations suggest a way to construct viscosity subsolutions, which in turn are good candidates for designing efficient rare-event simulation algorithms. The application to rare-event simulation is illustrated by the problem of computing escape probabilities for small-noise diffusions and Markov jump processes with state-dependent jumps. - Speaker:
**David Gamarnik, MIT Sloan School of Management, MIT , Friday 6 March 2015 (Joint seminar with CISE) 3:00 PM to 4:00 PM 8 St. Mary's Street, Room 210 Refreshments served at 2:45.**

Title: Limits of Local Algorithms for Randomly Generated Constraint Satisfaction Problems

Absract: We will discuss the problem of designing algorithms for solving randomly generated constraint satisfaction problems, such random K-SAT problem, random coloring problem and similar. We establish a fundamental barrier on the power of local algorithms to solve such problems, despite some conjectures put forward in the past. We show that a broad class of local algorithms, including the so-called Belief Propagation and Survey Propagation algorithms, cannot find satisfying assignments in a variant of random K-SAT problem called NAE-K-SAT problem above a certain asymptotic threshold, below which even simple algorithms succeed with high probability. Our negative results exploit fascinating geometry of the solution space of random constraint satisfaction problems, which was first predicted by physicists heuristically and now confirmed by rigorous methods. According to this picture, the solution space exhibits a clustering property whereby feasible solutions tend to cluster with respect to the underlying Hamming distance. This clustering property creates a barrier for local algorithms. - Speaker:
**Wei Biao Wu, Department of Statistics, University of Chicago , Thursday 19 March 2015**

Title: $L^2$ Asymptotic Theory for High-Dimensional Data

Absract: I will present an asymptotic theory for $L^2$ norms of sample mean vectors of high-dimensional data. An invariance principle for the $L^2$ norm is derived under conditions that involve a delicate interplay between the dimension $p$, the sample size $n$ and the moment condition. Under proper normalization, central and non-central limit theorems are obtained. To perform the related statistical inference, I will propose a plug-in calibration method and a re-sampling procedure to approximate the distributions of the $L^2$ norms. The results will be applied multiple tests and inference of covariance matrix structures. - Speaker:
**Ramon Van Handel, Department of Operations Research and Financial Engineering, Princeton University , Thursday 26 March 2015**

Title: How large is the norm of a random matrix?

Absract: Understanding the spectral norm of random matrices is a problem of basic interest in several areas of pure mathematics (probability theory, functional analysis, combinatorics) and in applied mathematics, statistics, and computer science. While the spectral norm of classical random matrix models is well understood, existing methods almost always fail to be sharp in the presence of nontrivial structure. In this talk, I will discuss new bounds on the norm of random matrices with independent entries that are sharp under mild conditions. These bounds shed significant light on the nature of the problem, and make it possible to effortlessly address otherwise nontrivial problems such as identifying the phase transition of the spectral edge of random band matrices. - Speaker:
**Peter Bull, DrivenData ( American Statistical Education Association) , Thursday, 2 April 2015**

Title: Using your powers for good: Data science in the social sector

Absract: Just like every major corporation today, nonprofits and governments have more data than ever before. And just like those corporations, they are eager to tap into the power of their data. But the social sector doesn’t have the same resources to attract talent. Jeff Hammerbacher, Chief Scientist at Cloudera, put it best: "The best minds of my generation are thinking about how to make people click ads. That sucks.” At DrivenData our goal is to make the world suck a little less by empowering impact organizations to get the most from their data. Peter Bull, co-founder at DrivenData, will speak on the ways in which statistics, computer science, and machine learning can be applied to the challenges in the social sector. The talk will address both the big-picture context of the data for good movement, and an in-depth case study of the methods which won DrivenData’s recent machine learning competition on smart school budgeting. It’s an exciting time for people who love data: methods are improving, computational costs are decreasing, storage and transport are cheaper, and the talent pool is growing. It’s up to the data geeks to use these powers for good. - Speaker:
**Fan Zhuo, Department of Economics, Boston University , Thursday 9 April 2015**

Title: Likelihood Ratio Based Tests for Markov Regime Switching

Absract: Regime switching models provide a flexible framework for modeling sudden and recursive shifts in dynamic relationships and have influenced the thinking in both the economics and the finance literature. Although there have been persistent interests in applying likelihood ratio based tests to detect regime switches (e.g., Hansen, 1992 and Garcia, 1998), the asymptotic distributions of such tests have remained an enigma. This paper considers such tests and establishes their asymptotic distributions in the context of nonlinear models permitting multiple switching parameters. The analysis simultaneously addresses three difficulties: (i) some nuisance parameters are unidentified under the null hypothesis, (ii) the null hypothesis yields a local maximum, and (iii) conditional regime probabilities follow stochastic processes that can only be expressed recursively. The important work of Cho and White (2007) took on only the first two difficulties, while this paper shows that addressing the third can lead to substantially higher testing power when the regimes are serially dependent. Besides obtaining the tests’ asymptotic distributions, this paper also obtains four sets of results that can be of independent interest: (1) a characterization of the conditional regime probabilities and their derivatives with respect to the model’s parameters, (2) a high order approximation to the log-likelihood ratio permitting multiple switching parameters, (3) a further refinement to the asymptotic distribution that provides better approximations in finite samples, and (4) a unified algorithm to simulate the critical values. In linear models, all the elements needed for the algorithm can be computed analytically. Finally, the above results reveal that some bootstrap procedures can be inconsistent and that standard information criteria, such as AIC and BIC, can be sensitive to the hypothesis and the model’s structure. - Speaker:
**Natesh Pillai, Department of Statistics, Harvard University , Thursday 16 April 2015**

Title: Some aspects of shrinkage priors in high dimensions

Absract: In this talk we explore some aspects of shrinkage priors in high dimensional Bayesian inference. These prior distributions (constructed as an alternative to the spike and slab priors) are popular because the corresponding MCMC algorithms mix very quickly. However, nothing much is known about their statistical efficiency. We present some results in this direction and also give a new prior which is both statistically and computationally efficient. We will also discuss some open problems. - Speaker:
**Francesco MAINARDI, Department of Physics, University of Bologna , Thursday 23 April 2015**

Title: Brownian motion and anomalous diffusion revisited via a fractional Langevin equation

Absract: In this talk the Brownian motion is revisited on the basis of the fractional Langevin equation which turns out to be a particular case of the generalized Langevin equation introduced by Kubo in 1966. The importance of this approach is to model the Brownian motion more realistically than the usual one based on the classical Langevin equation, in that it takes into account also the retarding effects due to hydrodynamic back-flow, i.e. the added mass and the Basset memory drag. We provide the analytical expressions of the correlation functions (both for the random force and the particle velocity) and of the mean squared particle displacement. The random force has been shown to be represented by a superposition of the usual white noise with a "fractional" noise. The velocity correlation function is no longer expressed by a simple exponential but exhibits a slower decay, proportional to $t^{-3/2}$ for long times, which indeed is more realistic. Finally, the mean squared displacement is shown to maintain, for sufficiently long times, the linear behaviour which is typical of normal diffusion, with the same diffusion coefficient of the classical case. However, the Basset history force induces a retarding effect in the establishing of the linear behaviour, which in some cases could appear as a manifestation of anomalous diffusion to be correctly interpreted in experimental measurements. - Speaker:
**Zsolt Pajor-Gyulai, Department of Mathematics, University of Maryland at College Park , Thursday 30 April 2015 NOTE: Seminar will be in MCS B25!!!**

Title: From averaging to homogenization in cellular flows - an exact description of the transition

Absract: We consider a two-parameter averaging-homogenization type elliptic problem together with the stochastic representation of the solution. A limit theorem is derived for the corresponding diffusion process and a precise description of the two-parameter limit behavior for the solution of the PDE is obtained. Joint work with M. Hairer and L. Koralov.

- Speaker:
**Liliya Zax, Department of Mathematics and Statisitcs, Boston University , Thursday 10 September 2015, 4-5 at MA B33**

Title: Statistics application in industry: financial institutions and tech companies

Absract: In my presentation I would share some of the aspects of my statistics related experience in different industries, namely in financial and technology companies. We would discuss some specific statistical problems that are of interest to the industry, what statistical tools do they use to try to solve those problems, and what are the statistical challenges that they are facing. The goal of the presentation is to help students to understand better how knowledge and skills they get in their academic programs can be later applied if they prefer to continue their career in industry. - Speaker:
**Leu Guo, College of Communication, Boston University , Thursday 17 September 2015**

Title: The power of message networks: Semantic network analysis of media effects in Twittersphere during the 2012 U.S. presidential election.

Absract: Do traditional news media still lead public opinion in this digital age? This talk will present a study that explores how media such as newspapers and televisions set the public agenda through constructing message networks. Semantic network analysis and big data analytics were used to examine the large dataset collected on Twitter during the 2012 U.S. presidential election. - Speaker:
**Philippe Rigollet, Department of Mathematics, MIT , Thursday 1 October 2015**

Title: Batched Bandits

Absract: Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic multi-armed bandits under the constraint that the employed policy must split trials into a small number of batches. Our results show that a very small number of batches gives already close to minimax optimal regret bounds and we also evaluate the number of trials in each batch. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits. [Joint with V. Perchet, S. Chassang and E. Snowberg]. - Speaker:
**John Harlim, Department of Mathematics, Penn State University , Thursday 8 October 2015**

Title: Diffusion Forecast: A Nonparametric Modeling Approach

Absract: I will discuss a nonparametric modeling approach for forecasting stochastic dynamical systems on low-dimensional manifolds. In the limit of large data, this approach converges to a Galerkin projection of the semigroup solution of the backward Kolmogorov equation of the underlying dynamics on a basis adapted to the invariant measure. This approach allows one to evolve the probability distribution of non-trivial dynamical systems with equation-free modeling. I will also discuss nonparametric filtering methods, leveraging the diffusion forecast in Bayesian framework to initialize the forecasting distribution given noisy observations. - Speaker:
**Pierre Jacob, Department of Statistics, Harvard University , Thursday, 15 October 2015**

Title: Estimation of the Derivatives of Functions That Can Only Be Evaluated With Noise

Absract: Iterated Filtering methods have recently been introduced to perform maximum likelihood parameter estimation in state-space models, and they only require being able to simulate the latent Markov model according to its prior distribution. They rely on an approximation of the score vector for general statistical models based upon an artificial posterior distribution and bypasses the calculation of any derivative. We show here that this score estimator can be derived from a simple application of Stein’s lemma and how an additional application of this lemma provides an original derivative-free estimator of the observed information matrix. These methods tackle the general problem of estimating the first two derivatives of a function that can only be evaluated point-wise with some noise. We compare these new methods with finite difference schemes and make connections with proximal mappings. In particular we look at the bias and variance of these estimators, the effect of the variance of the noise, and the effect of the dimension of the parameter space. - Speaker:
**Jian Zhou, Department of Mathematical Sciences, Worcester Polytechnic Institute , Thursday 22 October 2015**

Title: Volatility Inference Using High-Frequency Financial Data and Efficient Computations

Absract: The field of high-frequency finance has experienced a rapid evolvement over the past few decades. One focus point is volatility modeling and analysis for high-frequency financial data. It plays a major role in finance and economics. In this talk, we focus on the statistical inference problem on large volatility matrix using high-frequency financial data, and propose a methodology to tackle this problem under various settings. We illustrate the methodology with the high-frequency price data on stocks traded in New York Stock Exchange in 2013. The theory and numerical results show that our approach perform well while pooling together the strengths of regularization and estimation from a high-frequency finance perspective. - Speaker:
**Markos Katsoulakis, Department of Mathematics and Statistics, UMass Amherst , Thursday 29 October 2015**

Title: Path-space information metrics for uncertainty quantification and coarse-graining of molecular systems

Absract: We present path-space, information theory-based, sensitivity analysis, uncertainty quantification and variational inference methods for complex high-dimensional stochastic dynamics, including chemical reaction networks with hundreds of parameters, Langevin-type equations and lattice kinetic Monte Carlo. We establish their connections with goal-oriented methods in terms of new, sharp, uncertainty quantification inequalities that scale appropriately at both long times and for high dimensional state and parameter space. The combination of proposed methodologies is capable to (a) tackle non-equilibrium processes, typically associated with coupled physicochemical mechanisms or boundary conditions, such as reaction-diffusion problems, and where even steady states are unknown altogether, e.g. do not have a Gibbs structure. The path-wise information theory tools, (b) yield a surprisingly simple, tractable and easy-to-implement approach to quantify and rank parameter sensitivities, as well as (c) provide reliable parameterizations for coarse-grained molecular systems based on fine-scale data, and rational model selection through path-space (dynamics-based) variational inference methods. - Speaker:
**Iddo Ben-Ari, Department of Mathematics, University of Connecticut , Thursday 5 November 2015**

Title: The Bak-Sneppen Model of Biological Evolution and Related Models

Absract: The Bak-Sneppen model is a Markovian model for biological evolution that was introduced as an example for Self-Organized Criticality. In this model, a population of size N evolves according to the following rule. The population is arranged on a circle, or more generally a connected graph. Each individual is assigned a random fitness, uniform on [0,1], independent of the other fitness of the other individuals. At each unit of time, the least fit individual and its neighbors are removed from the population, and are replaced by new individuals. Despite being extremely simple, the model is known to be very challenging, and the evidence for Self-Organized Criticality provided by Bak and Sneppen was obtained through numerical simulations. I will review the main rigorous results on this model, mostly due to R. Meester and his coauthors, present some new results and open problems. I will then turn to a recent and more tractable variants of the model, in which on the one hand the spatial structure is relaxed, while on the other hand the population size is random. I will focus on the functional central limit for model, which has a somewhat unusual form. - Speaker:
**Mokshay Madiman, Department of Mathematical Sciences, University of Delaware , Thursday 12 November 2015**

Title: Optimal Concentration of Information for Log-Concave Distributions

Absract: It was shown by Bobkov and the speaker that for a random vector X in R^n drawn from a log-concave density e^{-V}, the information content per coordinate, namely V(X)/n, is highly concentrated about its mean. Their argument was nontrivial, involving the localization technique, and also gave suboptimal exponents, but it was sufficient to demonstrate that high-dimensional log-concave measures are in a sense close to uniform distributions on the annulus between 2 nested convex sets. We will present recent work that obtains an optimal concentration bound in this setting (optimal even in the constant terms, not just the exponent), using very simple techniques, and outline the proof. Applications that motivated the development of these results include high-dimensional convex geometry and random matrix theory, and we will outline these applications. Based on (multiple) joint works with Sergey Bobkov, Matthieu Fradelizi, and Liyao Wang. - Speaker:
**Youssef M. Marzouk, Department of Aeronautics and Astronautics, MIT , Thursday 19 November 2015**

Title: Transport maps for Bayesian computation

Absract: We will discuss how transport maps, i.e., deterministic couplings between probability measures, can enable useful new approaches to Bayesian computation. A first use involves a combination of optimal transport and Metropolis correction; here, we use continuous transportation to transform typical MCMC proposals into adapted non-Gaussian proposals, both local and global. Second, we discuss a variational approach to Bayesian inference that constructs a deterministic transport map from a reference distribution to the posterior, without resorting to MCMC. Independent and unweighted posterior samples can then be obtained by pushing forward reference samples through the map. Making either approach efficient in high dimensions, however, requires identifying and exploiting low-dimensional structure. We present new results relating sparsity of transport maps to the conditional independence structure of the target distribution, and discuss how this structure can be revealed through the analysis of certain average derivative functionals. A connection between transport maps and graphical models yields many useful algorithms for efficient ordering and decomposition---here, generalized to the continuous and non-Gaussian setting. The resulting inference algorithms involve either the direct identification of sparse maps or the composition of low-dimensional maps and rotations. We demonstrate our approaches on Bayesian inference problems arising in spatial statistics and in partial differential equations. This is joint work with Matthew Parno and Alessio Spantini. - Speaker:
**Shuyang (Ray) Bai, Department of Mathematics and Statistics, Boston University , Thursday 3 December 2015**

Title: Self-normalized resampling for time series

Absract: The inference procedures for the mean of a stationary time series are usually quite different depending on the strength of the dependence as well as the heavy tailedness of the model. In this talk, combining the ideas of resampling and self-normalization, we introduce a unified procedure which is valid under various different model assumptions. The procedure avoids estimation of any nuisance parameter, and requires only the choice of one bandwidth. Simulation examples will be given to illustrate its performance. The asymptotic theory will also be introduced. This is a joint work with Murad S. Taqqu and Ting Zhang. - Speaker:
**Vidhu Prasad – University of Massachusetts Lowell , Thursday 10 December 2015**

Title: Towers, Codes and Approximate Conjugacy

Absract: Consider the following question about an irrational rotation $T$ of the unit circle and a mixing Markov chain: is there a partition of the circle (indexed by the state space of the MC) so that the itinerary process given by $T$ and the partition has the distribution of the given Markov Chain? Furthermore, this will be true for any aperiodic measure preserving transformation (not just irrational rotation): the existence of “tower structures” for any $T$ is equivalent to the coding property above (the existence of a partition which is moved like the MC by $T$) and the latter property is equivalent to an “almost conjugacy” property for $T$. The “tower property” is generalization of one of the truly basic results in ergodic theory: (Kakutani) -Rokhlin’s Lemma.