l BU-KEIO 2016

## Probability and Statistics

### Boston University — August 15-19, 2016

Schedule:

Abstracts for contributed talks:

• Mohammadreza Aghajani (UCSD): Mean-Field Dynamics of Load-Balancing Networks with General Service Distributions

• We introduce a general framework for studying a class of randomized load balancing models in a system with a large number of servers that have generally distributed service times and use a first-come-first serve policy within each queue. Under fairly general conditions, we use an interacting measure-valued process representation to obtain hydrodynamics limits for these models, and establish a propagation of chaos result. Furthermore, we present a set of partial integro-differential equations (PDEs) whose solution can be used to approximate the transient behavior of such systems. We prove that these PDEs have a unique solution, use a numerical scheme to solve them, and demonstrate the efficacy of these approximations using Monte Carlo simulations. We also illustrate how the PDE can be used to gain insight into network performance.

• Daniel Ahelegbey (BU): Sparse Graphical Vector Autoregression: A Bayesian Approach

• This paper considers a sparsity approach for inference in large vector autoregressive (VAR) models. The approach is based on a Bayesian procedure and a graphical representation of VAR models. We discuss a Markov chain Monte Carlo algorithm for sparse graph selection, parameter estimation, and equation-specific lag selection. We show the efficiency of our algorithm on simulated data and illustrate the effectiveness of our approach in forecasting macroeconomic time series and in measuring contagion risk among financial institutions.

• Fumiya Akashi (Waseda): Empirical likelihood and self-weighting approach for hypothesis testing of infinite variance processes and its applications

• In this talk, we apply the empirical likelihood method to the testing problem for a linear hypothesis of infinite variance time series models, and in particular, a self-weighted least absolute deviation (LAD)-based empirical likelihood ratio test statistic is constructed. It is shown that the proposed test statistic converges to a standard chi-square distribution although we deal with infinite variance models. Therefore, we can carry out inference for infinite variance processes without estimating any unknown quantities of the underlying models. In other words, the proposed method is shown to simplify the procedure of the test. The finite sample performance of the proposed test is investigated by simulation experiments. It is observed that the proposed test improves the power of test compared with the classical LAD-based test.

• Takuji Arai (Keio): Local risk-minimization for Barndorff-Nielsen and Shephard models

• We obtain explicit representations of locally risk-minimizing strategies of call and put options for the Barndorff-Nielsen and Shephard models, which are Ornstein--Uhlenbeck-type stochastic volatility models. Moreover, some numerical experiments will be introduced.

• Atsushi Atsuji (Keio): Default functions and Liouville type theorems

• We encounter a default function when we ask if a local martingale is a true martingale. These functions appear in several problems in probability theory such as the theory of diffusion processes, mathematical finance, etc. In this talk we give simple remarks that the functions also play important roles in applications of stochastic calculus to geometric function theory, in particular, some Liouville type theorems for subharmonic functions and holomorphic maps.

• Prithwish Bhaumik (UT Austin): Bayesian high-dimensional quantile regression

• We consider a Bayesian high-dimensional quantile regression problem with diverging number of predictors or covariates. The error distribution of the observations is assumed to be an asymmetric Laplace distribution which may be different from the true error distribution. Sparse priors such as spike and slab type of priors are imposed on the coefficients of the covariates and inference is based on the posterior distribution. We prove a Bernstein-von Mises theorem for the posterior distribution of the coefficients.
• Luis Carvalho (BU): Bayesian Network Regularized Regression for Crime Modeling

• We present a new methodology for functional network regression on node atributes using a Laplacian operator based on edge similarities as regularizer. We show how usual regularization penalties can be cast as prior distributions on regression coefficients under a Bayesian setup, and propose a computationally efficient EM fitting procedure. We discuss a specific application to modeling residential burglary in Boston using a hierarchical model with latent indicators for "hot zones" and a conditional zero-inflated negative binomial regression for crime rates. This is joint work with Liz Upton.

• Minwoo Chae (UT Austin): Bayesian Sparse Linear Regression with Unknown Symmetric Error

• We study full Bayesian procedures for sparse linear regression when errors have a symmetric but otherwise unknown distribution. Unknown error distribution is endowed with a symmetrized Dirichlet process mixture of normal prior. For the prior of regression coefficients, a mixture of point masses at zero and Laplace distributions is considered. It is shown that the full posterior distribution is consistent in the mean Hellinger distance. The compatibility and restricted eigenvalue conditions yield the minimax convergence rate of the regression coefficients in $\ell_1$- and $\ell_2$-norms, respectively. In addition, the model selection consistency and semiparametric Bernstein-von Mises theorem are proved under stronger conditions.

• Aleksandrina Goeva (BU): Network Degree Distribution Inference Under Sampling

• Networks are widely used to model the relationships among elements in a system. Many empirical networks observed today can be viewed as samples of an underlying network, for example, large-scale online social networks. Hence, it is of fundamental interest to investigate the impact of the network sampling mechanism on the quality of characteristics estimated from the sampled network. We focus on the degree distribution as a fundamental feature. Under many popular sampling designs, this problem can be stated as a linear inverse problem characterized by an ill-conditioned matrix. This matrix relates the expectation of the sampled degree distribution to the true underlying degree distribution and depends entirely on the sampling design. We propose an approximate solution for the degree distribution by regularizing the solution of the ill-conditioned least squares problem corresponding to the naC/ve estimator. We then study the rate at which the approximate solution tends to the true solution as a function of network size and sampling rate. This provides theoretical understanding of the accuracy of the approximate solution, whose properties have previously been studied only numerically.

• Ryo Hayase (Keio): Analysis of Glycan Data using Non-negative Matrix Factorization

• Glycans are crucial for many key biological processes and their alterations are often a hallmark of diseases. The active research for the glycans as the tumor marker has been carried out in recent years. In this paper, we applied Non-negative matrix factorization (NMF) for the glycerin data to search tumor marker candidates for several types of cancers.

• Kenichi Hayashi (Keio): Model evaluation based on sensitivity and specificity

• The focus of this talk is comparison of two regression models with a binary response. Typical measures for this task are the difference of the areas under the ROC curve (AUC) and the integrated discrimination improvement (IDI). We discuss their probelms and show that the IDI can be modified to have a desirable property. This is joint work with Dr. Eguchi (ISM).

• Yukitake Ito (Keio): Forecasting Mortality Rates by Using Spatio-Temporal Data

• The Lee-Carter Model is well known as the famous classical forecasting mortality rates model that Lee and Carter (1992) suggested. In this presentation, we propose the expanded model applied to Spatio-Temporal Data by using the Lee-Carter Model framework. Then, we forecast the mortality rates including regional effects.

• Ayato Kashiyama (Keio): Annual Maximum Rainfall Analysis Using Extreme Value Theory

• In recent years, natural disasters occur frequently caused by extreme weather events. Extreme value theory aims at modeling maximum or minimum data, and in meteorological data, such data corresponds when natural disaster occurs. In this presentation, I will talk about extreme value theory and show an analytical result of annual maximum daily rainfall data from a region of Japan.

• Kei Kobayashi (Keio): Statistical analysis by tuning curvature of data spaces

• For data points distributed on a connected manifold or a geodesic metric space, the Frechet mean is a natural generalization of the ordinary Euclidean sample mean. However uniqueness of the Frechet means depends on the curvature of the space. In this talk, we first explain how the curvature of data space can play roles in data analysis. We next propose a class of transformations of the metrics for clustering and other statistical analysis by focusing the curvature. This is joint work with Henry Wynn.

• Takaaki Koike (Keio): Efficient Computation of Risk Contributions by using MCMC

• In most of financial institutions, the risk of their portfolios is measured by the economic capital. For the purpose of more detailed risk analysis, it is necessary to decompose the portfolio-wide economic capital into the sum of risk contributions by unit exposures. Despite high practical demands, computing the risk contributions is a challenging task in general. No explicit solutions are available for most risk models. In this talk, we will introduce a Markov chain Monte Carlo (MCMC)-based estimator of risk contributions when economic capital is computed by Value-at-Risk. We will demonstrate that the estimator is available and high-performing in a wide variety of risk models.

• Eric Kolaczyk (BU): Dynamic causal networks with multi-scale temporal structure

• I will discuss a novel method to model multivariate time series using dynamic causal networks. This method combines traditional multi-scale modeling and network based neighborhood selection, aiming at capturing the temporally local structure of the data while maintaining the sparsity of the potential interactions. Our multi-scale framework is based on recursive dyadic partitioning, which recursively partitions the temporal axis into finer intervals and allows us to detect local network structural changes at varying temporal resolutions. The dynamic neighborhood selection is achieved through penalized likelihood estimation, where the penalty seeks to limit the number of neighbors used to model the data. Theoretical and numerical results describing the performance of our method will be presented, as well as applications in financial economics and neuroscience. This is joint work with Xinyu Kang and Apratim Ganguly.

• Jun Li (BU): Hypothesis Testing For Multilayer Network Data

• There is a trend to analyze large collections of networks, e.g., collections of ego-centric subnetworks on Facebook. In recent work by our group, a formal notion of a space of network Graph Laplacians has been introduced and a central limit theorem has been developed based on it. Hypothesis testing is then implemented. However, in many natural and engineered systems multilayer networks arise naturally, e.g., in computational biology and neuroscience . In this project, we considered two useful classes of multilayer network, differing from each other in the form of their inter-layer connection. Defining a corresponding space of supra-Laplacians for these networks, we established the necessary geometry of this space and a central limit theorem. These results then enabled us to develop tests of various classes of hypotheses relevant to multilayer networks. Simulations were used to illustrate performance of our approach. Key words: Hypothesis Testing, Multilayer Network, Graph Laplacian, Supra-Laplacian, Network Time Series

• David Lipshutz (Brown): Differentiability of flows and sensitivity analysis of reflected Brownian motions

• Differentiability of flows and sensitivity analysis are classical topics in dynamical systems. However, the analysis of these properties for constrained processes, which arise in a variety of applications, is challenging due to the discontinuous dynamics at the boundary of the domain, and is further complicated when the boundary is non-smooth. We show that the study of both differentiability of flows and sensitivities of constrained processes in convex polyhedral domains can be largely reduced to the study of directional derivatives of an associated map, called the Skorokhod map, and we introduce an axiomatic framework to characterize these directional derivatives. In addition, we establish pathwise differentiability of a large class of reflected Brownian motions in convex polyhedral domains and show that they can be described in terms of certain constrained stochastic differential equations with time-varying domains and directions of reflection.

• Matthew Morse (BU): Bridging the Gap between Center and Tail for Multiscale Processes

• Processes characterized by two (or more) separated time scales appear naturally in many applications. Typical examples include protein folding, financial engineering, neural networks, and climate modeling. The behavior of the center of the probability distributions of these multiscale processes is governed by the central limit theorem. The tail of the distribution is governed by large deviations. We are interested in the gap between the center and the tail of the distribution. In particular, we study moderate deviations for multiscale diffusion processes. We derive the moderate deviations principle for general models and present specific examples. Applications of these results include the design of related provably efficient Monte Carlo methods.

• Daiki Nagata (Keio): The Evaluation of Catcher Framing using PITCHf/x data

• PITCHf/x system was developed by Sportvision Inc., and it has been installed in every MLB stadium since around 2006. In this talk we introduce catcher framing techniques and the evaluation of framing using PITCHf/x data. Then we apply a logistic regression model with spline smoothing to PITCHf/x location data, and extend this model to include pitchersCf, battersCf and umpiresCf contribution as random effects.

• Tomoshige Nakamura (Keio): The Problem of Treating Imputed Data as Observed Data When We Estimate the Effect of Exposure to Particulate Matter

• When we estimate the effects of exposure to particulate matters, community health survey data are often used. In such a case, information of the amount of the exposure of each subjects need to be provided. However, in many cases, the amount of exposure of some subjects cannot be observed, so estimated values computed by some methods are imputed to them, and the effects of particulate matter is analyzed as if they were observed. In this talk, I will discuss problems of estimating the effect of exposure to Particulate Matter using imputed values as observed values, which have not been paid much attention by environmental epidemiologists.

• Atsunobu Oishi (Keio): Nonparametric Estimation for Optimal Dividend Barrier based on Empirical Process

• There is a dividend problem with the application of ruin theory for insurance companies. We suppose that insurance companies will refund the part that the surplus exceeds the barrier, as dividends to the shareholders. In this presentation, we introduce a nonparametric estimation for Optimal Dividend Barrier based on empirical process.

• Hiroyuki Oka (Keio): Statistical Estimation of High-Dimensional Portfolio

• We introduce Markowitz's mean-variance optimal portfolio estimator from d times n data matrix under high dimensional setting where d is the number of assets and n is the sample size. When d/n converges in (0,1), we show inconsistency of the traditional estimator and propose a consistent estimator.

• Masayuki Sakai (Keio): Analysis of Groundwater Level at a River Without Water

• Groundwater level, precipitation and other meteorological data have been observed daily at a river without water in Tochigi prefecture. We analyze this data to find a model to describe daily change of groundwater levels using state space models.

• Michael Salins (BU): Local time and null-recurrent averaging

• We study a fast-slow system of stochastic differential equations where the fast motion is null-recurrent. We show that a rescaled version of the slow motion converges to a stochastic process that only moves when the fast motion crosses zero. The process is nontrivial, but it is constant on a set of times with full Lebesgue measure. For this reason, the limiting process cannot be described in terms of standard SDEs but it can be characterized in terms of the local time of the fast process.

• Hiroshi Shiraishi (Keio): Nonparametric Estimation for Optimal Dividend Barrier based on Laplace Transformation

• Dividends are defined as premium income whenever the insurance surplus attains a barrier level. Under the aggregate claims process taken as a compound Poisson model, optimal dividend barrier is defined as a barrier level that maximizes the expectation of the discounted dividends until ruin. We derive the optimal dividend barrier as a solution of a function following Gerber and Shiu (1997, 1998). Then, we consider the non-parametric estimation based on the empirical version of the Laplace transformation.

• Kostas Spiliopoulos (BU): Irreversible Langevin Samplers and Variance Reduction: A Large Deviations Approach and Diffusion on Graphs

• Monte Carlo methods are very popular methods to sample from high-dimensional target distributions, which very often are of Gibbs type. Markov processes that have the target distribution as their invariant measure are used to approximate the equilibrium dynamics. In this talk, we explore performance criteria based on the related large deviations theory for random measures and we focus on the diffusion setting. We find that large deviations theory can not only adequately characterize the efficiency of the approximations, but it can also be used as a vehicle to design Markov processes, whose time average optimally (in the sense of variance reduction) approximates the quantities of interest. We quantify the effect that added irreversibility has in the speed of convergence to a target Gibbs measure and to the asymptotic variance of the resulting estimator. One of our main finding is that adding irreversibility reduces the asymptotic variance of generic observables and we give an explicit characterization of when observables do not see their variances reduced in terms of a nonlinear Poisson equation. Connections to averaging problems for Hamiltonian systems and diffusion graphs will be given. Theoretical results are supplemented by simulations.

• Ryoichi Suzuki (Keio): Local risk-minimization for multidimensional Lévy markets

• Locally risk-minimizing hedging strategy (LRM, for short) is a well-known hedging method for contingent claims in a quadratic way. In this talk, we obtain an explicit representation of LRM in an incomplete financial market driven by a multidimensional Lévy process by using Malliavin calculus because in real markets, investors sell an option and want to replicate its payoff by trading many stocks.

• Hiroshi Takahashi (Tokyo Gakugei University): Topics on multi-dimensional Brox's diffusions

• In this talk, we give a survey of limiting behavior of diffusion processes in random environments. The diffusion process was introduced by Brox as a continuous time analogue of one-dimensional random walks in random environments by Sinai, and many properties have been studied. In the multi-dimensional case, though, there have been few results. After explaining Brox's one-dimensional diffusion process, we present results concerning recurrence and transience of multi-dimensional Brox's diffusion processes.

• Mengjie Wang (UT Austin): Dynamic Community Detection Using Dependent Latent Position Model

• Community detection in network analysis has drawn more and more attention recently in many areas. We perform dynamic community detection using dependent latent position model by introducing the dependence among latent positions across time. Clustering of nodes is done via clustering the corresponding latent positions in a model-based framework. The link probability between each pair of nodes is calculated through a logit link and the latent positions are modeled using a dependent Dirichlet Process mixture model. Efficient MCMC algorithms will be developed and applications are considered for both simulated and real data sets.

• Miaoyan Wang (UPenn) Higher-order tensors and their multi-mode flattenings

• Higher-order tensors (also known as multi-way arrays) arise naturally in many fields across science and engineering. Compared to matrices, tensors provide a greater flexibility in describing data, but they entail higher computational costs. Indeed, extending familiar matrix concepts such as SVDs to tensors is non-trivial and the associated computational complexity has proven to be NP-hard. One common approach to mitigate this problem is to unfold (or flatten) the tensor into a matrix and then apply classical methods developed for matrices. I will explore several aspects of this general topic. In particular, we establish general inequalities between the p-norms of any two tensor unfoldings, in which each unfolding is in one-to-one correspondence with the partition of {1,...,k}. For specially-structured tensors satisfying a generalized definition of orthogonal decomposability, we prove that the spectral norm remains invariant under specific subsets of unfolding operations. Time allowed, I will describe a new orthogonal tensor decomposition algorithm using two-mode flattening. The error bounds of the eigen-pair estimators will be given, which can be viewed as an analogue of Wedin's perturbation theorem for singular vectors of matrices.

• Michael Zhang (UT Austin): Robust and Parallel Bayesian Model Selection

• Effective and accurate model selection that takes into account model uncertainty is an important but challenging problem in modern data analysis. One of the major challenges is the computational burden required to infer huge data sets which, in general, cannot be stored or processed on one machine. Moreover, in many real data modeling scenarios we may encounter the presence of outliers and contaminations that will damage the quality of our model and variable selection. We can overcome both of these problems through a simple divide and conquer'' strategy in which we divide the observations of the full data set equally into subsets and perform inference and model selections independently on each subset. After local subset inference, we can aggregate the optimal subset model or aggregate the local model/variable selection criteria to obtain a final model. We show that by aggregating with the geometric median, we obtain results that are robust to outliers and contamination of an unknown nature.

• Ting Zhang (BU): An Introduction to Nonstationary Time Series Analysis

• In this talk, we will provide a brief introduction to nonstationary time series data, which seems to appear quite frequently in many scientific disciplines. We will cover a framework that can be used to study this type of data, and sample one or more research problems that have been studied using this framework. The talk is designed to be very friendly to graduate students.