l
BU-KEIO 2016
|
BOSTON UNIVERSITY/KEIO UNIVERSITY WORKSHOP
2016
Probability and Statistics
Boston University — August 15-19, 2016
|
|
Schedule:
- Return to the schedule here.
Abstracts for contributed talks:
- Mohammadreza Aghajani (UCSD):
Mean-Field Dynamics of Load-Balancing Networks with General Service
Distributions
-
We introduce a general framework for studying a class of randomized load
balancing models
in a system with a large number of servers that have generally distributed
service times
and use a first-come-first serve policy within each queue. Under fairly
general conditions,
we use an interacting measure-valued process representation to obtain
hydrodynamics limits
for these models, and establish a propagation of chaos result. Furthermore, we
present a set
of partial integro-differential equations (PDEs) whose solution can be used to
approximate
the transient behavior of such systems. We prove that these PDEs have a unique
solution,
use a numerical scheme to solve them, and demonstrate the efficacy of these
approximations
using Monte Carlo simulations. We also illustrate how the PDE can be used to
gain insight
into network performance.
- Daniel Ahelegbey (BU):
Sparse Graphical Vector Autoregression: A Bayesian Approach
-
This paper considers a sparsity approach for inference in large vector
autoregressive
(VAR) models. The approach is based on a Bayesian procedure and a graphical
representation
of VAR models. We discuss a Markov chain Monte Carlo algorithm for sparse
graph selection,
parameter estimation, and equation-specific lag selection. We show the
efficiency of our
algorithm on simulated data and illustrate the effectiveness of our approach
in forecasting
macroeconomic time series and in measuring contagion risk among financial
institutions.
- Fumiya Akashi (Waseda):
Empirical likelihood and self-weighting approach for hypothesis testing of
infinite variance processes and its applications
-
In this talk, we apply the empirical likelihood method to the testing problem
for a linear hypothesis of infinite variance time series models, and in
particular, a self-weighted least absolute deviation (LAD)-based empirical
likelihood ratio test statistic is constructed. It is shown that the proposed
test statistic converges to a standard chi-square distribution although we
deal with infinite variance models. Therefore, we can carry out inference for
infinite variance processes without estimating any unknown quantities of the
underlying models. In other words, the proposed method is shown to simplify
the procedure of the test. The finite sample performance of the proposed test
is investigated by simulation experiments. It is observed that the proposed
test improves the power of test compared with the classical LAD-based test.
- Takuji Arai (Keio):
Local risk-minimization for Barndorff-Nielsen and Shephard models
-
We obtain explicit representations of locally risk-minimizing strategies of
call and put options for the Barndorff-Nielsen and Shephard models, which are
Ornstein--Uhlenbeck-type stochastic volatility models. Moreover, some
numerical experiments will be introduced.
- Atsushi Atsuji (Keio):
Default functions and Liouville type theorems
-
We encounter a default function when we ask if a local martingale is a true
martingale. These functions appear in several problems in probability theory
such as the theory of diffusion processes, mathematical finance, etc. In this
talk we give simple remarks that the functions also play important roles in
applications of stochastic calculus to geometric function theory, in
particular, some Liouville type theorems for subharmonic functions and
holomorphic maps.
- Prithwish Bhaumik (UT Austin):
Bayesian
high-dimensional quantile regression
-
We consider a Bayesian high-dimensional quantile regression problem with
diverging number of predictors or covariates. The error distribution of the
observations is assumed to be an asymmetric Laplace distribution which may be
different from the true error distribution. Sparse priors such as spike and
slab type of priors are imposed on the coefficients of the covariates and
inference is based on the posterior distribution. We prove a Bernstein-von
Mises theorem for the posterior distribution of the coefficients.
- Luis Carvalho (BU):
Bayesian Network
Regularized Regression for Crime Modeling
-
We present a new methodology for functional network regression on node
atributes using a Laplacian operator based on edge similarities as
regularizer. We show how usual regularization penalties can be cast as prior
distributions on regression coefficients under a Bayesian setup, and propose
a computationally efficient EM fitting procedure. We discuss a specific
application to modeling residential burglary in Boston using a hierarchical
model with latent indicators for "hot zones" and a conditional zero-inflated
negative binomial regression for crime rates. This is joint work with Liz
Upton.
- Minwoo Chae (UT Austin):
Bayesian Sparse Linear Regression with Unknown Symmetric Error
-
We study full Bayesian procedures for sparse linear regression when errors
have a symmetric but otherwise unknown distribution. Unknown error
distribution is endowed with a symmetrized Dirichlet process mixture of normal
prior. For the prior of regression coefficients, a mixture of point masses at
zero and Laplace distributions is considered. It is shown that the full
posterior distribution is consistent in the mean Hellinger distance. The
compatibility and restricted eigenvalue conditions yield the minimax
convergence rate of the regression coefficients in $\ell_1$- and
$\ell_2$-norms, respectively. In addition, the model selection consistency and
semiparametric Bernstein-von Mises theorem are proved under stronger conditions.
- Aleksandrina Goeva (BU):
Network Degree Distribution Inference Under Sampling
-
Networks are widely used to model the relationships among elements in a
system. Many empirical networks observed today can be viewed as samples of an
underlying network, for example, large-scale online social networks. Hence, it
is of fundamental interest to investigate the impact of the network sampling
mechanism on the quality of characteristics estimated from the sampled
network. We focus on the degree distribution as a fundamental feature. Under
many popular sampling designs, this problem can be stated as a linear inverse
problem characterized by an ill-conditioned matrix. This matrix relates the
expectation of the sampled degree distribution to the true underlying degree
distribution and depends entirely on the sampling design. We propose an
approximate solution for the degree distribution by regularizing the solution
of the ill-conditioned least squares problem corresponding to the naC/ve
estimator. We then study the rate at which the approximate solution tends to
the true solution as a function of network size and sampling rate. This
provides theoretical understanding of the accuracy of the approximate
solution, whose properties have previously been studied only numerically.
- Ryo Hayase (Keio):
Analysis of Glycan
Data using Non-negative Matrix Factorization
-
Glycans are crucial for many key biological processes and their
alterations are often a hallmark of diseases. The active research for the
glycans as the tumor marker has been carried out in recent years. In this
paper, we applied Non-negative matrix factorization (NMF) for the glycerin
data to search tumor marker candidates for several types of cancers.
- Kenichi Hayashi (Keio):
Model evaluation
based on sensitivity and specificity
-
The focus of this talk is comparison of two regression models with a binary
response. Typical measures for this task are the difference of the areas under
the ROC curve (AUC) and the integrated discrimination improvement (IDI). We
discuss their probelms and show that the IDI can be modified to have a
desirable property. This is joint work with Dr. Eguchi (ISM).
- Yukitake Ito (Keio):
Forecasting
Mortality Rates by Using Spatio-Temporal Data
-
The Lee-Carter Model is well known as the famous classical forecasting
mortality rates model that Lee and Carter (1992) suggested. In this
presentation, we propose the expanded model applied to Spatio-Temporal Data by
using the Lee-Carter Model framework. Then, we forecast the mortality rates
including regional effects.
- Ayato Kashiyama (Keio):
Annual Maximum Rainfall Analysis Using Extreme Value Theory
-
In recent years, natural disasters occur frequently caused by extreme weather
events. Extreme value theory aims at modeling maximum or minimum data, and in
meteorological data, such data corresponds when natural disaster occurs. In
this presentation, I will talk about extreme value theory and show an
analytical result of annual maximum daily rainfall data from a region of Japan.
- Kei Kobayashi (Keio):
Statistical
analysis by tuning curvature of data spaces
-
For data points distributed on a connected manifold or a geodesic metric
space, the Frechet mean is a natural generalization of the ordinary Euclidean
sample mean. However uniqueness of the Frechet means depends on the curvature
of the space. In this talk, we first explain how the curvature of data space
can play roles in data analysis. We next propose a class of transformations of
the metrics for clustering and other statistical analysis by focusing the
curvature. This is joint work with Henry Wynn.
- Takaaki Koike (Keio):
Efficient Computation of Risk Contributions by using MCMC
-
In most of financial institutions, the risk of their portfolios is measured by
the economic capital. For the purpose of more detailed risk analysis, it is
necessary to decompose the portfolio-wide economic capital into the sum of
risk contributions by unit exposures. Despite high practical demands,
computing the risk contributions is a challenging task in general. No explicit
solutions are available for most risk models. In this talk, we will introduce
a Markov chain Monte Carlo (MCMC)-based estimator of risk contributions when
economic capital is computed by Value-at-Risk. We will demonstrate that the
estimator is available and high-performing in a wide variety of risk models.
- Eric Kolaczyk (BU):
Dynamic causal networks with multi-scale temporal structure
-
I will discuss a novel method to model multivariate time series using dynamic
causal networks. This method combines traditional multi-scale modeling and
network based neighborhood selection, aiming at capturing the temporally local
structure of the data while maintaining the sparsity of the potential
interactions. Our multi-scale framework is based on recursive dyadic
partitioning, which recursively partitions the temporal axis into finer
intervals and allows us to detect local network structural changes at varying
temporal resolutions. The dynamic neighborhood selection is achieved through
penalized likelihood estimation, where the penalty seeks to limit the number
of neighbors used to model the data. Theoretical and numerical results
describing the performance of our method will be presented, as well as
applications in financial economics and neuroscience. This is joint work with
Xinyu Kang and Apratim Ganguly.
- Jun Li (BU):
Hypothesis Testing For
Multilayer Network Data
-
There is a trend to analyze large collections of networks, e.g., collections
of ego-centric subnetworks on Facebook. In recent work by our group, a formal
notion of a space of network Graph Laplacians has been introduced and a
central limit theorem has been developed based on it. Hypothesis testing is
then implemented. However, in many natural and engineered systems multilayer
networks arise naturally, e.g., in computational biology and neuroscience . In
this project, we considered two useful classes of multilayer network,
differing from each other in the form of their inter-layer
connection. Defining a corresponding space of supra-Laplacians for these
networks, we established the necessary geometry of this space and a central
limit theorem. These results then enabled us to develop tests of various
classes of hypotheses relevant to multilayer networks. Simulations were used
to illustrate performance of our approach.
Key words: Hypothesis Testing, Multilayer Network, Graph Laplacian,
Supra-Laplacian, Network Time Series
- David Lipshutz (Brown):
Differentiability of flows and sensitivity analysis of reflected Brownian
motions
-
Differentiability of flows and sensitivity analysis are classical topics in
dynamical systems. However, the analysis of these properties for constrained
processes, which arise in a variety of applications, is challenging due to the
discontinuous dynamics at the boundary of the domain, and is further
complicated when the boundary is non-smooth. We show that the study of both
differentiability of flows and sensitivities of constrained processes in
convex polyhedral domains can be largely reduced to the study of directional
derivatives of an associated map, called the Skorokhod map, and we introduce
an axiomatic framework to characterize these directional derivatives. In
addition, we establish pathwise differentiability of a large class of
reflected Brownian motions in convex polyhedral domains and show that they can
be described in terms of certain constrained stochastic differential equations
with time-varying domains and directions of reflection.
- Matthew Morse (BU):
Bridging the Gap between Center and Tail for Multiscale Processes
-
Processes characterized by two (or more) separated time scales appear
naturally in many applications. Typical examples include protein folding,
financial engineering, neural networks, and climate modeling. The behavior of
the center of the probability distributions of these multiscale processes is
governed by the central limit theorem. The tail of the distribution is
governed by large deviations. We are interested in the gap between the center
and the tail of the distribution. In particular, we study moderate deviations
for multiscale diffusion processes. We derive the moderate deviations
principle for general models and present specific examples. Applications of
these results include the design of related provably efficient Monte Carlo methods.
- Daiki Nagata (Keio):
The Evaluation of
Catcher Framing using PITCHf/x data
-
PITCHf/x system was developed by Sportvision Inc., and it has been installed
in every MLB stadium since around 2006. In this talk we introduce catcher
framing techniques and the evaluation of framing using PITCHf/x data. Then we
apply a logistic regression model with spline smoothing to PITCHf/x location
data, and extend this model to include pitchersCf, battersCf and umpiresCf
contribution as random effects.
- Tomoshige Nakamura (Keio):
The Problem of Treating Imputed Data as Observed Data When We Estimate the
Effect of Exposure to Particulate Matter
-
When we estimate the effects of exposure to particulate matters, community
health survey data are often used. In such a case, information of the amount
of the exposure of each subjects need to be provided. However, in many cases,
the amount of exposure of some subjects cannot be observed, so estimated
values computed by some methods are imputed to them, and the effects of
particulate matter is analyzed as if they were observed. In this talk, I will
discuss problems of estimating the effect of exposure to Particulate Matter
using imputed values as observed values, which have not been paid much
attention by environmental epidemiologists.
- Atsunobu Oishi (Keio):
Nonparametric
Estimation for Optimal Dividend Barrier based on Empirical Process
-
There is a dividend problem with the application of ruin theory for insurance
companies. We suppose that insurance companies will refund the part that the
surplus exceeds the barrier, as dividends to the shareholders. In this
presentation, we introduce a nonparametric estimation for Optimal Dividend
Barrier based on empirical process.
- Hiroyuki Oka (Keio):
Statistical Estimation
of High-Dimensional Portfolio
-
We introduce Markowitz's mean-variance optimal portfolio estimator from d
times n data matrix under high dimensional setting where d is the number of
assets and n is the sample size. When d/n converges in (0,1), we show
inconsistency of the traditional estimator and propose a consistent estimator.
- Masayuki Sakai (Keio):
Analysis of
Groundwater Level at a River Without Water
-
Groundwater level, precipitation and other meteorological data have been
observed daily at a river without water in Tochigi prefecture. We analyze
this data to find a model to describe daily change of groundwater levels
using state space models.
- Michael Salins (BU):
Local time and null-recurrent averaging
-
We study a fast-slow system of stochastic differential equations where the
fast motion is null-recurrent. We show that a rescaled version of the slow
motion converges to a stochastic process that only moves when the fast motion
crosses zero. The process is nontrivial, but it is constant on a set of times
with full Lebesgue measure. For this reason, the limiting process cannot be
described in terms of standard SDEs but it can be characterized in terms of
the local time of the fast process.
- Hiroshi Shiraishi (Keio):
Nonparametric Estimation for Optimal Dividend Barrier based on Laplace
Transformation
-
Dividends are defined as premium income whenever the insurance surplus attains
a barrier level. Under the aggregate claims process taken as a compound
Poisson model, optimal dividend barrier is defined as a barrier level that
maximizes the expectation of the discounted dividends until ruin. We derive
the optimal dividend barrier as a solution of a function following Gerber and
Shiu (1997, 1998). Then, we consider the non-parametric estimation based on
the empirical version of the Laplace transformation.
- Kostas Spiliopoulos (BU):
Irreversible Langevin Samplers and Variance Reduction: A Large Deviations
Approach and Diffusion on Graphs
-
Monte Carlo methods are very popular methods to sample from high-dimensional
target distributions, which very often are of Gibbs type. Markov processes
that have the target distribution as their invariant measure are used to
approximate the equilibrium dynamics. In this talk, we explore performance
criteria based on the related large deviations theory for random measures and
we focus on the diffusion setting. We find that large deviations theory can
not only adequately characterize the efficiency of the approximations, but it
can also be used as a vehicle to design Markov processes, whose time average
optimally (in the sense of variance reduction) approximates the quantities of
interest. We quantify the effect that added irreversibility has in the speed
of convergence to a target Gibbs measure and to the asymptotic variance of the
resulting estimator. One of our main finding is that adding irreversibility
reduces the asymptotic variance of generic observables and we give an explicit
characterization of when observables do not see their variances reduced in
terms of a nonlinear Poisson equation. Connections to averaging problems for
Hamiltonian systems and diffusion graphs will be given. Theoretical results
are supplemented by simulations.
- Ryoichi Suzuki (Keio):
Local risk-minimization for multidimensional Lévy markets
-
Locally risk-minimizing hedging strategy (LRM, for short) is a well-known
hedging method for contingent claims in a quadratic way. In this talk, we
obtain an explicit representation of LRM in an incomplete financial market
driven by a multidimensional Lévy process by using Malliavin calculus
because in real markets, investors sell an option and want to replicate its
payoff by trading many stocks.
- Hiroshi Takahashi (Tokyo Gakugei University):
Topics on multi-dimensional Brox's diffusions
-
In this talk, we give a survey of limiting behavior of diffusion processes in
random environments. The diffusion process was introduced by Brox as a
continuous time analogue of one-dimensional random walks in random
environments by Sinai, and many properties have been studied. In the
multi-dimensional case, though, there have been few results. After explaining
Brox's one-dimensional diffusion process, we present results concerning
recurrence and transience of multi-dimensional Brox's diffusion processes.
- Mengjie Wang (UT Austin):
Dynamic Community
Detection Using Dependent Latent Position Model
-
Community detection in network analysis has drawn more and more attention
recently in many areas. We perform dynamic community detection using dependent
latent position model by introducing the dependence among latent positions
across time. Clustering of nodes is done via clustering the corresponding
latent positions in a model-based framework. The link probability between each
pair of nodes is calculated through a logit link and the latent positions are
modeled using a dependent Dirichlet Process mixture model. Efficient MCMC
algorithms will be developed and applications are considered for both
simulated and real data sets.
- Miaoyan Wang (UPenn)
Higher-order tensors and their multi-mode flattenings
-
Higher-order tensors (also known as multi-way arrays) arise naturally in many
fields across science and engineering. Compared to matrices, tensors provide
a greater flexibility in describing data, but they entail higher
computational costs. Indeed, extending familiar matrix concepts such as SVDs
to tensors is non-trivial and the associated computational complexity has
proven to be NP-hard.
One common approach to mitigate this problem is to unfold (or flatten) the
tensor into a matrix and then apply classical methods developed for
matrices. I will explore several aspects of this general topic. In particular,
we establish general inequalities between the p-norms of any two tensor
unfoldings, in which each unfolding is in one-to-one correspondence with the
partition of {1,...,k}. For specially-structured tensors satisfying a
generalized definition of orthogonal decomposability, we prove that the
spectral norm remains invariant under specific subsets of unfolding
operations. Time allowed, I will describe a new orthogonal tensor
decomposition algorithm using two-mode flattening. The error bounds of the
eigen-pair estimators will be given, which can be viewed as an analogue of
Wedin's perturbation theorem for singular vectors of matrices.
- Michael Zhang (UT Austin):
Robust and Parallel Bayesian Model Selection
-
Effective and accurate model selection that takes into account model
uncertainty is an important but challenging problem in modern data
analysis. One of the major challenges is the computational burden required to
infer huge data sets which, in general, cannot be stored or processed on one
machine. Moreover, in many real data modeling scenarios we may encounter the
presence of outliers and contaminations that will damage the quality of our
model and variable selection. We can overcome both of these problems through a
simple ``divide and conquer'' strategy in which we divide the observations of
the full data set equally into subsets and perform inference and model
selections independently on each subset. After local subset inference, we can
aggregate the optimal subset model or aggregate the local model/variable
selection criteria to obtain a final model. We show that by aggregating with
the geometric median, we obtain results that are robust to outliers and
contamination of an unknown nature.
- Ting Zhang (BU):
An Introduction to Nonstationary Time Series Analysis
-
In this talk, we will provide a brief introduction to nonstationary time
series data, which seems to appear quite frequently in many scientific
disciplines. We will cover a framework that can be used to study this type of
data, and sample one or more research problems that have been studied using
this framework. The talk is designed to be very friendly to graduate
students.