Statistics and Probability Seminar Series -- Spring 2008
Thursday 4:00-5:00pm, Room MCS 149
(Tea served from 3:30-4:00pm, Room MCS 153)
January 24, 3:30-4:30pm, Room MCS B46
Department of Statistics,
North Carolina State University
Attachment loss (AL), the distance down a tooth's root that is no longer attached to surrounding bone by periodontal ligament, is a common measure of periodontal disease. In this paper, we develop a spatiotemporal model to monitor progression of AL. Our model is an extension of the conditionally autoregressive (CAR) prior, which spatially smooths estimates towards their neighbors. However, since AL often exhibits bursts of large values in space and time, we develop a non-stationary spatiotemporal CAR model that allows the degree of spatial and temporal smoothing to vary in different regions of the mouth. To do this, we assign each AL measurement site its own set of variance parameters and spatially smooth the variances with spatial priors. We propose a heuristic to measure the complexity of the site-specific variances, and use it to select priors that ensure parameters in the model are well-identified. In data from a clinical trial, this model improves the fit compared to the usual dynamic CAR model for 90 of 99 patients' AL measurements.
January 24, 4:30-5:30pm (Note unusual Time), JOINT TALK WITH CBD
Department of Mathematics,
University of Maryland
This work focuses on determining the response dynamics of the Leaky Integrate-and-Fire model (LIF). The LIF is the simplest neuron model that captures the essential properties of neuronal signaling: integration of inputs by a leaky, capacitive cell membrane, a voltage threshold leading to the generation of a stereotyped action potential, and a subsequent re-polarization of the voltage. As a first step the response dynamics, we compare the firing rate response of the LIF to modulations in the mean of the input and to modulations in the variance of the input, and make this comparison for a range of baseline mean and variance levels that span the two basic regimes of LIF behavior. When synapses are instantaneous, we find that the response properties for changes in the variance are quite different than for changes in the mean. Additionally, the filtering properties of the model are strongly dependent on which input parameter is perturbed, as well as the underlying regime of firing behavior. Finally, many of the response differences to perturbations in the variance versus the mean can be understood by noting that ensemble firing rate depends on a multiplicative, and hence non-linear, interaction between separate underlying factors.
January 28, (Mon) 2-3pm, Room MCS B46 (Note unusual Time and Room)
Department of Statistics and Operations Research,
University of North Carolina
Stochastic processing networks arise commonly from applications in computers, telecommunications, and large manufacturing systems. Study of stability and control for such networks is an active and important area of research. In general the networks are too complex for direct analysis and therefore one seeks tractable approximate models. Heavy traffic limit theory yields one of the most useful collection of such approximate models. Typical results in the theory say that, when the network processing resources are roughly balanced with the system load, one can approximate such systems by suitable diffusion processes that are constrained to live within certain polyhedral domains (e.g., positive orthants). Stability and control problems for such diffusion models are easier to analyze and, once these are resolved, one can then infer stability properties and construct good control policies for the original physical networks. In this talk I will consider three related problems concerning stability and long time control for such networks and their diffusion approximations.
In the first part of the talk I will present results on long time asymptotic properties, in particular geometric ergodicity, for limit diffusion models obtained from heavy traffic analysis of stochastic networks. The results will address the rate of convergence to steady state, moment estimates for steady state, uniform in time moment estimates for the process and central limit type results for time averages of such processes. In the second part of the talk I will consider invariant distributions of an important subclass of stochastic networks, namely the generalized Jackson networks (GJN). It is shown that, under natural stability and heavy traffic conditions, the invariant distributions of GJN converge to unique invariant probability distribution of the corresponding constrained diffusion model. The result leads to natural methodologies for approximation and simulation of steady state behavior of such networks. In the final part of the talk I will consider a rate control problem for stochastic processing networks with an ergodic cost criterion. It is shown that value functions and near optimal controls for limit diffusion models serve as good approximations for the same quantities for certain physical networks that are heavily loaded.
Department of Mathematics and
Department of Statistics,
Algebraic statistics advocates algebraic geometry as a useful language for discussing statistical and probabilistic problems. The starting point is the observation that many statistical models are described by algebraic constraints or parametrizations. I will try to illustrate this connection with some examples including Gaussian conditional independence models, log-linear models, and phylogenetic models.
February 25 (MONDAY) 10:00am-12:00pm
Department of Mathematics and Statistics,
Financial data are often assumed to be generated by diffusions. Using recent results of Fan et al. and a multiple comparisons procedure created by Benjamini and Hochberg, we develop a test for non-stationarity of a one-dimensional diffusion based on the time inhomogeneity of the diffusion function. Time homogeneity of the diffusion function does not necessarily imply stationarity of the underlying diffusion, a fact that is illustrated by Brownian motion. But time-inhomogeneity implies non-stationarity of the diffusion, and we use our test to infer non-stationarity when the test indicates the presence of a time-inhomogeneous diffusion function. The procedure uses a single sample path of the diffusion and involves two estimators, one temporal and one spatial. Since the procedure is based on multiple hypothesis tests, it has the advantage of indicating the degree to which the diffusion function is time-inhomogeneous.
We first apply the test to simulated data. We consider sample paths of the Ornstein-Uhlenbeck process, of a stationary diffusion with non-constant diffusion function, of a non-stationary diffusion, of the Cox-Ingersoll-Ross process, and of standard Brownian motion. The test correctly works as expected on the sample paths of the simulated diffusions. After supplying some background information on the theory of purchasing power parity (and, in particular, the difference between absolute and relative purchasing power parity), we will apply our test to both interest rate data and real exchange rate data. The application to real exchange rate data is of particular interest, since a consequence of the law of one price (or relative purchasing power parity) is that real exchange rates should be stationary. With the exception of the GBP/USD real exchange rate, we find evidence that interest rates and real exchange rates are generally non-stationary.
These results are important for practitioners who implicitly use the stationarity assumption to estimate the drift and diffusion functions. Moreover, these results are important for economists and policy-makers who, under the assumption of stationarity, make decisions about foreign aid allocations and exchange rate regimes.
This is joint work with M.S. Taqqu.
March 17 (MONDAY)
Department of Statistics,
Carnegie Mellon University.
One of the most important techniques in learning about the functioning of the brain has involved examining neuronal activity in laboratory animals under varying experimental conditions. Neural information is represented and communicated through series of action potentials, or spike trains, and the central scientific issue in many studies concerns the physiological significance that should be attached to a particular neuron firing pattern in a particular part of the brain. In addition, a major relatively new effort in neurophysiology involves the use of multielectrode recording, in which responses from dozens of neurons are recorded simultaneously. Among other things, this has made possible the construction of brain-controlled robotic devices, which could benefit people whose movement has been severely impaired.
A key statistical step is to formalize specific scientific questions in terms of point process intensity functions. In my talk I will very briefly outline some of the substantive problems my colleagues and I have examined, the progress that's been made, and the challenge of dealing with high-dimensionality of data sets.
Computer Science Department,
Carnegie Mellon University
Recent research has demonstrated that sparsity is a powerful technique in signal reconstruction and in statistical inference. Recent work shows that l1-regularized least squares regression can accurately estimate a sparse model from n noisy samples in p dimensions, even if p is much larger than n. My work in this area focuses on studying the role of sparsity in high dimensional regression when the original noisy samples are compressed, and on structure estimation in Gaussian graphical models when the graphs evolve over time. In high-dimensional regression, the sparse object is a vector b in Y = X b+ e, where X is n by p matrix such that n << p, b in Rp and e in Rn consists of i.i.d. random noise. In the classic setting, this problem is ill-imposed for p > n even for the case when e = 0. However, when the vector b is sparse, one can recover an empirical b hat that is consistent in terms of its support with true b. In joint work with John Lafferty and Larry Wasserman, we studied the regression problem under the setting that the original n input variables are compressed by a random Gaussian ensemble to m examples in p dimensions, where m << n or p. A primary motivation for this compression procedure is to anonymize the data and preserve privacy by revealing little information about the original data. We established sufficient mutual incoherence conditions on X, under which a sparse linear model can be successfully recovered from the compressed data. We characterized the number of random projections that are required for l1-regularized compressed regression to identify the nonzero coefficients in the true model with probability approaching one. In addition, we showed that l1-regularized compressed regression asymptotically predicts as well as an oracle linear model, a property called "persistence". Finally, we established upper bounds on the mutual information between the compressed and uncompressed data that decay to zero.
March 24 (Monday 2pm)
Department of Statistics,
University of Michigan
Max-stable stochastic processes arise in the limit of component-wise maxima of independent processes, under appropriate centering and normalization. In this talk, various representations of max-stable processes will be discussed. Then, in terms of these "spectral" representations, necessary and sufficient conditions for the ergodicity and mixing of stationary max-stable processes will be presented.
The large classes of moving maxima and mixed moving maxima processes are shown to be mixing. Other examples of ergodic doubly stochastic processes and non-ergodic processes will be given. The developed ergodicity and mixing conditions involve a certain measure of dependence. We will address the statistical problem of estimating this measure of dependence and discuss some open problems.
March 27 (Joint Seminar with BU Biostatistics)
Department of Biostatistics,
M. D. Anderson Cancer Center
We propose a model for covariate-dependent clustering, i.e., we develop a probability model for random partitions that is indexed by covariates. The motivating application is inference for a clinical trial. As part of the desired inference we wish to define clusters of patients. Defining a prior probability model for cluster memberships should include a regression on patient baseline covariates. We build on product partition models (PPM). We define an extension of the PPM to include the desired regression. This is achieved by including in the cohesion function a new factor that increases the probability of experimental units with similar covariates to be included in the same cluster.
We discuss implementations suitable for continuous, categorical, count and ordinal covariates.
April 7, Rm 135, 10-12am. (Note unusual Time and Room)
Department of Mathematics and Statistics
Let X(t), t = 0,±1,..., be a second order stationary random sequence with spectral density function f(l), l in [-p, p]. Denote by sn2(f) the best linear mean square one-step prediction error in predicting the random variable X(0) by the past of X(t) of length n, and let
s2(f) = s∞ 2(f) be the prediction error by the entire past. The Szegö classical "weak" theorem states that the relative prediction error dn(f) = s2n(f) - s2(f) is nonnegative and tends to zero as n → ∞.
In this talk we will present some (old and new) results that describe the rate of decrease of the relative prediction error dn(f) to zero as n → ∞, depending on the dependence structure of the underlying process X(t) and the smoothness properties of its spectral density function f(l).
We also will discuss the inverse problem: for a given rate of decrease of the relative prediction error dn(f) to zero, describe the process X(t) compatible with that rate. Specify then dependence structure of X(t) and the smoothness properties of its spectral density f(l).
Deaprtment of Mathematics and Statistics,
University of Massachusetts, Amherst
MCMC samplers often have difficulty when the target distribution is multimodal or has sharp ridges. This talk will show how the difficulties may be ameliorated by keeping two distinct copies of the state vector -- a multiset -- instead of just one.
April 14, Rm 135, 10-12am. (Note unusual Time and Room)
Faculty of Industrial Engineering and Management and
Faculty of Electrical Engineering
Technion-Israel Institute of Technology
The three basic results of classical, Euclidean, Integral Geometry are the the Kinematic Fundamental Formula, Crofton's Formula, and Steiner's (Weyl's) Formula.
After describing these results and their importance, I will describe new versions of them in Gauss space and in Gaussian function space, as well as touching briefly on some of the applications of the new results.
This is joint work with Jonathan Taylor.
Deaprtment of Statistics,
In a regression or classification problem, one often has many potential predictors (independent variables), and these predictors may interact with each other to exert non-additive effects. I will present a Bayesian approach to search for these interactions. We were motivated by the epistasis detection problem in population-based genetic association studies, i.e., to detect interactions among multiple genetic defects (mutations) that may be causal to a specific complex disease. Existing methods are either of low power or computationally infeasible when facing a large number of genetic markers, and sometimes also many quantitative traits. Aided with MCMC sampling techniques, our Bayesian method can efficiently detect interactions among many thousands of markers. We will discuss how to extend this method to deal with general classification problems. This can be viewed as an extension of the naive Bayes method.
Deaprtment of Statistics,
Data naturally represented in the form of a network, such as social and information networks, are being encountered increasingly often and have led to the development of new generative models (such as exponential random graphs and power law mechanisms) to attempt to explain the observed structure. Since it is usually prohibitively expensive to observe the entire network, sampling within the network is needed, using schemes such as snowball sampling. We will discuss strategies for both generation and sampling of networks (via importance sampling, MCMC, and iteratively), and how the two problems relate.
April 28, Rm 135, 10-12am. (Note unusual Time and Room)
Department of Math & Statistics,
If X is a Gaussian process, the diffusion equation characterizes its marginal probability density function. How about finite-dimensional distributions? For each n >= 1, we derive a system of partial differential equations which are satisfied by the probability density function of the vector (X(t1),...,X(tn)). We then show that these differential equations determine uniquely the finite-dimensional distributions of Gaussian processes. We also discuss situations where the system can replaced by a single equation, which is either one member of the system, or an aggregate equation obtained by summing all the equations in the system.
Community Health and Center for Statistical Sciences,
In many neuroscience experiments, one of the key goals is to investigate the oscillatory behavior of brain signals as quantified by spectral analysis. First, we review some basic ideas of Fourier analysis of stationary time series and highlight its connection to analysis of variance. Second, we discuss current models and methods for analyzing non-stationary processes (i.e., processes whose spectral decomposition change over time). Stochastic representations using localized basis functions will be discussed. The talk will conclude with some current investigations including spatio-temporal-spectral analysis and classification of biological signals. These methods will be illustrated using electroencephalogram (EEGs) and magnetoencephalogram (MEGs).
Information on seminars from previous semesters may be found here: Fall 2005 | Spring 2006 | Fall 2006| Spring 2007| Fall 2007|.