Research
My main research interests are in Bayesian inference for structured, often high-dimensional, discrete spaces, and Computational Statistics.
- Bayesian Statistics: Statistical inference (point and interval estimation) on high-dimensional discrete spaces: characterization, algorithms, and applications. Centroid estimation.
- Objective Bayes: Variable selection from invariance-based priors.
- Computational Statistics: MCMC methods in discrete structures and constrained high-dimensional discrete spaces. Graphical models.
- Computational Biology: Bayesian statistical inference applied to sequence analysis, glyco-proteomics, genome-wide association studies (GWAS), and, more generally, systems biology.
- Networks: Community detection and inference in stochastic blockmodels. Network modeling, regression and regularization.
- Remote Sensing: Land cover classification and biomass assessment using satellite image data.
- Transportation Engineering: Origin-destination matrix estimation, link count based inference, traffic assignment.
Here is a long version of my CV.
Publications
I would invite any comments, reviews, critiques, or objections to these papers (especially for submitted or in-preparation papers!); please send them to my e-mail.
Recent and Selected Articles
- Wang, L. and Carvalho, L. E.,
Deviance Matrix Factorization.
Electronic Journal of Statistics, 17(2), 3762-3810, 2023.
doi:10.1214/23-EJS2174
. - Reynolds, D. and Carvalho, L. E.,
A Latent Association Graph Model for Frequent Itemset Mining,
Computational Statistics and Data Analysis, 160, 107229, 2021.
doi:10.1016/j.csda.2021.107229
. - Pitombeira Neto, A. R., Loureiro, C. F. G., and Carvalho, L. E.,
A Dynamic Hierarchical Bayesian Model for the Estimation of Day-to-Day
Origin-Destination Flows in Transportation Networks,
Networks and Spatial Economics, 20, 499-527, 2020.
doi:10.1007/s11067-019-09490-5
. - Baccini, A., Walker, W., Carvalho, L. E., Farina, M., and Houghton, R. A.,
Response to Comment on "Tropical forests are a net carbon source based on
aboveground measurements of gain and loss",
Science, 363 (6423), eaat1205, 2019.
doi:10.1126/science.aat1205
. - Pitombeira Neto, A. R., Loureiro, C. F. G., and Carvalho, L. E.,
Bayesian Inference on Dynamic Linear Models of Day-to-Day
Origin-Destination Flows in Transportation Networks,
Urban Science, 2 (4), 117, 2018.
doi:10.3390/urbansci2040117
. - Klein, J., Carvalho, L. E., Zaia, J.,
Application of Network Smoothing to Glycan LC-MS Profiling,
Bioinformatics, 34(20), 3511–3518, 2018.
doi:10.1093/bioinformatics/bty397
. - Glanz, H. and Carvalho, L. E.,
An Expectation-Maximization Algorithm for the Matrix Normal Distribution
with an Application in Remote Sensing,
Journal of Multivariate Analysis, 167, 31–48, 2017.
doi:10.1016/j.jmva.2018.03.010
. - Baccini, A., Walker, W., Carvalho, L. E., Farina, M., Sulla-Menashe, D., and Houghton, R. A.,
Tropical Forests Are a Net Carbon Source Based on New Measurements of Gain and Loss,
Science, 358 (6360), 230–234, 2017.
doi:10.1126/science.aam5962
. - Johnston, I., Hancock, T., Mamitsuka, H., and Carvalho, L. E.,
Gene-Proximity Models for Genome-Wide Association Studies,
Annals of Applied Statistics, 10 (3), 1217–1244, 2016.
doi:10.1214/16-AOAS907
. - Peng, L. and Carvalho, L. E.,
Bayesian Degree-Corrected Stochastic Blockmodels for Community Detection,
Electronic Journal of Statistics, 10 (2), 2746–2779, 2016.
doi:10.1214/16-EJS1163
. - Pham, L. M., Carvalho, L. E., Schaus, S., and Kolaczyk, E. D.,
Perturbation Detection Through Modeling of Gene Expression on a Latent Biological Pathway Network: A Bayesian Hierarchical Approach,
Journal of the American Statistical Association, 111 (513), 73–92, 2015.
doi:10.1080/01621459.2015.1110523
.
Submitted
- Ahelegbey, D. F., Carvalho, L. E., Kolaczyk, E., A Bayesian Covariance Graphical and Latent Position Model for Multivariate Financial Time Series. [ ArXiv ]
- Upton, E. and Carvalho, L. E., Bayesian Network Regularized Regression for Modeling Urban Crime Occurrences. [ ArXiv ]
Teaching
Fall 2022: Statistical Practicum 1 (MA 675) and Applied Statistical Modeling (MA 678).
Spring 2023: Statistical Practicum 2 (MA 676) and Data Science with R (MA 415/615).
Previous courses: Basic Statistics and Probability (MA 213), Applied Statistics (MA 214), Data Science in R (MA 415/615), Linear Models (MA 575), Generalized Linear Models (MA 576), Bayesian Statistics (MA 578), Computational Statistics (MA 589), Statistical Machine Learning (MA 751).
Students
Current
- Dan Cunha (PhD)
- Likun Chou (PhD)
- Man Huang (PhD)
Past
- Liang Wang (PhD 2023, currently Senior Data Scientist, Fidelity)
- Ryan Frost (PhD 2022, currently Applied Research Scientist, Ethos)
- David Reynolds (PhD 2021, currently Lecturer, Statistics, University of New Hampshire)
- Elizabeth Upton (PhD 2019, currently Assistant Professor, Statistics, Williams College)
- Ian Johnston (PhD 2015, currently VP Data Scientist Manager, Morgan Stanley)
- Lijun Peng (PhD 2015, currently Senior Staff Engineer, LinkedIn)
- Hunter Glanz (PhD 2014, currently Associate Professor, Statistics, Cal Poly San Luis Obispo)
Software
Here are my R packages:
dmf
implements deviance matrix factorizations, a generalization of the singular value decomposition to data matrices that follow exponential family distributions.gaussquadr
is a simple wrapper around code in Netlib for Gauss-Legendre quadratures.
I am very fond of a powerful, fast, light scripting language called Lua:
- Numeric Lua is a numerical package for the Lua programming language. It includes support for complex numbers, multidimensional matrices, random number generation, and special functions.
- Simulua is a discrete-event simulation library for Lua, in the same tradition and flavor of the SIMULA family of programming languages.
I have also developed a few extensions to PostgreSQL:
- PL/Lua is an implementation of Lua as a loadable procedural language for PostgreSQL: with PL/Lua you can use PostgreSQL functions and triggers written in the Lua programming language.
- PostBio is a set of bioinformatics extensions for PostgreSQL.
- PostStat is a set of statistics extensions for PostgreSQL.