## Probability and Statistics Seminar at Boston University Seminar on Scaling Phenomena

 Day: Tuesdays and/or Thursdays Click here for directions to the Boston University Department of Mathematics. Campus text map, gif map , and a general area map. Time: 10am-noon Place: Room 135, Department of Mathematics, 111 Cummington St., Boston University

### Schedule for 1999-2000 (updated weekly)

Go to the most recent scheduled talk.

This is a research oriented seminar, coordinated by Professor Murad Taqqu. The main theme, this year, will be Scaling Phenomena such as self-similarity, long-range dependence, wavelets, multifractals and applications to telecommunications. But we will also talks on other subjects, in particular, Mathematical Finance.

This seminar has now become a regular feature of Boston University and is also attended by mathematicians, scientists and postdoctoral fellows in the greater Boston area. Announcements will be done by email and through my Web Page:

There is a recommended text (for some of the lectures):

S. Mallat, A Wavelet Tour of Signal Processing, Academic Press, 1998.

### FALL SEMESTER 1999

This semester the talks will usually be on Tuesdays (10-12). The first talk will be on September 28.

Tuesday, September 28, 1999:

Statistical self-similarity

In this introductory lecture, we start with basic concepts, and introduce (statistical) self-similarity and Fractional Brownian Motion (FBM).

No background knowledge is presupposed.

Tuesday, October 5, 1999:

Fractional Brownian Motion

We describe properties of Fractional Brownian Motion (FBM) and its increments, called fractional Gaussian noise (FGN).

It is not too late to join.

Tuesday, October 12, 1999:

Fractional Gaussian Noise

We describe properties of Fractional Gaussian noise (FGN) and talk about the renormalization group.

Thursday, October 14, 1999 (special day):

Estimation using wavelets

We introduce wavelets estimation ideas.

No talk on Tuesday Oct 19.

Tuesday, October 26, 1999

Scale Invariance and Wavelets : a Review

Patrice Abry (CNRS - Physics Lab - Ecole Normale Superieure de Lyon - France)

* Scale Invariance

Self - similarity -- Long range dependence -- $1/f$ processes -- Fractal processes (and multifractal processes)

* Wavelet Transform

Definition -- Redundancy and discrete wavelet transform -- Multiresolution analysis and fast algorithm

* Wavelets and Scale Invariance

Wavelets and self similarity -- Wavelets and long range dependence -- Wavelets and $1/f$ processes -- Wavelets and (multi)fractal -- Wavelets and scale invariance

* Analysis and Estimation: The Logscale Diagram

The logscale diagram -- Estimation of the scaling parameter -- Statistical performance of the estimator -- Robustness against trends -- Comparison against other estimators

* Variations on a Theme

Joint estimation of scaling parameters -- Multifractional processes -- Multifractal processes -- $\alpha$-stable self similar processes

* Relations to other Tools

The Allan variance -- The aggregation procedure -- The fano factor and the point processes

* Applications to network traffic

Ethernet data -- Various models -- Robustness -- A single scaling parameter ?

* Internet data

The biscaling regime -- Multifractality ?

* Constancy of scaling ?

Tuesday, October 26, 1999 (Special time and place)

Patrick Flandrin (CNRS - Physics Lab - Ecole Normale Superieure de Lyon - France)

TIME: 4pm - 5:30 pm (tea at 3:30 in room 153)

PLACE: Room 149 (note special room)

We will introduce Wigner-type time-scale energy distributions, affine class, scale-invariant time-dependent spectra, and discuss their potential usefulness in scaling data analysis.

Thursday, October 28, 1999 (special day)

Wavelet Based Spectral Analysis of Discrete Time Series

Darryl Veitch (SERC, Carlton, Australia )

Strictly speaking, wavelet theory is concerned with the analysis of continuous-time processes or functions only. Some recent work will be presented showing how the spectral properties of discrete time series can nonetheless be rigorously studied through the wavelet framework. Numerical examples will be given and practical implementation issues will be discussed.

Tuesday, November 2, 1999

Modelling Search Behavior on the Internet: Caching Product Data and Search Profitability

Aviva Lev-Ari (Perotsystems)

Search for "Information" and Search for "Product/Service Data" is the most fundamental activity, following e-mail, currently conducted on the Internet. Modeling Search Behavior for Product Data available in on-line product catalogues is our selected topic.

Why Modeling Search Behavior?

Modeling Search Behavior derived from the data collected about "on-line customers" that conduct searches on Corporate Web sites and on Digital Marketplaces (DM) is targeted for applied usage in yielding the rules and guidelines required to improve areas of corporate operations such as marketing, customer service, sales, process improvement, fraud detection, product development, and product segmentation. Among the types of customer-related data, Search Behavior is most salient in its intrinsic potential for planning and prediction of business activity on-line.

Search Behavior Modeling involves measurement and estimation of the following search-related parameters:

* The probability to search for Product 'A' given the user previously searched for Product 'B'

* The probability to buy Product 'A' given the user searched for Product 'A'

* Six Prediction scenarios for "search-buy" seasonality, cyclicality, unexpected demand and existing vs. new customers

* Average Search Time for a Product searched if this Product data is or is not in Cache

* Sales in $generated by a Single Search by a Customer for a Product * Average Sales in$ per "Log on" Session

* Modeling Cache Assignment Profitability: optimal vs. random

The Range of technological functionality of these algorithms extends to information technology system management domains, Caching Customer and Product Data for Search Response Time Improvement, Network Caching for Improvement of Network Performance, System Resource Management and System Scaling.

Tuesday, November 9, 1999

An Introduction to Multifractals

What are multifractals, what are their main properties and how are they used in applications? This will be a very general talk aimed at providing the big picture. The following week Dr. Anna Gilbert of AT&T Labs-Research will talk about applications to computer network data.

Tuesday, November 16, 1999

Network inferences from scaling analysis

Anna Gilbert (AT&T Labs - Research)

In apparent contrast to the well-documented self-similar scaling behavior of measured local area traffic (LAN), recent studies suggest that wide area traffic (WAN) exhibits more complex local scaling behavior consistent with multifractals. We focus on the qualitative aspects of the corresponding wavelet-based scaling tools and discuss the physical inferences one should draw from the qualitative interpretations of these tools. This work also illustrates the role of variability in user/session and network-specific characteristics.

Tuesday, November 23, 1999

The FARIMA models

We return to our systematic presentations and develop the FARIMA models for describing time series with short and long-range dependence.

Tuesday, November 30, 1999

Integral representations

Integral representations are stochastic integrals with non-random integrands. We define them and discuss their applicability.

Tuesday, December 7, 1999

Integral representations in time series analysis

Integral representations are applied to time series analysis and linear sequences.

Tuesday, December 14, 1999

Integral representations of fractional Brownian motion

We obtain integral representations for fractional Brownian motion both in the time domain and in the spectral domain.

Note: This is the last talk of this semester. The seminar series will continue in the spring at the same time. Wait for email announcement.

### SPRING SEMESTER 2000

Tuesday, January 18, 2000

Fractional Calculus and its connections to fractional Brownian motion 1

In this first talk, we will introduce some basic fractional integral and derivative operators.

Note: This is the first talk of this semester.

Thursday, January 27, 2000

Fractional Calculus and its connections to fractional Brownian motion 2

We consider fractional integral and derivative on the real line.

Note: THE TALKS ARE FROM NOW ON WILL BE USUALLY ON THURSDAYS WITH SOME OCCASIONAL TUESDAYS.

Tuesday, February 8, 2000

Fractional Calculus and its connections to fractional Brownian motion 3

We consider the Fourier aspect of fractional integral and derivative on the real line.

Thursday, February 10, 2000

Fractional Calculus and its connections to fractional Brownian motion 4

We use fractional integrals and derivatives to obtain representations of fractional Brownian motion.

Thursday, February 17, 2000

Estimating the Orey index of a gaussian stochastic process with stationary increments: An application to financial data set

Donna Salopek (York University, Toronto Ontario)

A new method of time series data analysis will be discussed. This method is based on a strong limit theorem for a gaussian stochastic process with stationary increments. The method is used to analyse the local behaviour of a continuous time stochastic process given at finitely many equidistant time moments. In particular, an estimation of the maximal exponent of Holder's property for sample functions of a stochastic process is the motivation behind the method. This is joint work with R. Norvaisa.

Thursday, February 24, 2000

Fractional Calculus and its connections to fractional Brownian motion 5

We introduce various classes of deterministic integrands for fractional Brownian motion, compare them with each other and discuss some of their properties.

Thursday, March 16, 2000

Fractional Calculus and its connections to fractional Brownian motion 6

We further examine various classes of integrands for fractional Brownian motion, provide a number of examples and discuss some applications.

Tuesday, March 21, 2000

Stochastically Bounded Burstiness for Communication Networks

David Starobinsky (Technion-Israel Institute and UC Berkeley)

NOTE: This talk is sponsored by Electrical & Computer Eng, and will take place from 10:30 to 12 in the Photonics Building, Room 339

A network calculus is developed for processes whose burstiness is stochastically bounded by general decreasing functions. This calculus enables to derive statistical upper bounds on various performance measures, e.g., delay, at each buffer of a communication network. Our bounding methodology applies to a large class of exogenous arrival processes, including important new models of network traffic such as the fractional Brownian motion. Moreover, it allows judicious capture of the salient features of real-time traffic, such as the "cell" and "burst" characteristics of multiplexed voice traffic. This new calculus is expected to be of special interest for the efficient implementation of network services providing statistical guarantees.

Biography: David Starobinski received his B.Sc., M.Sc. and Ph.D. degrees, all in Electrical Engineering, from the Technion-Israel Institute of Technology, in 1993, 1996 and 1999 respectively. From 1993 to 1999, he was a research assistant at the Technion and served also as a lecturer, teaching assistant and project supervisor. He spent summer 1996 at the research laboratories of Sun Microsystems Corp. in Mountain View, California. He received awards from Intel Corp. and the Gutwirth Foundation for outstanding academic achievements. Since September 1999, he has been a post-doctoral researcher at the EECS Department at UC Berkeley, where he is sponsored by a fellowship from the Swiss National Science Foundation. His interests are in the general areas of high-speed and wireless networking.

Thursday, March 23, 2000

Fractional Calculus and its connections to fractional Brownian motion 7

We continue discussing applications.

Tuesday, March 28, 2000

Fractional Calculus and its connections to fractional Brownian motion 8

The Girsanov formula for fractional Brownian motion is developed . This will be the last talk of the series.

Thursday, March 30, 2000 (Note change of day)

Learning Systems and Support Vector Machines 1

Diane Watson (Boston University)

A learning system (or learning machine) is a computer program that makes decisions based on the accumulated experience contained in successfully solved cases. Mathematically, this is the problem of finding a function based on sparse data. Different types of learning machines are actually different models for the form of the solution. Once a model is chosen, the learning machine is "trained", i.e., the data is used to find the parameters (or coefficients) that will specify the unknown function.

The problem is called a pattern recognition or classification problem if the range of the function is discrete, a regression problem, if the range is continuous. The support vector machine ('SVM') is a new type of learning machine for pattern recognition and regression problems which constructs its solution in terms of a subset of the training data, the support vectors. Statistical learning theory is the theory behind support vector machines.

For historical reasons, the two talks will focus mostly on the pattern recognition problem.

This first talk involves learning systems. We will survey traditional methods for solving the classification problem and discuss means for estimating their true performance. Methods discussed will include classical statistical methods and neural nets. To judge performance, we compare the apparent error rate versus the true error rate, and discuss various methods for estimating the true error rate.

Tuesday, April 4, 2000

Learning Systems and Support Vector Machines 2

Diane Watson (Boston University)

We focus in this second talk on Support Vector Machines (SVM). We describe statistical learning theory and VC-domenion. We then introduce the process of structural risk minimization (the theory) and SVM (the method).

Thursday, April 6, 2000

A class of stationary processes with multifractal paths

Ilkka Norros (VTT - Technical Research Centre, Finland)

Self-similarity of data traffic became a hot topic in teletraffic theory in early 1990's. More recent analyses show that the statistics of traffic at small time scales is still more complicated and can be called multifractal. Aiming at developing new types of mathematical models for traffic, it is shown how stationary multifractal random measures can be constructed by multiplying independent smooth random densities with faster and faster variation. Fundamental work on this kind of measures was done by Kahane and Peyriere already in 1970's (the idea came from Mandelbrot's turbulence studies). The talk opens with data examples and explains some basic notions of multifractal analysis.

Thursday, April 13, 2000

An overview of linear orthogonal series based classifiers

Byron Shock (Boston University, Department of Cognitive and Neural Systems)

A method of pattern classification, mentioned in the literature but receiving little attention, is the focus of this talk. We introduce the general framework for orthogonal series based classification as developed by Specht (1971), Greblicki and Pawlak (1981; 1982; 1983; 1985), and others. Comparisons of this technique are made to (1) generalized linear classification and (2) orthogonal series based density estimation. Linear methods have a clear training speed advantage over generalized linear methods in that minimization of an objective function does not require an iterative optimization algorithm. Density estimation based on orthogonal series shares this speed advantage but often returns negative density values. Moreover, each class density estimate takes into account only exemplars of a single class, whereas the classification schemes that are the subject of this talk incorporate both target and non-target exemplars in constructing class decision models. We illustrate the comparative performance of these models on one- and two-dimensional data sets and discuss the potential advantages and pitfalls of all three methods. Lastly, we will discuss certain statistical properties of the orthogonal series based classification scheme. For example, when utilizing a Fourier basis, the method is strongly consistent under practical assumptions.

Go to the top of the list for time and place.