- GW Home
- Faculty and Staff
Speaker: Joe Blitzstein
Department of Statistics
Data naturally represented in the form of a network, such as social and information networks, are being encountered increasingly often and have led to the development of new generative models (such as exponential random graphs and power law mechanisms) to attempt to explain the observed structure. However, it is usually prohibitively expensive to observe the entire network, so sampling in the network is needed. There has been comparatively little attention given to the question of what network properties are stable under what sampling schemes. We will discuss some examples where valid inferences about the structure of the network can and cannot be drawn from the sample, depending on the generative model, the sampling method, and the quantity of interest..
Time: Friday, April 25th 11:00 am - 12:00 noon
Location: Duques Hall 250
Speaker: Lotfi A. Zadeh
Department of EECS
University of California, Berkeley
Computation with imprecise probabilities is not an academic exercise-it is a bridge to reality. In the real world, imprecision of probabilities is the norm rather than exception. In large measure, real-world probabilities are perceptions of likelihood. Perceptions are intrinsically imprecise. Imprecision of perceptions entails imprecision of probabilities. In applications of probability theory it is a common practice to ignore imprecision of probabilities and treat imprecise probabilities as if they were precise. A problem with this practice is that it leads to results whose validity is open to question. Publication of Peter Walley's seminal work "Statistical Reasoning with Imprecise Probabilities," in l99l, sparked a rapid growth of interest in imprecise probabilities. Today, there is a substantive literature. The approach described in this lecture is a radical departure from the mainstream. First, imprecise probabilities are dealt with not in isolation, as in the mainstream literature, but in an environment of imprecise events, imprecise relations and imprecise constraints. Second, imprecise probability distributions are assumed to be described in a natural language. The approach is based on the formalism of Computing with Words (CW) (Zadeh 1999, 2006). In the CW-based approach, the first step involves precisiation of information described in natural language. Precisiation is achieved through representation of the meaning of a proposition, p, as a generalized constraint. A generalized constraint if an expression of the form X isr R , where X is the constrained variable, R is a constraining relation and r is an indexical variable which defines the modality of the constraint, that is, its semantics. The primary constraints are possibilistic, probabilistic and veristic. Computation follows precisiation. In the CWbased approach the objects of computation are generalized constraints. The CW-based approach to computation with imprecise probabilities enhances the ability of probability theory to deal with problems in fields such as economics, operations research, decision sciences, theory of evidence, analysis of causality and diagnostics.
Time: Friday, April 11th 3:30 - 4:30 pm
Location: Funger Hall 420
Speaker: Hamparsum Bozdogan, McKenzie Professor
Department of Statistics, Operations & Management Science
University of Tennessee
Support Vector Machines (SVMs) have been popular kernel methods in regression and classification applications. However, SVMs suffer from a number of limitations. In this paper, we propose a new and novel model selection in Bayesian Relevance Vector Machines (BRVMs) in regression and classification problems (Tipping, 2001). BRVM is a sparse kernel technique which is an improvement over the SVMs from the Bayesian learning point of view, while avoiding the limitations that exist in SVMs. Unresolved model selection issues in BRVM regression and classification include: choosing the optimal form of the kernel function among a portfolio of kernel choices for a substantive data set; the parameters of the kernel function; and the subset selection of the best predictor variables in regression and classification.
We introduce novel statistical modeling techniques based on the information-theoretic measure of complexity called ICOMP criterion developed by Bozdogan (1988, 1990, 1994, 2000, 2007) as the fitness function hybridized with the genetic algorithm (GA) as our optimizer to perform the model selection. ICOMP allows the identification of the best fitting kernel function(s) among a large portfolio of kernel functions. It measures both the lack-of-fit (LOF) and the complexity of the BRVM models. The genetic algorithm (GA) enables the rapid computation of models that would otherwise be impossible in a reasonable amount of time for subset selection for data with large number of predictor variables. We illustrate the advantages of this new approach on simulated and several benchmark data sets in regression and classification problems including the early detection of the cause of heart attack. As a conclusion, we discuss how to robustify BRVMs using general distributional models along with smooth and flexible priors to enforce a stronger sparsity in the model to achieve further Occam's Razor in regression and classification problems.
Time: Friday, April 4th 11:00 am - 12:00 noon
Location: Duques Hall 453
Speaker: Roger M. Cooke
Resources for the Future
Bayesian Belief Nets (BBNs) enjoy wide popularity in Europe as a decision support tool. The main attraction is that the directed acyclic graph provides a graphical model in which the problem owner recognizes his problem and which at the same time is a user interface for running and updating the model. Discrete BBN's support rapid updating algorithms, but involve exponential complexity that limits their use to toy problems. Continuous BBNs hold more promise. To date, only 'discrete normal' BBNs have been available. The user specifies a mean and conditional variance for each node, and the child nodes are regressed on their parents. Continuous nodes can have discrete parents but not discrete children and all continuous nodes are normal. Overcoming the restriction to normality has opened new areas for applications. A large risk model for Schiphol airport involving some 300 probabilistic nodes and 300 functional nodes will be demonstrated. Updating is facilitated by the use of the 'normal copula'. This type of BBN can be used either in a probabilistic modeling mode (user supplies distributions) or in a data mining mode (a BBN is built to model multivariate data). The latter application will be demonstrated using fine particulate emission and collector data.
Time: Friday, March 28th 11:00 am - 12:00 noon
Location: Funger Hall 320
Speaker: Ruth Levy Guyer
Haveford College, Pennsylvania
Seven years ago, I became interested in how decisions are made for babies who are born at risk. These babies are sick at birth, or are born with genetic anomalies, or are born too early. The latter group--premature babies, or preemies--have been growing in number each year for the past 25 years; currently, 500,000 preemies are born each year in just the United States.
The problems for these babies and their families can be medical, social, financial, and legal. The consequences of their premature births and illnesses can be short-lived or lifelong. The children and their families may have financial, social, medical, educational, psychological, and legal needs. The lives of these children affect everyone, not just the babies and their families and those who care for them in the hospital and afterward. They live in the contexts of their families and their communities, and few communities (either local or state or federal) have adequately prepared for their complex and resource-demanding lives.
In 2006, I wrote the book whose themes I will be discussing: Baby at Risk: The Uncertain Legacies of Medical Miracles for Babies, Families and Society. I interviewed staff members of neonatal intensive care units, families whose babies had done well or had not, and many others. The parents are always young (that is, young enough to have babies) and typically have had little or no experience facing a medical ethics dilemma. They have no sense of the longterm outcomes for their newborn babies, and they are making decisions in a highly emotionally charged climate.
I will describe the roles of the therapeutic imperative and the technological imperative in decision making, the moral distress of nursing and medical staff members who care for these babies, and various other themes that I address in the book. I will talk about how medical and nursing staff members, women and their partners, community members, and policy makers might become better educated about what is medically appropriate and what is not. I will also discuss the role of the media (who have caused huge problems by hyping stories of "miracle babies") in raising expectations about what medicine and science can do. Many medical decisions today are also ethics decisions, and it is time for American society to grasp this concept and then more proactively help families whose babies are born at risk.
Time: Friday, March 7th 4:00 - 5:00 pm
Location: Duques Hall 451
Speaker: Edward I. George
University of Pennsylvania
Consider the canonical regression setup where one wants to learn about the relationship between y, a variable of interest, and x1, . . . , xp, p potential predictor variables. Although one may ultimately want to build a parametric model to describe and summarize this relationship, preliminary analysis via flexible nonparametric models may provide useful guidance. For this purpose, we propose BART (Bayesian Additive Regression Trees), a flexible nonparametric ensemble Bayes approach for estimating f(x1, . . . , xp) = E(Y | x1, . . . , xp), for obtaining predictive regions for future y, for describing the marginal effects of subsets of x1, . . . , xp and for model-free variable selection. Essentially, BART approximates f by a Bayesian "sum-of-trees'' model where fitting and inference are accomplished via an iterative backfitting MCMC algorithm. By using a large number of trees, which yields a redundant basis for f, BART is seen to be remarkably effective at finding highly nonlinear relationships hidden within a large number of irrelevant potential predictors. BART also provides an omnibus test: the absence of any relationship between y and any subset of x1, . . . , xp, is indicated when BART posterior intervals for f reveal no signal.
Time: Friday, February 15th 11:00 am - 12:00 noon
Location: Duques Hall 254
Speaker: Wolfgang Jank
Robert H Smith School of Business
University of Maryland
In this paper we propose a novel model for forecasting innovation success based on online virtual stock markets. In recent years, online virtual stock markets have been increasingly used as an economic and efficient information gathering tool for the online community. It has been used to forecast events ranging from presidential elections to sporting events and applied by major corporations such as HP and Google for internal forecasting. In this study, we demonstrate the predictive power of online virtual stock markets, as compared to several conventional methods, in forecasting demand for innovations in the context of the motion picture industry. In particular, we forecast the release weekend box office performance of movies which serves as an important planning tool for allocating marketing resources, determining optimal release timing and advertising strategies, and coordinating production and distributions for different movies. We accomplish this forecasting task using n ovel statistical methodology from the area of functional data analysis. Specifically, we develop a forecasting model that uses the entire trading path rater than only its final value. We also employ trading dynamics and we tease out differences between different trading paths using functional shape analysis. Our results show that the model has strong predictive power and improves tremendously over competing approaches.
Time: Friday, February 8th 3:00 - 4:00 pm
Location: Funger Hall 320
Speaker: José Mario Quintana
Bayesian Efficient Strategic Trading, LLC
Applications of the Bayesian approach to speculating trading strategies in derivative markets is discussed in the context of the parallel concepts of betting and investing, prices and expectations, and coherence and arbitrage freedom in the fields of Bayesian methodology and Finance.
Time: Friday, February 1st 11:00 am - 12:00 noon
Location: Funger Hall 320