Pricing, Profits and Efficiency

We analyze pricing strategies for digital information goods, such as those increasingly available via the Internet. Because perfect copies of such goods can be created and distributed almost costlessly, any single positive price for copies is likely to be socially inefficient. However, we show that, under certain conditions, a monopolist selling information goods in large bundles instead of individually may nearly eliminate this inefficiency. In addition, the bundling strategy can extract as profits an arbitrarily large fraction of the area under the demand curve for the individual goods while commensurately reducing consumers' surplus.

The bundling strategy is particularly attractive when the marginal costs of the goods are very low, when the correlation in the demand for different goods is low, and when consumer valuations for the individual goods are of comparable magnitude. We also describe the optimal pricing strategies when these conditions do not hold; show how private incentives for bundling can diverge from social incentives; and describe a mechanism to recover information about the underlying demand for each individual good. The predictions of our analysis appear to be consistent with empirical observations of the markets for Internet and on-line content, cable television programming, and copyrighted music.

________________________________________

We thank Timothy Bresnahan, Frank Fisher, Michael Harrison, Paul Kleindorfer, Thomas Malone, Robert Pindyck, Nancy Rose, Richard Schmalensee, John Tsitsiklis, Hal Varian, Albert Wenger, Birger Wernerfelt, Robert Wilson and seminar participants at the University of California at Berkeley, MIT, New York University, Stanford University, University of Rochester, the Wharton School and the 1995 Workshop on Information Systems and Economics for many helpful suggestions, although we have not been able to implement all of them.

The pricing of information presents many difficulties for conventional markets. In particular, digital copies of information goods are indistinguishable from the originals and can be created and distributed almost costlessly via the emerging information infrastructure. What is the optimal price for each copy? Free (or nearly free) information would assure that all consumers whose marginal benefit is greater than the marginal cost would have access to the good. However, a zero price would not generate revenues to defray development costs and provide incentives for innovation.

Existing theory and practice fail to provide clear guidance on how digital information goods should be priced (Varian 1995), an issue of increasing importance as the Internet provides the infrastructure for a major marketplace for electronic information. Providers of on-line content have adopted varied and contradictory pricing strategies. Some firms charge users incrementally each time they access information, invoke a software subroutine, or download an image, as Quote.com is currently doing for certain types of investment information. Other firms, such as America Online, bundle large collections of content together and offer the bundle for a flat fee. Still other firms, such as Infoseek, have tried both strategies at various times and even concurrently. As of 1996, most information content on the Internet is offered at zero price to the user, and information providers hope to build a user base which they can eventually target for subscription charges, as the Wall Street Journal has done; to recover their production costs by selling advertising; or both.

This paper focuses on the strategy of bundling a large number of information goods for a fixed price, no matter how many goods are actually used by the buyer. We find that in a variety of circumstances, a multiproduct monopolist will extract substantially higher profits by offering a single bundle of information goods than by offering the same goods separately. As the number of information goods in the bundle increases, the seller may be able to appropriate nearly the entire value created by the provision of these goods. Furthermore, bundling can increase economic efficiency by reducing the deadweight loss created when goods are priced above their marginal costs.

The key intuition behind these results is that in many situations, consumers' valuation for a collection of goods has a probability distribution with a lower standard deviation per good compared to the valuations for the individual goods. For instance, consumer valuations for a stock quotation service, an on-line sports scoreboard, a news service, or a piece of software will vary. A monopolist selling these goods separately may maximize profits by charging a high price for each good, thereby excluding consumers with low valuations, rather than charging a low price and selling to most consumers. Alternatively, the seller could offer all the information goods as a bundle. Under reasonable assumptions about the distribution of valuations, the law of large numbers guarantees that the distribution of valuations for the bundle has proportionately more mass near the mean. The more goods included in the bundle, the less likely it is that any given consumer's valuation for the entire bundle will be very low or very high. As Schmalensee (1984) has argued, such a reduction in "buyer diversity" typically helps sellers extract higher profits while reducing the deadweight loss from non-zero prices, as more units are sold than if the goods were offered separately. The benefits of bundling are greatest when the marginal cost of the goods is very low, when the correlation in the demand for different goods is low, and when the valuations for individual goods are of comparable magnitude.

Our analysis of bundling contrasts with the conventional wisdom that the pricing of on-line information will be increasingly fine-grained because new technologies enable metering information in units as small as the article, datum, or bit. Charging very low prices for very small quantities of information seems to create more price discrimination opportunities for sellers, presumably leading to a more profitable and efficient allocation of information goods in the economy. Furthermore, many consumers seem to believe that it is wasteful to pay for access to goods that are not used and assume that a more fine-grained price system would result in savings. However, while such reasoning provides useful heuristics in markets for physical goods, it is misleading for information goods that have zero marginal cost. On the contrary, our analysis demonstrates that a single-price bundling strategy may be optimal, even if the costs of administering multiple prices or delivering separate information goods is zero.

There are many potential benefits of bundling, including cost savings in production and transaction costs, complementarities among the bundle components, and sorting consumers according to their valuations (Eppen, Hanson, et al. 1991). We focus on this last benefit of bundling, which was first discussed by Stigler (1963) in a paper showing how bundling could increase sellers' profits when consumer valuations for two goods were negatively correlated. Adams and Yellen (1976) introduced a two-dimensional graphical framework for analyzing bundling as a device for price discrimination. By considering a setting with a multiproduct monopolist, two goods, no reselling, independent and additive consumer valuations, and linear "unit demands" (i.e., consumers buy either zero or one unit) for these two goods, they compare unbundled sales to pure bundling (offering only the complete bundle) and mixed bundling (offering both the complete bundle and subsets of the bundle). Using stylized examples, they illustrate that the relative profitability and efficiency of these pricing strategies depends on the marginal costs and on the distribution of customers' reservation values.

The formal analyses by Schmalensee (1984), McAfee, McMillan and Whinston (1989) and Salinger (1995) also focused on bundles of two goods. Schmalensee assumes a bivariate Gaussian distribution of reservation prices, and through a combination of analytic derivation and numerical techniques extends the results of Stigler and Adams and Yellen. He finds that pure bundling reduces the diversity of the population of consumers because the standard deviation of the consumer valuations for the bundle is less than the sum of the standard deviations of valuations for its components, thus enabling sellers to extract more consumers' surplus. He demonstrates that this is true if the valuations of the two goods are negatively correlated (as suggested by Stigler and Adams and Yellen), but can also be true if the valuations are independent, or even positively but not perfectly correlated. Schmalensee also derives conditions under which bundling goods with Gaussian demand will be profitable and socially efficient in the sense of reducing deadweight loss.

McAfee, McMillan and Whinston analyzed a setting with a multiproduct monopolist and a continuum of consumer valuations, similar to the one employed by Adams and Yellen. They show that mixed bundling will almost always strictly increase the seller's profits when the seller can enforce a price for the bundle that may exceed the sum of the prices of its components. They also derive a general condition under which mixed bundling dominates unbundled sales when the price of the bundle cannot exceed the sum of the prices of its components; this condition is always satisfied when the valuations for the two goods are independently distributed. Salinger develops a graphical framework to analyze the profitability and welfare implications of bundling two goods, primarily in the context of independent linear demand functions. He finds that bundling two goods tends to be profitable when consumer valuations are negatively correlated and high relative to marginal costs

More recently, Armstrong (1996) shows that for a special class of cases, the optimal tariff in the multiproduct case can be determined using the techniques typically used in the single-product case. However, he does not explore the implications of increasing the number of goods and, because he focuses on heterogeneous consumers, concludes that optimal bundle pricing will almost always inefficiently exclude some low-demand consumers.

Information goods may be bundled together because of technological
complementarities in their production, consumption, search, or
distribution. Our analysis suggests that it is often desirable
to bundle information goods simply to take advantage of bundling
as a pricing strategy. Therefore, for simplicity of exposition,
we assume that there are __no__ technological advantages to
bundling; such technological complementarities would only strengthen
our results. In this paper we define an "information good"
as the smallest logical unit of information that does not exhibit
technological complementarities, such as a news story, a photograph,
or a song.

In contrast to previous work, we concentrate on bundling strategies
that may involve a large number of goods with very low marginal
costs of production. This approach is particularly suitable to
information goods, which typically have virtually zero marginal
cost of reproduction and can be sold in large bundles, with components
delivered on demand via the developing telecommunications infrastructure,
or distributed via mass storage media such as CD-ROM devices.
We focus on pure bundling, which is the typical pricing strategy
for bundles of information goods: in the rest of this paper,
unless otherwise specified, the term bundling refers to *pure*
bundling. As shown by Hanson and Martin (1990), price-setting
for mixed bundling of many goods is an NP-complete problem, requiring
the seller to determine a number of prices and quantities that
grows exponentially as the size of the bundle increases.

We find that some of the results in the literature for bundles
of two goods do not generalize to this setting. For instance,
Salinger (1995) shows when consumers have independent linear demands,
bundling two goods increases consumers' surplus if marginal costs
are low enough for bundling to be profitable; we find that bundles
of more than two goods will always *reduce* consumers' surplus
when the goods have zero marginal cost and independent linear
demands. Other results from the bundling literature are strengthened
when the number of goods is large: bundling is profitable for
a broader set of conditions, and its effects on profitability,
consumers' surplus, and efficiency can be dramatically enhanced.

By focusing on large numbers of goods with low marginal costs we can formally model a multiproduct setting. This allows us to use well-developed techniques from the statistics literature and derive strong results without making strong assumptions about the initial distribution of consumer valuations. In addition, while valuations for individual goods do not typically conform to a Gaussian distribution, the central limit theorem guarantees that under relatively weak assumptions, the distribution of valuations for bundles of large numbers of goods does converge to a Gaussian distribution. As a result, Schmalensee's (1984) analytical apparatus and some of his results can be invoked to study large bundles of information goods, especially in providing criteria for evaluating the profitability and efficiency of "bundles of bundles."

Section 2 analyzes a simplified setting in which the marginal cost of all information goods is zero; buyers consume either zero or one unit of each information good; and buyers' valuations for all goods are independently and identically distributed. In sections 3 through 5, we relax these conditions. Section 3 shows that while our results are fairly robust for information goods, they are less likely to apply to physical goods because as the marginal cost of the component goods increases, the benefits of bundling eventually vanish. Thus, our model predicts that large bundles of unrelated physical goods should rarely be observed. In section 4, we allow the means, variances and covariance of the valuations for the components to vary, and investigate when adding a new product to a bundle will increase its profitability.

Section 5 considers more closely several types of covariance among components. Like earlier analyses of bundling, our model indicates that the increase in profits from bundling goods is greatest when the correlation in the valuations for the separate goods is small or negative. We also identify a special type of positive correlation for which the seller can capture nearly the entire value created by a set of goods, as long as the number of goods in the bundle is large enough and the correlation is less than perfect. In addition, we present discriminating mechanisms that significantly increase the benefits of bundling for goods with other types of correlated demands, provided the source of the underlying correlation can be identified, either directly, or indirectly through consumers' behavior. In particular, we show that mixed bundling can be more profitable than pure bundling when consumer valuations are not drawn from the same distribution, as it induces consumers to self-select.

Section 6 examines several extensions and implications. Because bundling destroys information about consumer valuations of individual goods, we describe a mechanism for recovering this information to an arbitrary degree of precision, while avoiding most of the deadweight loss associated with conventional single-good pricing. We also briefly characterize some of the ramifications of our analysis for market structure, including the potential for a "winner-take-all" equilibrium, and compare the implications of the model with some empirical evidence. Section 7 provides some concluding remarks.

We begin by considering a setting with a single seller
providing *n* information goods. For each *n*, let
consumer valuations for the goods be denoted by random variables
(such a collection is sometimes referred
to as a triangular array of random variables and can be denoted
by ), and let
be the per-good valuation of the bundle of *n* information
goods. Let ,
and denote the profit-maximizing price
per good for a bundle of *n* goods, the corresponding sales
as a fraction of the population, and the seller's resulting profits
per good. Assume the following conditions hold:

A1: The marginal cost for copies of all information goods is zero to the seller.

A2: Each buyer can consume either 0 or 1 units of each information good.

A3: For all *n*, buyer valuations
are independent, identically distributed (i.i.d.) with continuous
density functions, non-negative support, and finite mean
and variance .

A4: Resale is not permitted (or is unprofitable for buyers).

Under these conditions, we find that selling a bundle
of all *n* information goods can be remarkably superior to
selling the *n* goods separately. For the distributions
of valuations underlying many common demand functions, bundling
substantially reduces average deadweight loss and leads to higher
profits for the seller. As *n* increases, the seller captures
an increasing fraction of the total area under the demand curve,
correspondingly reducing both the deadweight loss and consumers'
surplus relative to selling the goods separately. More formally:

__Proposition 1
__Given assumptions A1, A2, A3, and A4,
as

__Proof__: All proofs are
in Appendix 1.

The intuition behind Proposition 1 is that as the number of information goods in the bundle increases, the distribution for the valuation of the bundle has more consumers with ìmoderateî valuations near the mean of the underlying distribution. Since the demand curve is derived from the cumulative distribution function for consumer valuations, it is more elastic near the mean, and less elastic away from the mean (Figure 1).

**Figure 1: **Demand
for bundles of 1, 2, and 20 information goods with i.i.d. valuations
uniformly distributed in [0,1] (linear demand case).

While Proposition 1 shows that for a sufficiently
large *n,* selling goods as a bundle can be significantly
more profitable than unbundled sales, and while McAfee, McMillan
and Whinston (1989) find that *mixed* bundling of two goods
always dominates unbundled sales when consumer valuations are
independent, pure bundling does not necessarily increase profits
for small *n*.
For a large number of goods and under the conditions for Proposition
1, however, pure bundling-using a single price-captures nearly
the entire value created by the information goods, so mixed bundling
cannot do substantially better.

The weak law of large numbers provides an upper bound for the number of goods in the bundle that are needed to enable the seller to capture a given fraction of the total area under the demand curve. Specifically, Corollary 1 follows from the weak law of large numbers as used in the proof of Proposition 1 by choosing :

__Corollary 1__Given assumptions A1, A2, A3, and
A4, bundling

For slightly stronger assumptions about the distribution of consumer valuations, the theory of large deviations (e.g., Chernoff's theorem or Lyapounov's theorem for bounded sequences) provides better estimates of the number of goods needed for a seller to extract as profits a given fraction of the area under the demand curve. Thus, a useful heuristic for the desirability of bundling for a seller is to consider the seller's ability to convert potential surplus to profits; if this can be effectively accomplished by selling the good separately, bundling is less likely to increase profits, especially for a small number of goods.

To further study the behavior of bundles, we assume the following condition, which implies a kind of "single crossing" property for the per-good demands :

A5: The distribution of valuations is such that
for all *n* and *e*.

In this case, if it is more profitable to bundle a certain number of goods, say , than to sell them separately, and if the optimal price per good for the bundle is less than the mean valuation , then bundling any number of goods greater than will further increase profits. More formally:

__Proposition 2
__Given assumptions A1, A2, A3, A4, and
A5, if and ,
then bundling any number of goods will
monotonically increase the seller's profits.

Since bundling two goods with independent linear demands is profit maximizing for the seller (Salinger 1995), and the uniform distribution of valuations underlying linear demand satisfied Assumption A5, the following corollary follows from Proposition 2:

__Corollary 2a:__ With independent
linear demands for the individual goods, bundling any number of
goods with zero marginal cost increases the seller's profits.

Propositions 1 and 2 show that sellers may increase
profits and efficiency by *reducing* their strategy set:
a seller sets a single price and sells equal quantities for all
goods, instead of using *n* prices and quantities. In the
limit, this achieves nearly perfect price discrimination. This
is because the area under the demand curve for the bundle equals
the sum of the areas under the demand curves for the individual
goods (Salinger, 1995), but its shape is different, allowing the
seller to capture more of the potential surplus created by the
goods (Figure 2).

**Figure 2: **As *n*
increases, the area of the inscribed rectangle *p***q**
that maximizes revenue and profits (normalized for *n*) increases,
and for *n*>2, the mean deadweight loss and mean consumers'
surplus also decrease.

As the number of i.i.d. goods in the bundle increases, total profit and profit per good increase. The profit-maximizing price per good for the bundle steadily increases, gradually approaching the per-good expected value of the bundle to the consumers, as shown in Figure 3 for the case of linear demand. The number of goods necessary to make bundling desirable, and the speed at which deadweight loss and profit converge to their limiting values, depend on the distribution of consumer valuations.

**Figure 3: **Profit as a
function of price per good for bundles of varying number of goods
*n* (steeper curves reflect larger *n*). The profit-maximizing
price is the point at the maximum of each curve. In the limit,
the price per good approaches the mean valuation 0.5.

The potential efficiency gains from bundling a large number of goods that we identify contrast with the more limited benefits identified in previous work, principally as a result of our focus on bundles of more than two goods and on goods with zero marginal costs, conditions that favor bundling. In fact, an important implication of our analysis is that the benefits of bundling grow as the number of goods in the bundle increases. This implies an aspect of superadditivity to bundling: bigger bundles will be more profitable than smaller bundles, even when the goods involved are identical:

__Corollary 2b:__
Assuming that bundles of goods and
goods are profitable (as per Proposition 2), then selling a bundle
of goods is more profitable than selling
two separate bundles of and
goods respectively.

When and are sufficiently large, the central limit theorem guarantees that A5 will hold for almost any initial demand function for the individual goods, making Corollary 2b fairly general.

Corollary 2b shows that bundling can create significant economies of scope distinct from economies in production, distribution, or consumption. Strikingly, profits under the bundling strategy can be an arbitrary multiple of the maximum profits obtainable when the same information goods are sold separately. To see this, assume that demand for the individual goods is approximated by a log-log (constant elasticity) function. For a sufficiently large number of goods, Proposition 1 shows that bundling can convert a large fraction of the area under the demand curve into profits. In contrast, if such goods are sold separately, total profits become an arbitrarily small fraction of the area under the demand curve as elasticity increases. An implication is that a monopolist selling an inferior good (one with lower mean valuation) as part of bundle may enjoy higher profits and a greater market share than could be obtained by selling a superior good separately.

The attractiveness of selling large bundles of information goods depends critically on the assumption that their marginal cost is very low. For ordinary goods and services, whose marginal costs are non-trivial relative to consumer valuations, bundling is less likely to be attractive. Since consumers must buy all the goods in a bundle, the probability that a consumer will value any of the components of the bundle at less than their marginal cost is reduced if individual marginal costs are low. In general, if the marginal cost for certain goods is less than or equal to the lowest possible valuation for them, then Propositions 1 and 2 apply, and bundling these goods can increase profits and social surplus. Proposition 3 shows that, as expected, bundling goods with sufficiently high marginal costs is neither profitable nor socially efficient.

__Proposition 3
__Under assumptions A2, A3, and A4, there is a marginal cost
for each information good that renders
bundling less profitable than selling the goods separately.

Whenever consumers can freely dispose of goods or avoid consuming them, then their valuations will not be less than zero. In such cases, bundling a large number of goods is efficient and profitable if the goods have zero marginal cost. For information goods, disposal costs are typically insignificant and digital reproduction and transmission are bringing marginal costs close to zero as well. At mid-1996 prices for hard disk storage, a typical one-page news story can be stored indefinitely for a cost of approximately $0.0004 , and can be transmitted over digital fiber in about one ten-thousandth of a second (20Kbits at 200Mbits/sec).

In contrast, if marginal costs are large, the seller will want
to increase, rather than decrease, the dispersion of valuations.
For example, if the marginal cost is greater than the mean valuation,
bundling will decrease profits because it decreases the fraction
of buyers with valuations far from the mean. Schmalensee (1984)
used numerical techniques to demonstrate that for a Gaussian distribution
of consumer valuations, bundling two goods will be less profitable
as long as *c* > *m*
-1.253*s*,
where *c* denotes the marginal costs, *m*
is the mean valuation for each good,
and *s*
is the standard deviation of the distribution of valuations.
In general, the threshold at which bundling becomes less profitable
than unbundled sales depends on the form of the demand for the
individual goods and, in the absence of other factors such as
economies of distribution, never exceeds the mean valuation of
a homogeneous population. Proposition 4 derives the threshold
for uniformly distributed i.i.d. consumer valuations:

__Proposition 4__

If consumer valuations for information goods are i.i.d. and uniformly distributed in , and if the marginal cost is

Even with zero marginal costs, bundling may still be less profitable if the valuations of some goods are negative for some consumers. For instance, the availability of some information goods, such as pornography or articles espousing certain political views, may have negative valuations for some consumers. In addition, while technology is rapidly reducing the marginal costs of reproduction and transmission, the time and energy a user must spend to identify an information good can present a barrier to the limiting result of Proposition 1. Adding items to a bundle can make it more difficult for consumers to locate the items of value to them, thereby decreasing the value obtainable from all items. This will reduce the expected valuation of the bundle, may create inefficiencies by inducing consumers to settle for second best goods in the bundle, and eventually can make the bundle unusable (Bakos, in press). For example, the dominant cost of using a new software program or information service is often the cognitive cost of learning a new set of commands; the value of the specific features may actually be less important to the purchase decision than this cost (Brynjolfsson and Kemerer, 1996).

When any of the above conditions apply, the benefits from bundling will be limited. Consequently, the availability of increasingly sophisticated search and filtering mechanisms will increase the profitability of bundles of information goods both directly and indirectly. Such mechanisms will create value directly by allowing consumers of large bundles of information goods to find the goods they desire and eliminate the goods they wish to avoid; indirectly, they will reduce the marginal cognitive cost for coping with additional information goods in the bundle, thus increasing the optimal size of bundles and further reducing deadweight loss.

Bundling tends to increase the fraction of the population that purchases most goods (Proposition 1). If congestion costs increase with the number of consumers of the good, then the benefits of bundling could be tempered. Congestion is probably not an important concern, however, since bundling only requires access to, rather than the physical distribution of all goods in the bundle.

Goods with positive network externalities could be modeled as
having *negative* congestion costs, so the welfare maximization
might require a subsidy to increase adoption by consumers, some
of whom might prefer not to purchase the goods even at a price
of zero (Farrell & Saloner, 1985). Consequently, bundling
can be especially beneficial for such goods: if the goods are
provided separately and consumer valuations are private information,
then universal adoption can be guaranteed only if the subsidy
is based on the lowest possible valuation of all consumers for
each such good. Bundling can reduce the cost of achieving nearly
universal adoption. With a large number of goods in the bundle,
nearly universal adoption can be achieved as long as the bundle
price approximates consumers' mean (private) valuation for the
goods with network externalities, which can reduce or eliminate
the need to subsidize consumers with low valuations for individual
goods. Profit-maximizing sellers might therefore find bundling
a cost-effective way to build a network of users for goods such
as Internet browser software and browser extensions.

Similarly, sellers of goods with high switching costs, or sellers of "experience goods" for which the consumer's valuation is only known after he or she tries the good, can use bundling to introduce their products to a broader set of consumers. A strategy of periodically updating the composition of the bundle so that any given consumer would face a stochastically attractive mix of both new and old goods could be more effective in overcoming buyer resistance than offering the goods separately. As Varian notes (1996a, p.11), information is fundamentally such an experience good, and this will also tend to favor bundling over unbundled sales as a way for sellers to leverage reputation effects.

When marginal costs are zero and consumers have non-negative valuations (or, equivalently, free disposal), bundling increases efficiency when it increases the fraction of consumers purchasing the bundle, since each purchase creates some benefit at no additional cost. As shown earlier, a monopolist who bundles a sufficiently large number of goods with i.i.d. valuations can capture an arbitrarily large fraction of the area under the demand curve, reducing mean deadweight loss and converting consumers' surplus into producers' surplus in the process.

However, even when bundling increases the fraction
of the population served, socially inefficient bundling may occur
if there are positive marginal costs associated with provision
of each good. Figure 4 shows why. Consider a collection of monopolistically
provided goods with independent linear demands and the same marginal
cost *c*, leading to a profit-maximizing price *P**
and quantity *Q* *for each good when sold separately. When
marginal costs are positive, some consumers will value some of
the goods at less than their marginal costs. In particular, consumers
that fall between *Q'* and *Q"* for a given good
derive a total benefit from that good equal to area F, but cost
the seller G+F to service, resulting in a net social cost of G.

A multiproduct monopolist will pursue a bundling
strategy when it is more profitable than selling the goods separately.
For a sufficiently large number of goods *n*, the monopolist
can capture nearly the entire area under the demand curve by bundling
and selling to almost all consumers, but will also incur the marginal
costs of serving the entire population, so the net profit per
good is approximately equal to A+B+D-G. (Recall that the sum
of the areas under the unbundled goods' demand curves exactly
equals the area under the demand curve for the bundle, although
the shapes will differ.) The profit from selling the goods individually
is B. Thus, the monopolist will choose to bundle only if A+B+
D-G > B. In contrast, a social planner will consider consumers'
surplus as well as producers' surplus and thus will prefer bundling
only if A+B+D-G
> A+B. Therefore, when D<G<A+D, the seller's private
incentives will induce bundling that is socially inefficient.

__Example:__ With linear
demand (i.e. consumer valuations i.i.d. and uniformly distributed
in , D is smaller than G if and only if
the marginal cost *c* is greater than .
In this case, according to Proposition 4 the seller will benefit
from bundling if . Thus when ,
bundling will be beneficial to the seller but socially undesirable.

When the goods are not i.i.d., the private incentives for bundling may diverge even further from the social incentives. For example, a multiproduct monopolist can increase profits by adding a new good to the bundle if that good has even a slight positive valuation for the marginal consumer of the bundle. If the new good has a significantly negative value to all inframarginal consumers, this will not diminish the private incentive to bundle as long as their reservation values do not drop below that of the marginal consumer. Thus, in principle, the loss of consumers' surplus from bundling can exceed the increase in seller's profits by an arbitrarily large amount.

In Proposition 1, all information goods are assumed to have identically distributed valuations. In practice, information goods will have different means or variances. Even the same information good may have different valuations at different times: a movie or a news story is likely to command higher valuations when first released than a year later. Because of the generality of the weak law of large numbers, relaxing the assumption of identically distributed in does not affect the results of Proposition 1, although may converge to more slowly. The following more general proposition directly follows from the proof of Proposition 1 and Lyapounov's theorem on stochastic convergence to the mean for bounded sequences:

__Proposition 1A
__The results of Proposition 1 hold if Assumptions
A1, A2, and A4 are satisfied, and buyer valuations
are independent and uniformly bounded with continuous density
functions and non-negative support.

Schmalensee (1984) points out that bundling increases a seller's profits "by reducing buyer diversity, thus facilitating the capture of consumers' surplus." In his paper, buyer diversity is indexed by the coefficient of variation of buyer valuations. As long as all goods are drawn from the same distribution, the weak law of large numbers guarantees that adding more goods to the bundle will reduce this coefficient of variation, while the central limit theorem implies that the distribution of valuations for the bundle will converge to the Gaussian distribution to which Schmalensee originally applied this criterion.

Although Proposition 1A implies that bundling generally increases seller's profits for large numbers of goods with zero marginal cost, it is not always optimal to add an additional information good to a bundle. Adding a good to a bundle can increase the sales and resulting profits from this good, especially if the demand curve for the individual good makes it difficult to extract a significant fraction of the potential surplus as profits, as is the case for goods with high and constant elasticity of demand. Conversely, if potential surplus can be effectively extracted as profits when a good is sold separately, there is little to be gained by adding it to a bundle, as is the case for goods with only two possible valuations, 0 and (see footnote 5).

Even when adding a good to a bundle does not affect
the good's own profitability, it may affect the seller's ability
to earn profits on the other goods in the bundle. For example,
when goods are asymmetric, the coefficient of variation of a bundle
does not necessarily decrease when an additional good is included
in the bundle. If a good with high variance is added to a bundle,
this may decrease the profitability of the bundle. Adding a new
information good *i* to an existing bundle *B* will
decrease the expected diversity of demand, as indexed by the coefficient
of variation, if and only if .

__Example:__ If the valuations
of *i* and *B* are uncorrelated and ,
the coefficient of variation will decrease if .

The above discussion may explain why a typical cable TV bundle from providers like HBO or Cinemax offers access to hundreds of movies, but prize fights and other "special events" are typically offered on a "pay-per-view" basis. The cable companies may have established that valuations for the prize fight are concentrated among a small fraction of consumers willing to pay very high prices to watch the fight; thus, the potential surplus of these consumers can be effectively extracted by selling the price fight outside the bundle, while including the fight in the regular bundle might increase the bundle's coefficient of variation.

While Proposition 1 assumes that valuations of information goods are independent, in practice they may be positively or negatively correlated. This section explores how such correlation affects the profit-maximizing strategy of a monopolist who bundles information goods.

There is evidence that consumers have implicit budget constraints for different categories of expenses (Thaler, 1990). For instance, while it is exceedingly difficult to predict which games, on-line services, and articles a particular consumer will purchase, one can predict that consumers are typically willing to spend about $30 per month on all types of on-line entertainment. As the cost of goods purchased approaches this ìbudget constraint," it becomes less likely that any additional goods will be purchased. Similarly, because human information processing capacity is finite, a time-budget constraint may prevent the consumption of a very large number of goods; even the most ardent football fan cannot watch all the games played on any given Sunday, and the most dedicated academic cannot read all the on-line publications that might be relevant. Such budget constraints create a negative correlation in the valuations of successive information purchases.

When there are explicit or implicit budget constraints, the average
variance of valuations for the bundle declines more rapidly as
new goods are added to the bundle. As a result, it is easier
for the seller to predict demand for the bundle, and the expected
size of the deadweight loss declines more rapidly. If the budget
constraint is "hard," the full efficiency benefits of
bundling may be achieved with a *finite* number of goods
*n*, which are selected to collectively exhaust a given budget
category.

Two distinct types of positive correlation need to be analyzed because they have different on the profitability and efficiency of bundling. In the first case, valuations for the information goods are positively correlated, but not to the same underlying variables. For example, a consumer with a high valuation for an article about the Boston Red Sox may also put high value on subsequent articles about baseball or sports in general. Similarly, a trader's valuations for a sequence stock quotations may be serially correlated over time or across industries. If these correlations become lower the more distant one gets from the initial topic or item, eventually converging to zero, then the law of large numbers and the central limit theorem apply, and the limiting results obtained in earlier sections hold. For example, the following more general proposition directly follows from the proof of Proposition 1 and the law of large numbers for stationary (in the wide sense) sequences:

__Proposition 1B__

The results of Proposition 1 hold if Assumptions A1, A2, and A4
are satisfied, and the sequences of buyer valuations
are identically distributed, not perfectly correlated, and stationary
in the wide sense for all *n, *with continuous density functions,
non-negative support, and finite mean
and variance .

Thus, bundling of information goods can substantially increase profits even when the valuations of individual goods are highly correlated, but not to the same underlying variables. However, the distribution for the bundle converges more slowly to a Gaussian distribution, and the number of goods required to achieve a given level of profits and efficiency gains generally increases.

In the second type of positive correlation, the valuations for all goods are correlated to one or more underlying variables. For instance, if business users have higher valuations than home users for both a financial news story and a research report, they will also have a higher valuation for a bundle of both these goods. In this case, the distribution of consumer valuations for the bundle does not converge to a Gaussian distribution as more goods are added. Instead, the limiting distribution of valuations reflects the distribution of the underlying variable, in this example the probability that a consumer uses the computer for fun or profit. No matter how many goods are added to the bundle, the demand curve always reflects the difference in valuations by home and business users, preventing the seller from capturing the entire surplus with a single price. In general, when valuations are correlated with underlying variables, bundling does not necessarily reduce deadweight loss even for very large bundles, and a simple bundling strategy may not be the profit-maximizing strategy for sellers of information goods.

__Example:__ Suppose that consumers are equally divided between
home and business users, and that both types have either a high
or a low valuation for each information good, respectively denoted
by and ().
Home users value each good at with probability
and at with
probability , while business users value
each good at with probability
and at with probability .
Marginal costs are negligible. In this setting, consumer valuations
are positively correlated with consumer type ("business"
or "home" user). Without bundling, if
the seller will set a price equal to ,
and sell to the consumers for a profit
per consumer per good. If ,
the seller will price at , and sell to
all consumers for a profit of per consumer
per good. Bundling a large number of goods results in average
valuations of for business users and
for home users. The seller can price
the bundle either for the business users, resulting in a maximum
profit of per consumer per good, or sell
to everybody for a maximum profit of
per consumer per good. Thus, when , bundling
is strictly less profitable than unbundled sales.

Similarly, when consumer valuations are correlated to the same underlying variable, mean deadweight loss may not be eliminated by bundling and may even increase, depending on the actual distribution of the valuations. Since consumers' preferences are often correlated with underlying variables, the strong results of propositions 1 and 2 may be limited in practice.

The results of Proposition 1 can be restored if the market can be segmented according to the underlying variable. The strategy is to create submarkets defined by different values of the underlying variable, so that consumers' demands are i.i.d. conditional to a given value of the underlying variable.

For instance, while home and business users may have different
valuations for a bundle, valuations for individual information
goods may be i.i.d. *within* each category of users (as in
the previous example). By identifying a given consumer's market
segment *ex ante*, a seller can predict that consumer's expected
value for the bundle. The seller can maximize profits by offering
an appropriately priced bundle for each type of consumer-third
degree price discrimination. For instance, it may be optimal
to offer an identical bundle of all goods to both types of users
while providing a rebate to home users or imposing a surcharge
on business users. In the previous example, the seller could
charge business users a price of per
bundled good, and home users a price of ,
allowing the seller to maximize profits while eliminating deadweight
loss and consumers' surplus.

Such price discrimination is common among software and information vendors (Varian, 1996b). For example, McAfee Associates, Inc. has separate price schedules for home and business users for identical collections of anti-virus software and updates of information about new computer viruses. Similarly, journals commonly charge different prices for collections of articles depending on the organizational affiliation of the subscriber (Varian 1996c).

In principle, demand might be segmented into an arbitrary number
of subcategories, with separate prices for each subcategory as
illustrated in Figure 5 and by Proposition 5. To demonstrate
this approach, we assume that each consumer is characterized by
a type parameter *w*, such as appetite for information goods,
computer literacy, or income, uniformly distributed in .
More formally, assume:

A6: Consumers are characterized by a type *w*. Given *w*,
the valuations for all information goods are i.i.d. and uniformly
distributed in . Consumer types *w*
are i.i.d. and uniformly distributed in .

If consumer valuations for individual goods are correlated to a common underlying variable such as consumer type, but are i.i.d. conditional on this variable, then bundling increases profits, reduces deadweight loss, and reduces consumers' surplus if the seller can segment the market through third-degree price discrimination. Proposition 5 illustrates this in the setting introduced above:

__Proposition 5 (Third-degree price discrimination in one variable)
__Given assumptions A1, A2, A4, and A6, bundling does not eliminate
deadweight loss in the limit as . However
if third-degree price discrimination is possible, then the results
of Proposition 1 still hold in the limit, and the seller charges
each consumer of type

The third-degree price discrimination strategy can be generalized to multiple underlying variables. If a seller segments consumers using one variable, and then finds that consumer valuations remain correlated to a different common variable, the process can be repeated to remove this residual correlation. With a sufficiently large collection of variables, the distribution of valuations may become i.i.d., or nearly so, conditional to a given value for the set of underlying variables. For instance, it might be possible to segment consumers by business vs. home use, zip code, educational background, age, sex, credit rating, etc. Databases providing such demographic information are readily available, although legal and ethical issues may limit the use of some of this data for price discrimination. Third-degree price discrimination strategies will be facilitated by widespread computer networking and public key encryption and authentication technologies that enable the cost effective delivery of non-transferable rebate coupons to individual consumers. The rebate amount can be a function of the underlying variables that are correlated with the targeted consumers' expected valuation for the bundle.

Third-degree price discrimination requires that any underlying variables correlated with consumer valuations be observable, so that the seller can segment the market based on these variables; this is often infeasible. However, if a consumer's type is correlated with an observable behavior, such as time spent on-line or willingness to wait for "stale" information, then this behavior can be used to segment the market. This enables a form of second-degree price discrimination (Varian, 1996b), in which consumers self-select by purchasing different versions of the bundle. In particular, the monopolist can pursue a mixed bundling strategy of offering several bundles, each including a subset of the available information goods; this menu of bundles will screen consumers by type.

To model this type of price discrimination, in the setting above assume there is a feature of the bundle (or an array of goods in the bundle) which is costless to the seller, and which has a value that is monotonically increasing with consumer type. More formally, assume that:

A7: There is a feature *d* of the bundle, costless to the
seller, that, without loss of generality, can take any value in
.

A8: A consumer of type would prefer to
ìconsumeî of *d*; a
lower level of consumption would result
in a linear utility loss of , while no
benefit is derived from a consumption level higher than .

In this setting, the seller can use a bundling strategy to increase
profits and reduce the total deadweight loss, but the seller must
provide incentives to prevent consumers with high valuations from
mimicking low-valuation consumers. This need to maintain incentive
compatibility typically reduces the efficiency benefits of bundling-some
low valuation consumers are inefficiently excluded from some goods-and
introduces some rent spillover-surplus is not completely extracted
from some high-valuation consumers (Wilson 1993 ch.10). As Armstrong
(1996) shows, the inefficient exclusion of low-demand consumers
is also common in multidimensional mechanism design, when consumers'
private information cannot be captured in a single scalar variable.
In the above setting, where consumer valuations are correlated
with the underlying variable *w*, the following proposition
applies:

__Proposition 6:
__Given assumptions A1, A2, A4, A6, A7, and A8, for a large
enough , bundling results in higher seller's
profits, lower consumers' surplus, and a smaller deadweight loss.
The optimal price schedule is for
and otherwise.

Proposition 6 implies that a seller can price a bundle contingent
on the level of feature *d* chosen by each consumer (and
the corresponding implied type *w*), thereby making the bundling
strategy profitable even when consumers are not homogeneous.
The sellerís strategy is similar to the third-degree price
discrimination strategy, except that the seller must satisfy incentive
compatibility constraints in setting the price schedule, because
consumers can strategically modify their behavior.

Strategies that may lead consumers to reveal their types include charging a lower price for delayed stock quotations, news stories or movies; for images with lower resolution; for less comprehensive search results; or for having access restricted to certain hours. For instance, Lexis/Nexis offers lower prices for access to a standard bundle of electronic data to users who do not need access during regular business hours. Another way to degrade the bundle is to leave certain items out; for example, to sell a "basic" bundle that is a subset of the "premium" bundle. Such a mixed bundling strategy forces consumers to signal their valuations by their choice of bundles. While the degraded bundles need not be any less expensive to create or provide, offering them can increase profits by reducing the rents that the seller does not capture from high-valuation consumers to induce them not to disguise their types (Deneckere and McAfee, 1994).

In summary, sellers of information goods will often find it advantageous to segment their markets based on observable characteristics or revealed behavior to reduce or eliminate the correlation of values across products. In practice, this may involve offering different bundles to different groups, a strategy that can be interpreted as mixed bundling. In this context, Proposition 6 can be interpreted as showing how mixed bundling can dominate pure bundling when consumer valuations are correlated to an underlying type (and thus consumers are heterogeneous), even if marginal costs are zero.

An interesting class of extensions would be to relax the assumption that the value (or production cost) of the bundle is equal to the sum of the values of the component goods. Not only may the goods be complements or substitutes, but there may also be costs and benefits associated with producing, distributing or consuming the bundle as a whole, such as economies of scale in creating a distribution channel, administering prices, and making consumers aware of each product's existence. For instance, technological complementarities affect the collective valuation of the millions of parts flying in close formation that comprise a Boeing 777, and economies of scale make it cheaper to distribute newspaper or journal articles in groups rather than individually. One implications is for certain types of dependencies, the distribution for the valuation of the bundle will not converge to a Gaussian. For instance, if the adding a good (or feature) to the bundle has a multiplicative effect on the value of other goods in the bundle, as is commonly assumed in hedonic models of product valuation (Fisher and Shell, 1971), then the value of the bundle will approach a log-normal distribution.

Complementarities and economics of scale in distribution can obviously create additional incentives for bundling, and if the savings are sufficiently large, can lead monopolists to bundle goods that would not otherwise be bundled (Eppen, Hanson, et al., 1991). Such economies underlie most large "bundles" of physical goods, and would tend to add to the advantages of bundling we have already identified. However, one of the effects of the emerging information infrastructure is to dramatically decrease distribution costs for goods that can be digitally encoded. As noted by Metcalfe (1995) and others, this may be enough to make it profitable to unbundle certain goods, such as magazine and journal articles, packaged software and songs, to the extent they were formerly bundled simply to reduce distribution costs.

A drawback of bundling is that information about the demand for individual components is lost, because only a single price and quantity are observed. This loss of information could impose a substantial cost: if total revenues are divided among the producers of the individual information goods without regard to how much value they each contributed, there will be significant underincentive for the development of new goods because of the "free rider" problem (Holmstrom, 1982). A multiproduct monopolist who makes the investment decision for all the components will face a similar problem: where should resources be allocated within the firm? One frequently mentioned benefit of the traditional price system is its ability to provide the information required for optimal production incentives.

To address this problem, information about the valuations for individual goods can be recovered by offering relatively small random samples of consumers access to selected information goods separately, rather than as part of a bundle. Under realistic conditions, this mechanismóbundling together information goods and statistically sampling consumersócan preserve most of the informational benefits of having separate prices for each good, while substantially increasing the profits and total surplus generated from a given set of information goods. Details are outlined in Appendix 2.

This approach decouples the two basic functions of prices: to allocate goods among consumers, and to allocate investment among producers. The price at which most consumers decide to buy a given information good need not be the same as the price that guides production decisions.

Our analysis shows that a multiproduct monopolist of information goods can often achieve higher profits and greater efficiency by using a bundling strategy than by selling the goods separately. If it would be difficult (or illegal) for a collection of single-good monopolists to coordinate on a unified bundling strategy and price, our analysis suggests that they may benefit from merging or from selling their information goods to a single firm. Thus, bundling creates non-technological economies of scope; for instance, an information good that is unprofitable (net of development costs) if sold separately could become profitable when sold as part of a larger bundle. These effects of bundling have implications for at least four different market structures.

First, the dynamics of bundling could create a winner-take-all market. Multiproduct firms that successfully sell a suite of information goods may find it more profitable to introduce new information goods than will single-product firms. Even if an information good introduced by as part of a bundle by the multiproduct firm is intrinsically less valuable to consumers than a similar product that might be sold by single-product firms, our analysis shows that the multiproduct firm may, in equilibrium, achieve a higher market share and earn higher profits from that good. Bundling may therefore enable a multiproduct firm to charge lower prices while remaining profitable. Furthermore, a single-product firm may find it profitable to sell all rights to its product to the multiproduct firm to reap a share of the benefits from bundling. This suggests that an equilibrium with a single multiproduct monopolist will be stable in the face of the introduction of new information goods or even small bundles of new information goods. This winner-take-all effect from bundling is distinct from technological economies of scope or scale or learning (e.g. Spence, 1981), network externalities (e.g. Farrell and Saloner, 1985), or financial market imperfections (e.g. Bolton and Scharfstein, 1987).

A variety of alternative market structures might also emerge. Bundling could be implemented by a broker that remarkets goods produced by information "content" producers. This is essentially the strategy of on-line services like America Online. Alternatively, a consortium or club of consumers could purchase access to a variety of information goods and make them available to all members for a fixed fee. Some user groups or certain site licensing arrangements for software resemble this approach. Finally, the government could fund the creation and distribution of information goods through taxes that do not depend on which individual goods are consumed, but only on access to the whole set, as is done for some television programming in some countries. For instance, the United Kingdom funds public television programming via a use tax on television sets. Each of these institutional approaches is likely to have somewhat different welfare consequences, and the analysis becomes even more complex when multiple brokers, consortia, and producers simultaneously compete.

Our models for bundling information goods can help explain some
empirical phenomena. For instance, a sharp contrast in pricing
and bundling strategies is evident at two commercial sites on
the World Wide Web: the Internet Shopping Network (http://internet.net)
and E-library (http://www.elibrary.com). At first glance, these
two sites look similar: each has colorful icons representing a
variety of products for sale. However, at Internet Shopping Network,
which sells physical goods like computer accessories, each item
is associated with a distinct price; at E-library, all of the
items displayed are available when the consumer pays a *single*
price for access to the bundle. The goods sold by E-library are
information goods with nearly zero marginal costs of reproduction.
Since both companies market their products over the Internet,
it is reasonable to assume that they face similar transaction
costs; our theory of bundling as a pricing strategy for information
goods provides a clear explanation for the difference in pricing
strategies.

Many sellers of on-line information, such as America Online, CompuServe, and Lexis/Nexis sell their goods in large bundles. For instance, when Reuters sells information about prices for various financial securities, the standard contract involves a large bundle of quotations for different securities over an extended period. While transaction costs encourage some degree of "bundling" simply to reduce administrative overhead, these considerations are not central to the Reuters strategy (Dhebar, 1995).

Cable and direct satellite broadcast television firms each sell
goods with nearly zero marginal costs of reproduction. In general,
pay-per-view has been less common than bundling-oriented pricing
schemes. Typically, a few standard bundles are offered, as predicted
by our theory, in an attempt to achieve some degree of price discrimination.
For example, these firms typically offer a "basic"
bundle from which certain goods are excluded. The pay-per-view
approach has been used mainly for unusual special events such
as boxing matches; this can be explained as a strategy of excluding
"big" goods from the bundle and charging for them separately
if some aspects of the nature of consumers' demand for these goods
is known *a priori*.

Interestingly, Microsoft has often incorporated into its operating systems applications and functionality that were developed by other firms and previously sold separately; this may be consistent with our model. In 1992, Microsoft's Windows operating system incorporated most of the capabilities of Artisoft's Lantastic; in 1993, it incorporated memory management similar to Quarterdeck's QEMM product, disk compression like Stac's Double Space, and faxing like Delrina's Winfax product; and in 1995, email like Lotus's cc:mail. Current versions of Windows 95 include web-browsing software similar to Netscape's Navigator. Similarly, Wordperfect and Lotus have also sought to compete by bundling their products with applications that previously were sold separately.

Finally, while the bundling and statistical sampling mechanism we propose in section 6.2 may at first seem unlike any practice in existing markets, it resembles the mechanism that has evolved for determining how royalties should be apportioned to composers and songwriters from the revenues paid by nightclubs, restaurants, and other venues. ASCAP and BMI , two music associations, charge flat rates to organizations based on factors such as the number of seats in the establishment, but do not consider which songs are played. They then sample radio play lists and other sources to estimate how popular each song is, and divide the total revenues earned among the composers and songwriters in proportion to the estimated current popularity of their songs.

A strategy of selling a bundle of many distinct information goods for a single price often yields higher profits and greater efficiency than selling the same goods separately. The bundling strategy takes advantage of the law of large numbers to "average out" unusually high and low values for goods, and can therefore result in a demand curve that is more elastic near the mean valuation of the population and more inelastic away from the mean. As a result, profits can be increased, even as inefficiency (deadweight loss) is reduced. While the profitability and efficiency benefits of bundling are most apparent when the consumer valuations are identically distributed and not closely correlated for different products, a bundling strategy can be profitable in a variety of situations.

Our analysis implies that optimal pricing strategies
for information goods with a marginal cost of reproduction close
to zero are likely to be quite different from strategies for goods
and services with non-zero marginal costs. This suggests that
further analysis of the pricing of such goods is desirable, as
many long-standing results and intuitions about the costs and
benefits of various pricing strategies may not apply to information
goods.

Armstrong, M. "Multiproduct Nonlinear Pricing,"
*Econometrica*, Vol. 64, No. 1, January 1996, pp. 51-75.

Adams, W.J. and Yellen, J.L. "Commodity bundling
and the burden of monopoly," *Quarterly Journal of Economics*
90 (August): 1976. 475-98.

Avery, A., Resnick P. & Zeckhauser, R. "The Market for Evaluations," Working Paper, Harvard Kennedy School of Government, 1996.

Bakos, Y. "Reducing Buyer Search Costs: Implications
for Electronic Marketplaces," University of California, Irvine
working paper, December 1995 (forthcoming in *Management Science*).

Balderston, J., "Online Cash: A Penny for Your
Thoughts?" *Infoworld, *3 (July 8, 1996).

Bolton, P. and D. Scharfstein, "Long-Term Financial Contracts and the Theory of Predation," Harvard University, mimeo (1987).

Brogden, S. L., "Letter to the Editor,"
*Infoworld *(February 19, 1996).

Brynjolfsson, E. and C. F. Kemerer, ìNetwork
Externalities in Microcomputer Software: An Econometric Analysis
of the Spreadsheet Market,î *Management Science*, in
press.

Deneckere, R. J. and McAfee, R. P. (1994). "Damaged goods," University of Texas at Austin.

Dhebar, A., "Reuters Holdings PLC. 1850-1987: A (selective) history," Harvard Business School, HBS Case 9-595-113, (May 1995).

Eppen, G. D., W. A. Hanson, et al. (1991). "Bundling-New Products, New Markets, Low Risk." Sloan Management Review 32(4): pp. 7-14.

Farrell, J. and G. Saloner, ìStandardization,
compatibility, and innovation,î *Rand Journal of Economics*,
16 (1): 442-455, (1985).

Fisher, F. M. and K. Shell, ìTaste and Quality
Change in the Pure Theory of the True Cost-of-Living Indexî,
pp. 16-54 in *Price Indexes and Quality Change*, Griliches,
Z. (ed.) Harvard University Press, Cambridge, MA, (1971).

Gal-Or, E., ìFirst Mover Disadvantages with
Private Information,î *Review of Economic Studies*,
54 279-292, (1987).

Hanson, W. and Martin, K., "Optimal Bundle Pricing,"
*Management Science*, 32(2) (February 1990).

Harmon, S., "Media Comparables Uncovered: What's
An Access Subscriber Worth Anyway?", *iWORLD*, (http://netday.iworld.com/stocks/index.shtml)
(August 20, 1996).

Holmstrom, B., ìMoral Hazard in Teams,î
*Bell Journal of Economics*, 13 324-340, (1982).

McAfee, R.P., McMillan, J., and Whinston, M.D. "Multiproduct
monopoly, commodity bundling, and correlation of values*."
Quarterly Journal of Economics* 114 (May): 1989. 371-84.

Metcalfe, R., "On-line Services for Small Change
on the Next Generation Internet," *Infoworld, *(December
25, 1995).

Metcalfe, R., "A penny for my thoughts is more than I could hope for on the next Internet", Infoworld, (January 22, 1996).

Ross, Stephen A. "The Arbitrage Theory of Capital
Asset Pricing". *Journal of Economic Theory*, 13 (3)
December, (1976) 22-531

Salinger, M. A., ìA Graphical Analysis of
Bundlingî, *Journal of Business*, 68 (1): 85-98, (1995).

Schmalensee, R.L. "Gaussian demand and commodity
bundling." *Journal of Business* 57 (January): 1984.
S211-S230.

Spence, M., ìNonlinear Prices and Welfareî,
*Journal of Public Economics*, 8 1-18, (1977).

Spence, A. M., ìThe Learning Curve and Competitionî,
*Bell Journal of Economics*, 12 49-70, (1981).

Stigler, G.J. "United States v. Loew's, Inc.:
A note on block booking." *Supreme Court Review*, 1963,
pp. 152-157.

Thaler, R. H., ìSaving, Fungibility, and Mental
Accounts,î *Journal of Economic Perspectives*, 4 (1):
193-205, (1990).

Urban, G. L., B. D. Weinberg and J. R. Hauser, ìPremarket
Forecasting of Really-New Productsî, *Journal of Marketing*,
60 (January): 47-60, (1996).

Varian, H. "Pricing Information Goods."
*Proceedings of Scholarship in the New Information Environment
Symposium*. Harvard Law School. May 1995.

Varian, H. "Economic Issues Facing the Internet", SIMS working paper, Berkeley, September, 1996a.

Varian, H. "Differential Pricing and Efficiency", SIMS working paper, Berkeley, June, 1996b.

Varian, H. "Pricing Electronic Journals", D-Lib Magazine, June, 1996c.

Weitzman, M. "Recombinant Growth", mimeo, Harvard University, 1995.

Wilcox, J., "Pricing of Content on the Internet: The Aggregator Model," unpublished MIT Masters thesis, (June, 1996).

Wilson, R. *Nonlinear Pricing*. Oxford University
Press. New York, 1993.

Consider a bundle of zero marginal cost
goods, each with i.i.d. valuations with mean
and standard deviation . Let
be the probability density function for a consumerís valuation
for this bundle, and letand
be the mean and standard deviation for the valuation of the bundle
adjusted for *n*; i.e., and.
Denote by ,
the optimal mean price for the bundle (adjusted for *n*)
and the corresponding quantity (), and
let be the resulting profits per good
. Let and .
We show that and .
(If these limits do not exist, the same reasoning can be applied
to convergent subsequences of and ,
as is bounded, and so is
because of the finite variance assumption.)

If *P*>*m*,
there exists some *e*>0
such that for all large enough *n*, .
By the weak law of large numbers, , where
or . Thus if
*P*>*m, *,
and since is bounded, ,
which contradicts the optimality of and
.

If *P*<*m*,
there exists some *e*>0
such that for all large enough *n*, .
Let , and the
corresponding quantity. The weak law of large numbers implies
that , and .
Since for large enough *n*, , it
follows that , which again contradicts
the optimality of and .
Thus .

If , let
and , so that .
Since converges to *Q* and ,
there exists some such that for
all . Choose
such that , which is satisfied for ,
and let be the quantity sold at price
. By the weak law of large numbers, ,
and thus there exists some such that
for all . Finally,
since converges to *m*
as shown above, there exists some such
that for . Let
. Then for ,
setting a price yields corresponding
sales and revenues .
Since *e*
was chosen so that , we get ,
contradicting the optimality of and .

Using the same notation as in Proposition 1, we assume that, for all integer and all , .

This assumption implies that the quantity of the bundle of
goods sold at price per good will increase
compared to the bundle of goods, i.e.,
. This guarantees that .
Adding the th good to the bundle is desirable
for the seller, because a bundle of goods
is more profitable than a bundle of *
*goods plus a single good sold separately, since .

Assumption A5 also implies that (otherwise would not be optimal), allowing the reasoning above to be applied inductively, which proves the proposition for all .

If the marginal cost is higher than the mean valuation, it is
easily seen that bundling is unprofitable at the limit as .
Separate sales are still profitable as long as *some* consumers'
valuations are higher than the marginal cost.

Without bundling, the seller faces a downward sloping demand function for each individual good, resulting in a monopolistic equilibrium price of and corresponding profit of . Bundling allows the seller to capture the entire consumer valuations, thus resulting in average profit . Bundling becomes unattractive when , or . As , this condition is met when .

Given a consumer's type *w*, valuations
are i.i.d. for all goods and uniformly distributed in ,
i.e., , where .

The probability that a consumer of type *w* will value any
particular good at *x* is for .
Thus the sum of valuations for consumers with valuation at level
*x* equals , and consequently the
unbundled demand at price *p* is

and thus .

As *n* increases, the mean valuation for a bundle of *n*
goods by a consumer of type converges
stochastically to . Thus at a price per
good for the bundle , the seller will
sell to a fraction of the consumers;
i.e., those with type . The resulting
demand curve is , and thus the profit-maximizing
bundle price is per good, and the corresponding
average profits are and the deadweight
loss is . If third-degree price discrimination
is feasible, however, the seller will set ,
resulting in profits of , no deadweight
loss, and full extraction of consumer surplus.

If a consumer of type with preferred
consumption level for the discriminating
feature chooses a lower consumption level ,
which is the preferred level for type ,
the resulting utility loss is . That
consumer values the bundle at , and it
can be shown that the optimal price schedule is linear: if a consumer's
consumption level for the discriminating feature implies type
*w*, the seller charges that consumer price ,
which results in a truth-telling equilibrium in which each consumer
selects the level *d* implied by his or her type *w*.

The resulting demand function is , with
sales at price .
The seller realizes profit , where *w**
characterizes the marginal consumer that will purchase the bundle.
This profit can be calculated to be ,
and solving yields .
Substituting *dW* for *w* in the price schedule above
yields the result in the Proposition.

Thus, unless , the optimal pricing strategy
for the bundle involves taking advantage of the feature *d*
to price-discriminate. If , then the
seller is able to achieve third-degree price discrimination, charging
each consumer their reservation value for the bundle, and extracting
the entire consumer surplus, resulting in higher profits and no
deadweight loss.

The proposed mechanism works as follows:

1. For each good *i*, expose a random subsample
of *s**i* potential
consumers to prices that make them reveal their demand for this
good. These consumers will not have access to good *i*,
which is normally in the bundle, unless they pay an additional
price, *p**i*.

2. Extrapolate the information from the subsamples
to the rest of the population. If these *s**i*
consumers are sufficiently representative, then their choices
will provide a (noisy) signal of what the demand of the whole
population, *S*, would have been for good *i*.

This mechanism requires preventing arbitrage among
consumers, a condition that can be enforced through technical
means, such as public key encryption and authentication; legal
means, such as copyrights and patents; social sanctions, such
as norms against piracy; or combinations of the three. This mechanism
will lead to a deadweight loss for those *s**i*
consumers who are included in the sample, since some of them may
choose to forgo consumption of the good. If *s**i*
= *S*, then the mechanism provides exactly
the same information as the conventional price system at exactly
the same cost. However, it is likely that for most purposes a
sufficiently accurate estimate of demand can be calculated for
*s**i* <<
*S*, because of the rapidly declining informativeness (O())
of additional draws from the sample, as shown in Figure 6.

While the conventional price system provides only
a binary signal of whether a given consumer's valuation is greater
than or less than the market price, by offering different prices
to different consumers one could estimate the shape of the *entire*
demand curve, rather than just the portion near the market price.
It may be too costly to experiment with prices far from the equilibrium
price if all consumers must be offered the same price (Gal-Or,
1987), but if only a few consumers face off-equilibrium prices,
then the costs can be kept manageable. Moreover, the shape of
demand far from the equilibrium price is an important determinant
of the total social surplus created by a good, and therefore the
optimal investment policy regarding which types of goods should
be created. For these reasons, our mechanism is likely to provide
information about consumers' demand at a significantly lower social
cost than the conventional price system, and it will never do
worse.

This statistical mechanism resembles the way investment decisions about certain information goods are actually made. For instance, information about consumers' valuations of individual television programs is rarely obtained by forcing them to pay for particular programs. Instead, television content producers provide a bundle of the goods for free (broadcast TV) or for a fixed price (cable or direct satellite TV) and rely on statistical sampling by firms like Nielsen and Arbitron to estimate audience size and quality. Advertising rates are based on these estimates, and indirectly determine which types of new television content will be produced. As discussed in section 6.4, this mechanism also resembles how royalties are apportioned to composers and songwriters from the revenues paid by nightclubs, restaurants, and other venues.

Finally, test-marketing of new products using focus
groups also has similarities with the mechanism we describe.
In fact, any signal that is reliably correlated with consumers'
expected valuation for a good can serve as a substitute for the
information provided by the conventional price system. These
indicators could include prices from related product markets or
populations, time spent visiting a site on the World Wide Web,
the number of keystrokes made while in a particular application,
survey answers on what users say they like, the expert opinion
of product specialists, or ratings generated by collaborative
filtering mechanisms (see, e.g., Avery, Resnick & Zeckhauser,
1995; Urban, Weinberg & Hauser, 1996).