Andreas Weigend
Social Data Revolution
MS&E 237, Stanford University, Spring 2010

Class 4: Product Discovery

Class Date: April 8, 2010
Audio | Transcript: NA
Paper: NA

Part I: Data analysis levels
There are four levels of analysis and actionability, each building upon the others. The four levels are, in ascending order of complexity, the page, the visit, the customer, and the network. Visits are a collection of page data, and customers aggregations of visit data, and so forth. For each level there is are several different models specific designed to structure the information, and different actions that might be taken given that information. The models generally increase in complexity and the actions in specificity as we move up to the aggregation chain. Please note that this is all from the perspective of the page manager/domain manager and not from that of the customer. For example, at the page level, the domain manager can serve generic ads, but ad the visit, he or she can serve targeted ads.

Level 1: Page

Model: content
Action: show ads
Explanation: The ad is determined based content on the page.
Example: A customer views a URL, and an ad is served. You serve ads for music on music blogs, financial service on finance blogs, etc.

Level 2: Visit

Model: intention, situation
Action: session based marketing
Explanation: With a thicker information base, the page manager is know not just about the page being visited, but about for how long, and if that same visitor has visited other pages, and upon what he or she has clicked. Based on this information, the manager can attempt to extrapolate the customer’s intent, or the context within which the customer is accessing the page.
Example: On a sports blog, somebody searching for the best snowboard might reveal an intention to purchase a snowboard, and so could be shows pertinent ads. On the other hand, without that information, the blog manager might serve ads for winter vacations, skis, lift tickets, etc.
Example: Google serves ads based on your search query, and based on that query can figure out what you’re looking for (your intention) or what you’ve clicking on (your context).
Example: Suppose that you’re searching on Amazon for digital cameras, if you search generically (suppose your query term is “digital cameras”), then they’ll show you one set of product, while if you’re searching for specific cameras (suppose “Canon Rebel XSI”) then they’ll show only that camera, or its accessories. This way they can differentiate between broad and narrow intentions and then market appropriately.
Example: Suppose also on Amazon, you’re not searching, but browsing through products. If you’re been browsing through beginner guitar packages, they might suggest a few beginner guitar technique books, and in this way they extrapolate your situation, even though they don’t have information about your intentions.

Level 3: Customer

Model: demographics, behavior prediction
Action: personalization (implicit), customization (explicit)
NB: It’s important to not confuse implicit and explicit modes of personalization/customization. The personalization mode is implicit and the website changes without directly requesting information from the user. An example would be offering you differently gendered products (pink vs. blue backpacks, for instance) based on the website’s understanding of your gender through inference methods (census data and first names, for examples) and not because you volunteered anything. The customization mode is when the pages is changed because the user provides information. For example, on Facebook, when you identify your gender, and gendered ads are shown, this is the result of customization.
Explanation: By using cookies, IP information, login, etc. to identify a particular user, the website can aggregate information from visit into a profile of the user. This might be a demographic profile, including age, gender, location, etc. and then marketing efforts are aimed at the demographic, or it might take the form of a behavior prediction model based on past behavior and inference models from other users.
Example: Amazon determines that I am I male between 18 and 25 (either implicitly because my history has been most consistent with that demographic, or explicitly because I filled out that information) and then shows me products that are most likely to be purchased by my demographic.
Example: Amazon sees that I’ve been looking at artsy comic books after I look for text books (either because it records my past visits, or because I mention “artsy comic books” as my favorite genre in my profile). Now, after I put a few textbooks in my cart, it then at checkout shows me artsy comic books because it know that I’m likely to buy them “while I’m at it” based on my past behavior.

Level 4: Network

Model: apply social network research
Action: discounts, better service
Explanation: As the levels of aggregation increase, the models start to take into account network effects. The level increases that value of each customer by taking into account the fact that a single customer can also influence other people to make purchases. This can take one of two routes: one is the socialization of marketing and the second is the marketing of socialization. The socialization of marketing is the use of network information, who your friends are, what trends are pertinent in your social network, and then making suggestions, serving ads, etc based on that information. The marketing of socialization works in the opposite direction as the socialization of marketing. The marketing of socialization is to use your purchase information and your friend network to promote products you buy to your friends.
Example: Socialization of marketing. By extrapolating trends in social networks, you can predict what is “cool” or “hot” and then push products. If your Facebook friends keep on writing about the iPad, then you should get served iPad ads. To take it a level further (beyond the network level) a semantic identification engine would also notice if those are positive or negative comments, and if they were negative, try to sell you an HP Slate or a Kindle, etc.
Example: Marketing of socialization. Before Facebook Beacon was rightfully destroyed, it would show your purchases on other sites to your friends. You purchase an iPhone on Amazon with the AT&T plan, then Beacon shows ads for the iPhone through AT&T with a focus on the “circle of friends” free calling feature on your Facebook.
Example: Share the Love is a program at Amazon where if you recommend a book that you purchase to a friend, and if he or she buys it within a week, you both get 10% off. This works both sides of the marketing and socialization equation: they’re pushing the book through your network, but they’re also using network information (your choose which friends to share the love with) to determine what products to push to what people.

Big example with real data:

(insert graph from lecture here)
x axis: click on a log scale from one session
y axis: counts of people making that number of clicks on a log scale
A session ends at midnight Seattle time.
Three user types and two action types: actions are purchase and not purchase, user types are documented, undocumented, and insider. It is not possible to purchase and remain undocumented.

Keys to analyzing data:

1. Get a visual of the data to understand what’s going on
a. Eg. Try a log by log graph (power log) and see if you get a straight line
b. NB try different parameters for your visualizations and look for what they have in common across all parameters

2. Use distributions and not averages in order to really understand what’s going on
a. Eg. Sam Savage and the drunk in the road alive at the average position but dead on average (see image:

3. Try to find explanations for data trends or abnormalities and test them
a. Eg. Bots create spikes at even numbers, one click purchases are an artifact of the midnight at Seattle format, etc.

Part II: Recommender Systems

The most important thing to consider in a recommender system is that kind of information that you want to get in relation to what effect you want your recommendations to have. One you’ve determined the information that you want, you then must find some way to obtain that information, either indirectly or directly from the user by offering them something in return for the information. As an example, consider Amazon’s Share the Love program which works as follows:
Desired output: a recommendation for a book which somebody is likely to buy
Desired information: who might want to buy a particular book
How to get that information: offer a discount to the information provider and his/her friend

Why they matter:

a good recommender system, and one based on the most relevant data, can make the difference between a profitable and a bankrupt e-business.
(specific data types and their value for different recommender systems to be added when the notes with them are put up)

Types of Recommender Systems

(pretty graph with the three bubble Venn diagram)
Collaborative Filtering: people who purchase x also purchase y. It can also function on browsing habits: people who viewed x ultimately purchased y. How can you determine those relationships? And once you do, how do you make the recommendations maximally persuasive?

Types of Recommendations:

It's important not just to consider what the recommender recommends, but how it presents it. Important questions include:
1. How many recommendations do you want to give? For a dating service you only want a single recommendations to induce action, but for Amazon it might be more, but not hundred. People are more interested by a variety of selection, but they are also less included to purchase (deferred purchasing perhaps indefinitely). Not mentioned in lecture, but people are also typically less satisfied with their purchases when there are more options.

2. How is the recommendation presented? Although not strictly rational, people will changes their preferences based on the context in which the presentations are offered.

Student 1: Jeremy Karmel
Student 2: Noah Burbank