P0- Team Formed
HW1- Analytics
DF1- Yourversion
HW2- Delicious
Real Time
P1- Initial
P2- Proj Page
HW4- Twitter
P3- Final Specs
DF2- Blippy
P4- Status Report
DF3- Quora
P5- Initial Results
DF4- Groupon
P6- Final Results
(make-up essay)
P7- Preso
(no final)

Speakers will be announced the class session before they're coming. We have some great people lined up for the term :)

Class Information

From the Stanford bulletin: Hands-on exploration of current and emergent data sources and their impact on individuals, business and society: recommendation engines, reputation systems, social network analysis, and engagement metrics.
Guest speakers, homework assignments and group projects (e.g., Facebook and/or iPhone/Android apps) combine data strategy, machine learning, modern and traditional marketing, behavioral economics, and incentive design.
Cases include, BestBuy, MySpace, Lufthansa, and startups.
Prerequisites: Intellectual curiosity, entrepreneurial spirit, some programming experience (details at, and willingness to implement in the real world.

2. History of the course
This course is a brand new revision, to be taught first in Spring 2010 of Data Mining and Electronic Business (formerly Stat252, only cross-listed in MS&E), designed and taught 2003 - 2009 by Prof. Weigend. In 2010, there will be significant emphasis on a group project (suggested group size: 4 students) leading to a startup-quality idea that leverages social data.

To decide on whether to take this course on social data and the implication to business and society, it might be useful to look at the course wiki of the last years ( 2009 , 2008 , 2007 ).


3. "Official" course page (non-wiki)

While all the information you need should be here on the pages of this wiki -- and if something is missing, you can just fix it for everybody else by putting it there -- the more traditional syllabus / high-level course description is It gives links, e.g., to the audio recordings and the transcripts of each class as well as general references to books and papers.

4. Course content and goals

And here is the 2009 version:

1. Overview. The PHAME framework. Learn what it means to clearly define the problem you want to solve. Come up with a couple of hypotheses, suggesting different actions. Create a rich set of metrics, understanding the trade-offs between variables. And then run simple experiments that compare the different actions.
2. Ecosystems and platforms.
3. Data sources. Their value is their impact on decisions. Case: Customer Lifetime Value, is getting redefined using social data.

1. a) Basics of decision analysis, and why it is an important tool in decision making.
b) Prediction markets: Choices in design, and their consequences.
Case: Lessons from Google's internal prediction market.

1. Social network analysis. Focus on decisions that are influenced by the outcome of the analysis. Case: The spread of information on Facebook, and its implications for traditional "influencer marketing". [Note added June 1, 2009: The paper Eric Sun (who took Stats 252 last year) presented received the Best Paper Award at the ICWSM09 conference. Congratulations!]

1. Creating information products from data. Case: How LinkedIn's successful culture is based on data and models, experiments and metrics. Also, brief discussion of underlying infrastructure (Hadoop, Aster Data, Greenplum etc.)

1. Recommendations, reputation, and relevance. Cases: Amazon, Music.

1. Machine learning approaches for online advertising. Serendipitous discovery vs. interrupt marketing. Cases: MySpace, Fox Interactive Media

1. Privacy, Dating, Mobile advertising. Cases: Skout. Orange-FranceTelecom.

The material above will be covered in 9 three-hour classes (8 regular classes and 1 class during the time slot for the final). I have reserved about an hour at the end of the last class. I would like to talk about the big shifts in medical personal data collection, sharing and mining. The impact on individuals, society and business will be significant. However, I am open to any suggestions of what you want to do in the last hour of class. Here are a couple of alternatives:

· Geolocation is another fast growing source of social data right now. Most apps don't do much more than putting pins on Google Maps. After summarizing technology and devices, it would be good to develop scenarios for what "advertising" could look like on mobile platforms.
· Quants on Wall Street always hope to get ahead by using data sources others don't have. I could tell some of the stories where using, cleaning, and understanding new data turned out to be lucrative. However, for each success story there are dozens of case of wishful thinking that it should work, although it doesn’t. The discussion would center on where SDR data might be useful, and where it might not?

Several students have asked whether we will do a class on visualization. While I love displaying information, the hard part, which is almost impossible to teach in an hour, is to learn to have a "dialog with the data". Using good interactive tools for the problem at hand is more important than pretty visualizations.

6. ... and if you are able to help out the class

Any engagement with the class is welcome. This can come in many forms:

· Infrastructure. E.g., can the audio be linked to slides and transcript (on or where else)? What is a good wiki (socialtext? wikimedia?)
· Writing (spend some time writing up one deep insight you got in class that is worth sharing, and will work with the people. (Thank you, Ray, for your creativity and clarity in the post on The Sorry State of Relevance.)

The world is changing so quickly. We use "old" technologies like this course wiki (simply because it worked well in the past years, but I am ready for new suggestions), or the page to understand both what data are being created there, and how important quick experiments and a thoughtful set of metrics are. We work with Twitter as a currently fast growing distribution channel, Ning to get to the social graph of the class, Ustream to entice people to create metadata via annotations, Etherpad as a real-time collaboration tool... New channels are created constantly, and the usage patterns that emerge are often different than the intended ones.

In any case, if you know of anything that you think I should look into for class and the dissemination of the ideas, or have a better way of doing what I do, I would be grateful if you would let me know. And if it makes sense, let’s just try it out and learn together. I will waive assignments and/or the wiki requirement for comparable work. Furthermore, the school will pay $20 per hour as a grader for things we can frame that way.

So, please talk to me after class, send me email, call me... I deeply care for the subject matter (otherwise I would not be teaching this course, very different from, say, teaching algorithms). And I care that you, the students, get the most out of this class. So, if you have specific ideas, talk to me about them. It usually is good to work on something concrete you care about, plus we’ll both learn in the discussion.

