This page lists when assignments etc are due.
[MIXER after class]
P0- Team Formed
HW1- Analytics
DF1- Yourversion
HW2- Delicious
Real Time
P1- Initial
P2- Proj Page
HW4- Twitter
P3- Final Specs
DF2- Blippy
P4- Status Report
DF3- Quora
P5- Initial Results
DF4- Groupon
P6- Final Results
(make-up essay)
P7- Preso
(no final)

Speakers will be announced the class session before they're coming. We have some great people lined up for the term :)

Office Hours
Office hours will be held in Terman 344. See signup to reserve a spot.

How to Submit Assignments
Assignments are due electronically at noon on the day they're due. Please email your homework assignments to as text, pdf, doc, or docx using the following convention "FirstName_LastName_AssignmentNumber":
Subject: "Jeremy_Carr_HW1" OR "Jeremy_Carr_DF3"
Filename: Jeremy_Carr_HW1
Please also submit a hard copy of your work in the homework submission box at the classroom entrance at the beginning of class on the due date.


Class Information

From the Stanford bulletin: Hands-on exploration of current and emergent data sources and their impact on individuals, business and society: recommendation engines, reputation systems, social network analysis, and engagement metrics.
Guest speakers, homework assignments and group projects (e.g., Facebook and/or iPhone/Android apps) combine data strategy, machine learning, modern and traditional marketing, behavioral economics, and incentive design.
Cases include, BestBuy, MySpace, Lufthansa, and startups.
Prerequisites: Intellectual curiosity, entrepreneurial spirit, some programming experience (details at, and willingness to implement in the real world.

Schedule for MS&E 237
2009-2010 Spring
· 3 units, Spr (Weigend, A)
· MS&E 237 | 3 units | Class # 88401 | Section 01 | Grading: Ltr-CR/NC | LEC
· Tue, Thu 4:15 PM - 5:30 PM
· Instructor: Weigend, Andreas

If you have not received an email from the instructor about the class and the required surveys(if you registered late on Axess or have not registered at all) please shoot an email to to get in touch with the teaching team.


1. Who is in the class?

I appreciate the rich diversity of the students in class, and am happy that people bring to bear both their academic perspective and their personal experiences. To give you a quick overview of the backgrounds of the students in class, here is the breakdown by departments (as of March 29, 2010, 5pm):

MS&E 237 Sp 2010: Social Data Revolution -- Backgrounds

23 MS&E Masters students

Mgmt Sci & Engineering - Mgmt Sci & Engineering (MS)

16 GSB students

Business Administration - (MBA)

Business Administration - Sloan Fellows (MSc)
23 other graduate students

Electrical Engineer - Electrical Engineering (MS)

Mgmt Sci & Engineering - Mgmt Sci & Engineering (PhD)

Management - Management (MS)

Computer Science - Computer Science (MS)

Material Sci & Eng - Materials Science & Engr (MS)

Financial Mathematics - Financial Mathematics (MS)

East Asian Studies - East Asian Studies (MA)

Biomedical Informatics - Biomedical Informatics (MS)

18 undergraduate students

Undergraduate Matriculated - Computer Science (BS)

Undergraduate Matriculated - Electrical Engineering (BS)

Undergraduate Matriculated - Undeclared (B)

Undergraduate Matriculated - Economics (BA)

Undergraduate Matriculated - Economics (BAH)

Undergraduate Matriculated - Engineering (BS)/MS&E(BS)

Undergraduate Matriculated - Math & Comp Science (BS)

Undergraduate Matriculated - Math & Comp Science (BSH)

Undergraduate Matriculated - Mechanical Engineer (BS)

Undergraduate Matriculated - Mgmt Sci & Engineering (BS)

Undergraduate Matriculated - Political Science (BA)

Undergraduate Matriculated - STSS (BA)/Hum Bio (Min)

Undergraduate Matriculated - Symbolic Systems (BS)



2. History of the course
This course is a brand new revision, to be taught first in Spring 2010 of Data Mining and Electronic Business (formerly Stat252, only cross-listed in MS&E), designed and taught 2003 - 2009 by Prof. Weigend. In 2010, there will be significant emphasis on a group project (suggested group size: 4 students) leading to a startup-quality idea that leverages social data.

To decide on whether to take this course on social data and the implication to business and society, it might be useful to look at the course wiki of the last years ( 2009 , 2008 , 2007 ).


· This url, , is a wiki. If you see anything that can be improved, just do it. While everybody can view, only members can edit. All students in the class have been added as members based on their email address on the class list. If there is any access problem, please click on the top of this page, and generate a request to be added.
· The page WISHES by students gives you a lightweight way to jot down what you would like to get out of the course. Please share your expectations with us and tell us how we can help. Two-way communication is what distinguishes Web 3.0 (as architectures of interaction) from Web 2.0 (architectures of participation). Feedback is key to make this course truly worthwhile for you.
· We will need some volunteers for some class-related tasks. Please check out the page HELP with class and add your name if you are interested. Thanks!

3. "Official" course page (non-wiki)

While all the information you need should be here on the pages of this wiki -- and if something is missing, you can just fix it for everybody else by putting it there -- the more traditional syllabus / high-level course description is It gives links, e.g., to the audio recordings and the transcripts of each class as well as general references to books and papers.

4. Course content and goals

And here is the 2009 version:

1. Overview. The PHAME framework. Learn what it means to clearly define the problem you want to solve. Come up with a couple of hypotheses, suggesting different actions. Create a rich set of metrics, understanding the trade-offs between variables. And then run simple experiments that compare the different actions.
2. Ecosystems and platforms.
3. Data sources. Their value is their impact on decisions. Case: Customer Lifetime Value, is getting redefined using social data.

1. a) Basics of decision analysis, and why it is an important tool in decision making.
b) Prediction markets: Choices in design, and their consequences.
Case: Lessons from Google's internal prediction market.

1. Social network analysis. Focus on decisions that are influenced by the outcome of the analysis. Case: The spread of information on Facebook, and its implications for traditional "influencer marketing". [Note added June 1, 2009: The paper Eric Sun (who took Stats 252 last year) presented received the Best Paper Award at the ICWSM09 conference. Congratulations!]

1. Creating information products from data. Case: How LinkedIn's successful culture is based on data and models, experiments and metrics. Also, brief discussion of underlying infrastructure (Hadoop, Aster Data, Greenplum etc.)

1. Recommendations, reputation, and relevance. Cases: Amazon, Music.

1. Machine learning approaches for online advertising. Serendipitous discovery vs. interrupt marketing. Cases: MySpace, Fox Interactive Media

1. Privacy, Dating, Mobile advertising. Cases: Skout. Orange-FranceTelecom.

The material above will be covered in 9 three-hour classes (8 regular classes and 1 class during the time slot for the final). I have reserved about an hour at the end of the last class. I would like to talk about the big shifts in medical personal data collection, sharing and mining. The impact on individuals, society and business will be significant. However, I am open to any suggestions of what you want to do in the last hour of class. Here are a couple of alternatives:

· Geolocation is another fast growing source of social data right now. Most apps don't do much more than putting pins on Google Maps. After summarizing technology and devices, it would be good to develop scenarios for what "advertising" could look like on mobile platforms.
· Quants on Wall Street always hope to get ahead by using data sources others don't have. I could tell some of the stories where using, cleaning, and understanding new data turned out to be lucrative. However, for each success story there are dozens of case of wishful thinking that it should work, although it doesn’t. The discussion would center on where SDR data might be useful, and where it might not?

Several students have asked whether we will do a class on visualization. While I love displaying information, the hard part, which is almost impossible to teach in an hour, is to learn to have a "dialog with the data". Using good interactive tools for the problem at hand is more important than pretty visualizations.

5. If you need help...
The teaching team has one single email address that is monitored by them: This email is not monitored by the instructor, but by the members of the teaching team.
· Andreas Weigend. I live in San Francisco, and am on campus two days a week. Try contacting me by email first, and if you think you should have received a response but didn’t, then text or call me, 650 906-5906. I'll be out of town once during the quarter but this does not impact class at all.
· Jeremy Carr

Thanks to others: website....

6. ... and if you are able to help out the class

Any engagement with the class is welcome. This can come in many forms:

· Infrastructure. E.g., can the audio be linked to slides and transcript (on or where else)? What is a good wiki (socialtext? wikimedia?)
· Writing (spend some time writing up one deep insight you got in class that is worth sharing, and will work with the people. (Thank you, Ray, for your creativity and clarity in the post on The Sorry State of Relevance.)

The world is changing so quickly. We use "old" technologies like this course wiki (simply because it worked well in the past years, but I am ready for new suggestions), or the page to understand both what data are being created there, and how important quick experiments and a thoughtful set of metrics are. We work with Twitter as a currently fast growing distribution channel, Ning to get to the social graph of the class, Ustream to entice people to create metadata via annotations, Etherpad as a real-time collaboration tool... New channels are created constantly, and the usage patterns that emerge are often different than the intended ones.

In any case, if you know of anything that you think I should look into for class and the dissemination of the ideas, or have a better way of doing what I do, I would be grateful if you would let me know. And if it makes sense, let’s just try it out and learn together. I will waive assignments and/or the wiki requirement for comparable work. Furthermore, the school will pay $20 per hour as a grader for things we can frame that way.

So, please talk to me after class, send me email, call me... I deeply care for the subject matter (otherwise I would not be teaching this course, very different from, say, teaching algorithms). And I care that you, the students, get the most out of this class. So, if you have specific ideas, talk to me about them. It usually is good to work on something concrete you care about, plus we’ll both learn in the discussion.

7. Beyond the class: Social Data Revolution

Besides the course wiki, we use a few Web 2.0 tools and reflect on their emerging strengths and weaknesses. We discuss what incentives work for people to engage and share, what can be returned to them in exchange, and what appropriate metrics are in each case that reflect long-term goals.

· Please follow @socialdata and include #socialdata @socialdata @aweigend in all tweets related to the class so there is a chance that your posts are actually seen.
Facebook Page
· Become a friend of the page ( Share when you have something interesting. And here is the code for the widget for your page:
Youtube Channel
· Subscribe to In 2009, I was putting up short videos with insights from class, often framed as conversation with a guest. Please share your thoughts -- what did we learn that works, and what doesn't? What does "works" mean, and what is the purpose of such a channel?
· We use the Ustream channel to understand how to enable people outside the classroom (including SCPD students) to engage. Anybody can view it in real time and participate, requiring people to user their twitter name cuts down on spam. As with all the tools, we will reflect on where they help and where they are distracting or just a waste of time.
· We use the Etherpad for real time shared notes during class. The free version supports up to 8 concurrent users. With the acquisition by google, how can we get this increased?
· This is the central location for SDR related stuff. It was seeded in 2009 with a few paragraphs copied from A good example of good original content was the 2009HW1 Berkeley-Stanford dashboard. In 2010, how shall we feature the results gleaned from the survey (, or We are off to a good start, but I want to figure out what makes sense: What would you like to see there, what goal should it serve, and how can we measure progress?

8. Directions to class and the department
For guests, the following information might be useful: If you come by car, a convenient parking area is the street parking (you need to pay until 4pm) in front of the Cantor Center for the Visual Arts (328 Lomita Drive, Stanford CA 94305). After parking, walk for a few minutes (continuing in the same direction) towards the central part of campus. The first real street you will reach (not counting the street at the museum) is called Serra Mall.
· If you want to go to the classroom directly, turn left on Serra Mall. The class is in the leftmost corner of the Quad, Building 200 ("History Corner"), basement room 002 (Stanford CA 94305).
· And if want to grab a bite or a cup of coffee, turn right on Serra Mall and cross the street to get to Bytes Cafe, located at the ground floor of David Packard Electrical Engineering.
There is a searchable campus map (keywords: History Corner). If you have problems finding it, any student on campus should be happy to help you with directions. ||