Andreas Weigend
Social Data Revolution
MS&E 237, Stanford University, Spring 2010

Class 7: Real Time Data

Date: April 20, 2010
Audio | Video | Transcript
Company:, Blog

Important Concepts for Content/Data Analysis


Creation of content.

Some of the statistics on a per minute basis (reasonable interval):
  • 100k blog posts (>90% Asia)
  • 40k twitter messages
  • 1k check-ins per min on foursquare

The amount of content being generated is huge and is increasing everyday. The information available online would approximately double every 1.5-2 yrs. In many cases, the information created is not by the service provider but it is by the people using it. (Eg: Nike, where content is mainly reviews, comments, etc)


Channels/Means used to make the produced content available.

Statistics for rate of distribution:
  • 500k pieces of content are shared on Facebook

For effective distribution, incentive based models are of utmost importance. Distribution also has a major impact on the economic aspects of the service. -The power of passed links.

Note: Creation vs Distribution - analogous to some one writing vs finding channels for other people to read what was written.


Usage of the distributed content

Some statistics for the rate of consumption by users:
  • 4m queries across search engines (new queries)
  • 100k links clicked
  • 100k product searches on Amazon

Consumer Models

  • Effective consumer models necessary for high consumption rates
  • Based on various components, including cognitive science, social capital, design, etc
  • Incentives are important to rope in consumers

Consumer Segmentation

  • Important for targeting customers effectively
  • Models designed using sound user segmentation analysis, user behavior models, etc
  • The perspective of the consumer is most important, not the perspective of the company

Self Metrics

  • An innovative way to engage customers
  • Self metrics drive content usage
  • Many kinds of metrics can be presented to users, but it is important to trim down the list to ones easily understood and effective for customers
  • Example: People have been using bit/ly service to encode content that could later be tracked. The content being encoded is pictures, download links etc. Most users would want to know how important the link they posted is to the world (the main reason why metrics come into picture)

Economics of 'Real Time'


The unit of data being consumed by users is reducing day-by-day. This is happening because of the phenomenon called 'Information Snacking'.


The concepts discussed above like Production and Distribution are becoming cheaper by the day and hence is not a major bottleneck. Production is cheap as more users are in possession of devices that can be used to used to create huge/small data. Also, the mode/means of distribution have increased in number - tweets, blogs, sites, ads, etc. have made it easier to distribute content. But the bottleneck is the consumption aspect.

How to Deal with the Consumption Bottleneck

We filter the information we receive using various techniques like:
  • Content and Source analysis
  • Metadata usage
  • Social Filtering (physical or virtual social graphs)
  • Temporal aspects (trade off between recency vs relevance)
  • Context

Note: An important point mentioned in the first lecture (the 4 C's): Context, Content, Connection, Conversation.

Limitations of 'Real Time'

The limitations on truly 'real time' services are as follows:


  • Speed of light (information transfer - cables etc).
  • This is highly important in financial markets where every second is worth hundreds of millions of dollars.


The results from the class survey:
  • Around 75% people expect close friends to respond faster than a company. That is 'Social capital' is worth more.
  • Around 10% expect companies reply faster.
  • Around 12% assign same time for both sources

Social Norms

  • Physical medium itself can dictate social norms, i.e. emails vs sms vs telephone calls
  • Communication - see Dimensions of Communication below
  • Symmetry - sharing information with someone whom you know is easier and considered safer than with an unknown person.


  • This dictates how easily real time is embraced by the users who are the final consumers.

Dimensions of Communication

Synchronous vs Asynchronous

  • One-way communication versus two-way communication
  • E.g. Twitter post versus Facebook wall-to-wall

Response Expectation

  • Level of responsibility to respond
  • Consider phone call vs text message vs email vs facebook message vs tweet

Planning vs Communication

  • Are plans and intentions being communicated clearly to the other party?

Discoverability vs Existence

  • If the data exists, but is not discoverable, it cannot be consumed
  • Example: Create a Google Doc, but don't share it with anyone

Persistent vs Ephemeral

  • Which kind of data is more likely to be used?

Public vs Private

  • What is the scope of communication? Who can "listen in" on the conversation?
  • Example: Email cc versus bcc

Edited by: Alex Muller
Sandeep Sripada