Magical Syndication

October 4, 2006

what if there was a better way to measure attention? browser plugins can be used to record usage paths, and maybe a rough idea of browse time per page.i wonder what could happen if we knew what part of a document each person actually read [and payed attention to]. i will list some things that could be deduced statistically. please feel free to contribute your thoughts as well [as comments on this post].

  • whether the document is interesting*; users are more likely to browse through a document if it is interesting.
  • which part of the document is most interesting; browse patterns within the document can give clues about the content of the document.
  • the density or difficulty of the document and parts of it; moments of pause to read and reread.

i wonder, furthermore, what could happen if we knew who was reading the document. now we have three sets of data – the set of documents, the set of users, and browse patterns.

  • the kinds of documents, or at least how related documents are to each other; users tend to be interested in a tiny subset of documents at a given moment. self organizing maps could be a possible implementation.
  • once we know how to describe documents we can also describe users; users have a set of interests. i don’t know what kind of algorithm is best suited for finding document clusters that represent a kind of interest [i will call these interest-clusters], but i’m sure there are known algorithms for this purpose.
  • the level of understanding that a user has of each interest-cluster; related to point 3. from the previous paragraph.

ok, so hopefully i’ve established that this is an interesting idea. now here is one implementation that makes the above possible.

  1. decide on a document representation schema that is a heirarhical document format, such that the document can be split into parts that correspond to the content of the document.
  2. implement a display program that can run on any browser that is easy to use, and most importantly, folds the document heirarchicaly such that in order to reach parts of the document, the user must act on the display program; think xml, hyperscope, gmail etc.
  3. the program should be able to act on documents hosted and pubished anywhere. the best way i can think of is to publish the documents as xml over http and include javascript that can parse and display the document according to point 2.
  4. track user identity by cookie or by login.
  5. the display program sends browse data and user identity to a server where it is aggregated and analyzed.

valence screenshotso where does the magic happen? based on all the data that is gathered, it is possible to come up with intelligent suggestions as to what to read next. a service could suggest further reading material depending on your level of understanding and topic of interest. if you like to read the latest development news about enterprise java frameworks, they come to you. if you would like to learn more about a topic in mathematics, perhaps even a trail of documents can be suggested to you. best of all, there is no need to process the contents of a document. all the data is agnostic of the contents of a document, and derived purely from attention data.

maybe it’s mostly all hogwash.

– Jay


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: