Search This Blog

Friday, December 10, 2010

How Much To Share...?

There's a post at Digital Nirvana about Facebook and its vast database.  The author, Nancy Scott, speculates that Facebook could monetize (sell) its vast collection of data.

I thought a little bit about this.

For the last thirty years or so corporations have spent a vast amount of IT resources (dollars, time, etc.) to develop giant databases of customers information.  Customer information includes a very broad spectrum of things, may of which depend on the company collecting the data.

For example, virtually every company I deal with (cable, phone, electric, and so on) all has my name and address.  Now, a few years ago there might have been slight differences between how my address would have been represented by each company.

The USPS, in implemented ZIP+4 and more consistent mailing discounts over the last 15 years or so have caused most vendors in the US to standardize their addresses in order to save on postage.  Virtually all big companies now mail to ZIP+4 addresses. 

A second wave of this has been happening with electronic bill paying.  Companies like American Express can be paid directly by your bank though on-line bill paying which does not involved anything but an electronic transaction (no physical mailing) which eliminates the cost of postage, billing, etc.

So your "address", if you will, has gone from a simple mailing address to a standardized mailing address to, in the case of electronic bill paying, basically a set of account and bank routing numbers.  These days your bank probably contracts this type of e-payment to a vendor so they are not really even directly involved.

Associated with you, at each company, is more than likely a giant collection of "transactions" that represent what it is that you and the company do.  At the cable company there is probably a vast collection of past bill and payments, for example.  Depending on the industry there may also be exact scans or images of bills and payments, i.e, in banking or finance.

This data is usually hierarchical in the sense that under "you" there may be a "billing" section with "bills" and "payments", there may be a "transactions" section, say showing which movies you rented, for example, and details about this, and so on.

The data about you is more than likely also linked to a vast array of corporate systems that track information about you that you are not allowed to see:  For example, a "service" database that tracks what happens on the corporate end when you make a service call: required parts the company used to fix the problem, travel, hours, time, and so forth.

Now what's interesting about Nancy's post is that it brings to my mind how different Facebook is from these other kinds of databases. 

Facebook uses a different database model than you might find in a corporate database.  Corporate databases typically use some kind of relational or hierarchical database that depends on transactions to store data.  A transaction is a kind of "unit of update" that makes sure that everything in the database that must be changed to reflect new information actually gets changed.

So, for example, imaging that there is a database at the cable company that has your mailing address in one part and a list of cable boxes in your house in another.  Further let's suppose that these two pieces of information are tied together in some way, for example, you might have two houses and some cable equipment in each.

So let's suppose you move from one house to another.  The cable company needs to make sure that both your address and the list of equipment at the house your are currently at "moves" to the new location.  If the database is divided up into two parts, say address and equipment at an address, both parts have to be updated.

You certainly don't want your address to change but your equipment not to change because that might cause a problem when services is required.

To do this in a modern database a "transaction" is required.  The transaction makes sure that either A) both your address and the equipment at that address is updated or B) neither is updated.  That way things never go "out of sync".  Now in the general case things are more complicated than this but this gives you some idea.

On the Facebook side things are somewhat different.  First of all Facebook requirements are somewhat different.  From this we know that in one year Facebook added 200 million new users (there are only about 64 million cable customers all told in the US).  Every week Facebook users upload 2 billion pieces of new content each week.

Facebook does do billing or touch your money - if you post a message to someone or upload a picture and it gets lost due to some IT problem mostly no one really cares - you might get a message that there was a problem and you might have to do it again.

This means that Facebook operates in a non-critical mode, i.e., unlink a bank or corporate company, there is no real consequence to losing a bit of data.  This means that the data in Facebook is not "reliable" in the same way as it is in a bank.

So while Facebook knows a lot about you it does not necessarily who you really are.

Today most corporations require you to provide some sort of ID telling them you're you: a social security number, lease, etc.  There might be hundreds of Tom Smiths and they don't want the wrong one to get a bill for something that they didn't do.

Facebook doesn't really care.  There can be a lot of people or businesses with the same name - no one really cares - either it lets you have the name or not.  Facebook also doesn't require you to be who you really are - you can create fictional IDs and so forth.

So how could these two world be shared?

It would be very complex.  First of all for anyone to use Facebook data they would have to figure out who is actually who relative to their corporate data, i.e., John Smith at 1234 Main Street, Anytown USA and some John Smith on Facebook.  There are lots of John Smith's on Facebook - which seems to organize them by location.

Facebook knows who you are by login info and perhaps by IP or MAC address - the same way Google and others that are "on line" know about you.  This works because neither is charging you money.  The cable company cannot work this way because someone will become angry if they are charged for what someone else has done.

Facebook allows you to advertise by "location, age and interestes."  Not by who you really are.

Instead I think that Facebook, like Google, will simply sell demographics - information about classes of people and what they do.

I don't see businesses using specific Facebook data for anything.

But then again who knows...

No comments:

Post a Comment