How philosophy can help create secure databases
It sounds like an interesting idea.
"A database of names, addresses and Social Security numbers (a common form of identification in America) might require only 200 characters to contain all possible combinations. That would limit the total number of character combinations. A positive database containing all the data in question would be a small subset of those combinations. The negative counterpart of this database would be much larger and contain all possible names and addresses that were not in the positive database plus a lot of gibberish. But it would not be infinite. By looking at the negative database, it would be possible to deduce what was in the positive database it complemented."
What do you think is the implication of the last line? I'm not sure if I'm getting the full significance of what it means...
September 1st, 2006 5:17am
Interesting. Very interesting.
September 1st, 2006 5:18am
I think it means "erm, we could do this, by why would we?"
September 1st, 2006 5:19am
$-- - I'm not sure if I'm getting what you said.
The example with social security is related to security. Like if we check an input string we can compare it with the complement of the bunch of negative strings. The complement is inferred from the negative database. So the real values aren't there in the first place. If it is satisfied then it means that it's a valid string. Else it's a fake one and we don't serve it. That's what I think means. But I kind of feel that I'm missing something.
September 1st, 2006 5:49am
a 'negative database' and a 'positive database' are mathematically the same thing. if you have one, you also have the other.
its only use is, as they mention, where you want to find common items between two sets...if a particular item does not exist in both sets then it is common.
really though it 'kind of' doesn't solve anything, since if you give either party the other parties negative database then you have actually given them the real database if they just care to look.
The one problem it _does_ solve is how to allow two trusted parties to check for common items without actually exposing the private data of any specific individuals to any specific people belonging to either party...the parties have to already both trust each other, since giving them the negative database is equivalent to giving them the normal database, so, in short the problem this solves is:
given two companies who want to check for common items in their sets of data and who both trust each other not to actually generate the real data once they have the negative database, how can they do this without providing employees from any party with a specific view of the actual private data held by the other.
got it? its a pretty specific use case.
September 1st, 2006 6:01am
wSV - Thanks for the explanation. I got it. As you say, I guess it's a pretty specific scenario where it is usually used.
September 1st, 2006 6:10am
The problem is to define and populate the complete set and have that codified in total before generating the negative (or postive) subset. Not many real life data are finite in their entirety.
September 1st, 2006 6:10am
I, too, thought the same. Though they say finite there should be a lot of cases where it can't be possible to have a complete set.
Hackers anyway I think can find ways to emulate requests which can be a complement to the negative database... :)
September 1st, 2006 6:13am
" I guess it's a pretty specific scenario where it is usually used."
actually Im not sure its used at all currently...the guy who was quoted said that banks etc _may_find it useful.
actually I guess it might also provide a slightly safer way of transferring private data from place to place....if some random person gets a chunk of it, no specific private data is gained....they have to get the entire set _and_ have both the knowledge and the hardware to recreate the original set.
September 1st, 2006 6:17am
...of course one problem with the idea is that by definition the complete set of negative data contains every set of private data of the same type for every person in the country who is _not_ a client of said company...
September 1st, 2006 6:20am
in fact, if that idea is valid then so is the idea of taking those 200 characters and _generating every possible combination of them_ and then trolling through the results to find the real data.
once you _have_ a name and SSN, is there any easy way to validate them?
September 1st, 2006 6:21am
so, as you seem to have a handle on this, was my summing up about right, Mr Catgut?
September 1st, 2006 6:29am
it suffered the curse of brevity.
September 1st, 2006 6:37am
$-- - So you have thought so many steps ahead?? That's cool... It sure makes sense now... :)
wSV - I was thinking on the same lines. I think practically it should be very difficult to do it. At least, it involves a couple of extra steps to get back to the real data. At most one can try to get a valid negative request and use it. But the data should essentially remain a black box. I guess that's the security it would achieve.
September 1st, 2006 6:41am
"it suffered the curse of brevity."
And had the soul of wit, which, by the way, was lost on me... :)
September 1st, 2006 6:42am
the main thing that went through my mind was "if you have a finite set, and you start manipulating the non-members of some subset, that has to be equivalent to manipulating the members in some way. So what's the point?"
I have to admit I didn't invest much more energy than that ...
September 1st, 2006 7:04am
and I love brevity.
September 1st, 2006 7:05am
never heard "the curse of brevity" before though. I wondered if it was a quote, so I googled. Only 3 responses ... here's one:
... and by coincidence, I learnt most about writing from a journalist. I can't get out of the habit of trying to trim excess out of everything I write.
Where do you know the phrase from, Catgut?
September 1st, 2006 7:10am
far be it from me to doubt those learned professors of biology and such, but the actual article looks like pure sophistry. If you have a huge but finite set, and a small subset, and you give some other party (trusted or not) the negative set (which by implication is also huge) - what's the point? You just gave them the set.
The implication could be that it is impracticable to actually calculate the set ... but that seems like horseshit considering that you are also saying it is reasonable to store the negative set, AND do lookups on it ...
September 1st, 2006 7:25am
If you have the room to store the negative set, you also have the room to store the complete set (positive union negative) and can then extract the postive set from the complete set.
But you can't have the room to store the negative set, because according to the article "...might require only 200 characters to contain all possible combinations..."
Hmm. Sounds to me like you're going to need about 36^200 rows in your database, which is so far beyond the number of atoms in the universe (estimated to be under 1^80), that not even scratches on electrons could be small enough to store that much data. Or worse, you'll need an accurate list of every American, along with SSN and address (around 400,000,000 rows to include all the dead folks who ever had SSNs).
Cryptography in the Database.
Even has some java code.
September 1st, 2006 9:35am
Where negative databases shine is in the example at the bottom of the article:
<<Dr Esponda gives the example of a negative survey in which respondents are asked to tick the box of one sexually transmitted disease they do not have. He reckons that this would be sufficient to estimate the population frequency of each disease, without having to ask people whether they actually suffer from such diseases—which is intrusive and also invites lying.>>
A strong boon in maintaining security in the realm of statistical analysis.
September 1st, 2006 10:13am
It is not the case that all ideas in the article are not-good.
September 1st, 2006 10:14am
"Where do you know the phrase from, Catgut?"
surprisingly enough, I made it up
September 1st, 2006 1:19pm