﻿<?xml version="1.0" encoding="utf-8"?><rss version="2.0"><channel><title>Ayende @ Rahien</title><link>http://ayende.com</link><description>Ayende @ Rahien</description><copyright>Copyright (C) Ayende Rahien  2004 - 2021 (c) 2026</copyright><ttl>60</ttl><item><title>Fr&amp;#233;d&amp;#233;ric commented on Slaying relational hydras (or dating them)</title><description>  
Ayende, my software is heavily using metade, or so called "client properties".
  
  
The real part of this is getting data validation and consistency if not defined.
  
  
But there is no major issue, just take a look at sharepoint, it's exactly the design principle used by moss.
</description><link>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment26</link><guid>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment26</guid><pubDate>Thu, 18 Mar 2010 10:03:45 GMT</pubDate></item><item><title>Joe Cincotta commented on Slaying relational hydras (or dating them)</title><description>...this is a transcript of our conversation to date...
  
  
2010/3/11 Joe Cincotta 
&lt;xxxx@spiralglobal.com&gt;
  
  
There would be 2 tables. 
  
  
1. USER
  
  
ID as GUID
  
other columns for the user
  
  
2. AVPAIR
  
ID as GUID
  
UserID as GUID (FK)
  
Attribute as String
  
Value as String
  
  
  
  
My thought was that I would do a select where  (attribute=x and value=y) or (attribute=z and value=a) etc etc for each attribute in the source profile. Then you would group by the result by the UserID - using an aggregate function to order by the most results on a UserID to the least. This would provide you the best matches first and the least further down. 
  
  
I would have to spend more time than I have to give the precise SQL, but this at least gives my idea. It would use a fair amount of processing power performing an aggregate SQL function if you are talking about thousands of attributes. 
  
  
On 12/03/2010, at 2:17 AM, Ayende Rahien 
&lt;xxxx@ayende.com&gt;
 wrote:
  
  
The problem is that this becomes expensive very quickly. You can't use indexing, for example.
  
  
2010/3/11 Joe Cincotta 
&lt;xxxx@pixolut.com&gt;
  
You could create some artificial indices. Like, for example - every time you 'save' a user profile you make a checksum. Create a checksum table which looks like 
  
  
CHECKSUM
  
ID as Guid
  
UserID as Guid (foreign key)
  
Checksum as varchar or byte - could use a nice big crypto SHA hash as the checksum so there are no collisions.  
  
  
When we do a save on the user profile we concat all attributes and values together and then hash them use a big (1024 bit?) SHA hash.
  
  
We can index this table easily and then exact match searches get waaaaay faster. One entry per user instead of per attribute. 
  
  
You could also use a variation of this to cluster results - like  
  
Do the above strategy but make several tables for the attribute clusters which are important. For example maybe everyone wants exact matches to 'gender, marital status, and age' - we make a checksum table for those attributes which gives us a shortlist for the more complex aggregate function. 
  
  
The more 'checksum' tables the more funky you can get. You essentially can segment data and control performance of the search. The place you pay performance hit is on the create or update operations. This would be deemed acceptable. 
  
  
  
J
  
  
  
On 12/03/2010, at 7:06 PM, Ayende Rahien 
&lt;xxxx@ayende.com&gt;
 wrote:
  
  
At which point, I am better off with a non relational option, it is simpler to follow, faster to program with and faster to execute.
  
  
On Fri, Mar 12, 2010 at 8:23 AM, Joe Cincotta 
&lt;xxxx@pixolut.com&gt;
 wrote:
  
If that's an option then, sweet. But you end up with other issues - do you use a non- relational data store? Does it provide the measured scalability of SQL? It's not just about the language you choose - it's about reliability. What would you suggest as an alternative to Sql? 
  
  
J 
  
  
Joe,
  
Yes, I do, and yes, it does.
  
But I can't really answer that question without having more information, such as what the actual need is.
  
  
...ende...
  
</description><link>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment25</link><guid>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment25</guid><pubDate>Fri, 12 Mar 2010 12:37:37 GMT</pubDate></item><item><title>Francois Germain commented on Slaying relational hydras (or dating them)</title><description>What about this?
  
  
Catalog : CatalogID, Prompt, Type, etc
  
Person : PersonID, FirstName, LastName, etc
  
Member : PersonID, MemberSince, Status, etc
  
PersonStringAttributes : PersonID, CatalogID, Value
  
PersonDateAttributes : PersonID, CatalogID, Value
  
PersonTextAttributes : PersonID, CatalogID, Value
  
DesiredStringAttributes : PersonID, CatalogID, MinValue, MaxValue
  
DesiredDateAttributes : PersonID, CatalogID, MinValue, MaxValue
  
DesiredTextAttributes : PersonID, CatalogID, MinValue, MaxValue
  
  
Now you have a few possible ways to return the resulting data.  You can combine the separate asynch returns on the middle tier.  You can have a view on top of your data structure.
</description><link>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment24</link><guid>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment24</guid><pubDate>Tue, 09 Mar 2010 04:46:24 GMT</pubDate></item><item><title>Gerke Geurts commented on Slaying relational hydras (or dating them)</title><description>Small correction in grouping logic:
  
  
select mq.MemberID, count(DISTINCT mq.QualityID) as CountOfMatchingAttributes
  
from MemberDesiredQualities mdq
  
join MemberQualities mq
  
on mdq.QualityID = mq.QualityID
  
and (mdq.ValueMin between mq.ValueMin and mq.ValueMax
  
or mdq.ValueMax BETWEEN mq.ValueMin and mq.ValueMax)
  
where mdq.MemberID = 1000
  
group by mq.MemberID
  
order by mq.MemberID
</description><link>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment23</link><guid>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment23</guid><pubDate>Tue, 09 Mar 2010 02:00:36 GMT</pubDate></item><item><title>Gerke Geurts commented on Slaying relational hydras (or dating them)</title><description>Proposes adjustment of Tobi's solution to allow for multiple values in both desired and actual member qualities. And I would treat single values as a range with same ValueMin and ValueMax values.
  
  
select mq.MemberID, count(mq.QualityID) as CountOfMatchingAttributes
  
from MemberDesiredQualities mdq
  
join MemberQualities mq 
  
on mdq.QualityID = mq.QualityID 
  
and (mdq.ValueMin between mq.ValueMin and mq.ValueMax
  
or mdq.ValueMax BETWEEN mq.ValueMin and mq.ValueMax)
  
where mdq.MemberID = 1000
  
group by mq.MemberID, mq.QualityID
  
order by mq.MemberID
  
</description><link>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment22</link><guid>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment22</guid><pubDate>Tue, 09 Mar 2010 01:57:36 GMT</pubDate></item><item><title>Ayende Rahien commented on Slaying relational hydras (or dating them)</title><description>Yoni,
  
Assume that it is dynamic enough to cause problems
</description><link>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment21</link><guid>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment21</guid><pubDate>Mon, 08 Mar 2010 22:49:41 GMT</pubDate></item><item><title>Yoni Shalom commented on Slaying relational hydras (or dating them)</title><description>is the matching criteria relatively static or dynamic ?
  
i.e. - do I fill out what qualities I like on a search page or do I fill them out once in a long while in an account preferences page ?
  
  
Where am I going with this ? - if the search itself is relatively static, then you might want to precalculate matches and save/search those....
</description><link>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment20</link><guid>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment20</guid><pubDate>Mon, 08 Mar 2010 21:45:39 GMT</pubDate></item><item><title>MattMc3 commented on Slaying relational hydras (or dating them)</title><description>Erik has the right idea as a starting point, but you'd have to use subqueries or temp tables instead of the join.  It's not that hard to build the query dynamically.  The hard part would be getting it to perform.
</description><link>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment19</link><guid>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment19</guid><pubDate>Mon, 08 Mar 2010 18:29:18 GMT</pubDate></item><item><title>tobi commented on Slaying relational hydras (or dating them)</title><description>configurator, you could use my solution for this. In addition to the count(*) predicate, you add
  
  
sum(Value - MinValue) as Difference
  
  
Then you can calculate the average difference or other metrics of closeness.
</description><link>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment18</link><guid>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment18</guid><pubDate>Mon, 08 Mar 2010 15:28:05 GMT</pubDate></item><item><title>Mark commented on Slaying relational hydras (or dating them)</title><description>You could add a marker ro switch between range match and value match, e.g.
  
  
...
  
join MemberQualities mq on (mdq.QualityID = mq.QualityID and mdq.ValueMin &lt;= mq.Value and mdq.ValueMax &gt;= mq.Value and mq.IsRangeMatch = 1) or (mdq.Value = mq.Value and mq.IsRangeMatch = 0)
  
....
  
  
If you need multiple matches just have more than one "quality" for choices, e.g. a "favourite food 1" and a "favourite food 2", both of which have the same qualityID. Watch the number of matches though, if another member had red eyes and liked eating heroes and witches you'd get three matches, where you may be expecting two. Add the quality to the group by to help with this - the rowcount would then give you the number of matching qualities.
  
  
You can't have a fixed set of values like this:
  
  
mdq.ValueA = mq.ValueB or mdq.ValueB = mq.ValueB 
  
  
Because you need to cross match A with B, not too bad if there are only two choices, but it soon gets ugly:
  
  
mdq.ValueA = mq.ValueA or mdq.ValueA = mq.ValueB 
  
or mdq.ValueA = mq.ValueC or mdq.ValueB = mq.ValueA 
  
or mdq.ValueB = mq.ValueB or mdq.ValueB = mq.ValueC 
  
or mdq.ValueC = mq.ValueA or mdq.ValueC = mq.ValueB 
  
or mdq.ValueC = mq.ValueC
  
  
nasty!
  
  
</description><link>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment17</link><guid>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment17</guid><pubDate>Mon, 08 Mar 2010 15:26:37 GMT</pubDate></item><item><title>Anonymous Coward commented on Slaying relational hydras (or dating them)</title><description>select p.Id, COUNT(*) from people p inner join qualities q on q.personid=p.id
  
where 
  
(q.attribute = 'heads' and q.value &gt;= '3' )
  
or (q.attribute = 'heads' and q.value &lt;= '5' )
  
or (q.attribute = 'species' and q.value = 'Hydra' )
  
or (q.attribute = 'scales' and q.value = 'Yes' )
  
group by p.Id
  
order by p.Id desc
  
  
returns people and how many things they have in common.
  
  
Posting as AC because I'm suspecting something is horribly wrong with this solution if you didn't come up with it :-)
</description><link>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment16</link><guid>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment16</guid><pubDate>Mon, 08 Mar 2010 15:26:03 GMT</pubDate></item><item><title>tobi commented on Slaying relational hydras (or dating them)</title><description>Ayende, according to the execution plan that I got with a few million test-data rows sql server does not use the predicate "mdq.ValueMin &lt;= mq.Value and mdq.ValueMax &gt;= mq.Value" in its index seeks. It should not do any harm to change this predicate to anything at all.
  
For multiple data types you could create multiple nullabe columns or multiple tables.
</description><link>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment15</link><guid>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment15</guid><pubDate>Mon, 08 Mar 2010 15:25:07 GMT</pubDate></item><item><title>configurator commented on Slaying relational hydras (or dating them)</title><description>Maybe this isn't what you're looking for, but in most cases (and I'm not only talking about dating sites) what you want isn't "someone with the most exact matches" but "someone with the closest match to each of the attributes." Consider it a multi-dimensional map and you want the closest point.
  
If my dream girl is blonde with blue eyes and a size L breast, I'd probably rather have a blonde girl with bluish-green eyes and a size M breast (1 match, two very close attributes) than a blonde with blue eyes and no breast (2 matches, one very far attribute).
  
What is normally done is you give a weight and a desired value to each attribute (the weight acts somewhat  like the opposite of a standard deviation). Then for each object you calculate the weighted distance from the object of your desires and you choose the top ones. How would you index that in a relational database?
  
Maybe you can separate the multidimensional map into sectors. A 3-headed beast with red eyes is sector 6485, and a blonde with a really big breast is sector 20316.
  
You'd have the following table
  
Name, Type, Eye color, Sector
  
Blondie, Human, Blue, 20316
  
Hydra, Hydra, Red, 6485
  
Than, you choose all the sectors that are close enough to the desired value (probably the exact one and one sector to each side so you'd have 3^(number of attributes) sectors), and only make an approximate calculation for each of the values in your narrowed-per-sector resultset. There are some more ways  you can optimize the sector-choosing, and the calculation, but that's the main idea.
</description><link>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment14</link><guid>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment14</guid><pubDate>Mon, 08 Mar 2010 15:12:51 GMT</pubDate></item><item><title>Ayende Rahien commented on Slaying relational hydras (or dating them)</title><description>Tobi,
  
Thinking about this, what about if I don't have just ranges, but direct matches?
  
Eye color = Red, for example.
  
Or a single quality with multiple values (Fav. Food: Heroes, Witches).
</description><link>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment13</link><guid>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment13</guid><pubDate>Mon, 08 Mar 2010 14:22:39 GMT</pubDate></item><item><title>Bryan commented on Slaying relational hydras (or dating them)</title><description>You really shouldn't be solving this with a SQL database (unless you're using something like Postgis to represent this as a multidimensional space).  This is really a multi-dimensional graph problem at it's heart.
</description><link>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment12</link><guid>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment12</guid><pubDate>Mon, 08 Mar 2010 13:50:59 GMT</pubDate></item><item><title>Ayende Rahien commented on Slaying relational hydras (or dating them)</title><description>Tobi,
  
Hat tip, that is a really nice solution.
  
</description><link>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment11</link><guid>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment11</guid><pubDate>Mon, 08 Mar 2010 12:04:35 GMT</pubDate></item><item><title>Rafal commented on Slaying relational hydras (or dating them)</title><description>Oh, sh*t, the task was to design a RELATIONAL model...
</description><link>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment10</link><guid>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment10</guid><pubDate>Mon, 08 Mar 2010 12:00:25 GMT</pubDate></item><item><title>Ayende Rahien commented on Slaying relational hydras (or dating them)</title><description>Rafal,
  
That isn't relational solution.
</description><link>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment9</link><guid>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment9</guid><pubDate>Mon, 08 Mar 2010 12:00:09 GMT</pubDate></item><item><title>tobi commented on Slaying relational hydras (or dating them)</title><description>select mq.MemberID, count(*) as CountOfMatchingAttributes
  
from MemberDesiredQualities mdq
  
join MemberQualities mq on mdq.QualityID = mq.QualityID and mdq.ValueMin &lt;= mq.Value and mdq.ValueMax &gt;= mq.Value
  
where mdq.MemberID = 1000
  
group by mq.MemberID
  
order by mq.MemberID
  
  
  
MemberQualities: QualityID, MemberID, Value
  
MemberDesiredQualities: MemberID, QualityID, ValueMin, ValueMax
  
  
Both tables clustered uniquely on the first two columns. Query returns instantly all members with at leas one matching attibute plus the count of matching attributes.
</description><link>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment8</link><guid>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment8</guid><pubDate>Mon, 08 Mar 2010 11:59:08 GMT</pubDate></item><item><title>Rafal commented on Slaying relational hydras (or dating them)</title><description>Convert the attributes to categories, like
  
Gender_Male
  
Species_Hydra
  
Eyes_Red
  
and build a Lucene index on all people, including these words.
</description><link>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment7</link><guid>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment7</guid><pubDate>Mon, 08 Mar 2010 11:57:25 GMT</pubDate></item><item><title>Ayende Rahien commented on Slaying relational hydras (or dating them)</title><description>Erik,
  
You will never return anything using this approach.
  
q.attribute - 'heads' and q.attributes = 'species' are mutually exclusive.
  
</description><link>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment6</link><guid>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment6</guid><pubDate>Mon, 08 Mar 2010 11:52:26 GMT</pubDate></item><item><title>Erik commented on Slaying relational hydras (or dating them)</title><description>Right, so then it turns into a where clause with a series of and'd or or'd search terms, depending on what you're wanting to match.  
  
  
select * from people p inner join qualities q on q.personid=p.id
  
where 
  
(q.attribute = 'heads' and q.value &gt;= '3' )
  
and (q.attribute = 'heads' and q.value &lt;= '5' )
  
and (q.attribute = 'species' and q.value = 'Hydra' )
  
and (q.attribute = 'scales' and q.value = 'Yes' )
  
and ...
  
  
It's not very elegant, but no EAV problem is when you're stuck in traditional RDBMS.  
  
  
[en.wikipedia.org/wiki/Entity-attribute-value_model](http://en.wikipedia.org/wiki/Entity-attribute-value_model)  
</description><link>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment5</link><guid>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment5</guid><pubDate>Mon, 08 Mar 2010 11:49:13 GMT</pubDate></item><item><title>Ayende Rahien commented on Slaying relational hydras (or dating them)</title><description>Erik,
  
You need to match on all the possible qualities, not on just one of them.
</description><link>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment4</link><guid>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment4</guid><pubDate>Mon, 08 Mar 2010 11:40:59 GMT</pubDate></item><item><title>Erik commented on Slaying relational hydras (or dating them)</title><description>select * from people p inner join qualities q on q.personid=p.id and q.attribute = 'species' and q.value = 'Hydra'
</description><link>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment3</link><guid>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment3</guid><pubDate>Mon, 08 Mar 2010 11:39:12 GMT</pubDate></item><item><title>Ayende Rahien commented on Slaying relational hydras (or dating them)</title><description>Joe,
  
Try to show me SQL that make a search work.
</description><link>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment2</link><guid>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment2</guid><pubDate>Mon, 08 Mar 2010 10:55:28 GMT</pubDate></item><item><title>Joe  commented on Slaying relational hydras (or dating them)</title><description>I think a correctly indexed attribute value pair model would work. You could use a few strategies for the search like multiple pass reducing constraint match - or just plain old aggregate function to see how many attributes in common. 
  
  
The nice thing about the av pair model is that as long as you think through your search and data entry - nothing says that the list of qualities has to stay constant. It could change over time without affecting the integrity of previous entries. 
  
  
You of course pay a price for the av pair model in mapping data to - well - pretty much anything. You will need adapter if you use things like mvc or databinding code in general. But it does serve the purpose. 
  
  
J
</description><link>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment1</link><guid>http://ayende.com/4427/slaying-relational-hydras-or-dating-them#comment1</guid><pubDate>Mon, 08 Mar 2010 10:49:55 GMT</pubDate></item></channel></rss>