﻿<?xml version="1.0" encoding="utf-8"?><rss version="2.0"><channel><title>Ayende @ Rahien</title><link>http://ayende.com</link><description>Ayende @ Rahien</description><copyright>Copyright (C) Ayende Rahien  2004 - 2021 (c) 2026</copyright><ttl>60</ttl><item><title>Dan Bunea commented on Lucene as a data repository</title><description>Hi,
  
  
I've published the source code at: http://danbunea.blogspot.com/2007/10/lucene-indexes-as-agile-databases.html
  
  
Thanks,
  
Dan
  
  
PS: crouchdb is only javascript. What if I need a desktop app?
</description><link>http://ayende.com/2881/lucene-as-a-data-repository#comment20</link><guid>http://ayende.com/2881/lucene-as-a-data-repository#comment20</guid><pubDate>Mon, 22 Oct 2007 10:15:54 GMT</pubDate></item><item><title>Chris Ortman commented on Lucene as a data repository</title><description>I was toying with this idea abnout 4 months ago. My biggest concern was around actually updating the index and keeping that fast / scalable. Lucene is great for read, not so sure about write.
  
  
I think what you're really talking about building though is CouchDB
</description><link>http://ayende.com/2881/lucene-as-a-data-repository#comment19</link><guid>http://ayende.com/2881/lucene-as-a-data-repository#comment19</guid><pubDate>Thu, 18 Oct 2007 12:54:22 GMT</pubDate></item><item><title>Tuna Toksoz commented on Lucene as a data repository</title><description>Well, i may use oodbms for such complex scenario but the problem with oodbms is it can be very very slow. 
</description><link>http://ayende.com/2881/lucene-as-a-data-repository#comment18</link><guid>http://ayende.com/2881/lucene-as-a-data-repository#comment18</guid><pubDate>Wed, 17 Oct 2007 15:35:51 GMT</pubDate></item><item><title>pete w commented on Lucene as a data repository</title><description>Dan, this is really cool stuff. I still have some reservations...
  
  
As I undertsand it, couchdb is recommended for the persistence of semi-structure data, but it isnt a relational database. Using couchdb to store semi-structural objects smells like an anti-pattern... 
  
Perusing through the FAQs on the couchdb website confirmed this: it was not designed as an OO persistence layer.
  
  
Nonetheless, the desire for persistence of semi-structure objects remains...
  
  
Im trying to find out why you would be interested in using Lucene as a persistence layer and what advantage this would have over couchdb.
</description><link>http://ayende.com/2881/lucene-as-a-data-repository#comment17</link><guid>http://ayende.com/2881/lucene-as-a-data-repository#comment17</guid><pubDate>Wed, 17 Oct 2007 14:47:59 GMT</pubDate></item><item><title>Edwin de Jonge commented on Lucene as a data repository</title><description>Great idea, I had a simular idea some years ago:
  
In a research project some couple years ago, we used Lucene as a datastorage for Topic Maps (XTM) (semantic networks).
  
It worked very well, precisely because of the flexibility of Lucene: Topic maps can define topic types, which we stored as different fields in Lucene.
  
</description><link>http://ayende.com/2881/lucene-as-a-data-repository#comment16</link><guid>http://ayende.com/2881/lucene-as-a-data-repository#comment16</guid><pubDate>Wed, 17 Oct 2007 09:46:41 GMT</pubDate></item><item><title>Dan Bunea commented on Lucene as a data repository</title><description>Hi all,
  
  
Now that I revealed a secret we have been using for a while, I'd like to make a very small contribution to the castle project, with a small project called ActiveDocument. 
  
  
I opened the discussion about it first in: http://groups.google.com/group/castle-project-users/browse_thread/thread/d73e1d00ee9d7fe4/#, where you Ayende asked me:
  
  
Dan, 
  
where are you keeping the data, then?
  
  
  
The answer was revealed last night, with the post about 
  
  
Basically I built on top of Lucene.Net, a few classes, which can do:
  
  
1. dynamic properties and search without problems
  
  
[Test]
  
public void Save()
  
{
  
ActiveDocument product = new ActiveDocument("Product");
  
product["Name"] = "CMS20"; 
  
product.Save();
  
  
ActiveDocument product2 = new ActiveDocument("Product");
  
product2["Name"] = "Taia";
  
product2["Category"] = "Software innovation"; 
  
product2.Save();
  
  
ActiveDocument[] allSoftware = ActiveDocument.Query("Category:Software*");
  
  
Assert.AreEqual(1, allSoftware.Length);
  
Assert.AreEqual ("Taia", allSoftware[0]["Name"]);
  
Assert.AreEqual("Product", allSoftware[0]["type"]);
  
  
}
  
  
  
2. relations:
  
  
[Test] 
  
public void TestManyToManyRelations()
  
{
  
ActiveDocument category = new ActiveDocument("Category");
  
category["Name"] = "Software"; 
  
category.Save();
  
  
ActiveDocument category2 = new ActiveDocument("Category");
  
category2["Name"] = "Sad and Cheap";
  
category2.Save ();
  
  
ActiveDocument pf = new ActiveDocument("ProductFamily");
  
pf["Name"] = "nada";
  
pf.Save();
  
  
ActiveDocument[] allCateg = ActiveDocument.Query ("type:Category");
  
Assert.AreEqual(2, allCateg.Length);
  
  
ActiveDocument product2 = new ActiveDocument("Product");
  
product2["Name"] = "Taia"; 
  
product2.AddRelated("Categories", category);
  
product2.AddRelated("Categories", category2);
  
product2.AddRelated("ProductFamilies", pf);
  
product2.Create();
  
  
ActiveDocument[] relatedCategories = product2.FindRelated("Categories");
  
Assert.AreEqual(2, relatedCategories.Length);
  
ActiveDocument[] relatedPF = product2.FindRelated("ProductFamilies");
  
Assert.AreEqual(1, relatedPF.Length);
  
//and after it is loaded
  
ActiveDocument productAfter = ActiveDocument.Find(product2["id"]); 
  
relatedCategories = productAfter.FindRelated("Categories");
  
Assert.AreEqual(2, relatedCategories.Length);
  
relatedPF = productAfter.FindRelated("ProductFamilies"); 
  
Assert.AreEqual(1, relatedPF.Length);
  
}
  
  
  
It also has internationalisation, multiple value fields (like tags), sorting, and probably it will have for the next versions: validation, customizations (maybe with postsharp http://www.postsharp.org/ aop engine)
  
  
At the time the code is a little too specific to our cms: http://www.eptala.ro/tb.htm but in the next few days I will publish the code for everyone to test and see. 
  
  
Thanks,
</description><link>http://ayende.com/2881/lucene-as-a-data-repository#comment15</link><guid>http://ayende.com/2881/lucene-as-a-data-repository#comment15</guid><pubDate>Wed, 17 Oct 2007 09:45:28 GMT</pubDate></item><item><title>Stefan Wenig commented on Lucene as a data repository</title><description>As for transactional integrity though, I was talking about the primary data source, not the source of primary data. So if you have data stored in Lucene ONLY, the lucene repository better never needs to be rebuilt. If you make a change to both primary (fixed-schema) and secondary (dynamic) data, there's no guarantee that the change is atomic, so, in a large-scale environment, inconsistencies will happen. 
  
  
I'm excited to hear that NH integrates with Lucene.net though. Have to look into that one soon.
  
  
PS: I sent you an email early last week, and tried again on monday. Could you check your spam-folder? Thanks!
</description><link>http://ayende.com/2881/lucene-as-a-data-repository#comment14</link><guid>http://ayende.com/2881/lucene-as-a-data-repository#comment14</guid><pubDate>Wed, 17 Oct 2007 09:22:25 GMT</pubDate></item><item><title>Markus Zywitza commented on Lucene as a data repository</title><description>select * from bug 
  
where 
  
id in (
  
-- subselect 1
  
select entityId from property
  
where 
  
entitytype = 'bug' and
  
propName = 'startdate' and
  
valueType = 'dt' and -- DateTime
  
dateValue &gt;= '2006-01-01' and
  
dateValue &lt; '2007-01-01')
  
and id not in (
  
-- subselect 2
  
select entityId from property
  
where 
  
entitytype = 'bug' and
  
propName = 'completiondate'
  
-- end of subselect 2
  
)
  
  
Each of the expressions translate to a subquery. Null value means that it's simply not stored as a property. Comparing terms is possible, that translates to a nested subquery.
  
Yes it is complex and not performant, but it allows me to store all my data in a single database, which means much less haedaches in administration.
  
Although this model can be extended to allow extensions by type if the propName is replaced by a reference to another table that defines possible extensions per entitytype.
</description><link>http://ayende.com/2881/lucene-as-a-data-repository#comment13</link><guid>http://ayende.com/2881/lucene-as-a-data-repository#comment13</guid><pubDate>Wed, 17 Oct 2007 08:37:32 GMT</pubDate></item><item><title>Ayende Rahien commented on Lucene as a data repository</title><description>Markus,
  
Assume that I extend my entity to include StartDate, DueDate, CompletionDate.
  
Now I want all the bugs that started last year and weren't finished:
  
  
In lucene it is something in the order of* "startdate:[20060101 TO 20070101] AND completiondate:null"
  
  
Now formulate is as a SQL query.
  
  
* not sure if lucene allows comparing of terms, though, so I don't know if something like completiondate &gt; duedate is possible.
  
</description><link>http://ayende.com/2881/lucene-as-a-data-repository#comment12</link><guid>http://ayende.com/2881/lucene-as-a-data-repository#comment12</guid><pubDate>Wed, 17 Oct 2007 07:55:45 GMT</pubDate></item><item><title>Markus Zywitza commented on Lucene as a data repository</title><description>Ayende,
  
I think you misunderstood me. My Property table would be like that:
  
  
create table Properties (
  
  id int primary key,
  
  entityType varchar(50) not null,
  
  entityId int not null,  
  
  propName varchar(50) not null,
  
--Variant 1
  
  genericValue varchar (1000),
  
--Variant 2
  
  valueType char(2), -- selects one of the columns below
  
  stringValue varchar(1000),
  
  decimalValue decimal,
  
  dateValue datetime
  
-- etc.
  
)
  
  
Variant 2 is an extension that allows using date ranges etc. Using sql_variant might be possible, but I didn't have a closer look at it yet.
</description><link>http://ayende.com/2881/lucene-as-a-data-repository#comment11</link><guid>http://ayende.com/2881/lucene-as-a-data-repository#comment11</guid><pubDate>Wed, 17 Oct 2007 07:45:36 GMT</pubDate></item><item><title>Ayende Rahien commented on Lucene as a data repository</title><description>Markus,
  
Yes, that works for a simple scenario, but what happens when you have 100 entities, and you can add 5 fields to an instance?
  
This also lose you the ability to do such things as search for date ranges, etc.
  
</description><link>http://ayende.com/2881/lucene-as-a-data-repository#comment10</link><guid>http://ayende.com/2881/lucene-as-a-data-repository#comment10</guid><pubDate>Wed, 17 Oct 2007 07:24:47 GMT</pubDate></item><item><title>Markus Zywitza commented on Lucene as a data repository</title><description>I cannot identify the need for such a solution currently. If I stick with the bug tracking example, I'd just use a bug table and a bugProperties table in an 1:n relation.
  
If you need a more generic solution, I propose a simple Property table using AR/NH "Any" to relate to the entities and a "HasMany" collection on the Entity side. 
  
The value field must be varchar and serialized through an NH custom type. This might be too simple for complex object values, but sufficient for storing atomic information.
  
This allows me to search and index both property names and values.
</description><link>http://ayende.com/2881/lucene-as-a-data-repository#comment9</link><guid>http://ayende.com/2881/lucene-as-a-data-repository#comment9</guid><pubDate>Wed, 17 Oct 2007 07:17:50 GMT</pubDate></item><item><title>Ayende Rahien commented on Lucene as a data repository</title><description>Lucene is built to be very scalable, you can distribute it etc.
</description><link>http://ayende.com/2881/lucene-as-a-data-repository#comment8</link><guid>http://ayende.com/2881/lucene-as-a-data-repository#comment8</guid><pubDate>Wed, 17 Oct 2007 06:35:12 GMT</pubDate></item><item><title>krzysztof@kozmic.pl (Krzysztof Koźmic) commented on Lucene as a data repository</title><description>Ok, but how about performance of this approach, for high-traffic website would it not be an overkill?
  
Krzysztof Koźmic
</description><link>http://ayende.com/2881/lucene-as-a-data-repository#comment7</link><guid>http://ayende.com/2881/lucene-as-a-data-repository#comment7</guid><pubDate>Wed, 17 Oct 2007 06:25:02 GMT</pubDate></item><item><title>Tuna Toksoz commented on Lucene as a data repository</title><description>looking forward to the results. 
  
dotlucene can be run on mssql, right? 
</description><link>http://ayende.com/2881/lucene-as-a-data-repository#comment6</link><guid>http://ayende.com/2881/lucene-as-a-data-repository#comment6</guid><pubDate>Wed, 17 Oct 2007 04:16:38 GMT</pubDate></item><item><title>Pete w commented on Lucene as a data repository</title><description>Im in on this one. I was recently working up the idea for an extensible document repository system...
</description><link>http://ayende.com/2881/lucene-as-a-data-repository#comment5</link><guid>http://ayende.com/2881/lucene-as-a-data-repository#comment5</guid><pubDate>Wed, 17 Oct 2007 01:24:57 GMT</pubDate></item><item><title>Ayende Rahien commented on Lucene as a data repository</title><description>Stefan,
  
Yes, NH can do that.
  
There is not transactional integrity between the two, but the DB is the master, so that is fine.
  
  
I would want to run some tests before I would commit to making it the primary data source, and I would probably want to keep NH around as the primary and making this the extensible source, rather.
</description><link>http://ayende.com/2881/lucene-as-a-data-repository#comment4</link><guid>http://ayende.com/2881/lucene-as-a-data-repository#comment4</guid><pubDate>Wed, 17 Oct 2007 00:16:51 GMT</pubDate></item><item><title>Stefan Wenig commented on Lucene as a data repository</title><description>NHibernate can do that? Wow...
  
  
Is Lucene.net robust enough to be used as a primary storage for data? I always think of full text indexes as something that tends to get corrupt and needs to be rebuilded froam a transactional data source, but maybe that's just from experience with - well, you get the idea.
  
  
Speaking of transactional sources, there's no transactional integrity between your RDBMS and your extensible data if you store the latter in Lucene.net, or is there?
</description><link>http://ayende.com/2881/lucene-as-a-data-repository#comment3</link><guid>http://ayende.com/2881/lucene-as-a-data-repository#comment3</guid><pubDate>Tue, 16 Oct 2007 23:53:55 GMT</pubDate></item><item><title>Rik Hemsley commented on Lucene as a data repository</title><description>This is indeed a great idea - I hadn't spotted the potential of Lucene for this kind of thing when I looked at it. Now I'm thinking of perhaps using it instead of full-text indexing for something...
</description><link>http://ayende.com/2881/lucene-as-a-data-repository#comment2</link><guid>http://ayende.com/2881/lucene-as-a-data-repository#comment2</guid><pubDate>Tue, 16 Oct 2007 22:29:56 GMT</pubDate></item><item><title>Tuna Toksoz commented on Lucene as a data repository</title><description>Wow, great idea. I have to take a look at dotlucene(lucene.net). I am looking forward to trying it soon.
</description><link>http://ayende.com/2881/lucene-as-a-data-repository#comment1</link><guid>http://ayende.com/2881/lucene-as-a-data-repository#comment1</guid><pubDate>Tue, 16 Oct 2007 21:57:10 GMT</pubDate></item></channel></rss>