Bug magnet
If you ever need to understand why I tend to be involved so many projects, it is for a simple reason. I am apparently a bug magnet. I mentioned that I managed to cause FireBird to corrupt itself, which is why I am not intending to use it. Now I have another contender to the list.
Berkeley DB is a well known, widely used, and in general though to be very stable. I decided to use it, mainly because it is cool and I really need to learn something new. (It is running a lot of interesting things, and I want to be able to give concrete opinions on that.)
So, I start by writing tests, to ensure that it behave the way I expect it to. It doesn't, so I go and read the documentation, and it explained that my assumptions were wrong. Okay, I can deal with that, I learn more about this, and starting to experiment with how it works.
It was going fairly well, until I started to toy with transactions and isolations. One of the early tests is to see whatever read committed transactions work.
I wanted to get this (pseudo code) to pass:
using (var environment = new BerkeleyDbEnvironment("test")) { using (var tx = environment.BeginTransaction()) using (var queue = environment.OpenQueue("my-queue")) { queue.Append(new DateTime(2000, 1, 1)); // now test that we cannot read it using (var environment2 = new BerkeleyDbEnvironment("test")) { using (var tx2 = environment2.BeginTransaction()) using (var queue2 = environment2.OpenQueue("my-queue")) { Assert.IsNull(queue2.Consume()); tx2.Commit(); } } tx.Commit(); } }
However, not only did the test fail, it also corrupt the DB state, and corrupt the process state (leaving file handles hanging). At first, I blamed myself, then I blamed the code. Finally, I blamed the .Net API Binding.
Eventually I decided that I really need to write my own, to fix this issue. Luckily, I decided to try to reproduce the same scenario using the native C API. This is the first time I touched C in years, so you can imagine how pissed off I was.
I got the same error! And it is a scary one:
PANIC: fatal region error detected; run recovery
PANIC: fatal region error detected; run recovery
File handles still open at environment close
Open file handle: test\my-test-queue
PANIC: fatal region error detected; run recovery
Here the full C code, running against BDB 4.7.
//create environment & queue u_int32_t envCreateFlags = DB_CREATE | DB_INIT_LOCK| DB_INIT_LOG| DB_INIT_MPOOL| DB_INIT_TXN| DB_RECOVER; char* path = "test"; DB_ENV *createDbEnv; db_env_create(&createDbEnv, 0); createDbEnv->open(createDbEnv,path, envCreateFlags, 0); DB *createDbDb; db_create(&createDbDb, createDbEnv, 0); createDbDb->set_re_len(createDbDb, 128); createDbDb->open(createDbDb, NULL, "my-test-queue", NULL, DB_QUEUE, DB_CREATE, 0); createDbDb->close(createDbDb, 0); createDbEnv->close(createDbEnv, 0); // write to queue DB_ENV *writeToQueueEnv ; db_env_create(&writeToQueueEnv, 0); writeToQueueEnv->open(writeToQueueEnv, path, envCreateFlags, 0); DB_TXN *writeToQueueTxn; writeToQueueEnv->txn_begin(writeToQueueEnv, NULL, &writeToQueueTxn, DB_READ_COMMITTED); DB *writeToQueueDb; db_create(&writeToQueueDb, writeToQueueEnv, 0); writeToQueueDb->open(writeToQueueDb, writeToQueueTxn, "my-test-queue", NULL, DB_QUEUE, 0, 0); int keyVal = 5; int valVal = 3; DBT key, value; memset(&key, 0, sizeof(DBT)); memset(&value, 0, sizeof(DBT)); key.data = &keyVal; key.size = sizeof(int); value.data = &valVal; value.size = sizeof(DBT); writeToQueueDb->put(writeToQueueDb, writeToQueueTxn, &key, &value, DB_APPEND); // now try to read using another transaction // read from queue in middle of uncommited transaction DB_ENV *readFromQueueEnv ; db_env_create(&readFromQueueEnv, 0); readFromQueueEnv->open(readFromQueueEnv, path, envCreateFlags, 0); DB_TXN *readFromQueueTxn; readFromQueueEnv->txn_begin(readFromQueueEnv, NULL, &readFromQueueTxn, DB_READ_COMMITTED); DB *readFromQueueDb; db_create(&readFromQueueDb, readFromQueueEnv, 0); readFromQueueDb->open(readFromQueueDb, readFromQueueTxn, "my-test-queue", NULL, DB_QUEUE, 0, 0); int readFromQueueKeyVal = 5; int val2Val = 3; DBT readFromQueueKey, readFromQueueData; memset(&readFromQueueKey, 0, sizeof(DBT)); memset(&readFromQueueData, 0, sizeof(DBT)); readFromQueueKey.data = &keyVal; readFromQueueKey.size = sizeof(int); readFromQueueData.data = &valVal; readFromQueueData.size = sizeof(DBT); int ret = readFromQueueDb->get(readFromQueueDb, readFromQueueTxn, &key, &value, DB_CONSUME); if( ret == DB_NOTFOUND) Console::WriteLine("not found"); readFromQueueTxn->close(readFromQueueTxn, 0); readFromQueueDb->close(readFromQueueDb, 0); readFromQueueEnv->close(readFromQueueEnv, 0); // now commit write transaction writeToQueueTxn->commit(writeToQueueTxn, 0); writeToQueueDb->close(writeToQueueDb,0); writeToQueueEnv->close(writeToQueueEnv, 0);
Now, what the hell am I doing wrong? I am pretty sure that I am missing something. Just to give you an idea about what the issue is, this is about as bad as if running this code would crash a SQL database:
using( var con = new SomeDbConnection(" connection string ")) using( var tx = con.BeginTransaction()) using( var cmd = con.CreateCommand() ) { cmd.Transaction = tx; cmd.CommandText = "INSERT INTO Queue (Data) VALUES('a')"; cmd.ExecuteNonQuery(); using( var con2 = new SomeDbConnection(" connection string ")) using( var tx2 = con2.BeginTransaction()) using( var cmd2 = con2.CreateCommand() ) { cmd.Transaction = tx; cmd.CommandText = "DELETE FROM Queue" int rowAffected = cmd.ExecuteNonQuery(); Assert.AreEqual(0, rowAffected); tx2.Commit(); } tx.Commit(); }
Sigh, if I were a tester, I would be a menace.
Updates:
The error occurs if & only if you are creating two environments in the same process. What is annoying is that this is at least implicitly allowed. Only if you specify DB_REGISTER you are limited to a single environment per process (and there is a check for it).
It turns out that this is an issue of using the DB_RECOVER flag in multiple environments at the same time. When you create a new environment with DB_RECOVER, it clear the process state. Yuck. Full details can be found here.
Comments
There are several things you're doing wrong:
1) You must never use more than one transaction in a given thread at the same time. Doing so risks self-deadlock, which is a state which the environment cannot detect and therefore avoid.
2) There are a few memory buffer overrun problems because of the following lines:
value.size = sizeof(DBT);
readFromQueueData.size = sizeof(DBT);
They should both be using sizeof(int).
3) It should be fine to open multiple handles on the same environment, but there's really no point - it'll just make things slower. The same goes for opening multiple database handles on the same database.
4) I'm pretty sure you've not run the C code that you've posted, as it contains a few other bugs, ie:
Console::WriteLine("not found");
and
readFromQueueTxn->close(readFromQueueTxn, 0);
when DB_TXN has no close() method - did you mean commit()?
John
The Environment is very similar to a Sql Server Filegroup. Sql Server allocates threads on a per filegroup basis.
In your example, you should open a single environment, and use it to open the queue/db once.
Then each thread can create transactions against the shared environment handle, and use them on the shared queue/db handle.
The environment is the shared memory region (on disk or in RAM) where the locks, etc are managed. When running in a single process, you only need one.
Also, it's a good idea to open the queue/db inside a transaction (at app startup). There's an api overload for it.
The api starts making more sense once you get used to it. fyi
Evan
Also, the ambient transaction stuff doesn't exist in the .NET api (unlike System.Transactions). So make sure you are using the method overloads to pass the transaction handle in when doing database operations. Otherwise your transaction won't be used.
Also, one other thing.
Conceptually, the ADO.NET code is different from the BDB code.
BDB is simply a library, unlike having a full blown RDBMS server. So the connection concept doesn't exist in the BDB context, because your code IS the DB.
Ahh the good ole C and C++ days when people when programmer know exactly what they are talking about...
There is a distinct difference between how BDB and a full blown SQL Server operate. BDB runs in your application's process. So your app is acting like the database engine. When you tell it to open two environments that's like pointing two instances of SQL Server at the same file.
Evan, thanks for your comments.
This is test code that I have used to demonstrate the point, not how I am using it.
I reviewed my code according to your suggestions.
John,
1 & 2) This code was for test purposes only. It most certainly isn't good code.
See the update for the post to see what it going on there.
4) the code there was slightly modified from what I run, specifically, I played with my version to see if discarding the tx will work, then changed discard to close instead of commit.
Comment preview