Bit Packed Table backend with encryption

August 4th, 2010

In March 2010 (3rd and 9th), I wrote on this blog about a new backend for Emdros under development, called the “Bit Packed Table” (BPT) backend. It is a high-performance, read-only database engine, based on “bit packed tables” and custom-tailored to the EMdF model. It outperforms even SQLite in terms of raw querying speed by about 30% on average.

I have recently made the BPT engine almost feature-complete, including adding an encryption layer. The encryption isn’t strong, but it does the job of keeping prying eyes out of your data.

I have added BPT to two of my Emdros-based software projects, using it exclusively for the backend for these projects, both of which deliver content to the user through a thin shell on top of Emdros. It works fine, and the speed increase over SQLite 3 is especially noticeable — pieces of content that used to take 1.5 seconds to load now leap onto the screen.

I said the BPT engine is almost feature-complete. The only thing missing, in fact, is support for stored monad sets. That is, monad sets that don’t have any object data associated with them, but which can be used for delimiting a query. I will add this feature in due course.

The BPT engine isn’t Open Source, and won’t be for the foreseeable future. If you are interested in licensing the engine, please drop me an email.

Enjoy!

Ulrik

Emdros 3.2.0 released

July 4th, 2010

I’ve released Emdros 3.2.0 over at SourceForge.net.

http://emdros.org/download.html

The release notes appear below.

Please let me know via the usual avenues whether anything is amiss.

Enjoy!

Ulrik

- *** Version 3.2.0 ***

As usual, binaries are available for Mac OS X, Windows(R), and Fedora
(13).

The Windows binaries have support for MySQL, SQLite 2, and SQLite 3.
They are built with Visual Studio Express 2010.

The Mac OS X binaries are Universal binaries running on Mac OS X 10.4
(Tiger), 10.5 (Leopard), and 10.6 (Snow Leopard).  They do not have
support for either MySQL or PostgreSQL; Only SQLite 2 and SQLite 3 are
supported in the Mac OS X binaries.  You can compile the sources with
support for MySQL yourself, though, and possibly also PostgreSQL.

The Fedora binaries come with support for PostgreSQL, MySQL, SQLite 2,
and SQLite 3.

This release has the following changes over 3.1.1:

- A new backend was created, called the BPT engine.  It is
proprietary, and thus not Open Source, at the moment (sorry).
Interested licensors can contact me at ulrikp – at – emdros |dot|
org for questions about this new engine.

- SQLite3 was upgraded to version 3.6.17

- PCRE was upgraded to version 8.01. The license is still BSD.

- The TIGERXML importer is now more lenient towards the XML being
imported.

- The Emdros Query Tool now implements an XML_Output_Style.  See the
User’s Guide for the Emdros Query Tool for how to use it.  WARNING:
The output is still subject to change!

- The Emdros Query Tool (GUI version) can now create PNG files right
from the command line.  See the man page for eqtu.

- Assorted changes to the harvest library.  Note that the harvest
library is not stable yet; all APIs are subject to change as I
experiment with the best way of doing this important task.

- A topographic query can be stopped by setting the following bool to
false:

MQLExecEnv::m_bContinueExecution.

- Assorted changes to the horizontal tree and vertical tree layout
engines.

Enjoy!

Ulrik Sandborg-Petersen

Linguistic Tree Constructor — 25000 downloads passed

June 28th, 2010

One of my Open Source “successes”, Linguistic Tree Constructor, has passed 25000 downloads over at SourceForge.net.

Linguistic Tree Constructor (LTC) is a tool for building linguistic syntax trees in no time flat, using your mouse. Its main strength is quick annotation of large amounts of text, i.e., production of syntactic databases. It is based on Emdros for much of its implementation.

You can see the stats, or download for Mac OS X, Windows, and Linux over at the kindly folk at SourceForge.Net.

Enjoy!

Ulrik

Getting data out of Emdros (the easy way)

April 12th, 2010

Between now and the last Emdros release (August 2009), I’ve been busy building up an infrastructure around Emdros which should make it easier to use.

One of those efforts has involved what I call “the harvesting library”. Basically, it’s a piece of software which is part of Emdros, and which runs on top of the core Emdros services, and whose primary goal in life is to make it incredibly easy to extract information from almost any Emdros database. Not only that, but the harvesting library also has some nifty ways of turning that extracted information into HTML, XML, JSON, or whatever you like.

The way it works is, you write a “stylesheet” in what’s called JSON. JSON is a very small language, and is very easy to learn. So, you feed the harvesting library a specially structured JSON data file, which I call a “stylesheet”.  Then, the harvesting library interprets that JSON structure, and goes to work extracting the desired information from the Emdros database at hand. This extraction process is driven by the JSON data file, and is extremely simple to set up. Once extracted, the harvesting library optionally takes that information and transforms it according to the rules you’ve written in another part of the same JSON structure. This could be HTML, XHTML, RTF, YAML, JSON, or whatever you want.

What this amounts to is that you can store in an Emdros database, not only “what” you want to store (the data), but also “how” you want to extract it, in what order you want to “assemble” the information, and how you want to “present” it (using HTML, RTF, or another presentation language). You just store the JSON script in an EMdF object that your application knows how to find in your database. When you want to use the JSON, you grab the EMdF object, extract the JSON, and pass it to the harvesting library, along with information about which monad set to harvest, and out comes your nifty, formatted HTML, XML, or whatever it is your stylesheet produces.

This will appear in the next public release of Emdros. Interested parties are, as always, welcome to contact me (http://emdros.org/contact.html) to get preview code.

Ulrik

SQLite 3 on Mac OS X

March 10th, 2010

I was doing some speed tests of the BPT engine on Mac OS X Tiger, which ships with SQLite 3.1.3.  I accidentally built Emdros against the SQLite 3 that ships with Tiger, and found that GET OBJECTS HAVING MONADS IN was painfully slow. It took up to 10 minutes to run a basic query.

I found out that SQLite 3.1.3 doesn’t have some optimizations for column_name BETWEEN X AND Y which are present in later versions of SQLite 3.  GET OBJECTS HAVING MONADS IN makes heavy use of precisely this construct.

So I changed all cases of this construct to the idiom “column_name >= X AND column_name <= Y”. This has no noticeable side-effects on the speed on Linux, but does give a great increase in speed on Mac OS X.

Still, BPT beats SQLite 3 by a wide margin: 270 seconds for SQLite 3, 214 seconds for BPT, to run all test-queries in one of my test suites (124 queries against a syntactic database of 1.4 million syntactic objects). That’s a 20% speed increase on this particular combination of harware and Mac OS X 10.4. (The hardware is a 2007 Mac Mini, Intel Core Duo 1.6GHz, 1GB Ram).

Ulrik

BPT engine progress (now on Win32)

March 9th, 2010

I’ve continue to make progress on the BPT engine. It now compiles, works, and is fast on Windows as well as on Linux. The speedup is about the same on Windows as on Linux.

Along the way, I’ve fixed a bug in FastSetOfMonads::isMemberOf(), and have also reimplemented some of the functions in include/string_func.h, so that they don’t use the C standard library; functions such as hex2char, octal2char, char2hex, and char2upperhex.  This means that mqldump() will most likely be faster as well (haven’t tested this yet, though!).

Ulrik

Bit Packed Table implementation progressing

March 3rd, 2010

Before (and while) writing the Bit Packed Table backend, I wrote a program in Python which could create a Bit Packed Table database, given an already-created Emdros database. The purpose of writing it in Python was so as to be able to modify it quickly, given that Python is so malleable.

Today I finished porting the Python program to C++. It is much faster (predictably), and creates exactly the same databases as the Python version, bit for bit.

Remember, the BPT engine won’t be Open Source any time soon. If you are interested in licensing it, please let me know. Existing licensees will, of course, get a free upgrade.

Ulrik

Emdros and Delphi integration

February 19th, 2010

I won’t be needing this right now, but here’s how to integrate Emdros into Delphi, even though Emdros is written in C++.

http://rvelthuis.de/articles/articles-cppobjs.html

I would favor the “simple” method described in this article, namely wrapping the relevant parts of the Emdros API in an “extern C” body of simple functions.

Just thought it was cool that it could be done.

Ulrik

Feeding Emdros carrots

February 18th, 2010

I’ve been fairly silent in public around Emdros for a couple of months.  That is not because Emdros has not required my attention.  On the contrary, I’ve been feeding it lots of carrots and hay so that its engine “horsepower” can grow.

In particular, this is a preliminary announcement that I have written an entirely new database backend for Emdros — to parallel SQlite 2/3, MySQL, and PostgreSQL.  I call the new database backend the “Bit Packed Table” backend, or BPT for short.

The upshot of the new database backend is that it beats the pants off all other currently implemented backends in terms of query speed.  I will post some query statistics results later.

In addition, the database size shrinks by quite a margin compared to even SQLite 3, which compresses data already. In my tests, the database size can shrink between 42% and 61% as compared to the same data in SQLite 3.

So how did I do this?

First of all, the BPT backend is tailored specifically to the needs of the EMdF model. It doesn’t even attempt to be a full SQL engine with a general relational model. Instead, it focuses on the requirements and inherent properties of the EMdF model.

Secondly, the BPT engine is simple. Because it does only what is necessary for the requirements of the EMdF model (and uses some fairly clever storage schemes tailored to the EMdF model), it is quite simple both to maintain and to understand.  It is coded in about 9000 lines of well-commented C++.

Thirdly, it uses fairly old techniques (bit packing obviously being one of them) to store the data.

Fourthly, the BPT backend uses fairly recent results from research by Allison L. Holloway and David J. DeWitt, “Read-optimized databases, in-depth” (Proceedings of VLDB 2008) that show that read-optimized, row-oriented databases can be quite fast, even when compared to column-oriented databases, which otherwise are (partly) all the rage these days.

The BPT engine will stay proprietary for now, and won’t be Open Sourced for the foreseeable future. Existing and prospective licensees are welcome to drop me a line at http://emdros.org/contact.html


Ulrik

Tree-displays and Emdros queries

November 10th, 2009

For some time, I have been having ideas for how to make a tree-based topographic query editor.  Today I’ve been working my way towards the preliminaries for an implementation of those ideas. That is, I have been working on getting the tree displays that are in the wx/htreecanvas.cpp source code file to look much better.  That’s the first step.

The ideas can be laid out as follows:

  1. Have an interactive query-editor in which the query looks like a  linguistic tree, with the root (Query) at the top and its branches (going dowards) going to nodes that represent object blocks, power blocks, gap blocks, optional gap blocks, groupings, etc.
  2. For object blocks, the main node name should be the object type.  Below the object type name is shown the node number (e.g., “Clause 1″) (this becomes an object reference declaration, e.g., “AS CLause1″). Any repetition (kleene star) gets shown below the node number, e.g., “*{1-3}” or simply “*”. Then, if there are any feature-restrictions, they get to be shown below it, probably as a subtree of nodes where each AND becomes two new nodes (with OR being the node name), and each OR becomes another line in a stacked box of disjoined terms.
  3. For power blocks, the node name is simply “…” and below it we find any monad-restrictions spelled out (e.g., “< 5″).
  4. For gap blocks, the node name is simply “gap”.
  5. For optional gap blocks, the node name is simply “optional gap”.
  6. For groupings, the node name is simply “group”.  Any repetition (kleene star) gets shown below the “group” name, e.g., “*{1-3}” or simply “*”.
  7. For OR between strings of blocks, the node name is simply “OR”.

There should be a palette from which to choose these node types.

Each node should result in a side-panel which shows the options for this node-type

If it is an object block:

  • All features, together with the possible values (e.g., for enumerations, a checklistbox; for strings, a text control, etc.)
  • A way of saying that such and such a feature is equal to (less than, greater than, different from, etc.) some other feature of some other, named node in the tree.
  • Whether the object block should NOTEXIST or not.
  • Whether the object block should be FIRST, LAST, or FIRST AND LAST within the context.
  • Any repetition (Kleene Star).

If it is a power block:

  • Any monad-restrictions.

If it is a gap or optional gap block:

  • Nothing to select.

If it is a grouping:

  • Any repetition (Kleene Star).

If it is an OR between strings of blocks:

  • Nothing to select.

There should be ways of moving nodes around, and copying and pasting subtrees.

The tree should probably have slanted lines rather than lines that are perpendicular.

The tree should not have the leaf nodes in a straight line at the bottom, but should have the leaf nodes that are at the same level be horizontally laid out in a straight line.

The above is a general overview of what it should look and feel like. Much inspiration was taken from the way that Logos Bible Software does it. They, however, have a tree which grows from the left and goes right.

I do believe that the above will make it easier for the user to understand what is going on.

The above is only a sketch, with lots of details to be filled out.

But it’s a start.

Ulrik