Emdros version 3.4.0 released

May 13th, 2014

Hi everyone!

Emdros version 3.4.0 has been released. This is the first public release in almost three years. Development has been very active, however, and several of you have been receiving updates by simply requesting them.

You can find the source code and binaries via this link:

http://emdros.org/download.html

This release also sees a new package, emdros-example-2.0. It contains a complete sample database with sample queries and configuration files for the Emdros Query Tool and the Emdros Chunking Tool. It contains the complete King James Version of the Bible, with parse trees and part of speech information. Please be sure to download this separate download, if you need to see what Emdros can do.

The focus-areas of this release include:

  • ¬†Support for iOS and Android. This is at the library level only — no apps are bundled, but instructions for making them yourself are included.
  • A smoother user experience for the GUI applications
  • Enhancements to the MQL query language
  • Documentation improvements
  • Build improvements on all platforms
  • Speed improvements

The Release Notes can be seen here:

https://sourceforge.net/projects/emdros/files/emdros/3.4.0/

 

Enjoy!

 

Ulrik Sandborg-Petersen

 

SQLite 3 encryption and US export laws

January 29th, 2013

For a number of years, I have had a proprietary extension to Emdros which makes an attempt at encrypting SQLite 3 databases that are used with Emdros. The encryption isn’t strong, and is meant as an attempt at fooling “most people”, not including professional cryptographers. Thus it is primarily meant as a way to copy-protect (DRM-enable) content that content-creators may wish to protect from their customers. Until today, it has been using a key with length > 56 keys.

Today I successfully added a key scheduling algorithm which allowed the cipher to function with only a 56 bit key. This is significant, since, according to my research, it allows export from the US without restriction. Thus Apple and others can happily ship Emdros-driven apps from their app stores (e.g., iOS and/or Mac app store) without requiring a license to export from the US.

If you are interested in licensing this encryption, please get in touch via the email address mentioned on http://emdros.org/contact.html.

Speed increase of about 5%

January 1st, 2013

Emdros uses, internally, a data structure known as a “Skip List”. It is basically a glorified linked list, with randomization applied to make it look somewhat like a balanced binary tree. It’s efficient both space-wise and time-wise for a range of problems, though it often cannot beat a really good implementation of a red-black tree, for example.

Skip lists were invented by William Pugh, and described in a paper from the early 1990’ies. In the paper, Pugh described the various options for the randomization to be applied to the data structure. Based on Pugh’s recommendations, I originally chose a particular kind of randomization, but failed to experiment with different kinds. Until today, that is.

I found that by simply tweaking the number of bits to consume from the random number, as well as raising the number of elements catered for in the data structure, Emdros as a whole could be made to run consistently about 5-6% faster across my various test suites.

That’s a lot of speed increase in exchange for three lines of different code.

Incidentally, while running the tests, I found that the BPT engine (my proprietary backend engine) is still at least 30% faster than the SQLite 3 backend for the same database content and the same set of queries.

Compiling wxWidgets 2.8.12 on Mac OS X 10.6 (Snow Leopard) for use with Mac OS X 10.4 and later

October 20th, 2012

I’ve successfully compiled Universal binaries of wxWidgets 2.8.12 on Mac OS X 10.6 (Snow Leopard) that work with Mac OS X 10.4 and later.

Here’s how:

  1. Unpack the sources of wxMac-2.8.12.tar.gz
  2. cd wxMac-2.8.12
  3. mkdir macosx
  4. cd macosx
  5. ../configure¬†–enable-unicode –disable-shared –prefix=/Users/ulrikp/opt/wxMac-2.8.12-10.4-Unicode-noshared –with-macosx-sdk=/Developer/SDKs/MacOSX10.4u.sdk –with-macosx-version-min=10.4 –enable-universal_binary CC=gcc-4.0 CXX=g++-4.0 LD=g++-4.0
  6. make -j 2 all
  7. make instal

The crucial part is in step #5. You can, and should, change the path to the –prefix switch, to match your username.

The part that threw me off was that you have to switch the C and C++ compiler away from the default, to gcc-4.0 and g++-4.0. This is documented here.

Step #6 has the switch -j 2. This makes “make” use two processes at once, whenever it can. If you’ve got more horsepower than I do, you can up this to 4 or 8, or whatever is appropriate for your processor count.

After step #7, you can do this:

export PATH=/Users/ulrikp/opt/wxMac-2.8.12-10.4-Unicode-noshared/bin:$PATH

then any configure-script which uses wx-config to determine how to use wxWidgets will pick up this particular version of wx-config which we’ve just compiled.

Remember, though, to do

CC=gcc-4.0 CXX=g++-4.0 LD=g++-4.0

as well, when compiling/configuring your program.

Ulrik

 

Harvesting revisited

April 16th, 2012

I’ve spent some time writing about how to harvest objects to produce documents. The result is some documentation of a yet-to-be-implemented “Render2” library. It is basically a description of some languages which are at once more powerful and yet also simpler than the RenderObjects and RenderXML library languages.

Once implemented, the Render2 code will:

  • Be easier to use than RenderObjects and RenderXML
  • Be more powerful than RenderObjects and RenderXML
  • Be more easily extensible than RenderObjects and RenderXML

The idea is still the same as in RenderObjects and RenderXML:

  • “Stylesheets” tell the Render2 engine what to do when encountering an object in the database (when retrieving), or what to do with XML elements (when parsing XML).
  • These “Stylesheets” basically tell what to do at the start and/or end of an object or XML element.
  • The “Stylesheets” are ordered in a tree, with inheritance semantics between them.
  • “What to do” at the start/end of an object / XML element is expressed in a second language, called a “template language”. The template language is quite powerful (both for the old RenderObjects/RenderXML library and the new Render2 library), and has support for things like variables, lists, counters, etc.

What’s new in the Render2 library includes:

  • “RenderObjects2” stylesheets can inherit from other “RenderObjects” stylesheets. This is not just for RenderXML stylesheets any more.
  • The new template language is more regular, with less idiosyncrasies, and more expressive power. This expressive power comes in part from the new concept of “pockets” (see below).
  • The new template language introduces the idea of functions. A number of built-in functions will be provided. I am debating with myself whether to include a small scripting language in which the user can express functions themselves. We’ll see.
  • The new template language introduces the idea of expressions, which can be used in such places as “if” templates, and in parameters to function-calls.
  • The new stylesheet language (in which the template language is embedded) has a very, very simple grammar which fits in about 12 grammar-rules in Extended Backus-Naur Form. This alone should make it easier to use than the current JSON-embedded stylesheet language. The simple grammar makes it very, very easy to remember how to create a stylesheet, with very few “what you don’t know will hurt you” surprises.
  • The new stylesheet language introduces the idea of strings that are “””triple-quoted”. This idea has been stolen from Python. The idea is to be able to use “single quotes” and newlines within “””triple-“quote” strings””” witout needing to escape them with backslashes. This should not only make the new stylesheets easier to use in practice (because of fewer backslashes); it should also make them more beautiful.
  • The new Stylesheet language uses the idea of “packet” to encompass all the different kinds of things you put into a stylesheet. Basically, a stylesheet unit is an ordered list of “packets”, where each packet has a packet name and a packet class (telling us how to use it), and a packet always belongs to exactly one stylesheet. Internally, a packet is no more, no less than an ordered list of key/value pairs. (This ordered list of key/value pairs may turn into a map/dictionary, but that is not part of the syntax, only part of the semantics).
  • The old RenderObjects/RenderXML stylesheets had the disadvantage that it was sometimes difficult to see which stylesheet we were currently looking at, since the stylesheet name was only mentioned once, at the top of the stylesheet. The new stylesheet language repeats the stylesheet name for every “packet”, making it easier to orient oneself in the stylesheet unit file.
  • The C++ API to the Render2 library has been greatly simplified as compared to the RenderObjects/RenderXML library.
  • The Render2 library takes a Set of Monads, not a range of monads, when needing to retrieve objects. This generalization makes it much more powerful than the old RenderObjects/RenderXML library.

The idea of “pockets” has been introduced. A “pocket” is a map/dictionary which maps strings to lists of strings. In addition, each pocket has a name which is a C identifier. The idea that one can redirect the output to a pocket, and that one can refer to the list of strings in a pocket by pocket-name coupled with pocket-key, has turned out to be quite powerful and general, supporting within one data-structure such diverse concepts as: variables, counters, integer-arithmetic, lists, and the “pockets” themselves, which can be used to output stuff “later” in the document than otherwise would have been the case.

Interested parties are welcome to ask for the documentation. The documentation is still a work-in-progress, but implementation will hopefully start soon.

Ulrik

Emdros on Debian/Ubuntu/etc.

February 17th, 2012

I’ve successfully made the files requisite for building a .deb on Debian/Ubuntu/other-Debian-derived-Linux-distros.

Interested parties are welcome to contact me for the sources.

Ulrik

The Emdros blog is back

January 3rd, 2012

The Emdros blog is kindly hosted by the J. Alan Groves Center for Advanced Biblical Research. The Groves Center suffered a hardware outage in late 2011, bringing this blog down.

Thanks to the hard work of Dr. Kirk Lowery, the blog is now back. Thanks, Kirk!

More news coming. Stay tuned!

Ulrik

Emdros 3.3.0 released

July 4th, 2011

I have released Emdros version 3.3.0 over at SourceForge.Net.

http://emdros.org/download.html

Please note that the implementation and method of indexing of the Full Text Search are subject to change, as this feature is still experimental.

Enjoy!

Ulrik Sandborg-Petersen

 

Controlling containment in topographic MQL

February 14th, 2011

I have just finished adding a new feature to the topographic part of the MQL query language.

Hitherto, the only relation one could specify for containment between an inner object block and the outer container was “part_of”, and it was always relative to the containing substrate.

In plain English, that meant that the inner object’s monad set had to be a subset of the outer object’s monad set, or (if the inner block was at the outermost level), it must be a subset given in the IN clause after SELECT ALL OBJECTS.

Now, you can specify these four relations:

  • part_of(substrate) // The default
  • part_of(universe) // To disregard gaps in the substrate
  • overlap(substrate)
  • overlap(universe)

The overlap relation means: The inner object must have a non-empty intersection (i.e., share at least one monad with) the outer substrate or universe.

This makes it possible to specify things like this:

SELECT ALL OBJECTS
IN Aramaic_monads // Pre-defined monad set
WHERE
// This means that we want all clauses which share at least one monad
// with the Aramaic_monads monad set
[Clause overlap(substrate)
   // This finds all phrases inside the left and right boundaries of
   // the outer clause, regardless of any gaps in the clause.
   [Phrase part_of(universe)
   ]
]

This will appear in the next public release after 3.2.0.

If anyone is interested in trying this out, please let me know.

Ulrik

Full Text Search implemented in Emdros

October 30th, 2010

I’ve finished the implementation, tuning, and testing of Full Text Search (FTS) for Emdros.

The implementation is part of the libharvest library, and is written in C++ like the rest of Emdros.

I implemented the basic idea in Python first, then reimplemented it in C++. Python is so malleable that this sort of prototyping work makes Python ideal for the task.

The Full Text Search has a lot of features, including:

  • Index “documents”, which must exist as object types.
  • Index documents based on “indexed object types” (e.g., token) and one indexed feature of the indexed object type.
  • Search within “documents”.
  • Chainable filters that modify token strings before being indexed, e.g., to weed out stop-words, or to strip, lower-case, or otherwise alter the token strings.
  • Tokenization of query-string splitting on spaces.
  • Optional application of the chainable filters to the query-terms after tokenization, so as to be more likely to match the indexed feature.
  • Google-like “quoted strings” that make the query-terms be adjacent.
  • More than one “quoted string” allowed in the query-string.
  • Return results as list of three-tuples (document-first-monad, document-last-monad, first-search-term-first-monad)
  • Return results as customizable snippets of real tokens, with optional highlighting of query terms.
  • Command-line tools for both indexing and searching.

This will appear in the next public release of Emdros.

Interested parties should contact me via email for getting the latest sources.

Enjoy!

Ulrik