Emdros 3.1.0 released

March 10th, 2009

I have released Emdros version 3.1.0.  Go get it while it’s hot.

The biggest change since 3.0.1 (the previous public release) is that the “uncles and nephews” bug mentioned earlier on this blog has been fixed.

Enjoy!

Ulrik

Speeding up a slow Windows

February 12th, 2009

Windows deteriorates over time — i.e., it doesn’t maintain itself very well.  This can lead to Windows becoming slow as molasses.

I recently had to try to speed up a computer running Windows XP SP3.  The computer had been running Windows XP since 2006, without a reinstall in the interim.  It had gotten dreadfully slow, to the point where it took 5-7 minutes for it just to boot to the point where it was moderately usable.

I actually succeeded in bringing the computer back to its original speed, without reinstalling Windows XP.  Below are my findings, for anyone who might be in a similar situation.

First, I am going to refer below to a “registry cleaner”.  A registry cleaner is a program which cleans up the Windows Registry Database.  The one I used is callled “CCleaner“, and is free as in freeware (no cost).  I highly recommend this tool.  The authors ask for a donation in case you are satisfied, but this is not required as of this writing.  Just be aware that the installer will also install a Yahoo! Toolbar in your web browser, unless you opt out of this during the install process.

Here is what I did:

  1. Defragment the harddrive.
  2. Remove icons from the desktop.  Yes, I know this sounds crazy, but it worked for me, some of the way to full speed!
  3. Download and install CCleaner Registry Cleaner.
  4. Run CCleaner, and clean both the harddrive, the system, and the registry.  The registry might need to be cleaned several times before all defects have been cleaned.  Just keep cleaning until the scan says there are zero defects.
  5. Deinstall all unnecessary and/or outdated software.
  6. Run CCleaner, cleaning both the harddrive, the system, and the registry.
  7. Reinstall the latest versions of the software you actually need (e.g., FireFox, OpenOffice.org, Adobe Reader, Adobe Flash, Java, etc.)
  8. Run CCleaner again.

Once I had gone through this process, the computer ran almost as fast as it did when Windows was first installed.

Ulrik

Emdros downloaded 20,000 times

February 5th, 2009

Emdros and related files have just been downloaded for the 20,000th time within the last few days.

/Ulrik

“Uncles and nephews” fix implemented!

January 10th, 2009

I’ve implemented the fix to the “Uncles and nephews” bug, in C++ this time. It works beautifully. I have added around 15 regression tests to the suite of tests that are run by the mqltry program, and have tested the new solution with the usual suites of regression tests on some real-life databases. Not only does it give accurate results now, it also is very, very slightly faster as a result of the updates to the code. And it gives the same results as before, except for the queries where other results are expected.

I am so pleased :-).

Ulrik

Object References — fixed!

January 7th, 2009

A lot has happened in Emdros-land since my last post, even though the last public release was on February 2008 — almost a year ago.  The code has been progressing, even though I am now a PhD, and an assistant professor, and a father.

Today’s blog post is not about my public silence, however, but about a design bug that exists in every version of Emdros since the first release.

Since the day I wrote the first line of what was to become Emdros, there has been a design error in the way object references are handled. This has meant that object references could not really be relied upon to provide correct results under certain circumstances, leading to misleading results.

There has never been a problem with this kind of object reference:

[Phrase as p1
   [Word parent = p1.self]
]

That is, there has never been a problem when the object block referenced was a parent of the referencing object block.

The problem, rather, has shown up in circumstances where the object block referenced was either a sibling, or a nephew, or an uncle, or a cousin, or a cousin twice removed… etc… of the referencing object block.

For example:

[Clause
   [Phrase as p1]
]
[Clause
   [Phrase cousin = p1.self] // This can lead to errors
]

I have worked very hard on this problem on and off for at least 3 years now.  I have finally found a solution.

Some time ago, I ported the topographic subset of Emdros to the Python programming language. It is not a port of the complete topographic subset, but it does allow me to experiment with new extensions of Emdros in the much more malleable Python language.

I have used this Python version of Emdros to experiment with this “uncles and nephews” problem.  The solution was conceived of in a three-hour session of writing up the solution, and the implementation in Python has taken around 12 hours so far.  I have tested the solution, and am now ready to port it over to C++.

Stay tuned for news about a release of Emdros having the fix (but don’t hold your breath ;-) ).

Ulrik

Interview with yours truly

July 24th, 2008

The unofficial Emdros blog, run by a very good friend of mine, has an interview with yours truly.

Enjoy!

Ulrik

I am now a PhD

July 4th, 2008

On June 27, 2008, I defended my PhD thesis in front of an august panel of professors.  I passed, so I guess I am now a Doctor. It feels great :-).
The thesis can be had here:

Ulrik Sandborg-Petersen’s PhD thesis.

It’s all about how Emdros can help save cultural heritage.

The Emdros code has in fact progressed since the last release (which was 3.0.1), even though I have been busy writing the thesis and defending it. One of the upcoming goodies which will be in 3.0.2 is a tree-display in the Emdros Query Tool.  Yes, you can now display the TIGER corpus, for example, or the Penn Treebank, or the BLLIP corpus, or almost any other treebank, as trees right inside the Emdros Query Tool.

Ulrik

Emdros Query Tool: New Harvesting algorithms

March 19th, 2008

Since its inception by Hendrik Jan Bosman many years ago, the Emdros Query Tool has only had one harvesting algorithm. Well, until today, that is. Now it has four, including the old one.

The overall harvesting algorithm is:

  1. Execute the query. This results in a sheaf.
  2. Traverse the sheaf and gather a list of “hits”: One monad set for each “hit”.
  3. Traverse the sheaf and gather the big-union of the sets of monads in all matched objects whose “Focus” boolean is true. This is called the “sheaf focus monad set”.
  4. Get a set of raster monad ranges based on the list of “hits”. A “raster monad range” determines how much context to show around a set of monads corresponding to a “hit”. See below for how it is calculated.
  5. Get all “data units” and their features, based on the set of monads being the big-union of all raster monad ranges. A “data unit” is an object type whose objects must be shown for any given hit. Typical data units include “Word”, “Phrase”, “Clause”, “Sentence”, etc. This is gotten using the MQL statement called “GET OBJECTS HAVING MONADS IN”.
  6. Traverse the list of monad sets corresponding to a “hit”. For each monad set, calculate one “solution” to be: (i) The “hit” set of monads; (ii) The set of monads arising from taking all of the raster units that overlap with a stretch of monads in the “hit” set of monads. This is called the “raster monad set” for this solution; (iii) All data unit objects which have monads sets which overlap with the “raster monad set”. (iv) A “focus set of monads”, which is the intersection of the “raster monad set” and the “sheaf focus monad set”.

There are two changes to the harvesting algorithm which I have made today. The first relates to step #2 (gathering “hit” monad sets), and the second relates to step #4 (gathering raster monad ranges).

The first change (gathering “hit” monad sets) now has four ways to do it, as opposed to only one before today:

  • outermost“: This is the old one which was already there. It simply traverses the sheaf, and for each outermost straw, it calculates one set of monads being the big-union of the monad sets of all matched objects which are direct children of each outermost straw. Naturally, this can get unwieldy if the outermost block is, say, a “book”.
  • focus“: This calculates one “hit” monad set for each matched object whose “focus” boolean is “true”. The “hit” monad set is simply the monad set of the matched object.
  • innermost“: This calculates one “hit” for each straw which satisfies the condition that all its children are terminals in the sheaf tree, i.e., none of the children have an inner sheaf. The “hit” is simply the big-union of the monad sets of all matched objects in such straws.
  • innermost_focus“: Like innermost, but only does the big-union of the monad sets of those matched objects in the straw whose focus boolean is “true”.

The “innermost” and “innermost_focus” algorithms are especially well suited to making concordance-views (which I’ll hopefully blog about at some point).

The second change is to step #4, which calculates the raster monad ranges. The old way used to be to be told an object type (a “raster unit”) whose objects would determine the context range of monads. This would be done with GET OBJECTS HAVING MONADS IN, using the big-union of all “hit” monad sets, and using the “raster unit” object type as the object type to GET. This method is still available.

The new way, however, specifies two context monads: “raster_context_before” and “raster_context_after”: Two independent, positive integers which determine the raster context ranges. The algorithm is to traverse the list of “hit” set of monads, and for each set of monads, take the first monad, minus “raster_context_before” as the first monad of the range, and take the last monad, plus “raster_context_after” as the last monad of the range. Again, this is especially useful for concordance-type views.
This will appear in the next public release after 3.0.1.

As always, if anyone is interested in having a preview, please contact me.

Until then,

Ulrik

Emdros 3.0.1 released

March 19th, 2008

It has been a while, but I forgot to mention that Emdros version 3.0.1 was released on February 17, 2008.

Ulrik

Emdros 3.0.0 released

January 27th, 2008

After more than 4 years in the making, Emdros version 3.0.0 has been released over at SourceForge.Net:

http://emdros.org/download.html

This started off as a branch off of the 1.1-series of Emdros, way back in 2004 (or was that 2003, even?). It then became a long series of preview releases, labelled 1.2.0.preXX (running internally to 1.2.0.pre269!). I should, of course, have released 2.0 way earlier. Now it became 3.0, simply because that is what it is, in terms of feature-additions.

By the way, the primary reason I haven’t been so publicly active around Emdros is that I have gotten married (hence the change of surname you’ll see below). Things have been moving internally, though, so 3.0.0 is actually a long ways from 1.2.0.pre262, the last public release.Enjoy!

Ulrik Sandborg-Petersen