Interview with yours truly
July 24th, 2008The unofficial Emdros blog, run by a very good friend of mine, has an interview with yours truly.
Enjoy!
Ulrik
The unofficial Emdros blog, run by a very good friend of mine, has an interview with yours truly.
Enjoy!
Ulrik
On June 27, 2008, I defended my PhD thesis in front of an august panel of professors. I passed, so I guess I am now a Doctor. It feels great :-).
The thesis can be had here:
Ulrik Sandborg-Petersen’s PhD thesis.
It’s all about how Emdros can help save cultural heritage.
The Emdros code has in fact progressed since the last release (which was 3.0.1), even though I have been busy writing the thesis and defending it. One of the upcoming goodies which will be in 3.0.2 is a tree-display in the Emdros Query Tool. Yes, you can now display the TIGER corpus, for example, or the Penn Treebank, or the BLLIP corpus, or almost any other treebank, as trees right inside the Emdros Query Tool.
Ulrik
Since its inception by Hendrik Jan Bosman many years ago, the Emdros Query Tool has only had one harvesting algorithm. Well, until today, that is. Now it has four, including the old one.
The overall harvesting algorithm is:
There are two changes to the harvesting algorithm which I have made today. The first relates to step #2 (gathering “hit” monad sets), and the second relates to step #4 (gathering raster monad ranges).
The first change (gathering “hit” monad sets) now has four ways to do it, as opposed to only one before today:
The “innermost” and “innermost_focus” algorithms are especially well suited to making concordance-views (which I’ll hopefully blog about at some point).
The second change is to step #4, which calculates the raster monad ranges. The old way used to be to be told an object type (a “raster unit”) whose objects would determine the context range of monads. This would be done with GET OBJECTS HAVING MONADS IN, using the big-union of all “hit” monad sets, and using the “raster unit” object type as the object type to GET. This method is still available.
The new way, however, specifies two context monads: “raster_context_before” and “raster_context_after”: Two independent, positive integers which determine the raster context ranges. The algorithm is to traverse the list of “hit” set of monads, and for each set of monads, take the first monad, minus “raster_context_before” as the first monad of the range, and take the last monad, plus “raster_context_after” as the last monad of the range. Again, this is especially useful for concordance-type views.
This will appear in the next public release after 3.0.1.
As always, if anyone is interested in having a preview, please contact me.
Until then,
Ulrik
It has been a while, but I forgot to mention that Emdros version 3.0.1 was released on February 17, 2008.
Ulrik
After more than 4 years in the making, Emdros version 3.0.0 has been released over at SourceForge.Net:
http://emdros.org/download.html
This started off as a branch off of the 1.1-series of Emdros, way back in 2004 (or was that 2003, even?). It then became a long series of preview releases, labelled 1.2.0.preXX (running internally to 1.2.0.pre269!). I should, of course, have released 2.0 way earlier. Now it became 3.0, simply because that is what it is, in terms of feature-additions.
By the way, the primary reason I haven’t been so publicly active around Emdros is that I have gotten married (hence the change of surname you’ll see below). Things have been moving internally, though, so 3.0.0 is actually a long ways from 1.2.0.pre262, the last public release.Enjoy!
Ulrik Sandborg-Petersen
Today I’ve made an Emdros demo website available on the ‘net. Be sure to check it out!
It is butt-ugly for now, but it works. It gives the user MQL-query access to the Penn Treebank sampler available with the Natural Language Toolkit (NLTK). That is, about 1 million words of the WSJ corpus can now be searched online with the demo website.
Enjoy!
Within the next 24 hours, I expect that the 16000th copy of Emdros and related files will be downloaded from SourceForge.Net. This does not include those copies that may have been downloaded from elsewhere.
Emdros was first released to the public on October 11, 2001, as version 1.0.3. Since then, around 45 releases have been made public. One Linux distribution (the Russian “Alt Linux”) has picked it up and included it in their portfolio of packages. Two companies have bought licenses, and incorporated it into their software, so that Emdros may be in use by thousands of people every day. At least four academic settings have used Emdros for meeting their own needs, including IRIT in Toulouse, France, who are using Emdros as the foundation for a concordancer used by linguists in their research. Several individuals have been very kind, and have written to me with requests for help and enhancements, and some have even contributed bugfixes. Emdros has taken me to two countries to meet with people who were interested in using Emdros, and my work on Emdros has led to several new friends, some of whom I have not yet met face to face, but only via the Internet. So, I have been truly blessed by the Lord in his making me able to produce Emdros.
Update: It has already happened, as of around 09:15 GMT, on 2007-07-15.
I’ve released Emdros version 1.2.0.pre262 over at SourceForge.Net. It contains all the goodies I’ve been blogging about since March 2007 (i.e., since the last release, which was 1.2.0.pre242).
To summarize, this release brings:
Enjoy!
Ulrik
One of my customers told me that the new Wrap block wasn’t what they really needed. The main point of complaint was that there was an implicit power block at the beginning of the innards of the wrap block. It wasn’t intuitive enough.
I looked at my implementation, and realized that, in order to fix the problem, I had to essentially rewrite large parts of the implementation of the topographic part of MQL.
So that’s what I did. I struck gold when I came up with a very simple solution:
The bottom line is that MQL is much more powerful now, as a result of the following: a) We now have “real grouping” of strings of blocks; b) Kleene star can now apply to groups as well as to individual blocks. c) We can now have any kind of block after a power block, not just object blocks.
These three points may not seem to be “big”. But let me assure you that this is indeed one — nay two — quantum leaps upward for MQL in expressive power. That is, MQL is now much closer to what Doedens had envisaged, and what he described in his PhD thesis as the language “QL”.
I have added many regression tests to the regression test suite in order to test the new functionality, and the old regression tests all run without a hiccup. I have also run valgrind’s memory checker on the regression test program, and it comes up with 0 memory leaks. Finally, all of the test queries in my corpus of test queries against a “real” database come up with the same answers as before, except for the order in which straws from OR-separated block_strings appear.
So things are looking good.
Again, I still have no schedule for when these changes become public. If you want to try it out, please drop me a line.
Until then,
Enjoy!
Ulrik
The “Unofficial Emdros blog” has an interview with yours truly.
The Unofficial Emdros blog is run by one of my very good friends, and mostly has tidbits bantered in friendly conversation.
Ulrik