|
/
Zope
/
WHI Hosted Mailing Lists
/
osis-user
/
Archive
/
2006
/
2006-08
/
Re: [osis-user] Encoding of morphological texts in OSIS
[
Encoding of morphological texts in OSIS / Martin ... ]
[
Re: Welcome to [osis-user] / "Peter von ... ]
Re: [osis-user] Encoding of morphological texts in OSIS
Todd Tillinghast <todd(at)snowfallsoftware.com> |
2006-08-01 10:05:02 |
[ FULL ]
|
I believe you can simply encode (assuming that "TO33HW.03" is a word):
<w lemma="T.OHW." morph="ncmsa">TO33HW.03</w>
Naturally the work for the lemma and morph would be declared in the header.
Todd
Martin Gruner wrote:[...]
|
Re: [osis-user] Encoding of morphological texts in OSIS
Todd Tillinghast <todd(at)snowfallsoftware.com> |
2006-08-01 10:06:02 |
[ FULL ]
|
Martin,
The ":" character in lemma="W:" would not be allowed.
Todd
Martin Gruner wrote:[...]
|
Re: [osis-user] Encoding of morphological texts in OSIS
DM Smith <dmsmith555(at)yahoo.com> |
2006-08-01 10:50:08 |
[ FULL ]
|
According to the latest OSIS manual, <w> is used for words, <seg>
for
parts of a word.
The example that is given for seg (which is in error because of <word>)
is:
[...]
Martin's question seems to be how to extend this example to carry
metadata in the attributes of seg.
I think there is still a need to know the start and the end of the word
for other processing. The <w> tag does not allow nesting, so this would
not work.
On a side note, in the field there are a lot of uses of <seg> that are
more like an html <span> as opposed to <div>. That is, <seg>
is being
used as an inline element. And <div> as a block element. I think that
this may be due to the lack of early guidance on the use of <seg> and
that the schema allows it to be almost anywhere and contain almost anything.
I think it would make sense to come up with a word part tag (e.g. <wp>)
that is restricted to the <w> element and can only hold text and perhaps
a small subset of elements (e.g. The formula for water (H2O) might be:
<w><wp>H</wp><wp><hi
type="sub">2</hi></wp><wp>O</wp></w>)
The reason for this suggestion is that it is hard (for me) to write
stylesheets for an element that is used for multiple, very different
purposes.
Todd Tillinghast wrote:[...][...][...]
|
Re: [osis-user] Encoding of morphological texts in OSIS
Martin Gruner <mg.pub(at)gmx.net> |
2006-08-01 15:57:32 |
[ FULL ]
|
Hello Todd,
thanks, but my problem is that I have words consisting of more than one part
that need to be marked up separately. I can't use <w> there. I could
perhaps
use it for each word-part without whitespace in between, but it wouldn't be
correct.
Why is lemma="W:" be forbidden? How can I make it work?
Thanks,
Martin
Am Dienstag, 1. August 2006 16:04 schrieb Todd Tillinghast:[...]
|
|