/ Zope / WHI Hosted Mailing Lists / osis-user / Archive / 2006 / 2006-08 / Re: [osis-user] Encoding of morphological texts in OSIS

[ << ] [ >> ]

[ Encoding of morphological texts in OSIS / Martin ... ] [ Re: Welcome to [osis-user] / "Peter von ... ]

Re: [osis-user] Encoding of morphological texts in OSIS
Todd Tillinghast <todd(at)snowfallsoftware.com>
2006-08-01 10:05:02 [ FULL ]
I believe you can simply encode (assuming that "TO33HW.03" is a word):
<w lemma="T.OHW." morph="ncmsa">TO33HW.03</w>

Naturally the work for the lemma and morph would be declared in the header.

Todd

Martin Gruner wrote:[...]

Re: [osis-user] Encoding of morphological texts in OSIS
Todd Tillinghast <todd(at)snowfallsoftware.com>
2006-08-01 10:06:02 [ FULL ]
Martin,

The ":" character in lemma="W:" would not be allowed.

Todd

Martin Gruner wrote:[...]

Re: [osis-user] Encoding of morphological texts in OSIS
DM Smith <dmsmith555(at)yahoo.com>
2006-08-01 10:50:08 [ FULL ]
According to the latest OSIS manual, <w> is used for words, <seg>
for 
parts of a word.
The example that is given for seg (which is in error because of <word>)
is:
[...]
Martin's question seems to be how to extend this example to carry 
metadata in the attributes of seg.

I think there is still a need to know the start and the end of the word 
for other processing. The <w> tag does not allow nesting, so this would 
not work.

On a side note, in the field there are a lot of uses of <seg> that are 
more like an html <span> as opposed to <div>. That is, <seg>
is being 
used as an inline element. And <div> as a block element. I think that 
this may be due to the lack of early guidance on the use of <seg> and 
that the schema allows it to be almost anywhere and contain almost anything.

I think it would make sense to come up with a word part tag (e.g. <wp>) 
that is restricted to the <w> element and can only hold text and perhaps 
a small subset of elements (e.g. The formula for water (H2O) might be: 
<w><wp>H</wp><wp><hi
type="sub">2</hi></wp><wp>O</wp></w>)

The reason for this suggestion is that it is hard (for me) to write 
stylesheets for an element that is used for multiple, very different 
purposes.

Todd Tillinghast wrote:[...][...][...]

Re: [osis-user] Encoding of morphological texts in OSIS
Martin Gruner <mg.pub(at)gmx.net>
2006-08-01 15:57:32 [ FULL ]
Hello Todd,

thanks, but my problem is that I have words consisting of more than one part 
that need to be marked up separately. I can't use <w> there. I could
perhaps 
use it for each word-part without whitespace in between, but it wouldn't be 
correct.

Why is lemma="W:" be forbidden? How can I make it work?

Thanks,

Martin

Am Dienstag, 1. August 2006 16:04 schrieb Todd Tillinghast:[...]

MailBoxer