/ Zope / WHI Hosted Mailing Lists / osis-user / Archive / 2006 / 2006-07 / Encoding of morphological texts in OSIS

[ << ] [ >> ]

[ Arabic Bible in Osis [xml] / Magdy Nakhla ... ] [ Re: [osis-user] Encoding of morphological texts ... ]

Encoding of morphological texts in OSIS
Martin Gruner <mg.pub(at)gmx.net>
2006-07-31 13:43:00 [ FULL ]
Hello all,

I'm new to this list and pretty uninformed about OSIS. Sorry!

I've got a particular encoding problem and would like to hear your experienced 
opinions. The goal is to encode WHI's MORPH database (OT text + morphological 
information) in OSIS.

Source looks like (Genesis 1:2, words 3 and 4):

gn1:2,3.1 TO33HW.03 T.OHW.(at)ncmsa
gn1:2,4.1 WF W:(at)Pc
gn1:2,4.2 BO80HW. B.OHW.(at)ncmsa

After the verse, word and word-part numbers comes the word, followed by the 
lemma, followed by (at) for Hebrew or % for Aramaic, followed by the 
morphological information (special code). How can I put this into OSIS? The 
spec, as it is, does not exactly allow for this because of the sub-word 
segmentation. So I'd like to use <seg> with some attributes of <w>.
This 
would look like:

<w xml:lang="he"><seg type="morph" lemma="T.OHW." 
morph="ncmsa">TO33HW.03</seg></w>
<w xml:lang="he"><seg type="morph" lemma="W:"
morph="Pc">WF</seg><seg 
type="morph" lemma="B.OHW." morph="ncmsa">BO80HW.</seg></w>

Would this be fine in OSIS, or is there a better way? Of couse, the final OSIS 
will not have WHI's transcription, but Unicode text instead, here I kept it 
just for easier reading of this email.

Please advise.

Martin

MailBoxer