/ Zope / WHI Hosted Mailing Lists / osis-user / Archive / 2006 / 2006-02 / OSIS Fragment

[ << ] [ >> ]

[ Schema problem? / DM Smith ... ] [ User Manual Examples / DavidTroidl(at)aol.com ]

OSIS Fragment
DM Smith <dmsmith555(at)yahoo.com>
2006-02-16 19:51:11 [ FULL ]
I am an author of JSword and BibleDesktop, a bible application that 
works with Sword modules.

The basic architecture of our program is that the user requests one or 
more passages (each passage is a contiguous set of verses), upon which 
the program fetches the text from the Sword module and then that text is 
turned into a valid, well-formed OSIS document. The Sword text might be 
GBF, ThML, plain text, or OSIS. Once we have the OSIS we use xslt to 
transform it into HTML and display it to the user.

In this process we take the fragment of Sword text that has been turned 
into a list of OSIS elements and make it a child of a div of a osisText 
in an OSIS document complete with all  the required elements (i.e. the 
header). This OSIS document is transformed by xslt that ignores the 
header element. Since it was synthesized out of nothingness, it is 
valueless.

Would it be possible and reasonable to define an fragment element that 
could be used to hold any fragment of a document.

I'm thinking something like: (not showing attributes)
<osis>
    <fragment>
       elements that can appear at any level (i.e. just like a div)
    </fragment>
    <fragment>
       ....
    </fragment>
</osis>

Re: [osis-user] OSIS Fragment
Patrick Durusau <patrick(at)durusau.net>
2006-02-17 14:01:10 [ FULL ]
DM,

Thanks for all the proposed edits! A lot to correct but every one makes 
the manual better!

Hmmm, somebody said that about open source projects.... ;-)

Just to make sure what you are asking for:

When you say a <fragment> element that holds content like <div>,
how 
would <div> differ in your view from <fragment>?

Realize there is something you see as different but it isn't clear to 
me, yet, hence the question.

Or is this a question of how to handle arbitrary XML fragments? BTW, the 
XML fragment activity died at the W3C for lack of interest.

Thanks again for the comments!

Hope you are looking forward to a great weekend!

Patrick

DM Smith wrote:
[...][...]

Re: [osis-user] OSIS Fragment
DavidTroidl(at)aol.com
2006-02-17 14:16:10 [ FULL ]
Hi!
 
Correct me if I'm wrong, but it seems to me the point of the question is:  
could there be a way of having a valid fragment, without the full  header, 
especially when the header wouldn't be used.
 
I'm considering making separate book files for the New Testament, to reduce  
file size and processing time, but to have a valid header in a master file 
that  would reference the individual book files.  Is there a way to accommodate
 
that, without including the full header in all the files?  

Peace,

David
Attachments:  
text.html text/html 1143 Bytes

Re: [osis-user] OSIS Fragment
DM Smith <dmsmith555(at)yahoo.com>
2006-02-17 18:26:10 [ FULL ]
Patrick Durusau wrote:[...]
The difference is that of purpose, parenting and form.
A fragment would have purpose in a processing system as an artifact of 
processing a whole document. It would have meaning only in that context.
It differs in parenting in that it would be a child of the root <osis> 
element.
And in form in that div is milestoneable, a fragment is not and there is 
little need for attributes.

More specifically, something like (:
    <xs:complexType name="osisCT">
        <xs:choice>
            <xs:element name="osisCorpus" type="osisCorpusCT" 
minOccurs="0"/>
            <xs:element name="osisText" type="osisTextCT" minOccurs="0"/>
            <xs:element name="fragment" type="fragmentCT" minOccurs="0"/>
        </xs:choice>
        <xs:attribute name="TEIform" fixed="TEI.2"/>
    </xs:complexType>

    <xs:complexType name="fragmentCT" mixed="true">
                    <xs:choice minOccurs="0" maxOccurs="unbounded">
                        <xs:element name="a" type="aCT"/>
                        <xs:element name="abbr" type="abbrCT"/>
                        <xs:element name="chapter" type="chapterCT"/>
                        <xs:element name="closer" type="closerCT"/>
                        <xs:element name="date" type="dateCT"/>
                        <xs:element name="div" type="divCT"/>
                        <xs:element name="divineName"
type="divineNameCT"/>
                        <xs:element name="figure" type="figureCT"/>
                        <xs:element name="foreign" type="foreignCT"/>
                        <xs:element name="hi" type="hiCT"/>
                        <xs:element name="index" type="indexCT"/>
                        <xs:element name="inscription" 
type="inscriptionCT"/>
                        <xs:element name="lb" type="lbCT"/>
                        <xs:element name="lg" type="lgCT"/>
                        <xs:element name="list" type="listCT"/>
                        <xs:element name="mentioned" type="mentionedCT"/>
                        <xs:element name="milestone" type="milestoneCT"/>
                        <xs:element name="milestoneEnd" 
type="milestoneEndCT"/>
                        <xs:element name="milestoneStart" 
type="milestoneStartCT"/>
                        <xs:element name="name" type="nameCT"/>
                        <xs:element name="note" type="noteCT"/>
                        <xs:element name="p" type="pCT"/>
                        <xs:element name="q" type="qCT"/>
                        <xs:element name="reference" type="referenceCT"/>
                        <xs:element name="salute" type="saluteCT"/>
                        <xs:element name="seg" type="segCT"/>
                        <xs:element name="signed" type="signedCT"/>
                        <xs:element name="speaker" type="speakerCT"/>
                        <xs:element name="speech" type="speechCT"/>
                        <xs:element name="table" type="tableCT"/>
                        <xs:element name="title" type="titleCT"/>
                        <xs:element name="transChange" 
type="transChangeCT"/>
                        <xs:element name="verse" type="verseCT"/>
                        <xs:element name="w" type="wCT"/>
                    </xs:choice>
                </xs:sequence>
                <xs:attribute name="canonical" type="xs:boolean" 
default="true" use="optional"/>
                <xs:attribute name="TEIform" fixed="fragment"/>
    </xs:complexType>
[...]
The desire is not how to handle arbitrary XML fragments. We've got that 
nailed. We wrap them in a div in an osisText in an osis element. The 
problem is that an osisText requires a header to be valid.

We need to wrap the element in something so it is well-formed, as it 
could be a list of elements and text nodes. We do processing with xslt 
and provide the schema so that it can get defaults (and though not used 
here external entities) For good measure we use a validating parser.

The problem is that in constructing a search result set consisting of 
several thousand passages, each represented as a well-formed and valid 
OSIS document, the header is repeated without value that number of times.

So in processing we need to dig down into the document from <osis> to 
<osisText>, skip <header> and all its descendants, to the
<div> and then 
present the children of that div.

It would be easier, in time, space and code complexity, to go from 
<osis> to <fragment> and process its children.
[...]

I am not really interested in any formal definition, though having one 
and tools to support it might be good.
I am more interested in a simple solution to a practical problem.
[...][...]

Re: [osis-user] OSIS Fragment
Todd Tillinghast <todd(at)snowfallsoftware.com>
2006-02-20 12:24:46 [ FULL ]
I think the following will support your needs.

You can shorten the header down to the following and indicate the 
portion of the whole document (Bible) this XML document contains using 
the <scope> element and the <identifier type="OSIS"> to uniquely 
identify the work.

(You are talking some risk by defaulting the reference system.)

<?xml version="1.0" encoding="UTF-8"?>
<osis ...>
    <osisText osisIDWork="thisWork" xml:lang="en">
       <header>
          <work osisWork="thisWork">
             <identifier
type="OSIS">Bible.en.ABS.CEV.1999</identifier>
             <scope>Gen.1</scope>
             <refSystem>Bible</refSystem>
          </work>
       <div scope="Gen.1">...</div>
    </osisText>
</osis>


This would allow you to have only Gen.1 in a stand alone document.

If it were a single verse or two you could do the following.

<?xml version="1.0" encoding="UTF-8"?>
<osis ...>
    <osisText osisIDWork="thisWork" xml:lang="en">
       <header>
          <work osisWork="thisWork">
             <identifier
type="OSIS">Bible.en.ABS.CEV.1999</identifier>
             <scope>Gen.1.1-Gen.1.2</scope>
             <refSystem>Bible</refSystem>
          </work>
       </header>
       <div scope="Gen.1.1-Gen.1.2"><lg><l
level="1"><verse 
sID="Gen.1.1" osisID="Gen.1.1"/>In the beginning God </l><l 
level="1">created the heavens </l><l level="2">and the
earth.<note 
osisRef="Gen.1.1" osisID="Gen.1.1!footnote.1" n="a"><reference 
type="source" osisRef="Gen.1.1">1.1 </reference><catchWord>the
heavens 
and the earth: </catchWord><q level="1">The heavens and the
earth</q> 
stood for the universe.</note> <verse
eID="Gen.1.1"/></l><l 
level="1"><verse sID="Gen.1.2" osisID="Gen.1.2"/>The earth was barren,

</l><l level="2">with no form of life;<note osisRef="Gen.1.2" 
osisID="Gen.1.2!footnote.1" n="b"><reference type="source" 
osisRef="Gen.1.2">1.1,2 </reference><catchWord>In … life:
</catchWord>Or 
<q level="1">When God began to create the heavens and the earth, the 
earth was barren with no form of life.</q></note> </l><l
level="1">it 
was under a roaring ocean </l><l level="2">covered with darkness.
</l><l 
level="1">But the Spirit of God<note osisRef="Gen.1.2" 
osisID="Gen.1.2!footnote.2" n="c"><reference type="source" 
osisRef="Gen.1.2">1.2 </reference><catchWord>the Spirit of God: 
</catchWord>Or <q level="1">a mighty wind.</q></note>
</l><l 
level="2">was moving over the water. <verse
eID="Gen.1.2"/></l></lg></div>
    </osisText>
</osis>

Todd

DM Smith wrote:[...][...][...][...][...][...][...][...]
>>> works with Sword modules.
>>>
>>> The basic architecture of our program is that the user requests
one 
>>> or more passages (each passage is a contiguous set of verses),
upon 
>>> which the program fetches the text from the Sword module and then 
>>> that text is turned into a valid, well-formed OSIS document. The 
>>> Sword text might be GBF, ThML, plain text, or OSIS. Once we have
the 
>>> OSIS we use xslt to transform it into HTML and display it to the
user.
>>>
>>> In this process we take the fragment of Sword text that has been 
>>> turned into a list of OSIS elements and make it a child of a div
of a 
>>> osisText in an OSIS document complete with all  the required
elements 
>>> (i.e. the header). This OSIS document is transformed by xslt that 
>>> ignores the header element. Since it was synthesized out of 
>>> nothingness, it is valueless.
>>>
>>> Would it be possible and reasonable to define an fragment element 
>>> that could be used to hold any fragment of a document.
>>>
>>> I'm thinking something like: (not showing attributes)
>>> <osis>
>>>    <fragment>
>>>       elements that can appear at any level (i.e. just like a div)
>>>    </fragment>
>>>    <fragment>
>>>       ....
>>>    </fragment>
>>> </osis>
>>>
>>>
>>>[...][...]

Re: [osis-user] OSIS Fragment
DM Smith <dmsmith555(at)yahoo.com>
2006-02-20 12:59:45 [ FULL ]
Todd,
    Thanks for your reply. But this is what we already do, as mentioned 
below. (Well an insignificant variation of it.) There is no risk in 
defaulting the reference system if the program that creates the fragment 
also consumes it.
    If you add up the bytes and the processing cycles to chew through 
this for an answer set of several thousand fragments (worse case is an 
answer set that consists of every other verse), it adds up to be fairly 
significant. And all of it just to satisfy being valid with the schema 
and all of it just to ignore it when processing it.
    What I am looking for is nothing more than an optimization that 
allows for a valid OSIS document.
    It would be fine to have an attribute or two on the <fragment> 
element that would point to the document from which it came so the 
header from it could be used, if necessary. This would migitate any risk 
of not having a valid header. e.g <fragment source="uri for the source" 
osisIDWork="work id from the source document">...</fragment>
DM

Todd Tillinghast wrote:[...][...]
>>> DM,
>>>
>>> Thanks for all the proposed edits! A lot to correct but every one 
>>> makes the manual better!
>>>
>>> Hmmm, somebody said that about open source projects.... ;-)
>>>
>>> Just to make sure what you are asking for:
>>>
>>> When you say a <fragment> element that holds content like
<div>, how 
>>> would <div> differ in your view from <fragment>?[...]
>>> Realize there is something you see as different but it isn't clear

>>> to me, yet, hence the question.
>>>
>>> Or is this a question of how to handle arbitrary XML
fragments?[...]
>>> Thanks again for the comments!
>>>
>>> Hope you are looking forward to a great weekend!
>>>
>>> Patrick
>>>
>>> DM Smith wrote:
>>>
>>>> I am an author of JSword and BibleDesktop, a bible application
that 
>>>> works with Sword modules.
>>>>
>>>> The basic architecture of our program is that the user
requests one 
>>>> or more passages (each passage is a contiguous set of verses),
upon 
>>>> which the program fetches the text from the Sword module and
then 
>>>> that text is turned into a valid, well-formed OSIS document.
The 
>>>> Sword text might be GBF, ThML, plain text, or OSIS. Once we
have 
>>>> the OSIS we use xslt to transform it into HTML and display it
to 
>>>> the user.
>>>>
>>>> In this process we take the fragment of Sword text that has
been 
>>>> turned into a list of OSIS elements and make it a child of a
div of 
>>>> a osisText in an OSIS document complete with all  the required

>>>> elements (i.e. the header). This OSIS document is transformed
by 
>>>> xslt that ignores the header element. Since it was synthesized
out 
>>>> of nothingness, it is valueless.
>>>>
>>>> Would it be possible and reasonable to define an fragment
element 
>>>> that could be used to hold any fragment of a document.
>>>>
>>>> I'm thinking something like: (not showing attributes)
>>>> <osis>
>>>>    <fragment>
>>>>       elements that can appear at any level (i.e. just like a
div)
>>>>    </fragment>
>>>>    <fragment>
>>>>       ....
>>>>    </fragment>
>>>> </osis>
>>>>
>>>>
>>>>
>>>[...][...]

Re: [osis-user] OSIS Fragment
Patrick Durusau <patrick(at)durusau.net>
2006-02-20 13:34:45 [ FULL ]
DM,

You may have already answered your own question without realizing it. I 
like those. ;-)

Let me walk through what I think you are doing/need to do and you tell 
me when I jump off the track:

1. You have any number of OSIS documents, all of which can be validated 
against the OSIS schema.

2. You want to search those documents and return fragments that meet 
some search criteria.

3. At no point do you want to re-validate the fragments, as you already 
know they are from valid OSIS documents.

4. What you do need to know is what content model you are going to see 
in any particular fragment that is being processed (for XLST purposes 
for example).

5. So, at this point you are including the header because it is required 
to have a valid OSIS document.

Have I captured it more or less accurately so far?

But if validity is not an issue, then why not use SAX to process the 
fragments, without a schema at all?

Just stick whatever wrapper you want around the fragments and role on.

Or is there some reason for needing information from the schema?

Or do you want to produce a valid OSIS document from all of the 
fragments with only one header?

Hope you are having a great day!

Patrick



DM Smith wrote:
[...][...]
>>>
>>>> DM,
>>>>
>>>> Thanks for all the proposed edits! A lot to correct but every
one 
>>>> makes the manual better!
>>>>
>>>> Hmmm, somebody said that about open source projects.... ;-)
>>>>
>>>> Just to make sure what you are asking for:
>>>>
>>>> When you say a <fragment> element that holds content
like <div>, 
>>>> how would <div> differ in your view from
<fragment>?
>>>
>>> The difference is that of purpose, parenting and form.
>>> A fragment would have purpose in a processing system as an
artifact 
>>> of processing a whole document. It would have meaning only in that

>>> context.
>>> It differs in parenting in that it would be a child of the root 
>>> <osis> element.
>>> And in form in that div is milestoneable, a fragment is not and 
>>> there is little need for attributes.
>>>
>>> More specifically, something like (:
>>>    <xs:complexType name="osisCT">
>>>        <xs:choice>
>>>            <xs:element name="osisCorpus" type="osisCorpusCT" 
>>> minOccurs="0"/>
>>>            <xs:element name="osisText" type="osisTextCT" 
>>> minOccurs="0"/>
>>>            <xs:element name="fragment" type="fragmentCT" 
>>> minOccurs="0"/>
>>>        </xs:choice>
>>>        <xs:attribute name="TEIform" fixed="TEI.2"/>
>>>    </xs:complexType>
>>>
>>>    <xs:complexType name="fragmentCT" mixed="true">
>>>                    <xs:choice minOccurs="0"
maxOccurs="unbounded">
>>>                        <xs:element name="a" type="aCT"/>
>>>                        <xs:element name="abbr"
type="abbrCT"/>
>>>                        <xs:element name="chapter"
type="chapterCT"/>
>>>                        <xs:element name="closer"
type="closerCT"/>
>>>                        <xs:element name="date"
type="dateCT"/>
>>>                        <xs:element name="div" type="divCT"/>
>>>                        <xs:element name="divineName" 
>>> type="divineNameCT"/>
>>>                        <xs:element name="figure"
type="figureCT"/>
>>>                        <xs:element name="foreign"
type="foreignCT"/>
>>>                        <xs:element name="hi" type="hiCT"/>
>>>                        <xs:element name="index"
type="indexCT"/>
>>>                        <xs:element name="inscription" 
>>> type="inscriptionCT"/>
>>>                        <xs:element name="lb" type="lbCT"/>
>>>                        <xs:element name="lg" type="lgCT"/>
>>>                        <xs:element name="list"
type="listCT"/>
>>>                        <xs:element name="mentioned" 
>>> type="mentionedCT"/>
>>>                        <xs:element name="milestone" 
>>> type="milestoneCT"/>
>>>                        <xs:element name="milestoneEnd" 
>>> type="milestoneEndCT"/>
>>>                        <xs:element name="milestoneStart" 
>>> type="milestoneStartCT"/>
>>>                        <xs:element name="name"
type="nameCT"/>
>>>                        <xs:element name="note"
type="noteCT"/>
>>>                        <xs:element name="p" type="pCT"/>
>>>                        <xs:element name="q" type="qCT"/>
>>>                        <xs:element name="reference" 
>>> type="referenceCT"/>
>>>                        <xs:element name="salute"
type="saluteCT"/>
>>>                        <xs:element name="seg" type="segCT"/>
>>>                        <xs:element name="signed"
type="signedCT"/>
>>>                        <xs:element name="speaker"
type="speakerCT"/>
>>>                        <xs:element name="speech"
type="speechCT"/>
>>>                        <xs:element name="table"
type="tableCT"/>
>>>                        <xs:element name="title"
type="titleCT"/>
>>>                        <xs:element name="transChange" 
>>> type="transChangeCT"/>
>>>                        <xs:element name="verse"
type="verseCT"/>
>>>                        <xs:element name="w" type="wCT"/>
>>>                    </xs:choice>
>>>                </xs:sequence>
>>>                <xs:attribute name="canonical" type="xs:boolean"

>>> default="true" use="optional"/>
>>>                <xs:attribute name="TEIform"
fixed="fragment"/>
>>>    </xs:complexType>
>>>
>>>>
>>>> Realize there is something you see as different but it isn't
clear 
>>>> to me, yet, hence the question.
>>>>
>>>> Or is this a question of how to handle arbitrary XML
fragments?
>>>
>>> The desire is not how to handle arbitrary XML fragments. We've got

>>> that nailed. We wrap them in a div in an osisText in an osis 
>>> element. The problem is that an osisText requires a header to be
valid.
>>>
>>> We need to wrap the element in something so it is well-formed, as
it 
>>> could be a list of elements and text nodes. We do processing with 
>>> xslt and provide the schema so that it can get defaults (and
though 
>>> not used here external entities) For good measure we use a 
>>> validating parser.
>>>
>>> The problem is that in constructing a search result set consisting

>>> of several thousand passages, each represented as a well-formed
and 
>>> valid OSIS document, the header is repeated without value that 
>>> number of times.
>>>
>>> So in processing we need to dig down into the document from
<osis> 
>>> to <osisText>, skip <header> and all its descendants,
to the <div> 
>>> and then present the children of that div.
>>>
>>> It would be easier, in time, space and code complexity, to go from

>>> <osis> to <fragment> and process its children.
>>>
>>>> BTW, the XML fragment activity died at the W3C for lack of
interest.
>>>
>>>
>>> I am not really interested in any formal definition, though having

>>> one and tools to support it might be good.
>>> I am more interested in a simple solution to a practical problem.
>>>
>>>>
>>>> Thanks again for the comments!
>>>>
>>>> Hope you are looking forward to a great weekend!
>>>>
>>>> Patrick
>>>>
>>>> DM Smith wrote:
>>>>
>>>>> I am an author of JSword and BibleDesktop, a bible
application 
>>>>> that works with Sword modules.
>>>>>
>>>>> The basic architecture of our program is that the user
requests 
>>>>> one or more passages (each passage is a contiguous set of
verses), 
>>>>> upon which the program fetches the text from the Sword
module and 
>>>>> then that text is turned into a valid, well-formed OSIS
document. 
>>>>> The Sword text might be GBF, ThML, plain text, or OSIS.
Once we 
>>>>> have the OSIS we use xslt to transform it into HTML and
display it 
>>>>> to the user.
>>>>>
>>>>> In this process we take the fragment of Sword text that
has been 
>>>>> turned into a list of OSIS elements and make it a child of
a div 
>>>>> of a osisText in an OSIS document complete with all  the
required 
>>>>> elements (i.e. the header). This OSIS document is
transformed by 
>>>>> xslt that ignores the header element. Since it was
synthesized out 
>>>>> of nothingness, it is valueless.
>>>>>
>>>>> Would it be possible and reasonable to define an fragment
element 
>>>>> that could be used to hold any fragment of a document.
>>>>>
>>>>> I'm thinking something like: (not showing attributes)
>>>>> <osis>
>>>>>    <fragment>
>>>>>       elements that can appear at any level (i.e. just
like a div)
>>>>>    </fragment>
>>>>>    <fragment>
>>>>>       ....
>>>>>    </fragment>
>>>>> </osis>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>[...][...][...]

Re: [osis-user] OSIS Fragment
"Steven J. DeRose" <sderose(at)acm.org>
2006-02-20 13:54:45 [ FULL ]
My first inclination was the same as Patrick's -- looks a lot like a 
<div>. But I see your point about the header.

You could use a very abbreviated header as Todd pointed out; that has 
the advantage that no one has to create or maintain a separate 
schema, that you could use all the same tools, etc. etc. That may 
well be your best bet.

Of course, if this is purely for use within a particular system, you 
could do just about anything. Perhaps the easiest would be to just 
take the OSIS schema, and promote div to be the root element (or make 
something like <fragment-set> for the top), remove the 
milestonability for div, and just go with that. It wouldn't be 
official, but as long as it's within one system, that's no problem. 
If you wanted to export or interchange with other software, though, 
you'd want to make sure there was a totally trivial way to convert -- 
say, "Save As Full Document" that fills in the header (perhaps you'd 
have a shared header pointed to from all your fragments).

How may fragments do you expect to be working with? Can you say a bit 
more about the application domain? Is there a general application to 
this where you think we should consider creating a formal way to 
share headers in a later version of OSIS, perhaps?

S[...]

Re: [osis-user] OSIS Fragment
DM Smith <dmsmith555(at)yahoo.com>
2006-02-20 14:19:45 [ FULL ]
Patrick Durusau wrote:[...]

But may not have. In the example of the SWORD module in OSIS, we might 
want to assume that the document has been validated. But as far as I 
know no document has been validated. I have a process that does not 
assume that the input is good, but has rudimentary ability to scrub bad 
data when found. In fact, some are not even well-formed, let alone valid.
[...]

Again, I don't know that the fragments are valid. And it may be that the 
fragment is not well formed and needs to be adjusted to make it 
well-formed. An example of this would be a fragment that consists of a 
single verse, where the document markup is forced to use the milestoned 
form of verses. And the verse returns contains a either the start or the 
end of a container element but not the other. In this case, the software 
needs to detect that it is not well-formed (which it does) and upon 
exception enters an error handling routine that detects the anomaly and 
synthetically adds the missing piece.

In this instance, it would be nice to be able to validate against the 
schema to know if the error handler did thing correctly.

But you are right that if the document were known to be valid then the 
fragments would not need to be re-validated unless they were not well 
formed.
[...]

More or less.
[...]

The schema declares default values for attributes. With the schema, it 
is possible to know what the default values are but not otherwise.
[...]

Default and fixed values for attributes. And if you use them in 2.5, 
internal and external entities defined referenced in the schema.
[...]

We do ultimately assemble all the fragments into a synthetic document 
with one header. Then we pass that to xslt. Currently we wrap each 
fragment with a div. In this context the overhead of a header and the 
extra level introduced by osisText is minimal. It is the processing to 
this point that is significant.
[...][...]
>>>
>>> You can shorten the header down to the following and indicate the 
>>> portion of the whole document (Bible) this XML document contains 
>>> using the <scope> element and the <identifier
type="OSIS"> to 
>>> uniquely identify the work.
>>>
>>> (You are talking some risk by defaulting the reference system.)
>>>
>>> <?xml version="1.0" encoding="UTF-8"?>
>>> <osis ...>
>>>    <osisText osisIDWork="thisWork" xml:lang="en">
>>>       <header>
>>>          <work osisWork="thisWork">
>>>             <identifier
type="OSIS">Bible.en.ABS.CEV.1999</identifier>
>>>             <scope>Gen.1</scope>
>>>             <refSystem>Bible</refSystem>
>>>          </work>
>>>       <div scope="Gen.1">...</div>
>>>    </osisText>
>>> </osis>
>>>
>>>
>>> This would allow you to have only Gen.1 in a stand alone document.
>>>
>>> If it were a single verse or two you could do the following.
>>>
>>> <?xml version="1.0" encoding="UTF-8"?>
>>> <osis ...>
>>>    <osisText osisIDWork="thisWork" xml:lang="en">
>>>       <header>
>>>          <work osisWork="thisWork">
>>>             <identifier
type="OSIS">Bible.en.ABS.CEV.1999</identifier>
>>>             <scope>Gen.1.1-Gen.1.2</scope>
>>>             <refSystem>Bible</refSystem>
>>>          </work>
>>>       </header>
>>>       <div scope="Gen.1.1-Gen.1.2"><lg><l
level="1"><verse 
>>> sID="Gen.1.1" osisID="Gen.1.1"/>In the beginning God
</l><l 
>>> level="1">created the heavens </l><l level="2">and
the earth.<note 
>>> osisRef="Gen.1.1" osisID="Gen.1.1!footnote.1"
n="a"><reference 
>>> type="source" osisRef="Gen.1.1">1.1
</reference><catchWord>the 
>>> heavens and the earth: </catchWord><q level="1">The
heavens and the 
>>> earth</q> stood for the universe.</note> <verse 
>>> eID="Gen.1.1"/></l><l level="1"><verse
sID="Gen.1.2" 
>>> osisID="Gen.1.2"/>The earth was barren, </l><l
level="2">with no 
>>> form of life;<note osisRef="Gen.1.2"
osisID="Gen.1.2!footnote.1" 
>>> n="b"><reference type="source" osisRef="Gen.1.2">1.1,2 
>>> </reference><catchWord>In … life: </catchWord>Or
<q level="1">When 
>>> God began to create the heavens and the earth, the earth was
barren 
>>> with no form of life.</q></note> </l><l
level="1">it was under a 
>>> roaring ocean </l><l level="2">covered with darkness.
</l><l 
>>> level="1">But the Spirit of God<note osisRef="Gen.1.2" 
>>> osisID="Gen.1.2!footnote.2" n="c"><reference type="source" 
>>> osisRef="Gen.1.2">1.2 </reference><catchWord>the
Spirit of God: 
>>> </catchWord>Or <q level="1">a mighty
wind.</q></note> </l><l 
>>> level="2">was moving over the water. <verse 
>>> eID="Gen.1.2"/></l></lg></div>
>>>    </osisText>
>>> </osis>
>>>
>>> Todd
>>>
>>> DM Smith wrote:
>>>
>>>> Patrick Durusau wrote:
>>>>
>>>>> DM,
>>>>>
>>>>> Thanks for all the proposed edits! A lot to correct but
every one 
>>>>> makes the manual better!
>>>>>
>>>>> Hmmm, somebody said that about open source projects....
;-)
>>>>>
>>>>> Just to make sure what you are asking for:
>>>>>
>>>>> When you say a <fragment> element that holds content
like <div>, 
>>>>> how would <div> differ in your view from
<fragment>?
>>>>
>>>> The difference is that of purpose, parenting and form.
>>>> A fragment would have purpose in a processing system as an
artifact 
>>>> of processing a whole document. It would have meaning only in
that 
>>>> context.
>>>> It differs in parenting in that it would be a child of the
root 
>>>> <osis> element.
>>>> And in form in that div is milestoneable, a fragment is not
and 
>>>> there is little need for attributes.
>>>>
>>>> More specifically, something like (:
>>>>    <xs:complexType name="osisCT">
>>>>        <xs:choice>
>>>>            <xs:element name="osisCorpus"
type="osisCorpusCT" 
>>>> minOccurs="0"/>
>>>>            <xs:element name="osisText" type="osisTextCT" 
>>>> minOccurs="0"/>
>>>>            <xs:element name="fragment" type="fragmentCT" 
>>>> minOccurs="0"/>
>>>>        </xs:choice>
>>>>        <xs:attribute name="TEIform" fixed="TEI.2"/>
>>>>    </xs:complexType>
>>>>
>>>>    <xs:complexType name="fragmentCT" mixed="true">
>>>>                    <xs:choice minOccurs="0"
maxOccurs="unbounded">
>>>>                        <xs:element name="a" type="aCT"/>
>>>>                        <xs:element name="abbr"
type="abbrCT"/>
>>>>                        <xs:element name="chapter"
type="chapterCT"/>
>>>>                        <xs:element name="closer"
type="closerCT"/>
>>>>                        <xs:element name="date"
type="dateCT"/>
>>>>                        <xs:element name="div"
type="divCT"/>
>>>>                        <xs:element name="divineName" 
>>>> type="divineNameCT"/>
>>>>                        <xs:element name="figure"
type="figureCT"/>
>>>>                        <xs:element name="foreign"
type="foreignCT"/>
>>>>                        <xs:element name="hi"
type="hiCT"/>
>>>>                        <xs:element name="index"
type="indexCT"/>
>>>>                        <xs:element name="inscription" 
>>>> type="inscriptionCT"/>
>>>>                        <xs:element name="lb"
type="lbCT"/>
>>>>                        <xs:element name="lg"
type="lgCT"/>
>>>>                        <xs:element name="list"
type="listCT"/>
>>>>                        <xs:element name="mentioned" 
>>>> type="mentionedCT"/>
>>>>                        <xs:element name="milestone" 
>>>> type="milestoneCT"/>
>>>>                        <xs:element name="milestoneEnd" 
>>>> type="milestoneEndCT"/>
>>>>                        <xs:element name="milestoneStart" 
>>>> type="milestoneStartCT"/>
>>>>                        <xs:element name="name"
type="nameCT"/>
>>>>                        <xs:element name="note"
type="noteCT"/>
>>>>                        <xs:element name="p" type="pCT"/>
>>>>                        <xs:element name="q" type="qCT"/>
>>>>                        <xs:element name="reference" 
>>>> type="referenceCT"/>
>>>>                        <xs:element name="salute"
type="saluteCT"/>
>>>>                        <xs:element name="seg"
type="segCT"/>
>>>>                        <xs:element name="signed"
type="signedCT"/>
>>>>                        <xs:element name="speaker"
type="speakerCT"/>
>>>>                        <xs:element name="speech"
type="speechCT"/>
>>>>                        <xs:element name="table"
type="tableCT"/>
>>>>                        <xs:element name="title"
type="titleCT"/>
>>>>                        <xs:element name="transChange" 
>>>> type="transChangeCT"/>
>>>>                        <xs:element name="verse"
type="verseCT"/>
>>>>                        <xs:element name="w" type="wCT"/>
>>>>                    </xs:choice>
>>>>                </xs:sequence>
>>>>                <xs:attribute name="canonical"
type="xs:boolean" 
>>>> default="true" use="optional"/>
>>>>                <xs:attribute name="TEIform"
fixed="fragment"/>
>>>>    </xs:complexType>
>>>>
>>>>>
>>>>> Realize there is something you see as different but it
isn't clear 
>>>>> to me, yet, hence the question.
>>>>>
>>>>> Or is this a question of how to handle arbitrary XML
fragments?
>>>>
>>>> The desire is not how to handle arbitrary XML fragments. We've
got 
>>>> that nailed. We wrap them in a div in an osisText in an osis 
>>>> element. The problem is that an osisText requires a header to
be 
>>>> valid.
>>>>
>>>> We need to wrap the element in something so it is well-formed,
as 
>>>> it could be a list of elements and text nodes. We do
processing 
>>>> with xslt and provide the schema so that it can get defaults
(and 
>>>> though not used here external entities) For good measure we
use a 
>>>> validating parser.
>>>>
>>>> The problem is that in constructing a search result set
consisting 
>>>> of several thousand passages, each represented as a
well-formed and 
>>>> valid OSIS document, the header is repeated without value that

>>>> number of times.
>>>>
>>>> So in processing we need to dig down into the document from
<osis> 
>>>> to <osisText>, skip <header> and all its
descendants, to the <div> 
>>>> and then present the children of that div.
>>>>
>>>> It would be easier, in time, space and code complexity, to go
from 
>>>> <osis> to <fragment> and process its children.
>>>>
>>>>> BTW, the XML fragment activity died at the W3C for lack of
interest.
>>>>
>>>>
>>>> I am not really interested in any formal definition, though
having 
>>>> one and tools to support it might be good.
>>>> I am more interested in a simple solution to a practical
problem.
>>>>
>>>>>
>>>>> Thanks again for the comments!
>>>>>
>>>>> Hope you are looking forward to a great weekend!
>>>>>
>>>>> Patrick
>>>>>
>>>>> DM Smith wrote:
>>>>>
>>>>>> I am an author of JSword and BibleDesktop, a bible
application 
>>>>>> that works with Sword modules.
>>>>>>
>>>>>> The basic architecture of our program is that the user
requests 
>>>>>> one or more passages (each passage is a contiguous set
of 
>>>>>> verses), upon which the program fetches the text from
the Sword 
>>>>>> module and then that text is turned into a valid,
well-formed 
>>>>>> OSIS document. The Sword text might be GBF, ThML,
plain text, or 
>>>>>> OSIS. Once we have the OSIS we use xslt to transform
it into HTML 
>>>>>> and display it to the user.
>>>>>>
>>>>>> In this process we take the fragment of Sword text
that has been 
>>>>>> turned into a list of OSIS elements and make it a
child of a div 
>>>>>> of a osisText in an OSIS document complete with all 
the required 
>>>>>> elements (i.e. the header). This OSIS document is
transformed by 
>>>>>> xslt that ignores the header element. Since it was
synthesized 
>>>>>> out of nothingness, it is valueless.
>>>>>>
>>>>>> Would it be possible and reasonable to define an
fragment element 
>>>>>> that could be used to hold any fragment of a document.
>>>>>>
>>>>>> I'm thinking something like: (not showing attributes)
>>>>>> <osis>
>>>>>>    <fragment>
>>>>>>       elements that can appear at any level (i.e. just
like a div)
>>>>>>    </fragment>
>>>>>>    <fragment>
>>>>>>       ....
>>>>>>    </fragment>
>>>>>> </osis>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>[...]

Re: [osis-user] OSIS Fragment
DM Smith <dmsmith555(at)yahoo.com>
2006-02-20 14:34:45 [ FULL ]
Steven J. DeRose wrote:[...]

The application I am talking about is BibleDesktop 
(www.crosswire.org/bibledesktop). Currently it searches one SWORD module 
at a time. It does not matter what it is encoded in. It could ThML, GBF, 
PlainText or OSIS. If a search result is not in OSIS, we convert it into 
OSIS (without schema validation, who knows if our transformation is 
correct!) and then do further processing.

The worst case scenario would be a search that would return every other 
verse of a bible or nearly 16,000 answers. We plan to extend the ability 
to search multiple bibles at the same time. So take the number of 
English SWORD Bible modules and multiple it out.
[...]

Yes, I understand that the outcome of this request is "perhaps later".

In the context of a SWORD  application the context is a lookup of a 
requested passage or a search for qualifying verses/passages. It is not 
the entire document. So at the time of a lookup or a search result, it 
is not present in what is returned. It would be good to have a formalism 
that specifies the reuse of the header from the "master" document. 
Something that declares that this fragment is taken directly from that 
document.

Re: [osis-user] OSIS Fragment
DM Smith <dmsmith555(at)yahoo.com>
2006-02-20 15:29:44 [ FULL ]
Steven J. DeRose wrote:[...]

In my response, I meant to mention that I am not talking about a 
separate schema but a simple extention to the current one. I included it 
in another e-mail to this thread: All that is needed is to make fragment 
be a top level choice along with osisText.

Oh, and the name of the element does not matter to me. <osisChunk> would 
work as well.[...]

MailBoxer