|
/
Zope
/
WHI Hosted Mailing Lists
/
osis-user
/
Archive
/
2006
/
2006-02
/
OSIS Fragment
[
Schema problem? / DM Smith ... ]
[
User Manual Examples / DavidTroidl(at)aol.com ]
OSIS Fragment
DM Smith <dmsmith555(at)yahoo.com> |
2006-02-16 19:51:11 |
[ FULL ]
|
I am an author of JSword and BibleDesktop, a bible application that
works with Sword modules.
The basic architecture of our program is that the user requests one or
more passages (each passage is a contiguous set of verses), upon which
the program fetches the text from the Sword module and then that text is
turned into a valid, well-formed OSIS document. The Sword text might be
GBF, ThML, plain text, or OSIS. Once we have the OSIS we use xslt to
transform it into HTML and display it to the user.
In this process we take the fragment of Sword text that has been turned
into a list of OSIS elements and make it a child of a div of a osisText
in an OSIS document complete with all the required elements (i.e. the
header). This OSIS document is transformed by xslt that ignores the
header element. Since it was synthesized out of nothingness, it is
valueless.
Would it be possible and reasonable to define an fragment element that
could be used to hold any fragment of a document.
I'm thinking something like: (not showing attributes)
<osis>
<fragment>
elements that can appear at any level (i.e. just like a div)
</fragment>
<fragment>
....
</fragment>
</osis>
|
Re: [osis-user] OSIS Fragment
Patrick Durusau <patrick(at)durusau.net> |
2006-02-17 14:01:10 |
[ FULL ]
|
DM,
Thanks for all the proposed edits! A lot to correct but every one makes
the manual better!
Hmmm, somebody said that about open source projects.... ;-)
Just to make sure what you are asking for:
When you say a <fragment> element that holds content like <div>,
how
would <div> differ in your view from <fragment>?
Realize there is something you see as different but it isn't clear to
me, yet, hence the question.
Or is this a question of how to handle arbitrary XML fragments? BTW, the
XML fragment activity died at the W3C for lack of interest.
Thanks again for the comments!
Hope you are looking forward to a great weekend!
Patrick
DM Smith wrote:
[...][...]
|
Re: [osis-user] OSIS Fragment
DavidTroidl(at)aol.com |
2006-02-17 14:16:10 |
[ FULL ]
|
Hi!
Correct me if I'm wrong, but it seems to me the point of the question is:
could there be a way of having a valid fragment, without the full header,
especially when the header wouldn't be used.
I'm considering making separate book files for the New Testament, to reduce
file size and processing time, but to have a valid header in a master file
that would reference the individual book files. Is there a way to accommodate
that, without including the full header in all the files?
Peace,
David
|
|
|
Re: [osis-user] OSIS Fragment
DM Smith <dmsmith555(at)yahoo.com> |
2006-02-17 18:26:10 |
[ FULL ]
|
Patrick Durusau wrote:[...]
The difference is that of purpose, parenting and form.
A fragment would have purpose in a processing system as an artifact of
processing a whole document. It would have meaning only in that context.
It differs in parenting in that it would be a child of the root <osis>
element.
And in form in that div is milestoneable, a fragment is not and there is
little need for attributes.
More specifically, something like (:
<xs:complexType name="osisCT">
<xs:choice>
<xs:element name="osisCorpus" type="osisCorpusCT"
minOccurs="0"/>
<xs:element name="osisText" type="osisTextCT" minOccurs="0"/>
<xs:element name="fragment" type="fragmentCT" minOccurs="0"/>
</xs:choice>
<xs:attribute name="TEIform" fixed="TEI.2"/>
</xs:complexType>
<xs:complexType name="fragmentCT" mixed="true">
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element name="a" type="aCT"/>
<xs:element name="abbr" type="abbrCT"/>
<xs:element name="chapter" type="chapterCT"/>
<xs:element name="closer" type="closerCT"/>
<xs:element name="date" type="dateCT"/>
<xs:element name="div" type="divCT"/>
<xs:element name="divineName"
type="divineNameCT"/>
<xs:element name="figure" type="figureCT"/>
<xs:element name="foreign" type="foreignCT"/>
<xs:element name="hi" type="hiCT"/>
<xs:element name="index" type="indexCT"/>
<xs:element name="inscription"
type="inscriptionCT"/>
<xs:element name="lb" type="lbCT"/>
<xs:element name="lg" type="lgCT"/>
<xs:element name="list" type="listCT"/>
<xs:element name="mentioned" type="mentionedCT"/>
<xs:element name="milestone" type="milestoneCT"/>
<xs:element name="milestoneEnd"
type="milestoneEndCT"/>
<xs:element name="milestoneStart"
type="milestoneStartCT"/>
<xs:element name="name" type="nameCT"/>
<xs:element name="note" type="noteCT"/>
<xs:element name="p" type="pCT"/>
<xs:element name="q" type="qCT"/>
<xs:element name="reference" type="referenceCT"/>
<xs:element name="salute" type="saluteCT"/>
<xs:element name="seg" type="segCT"/>
<xs:element name="signed" type="signedCT"/>
<xs:element name="speaker" type="speakerCT"/>
<xs:element name="speech" type="speechCT"/>
<xs:element name="table" type="tableCT"/>
<xs:element name="title" type="titleCT"/>
<xs:element name="transChange"
type="transChangeCT"/>
<xs:element name="verse" type="verseCT"/>
<xs:element name="w" type="wCT"/>
</xs:choice>
</xs:sequence>
<xs:attribute name="canonical" type="xs:boolean"
default="true" use="optional"/>
<xs:attribute name="TEIform" fixed="fragment"/>
</xs:complexType>
[...]
The desire is not how to handle arbitrary XML fragments. We've got that
nailed. We wrap them in a div in an osisText in an osis element. The
problem is that an osisText requires a header to be valid.
We need to wrap the element in something so it is well-formed, as it
could be a list of elements and text nodes. We do processing with xslt
and provide the schema so that it can get defaults (and though not used
here external entities) For good measure we use a validating parser.
The problem is that in constructing a search result set consisting of
several thousand passages, each represented as a well-formed and valid
OSIS document, the header is repeated without value that number of times.
So in processing we need to dig down into the document from <osis> to
<osisText>, skip <header> and all its descendants, to the
<div> and then
present the children of that div.
It would be easier, in time, space and code complexity, to go from
<osis> to <fragment> and process its children.
[...]
I am not really interested in any formal definition, though having one
and tools to support it might be good.
I am more interested in a simple solution to a practical problem.
[...][...]
|
Re: [osis-user] OSIS Fragment
Todd Tillinghast <todd(at)snowfallsoftware.com> |
2006-02-20 12:24:46 |
[ FULL ]
|
I think the following will support your needs.
You can shorten the header down to the following and indicate the
portion of the whole document (Bible) this XML document contains using
the <scope> element and the <identifier type="OSIS"> to uniquely
identify the work.
(You are talking some risk by defaulting the reference system.)
<?xml version="1.0" encoding="UTF-8"?>
<osis ...>
<osisText osisIDWork="thisWork" xml:lang="en">
<header>
<work osisWork="thisWork">
<identifier
type="OSIS">Bible.en.ABS.CEV.1999</identifier>
<scope>Gen.1</scope>
<refSystem>Bible</refSystem>
</work>
<div scope="Gen.1">...</div>
</osisText>
</osis>
This would allow you to have only Gen.1 in a stand alone document.
If it were a single verse or two you could do the following.
<?xml version="1.0" encoding="UTF-8"?>
<osis ...>
<osisText osisIDWork="thisWork" xml:lang="en">
<header>
<work osisWork="thisWork">
<identifier
type="OSIS">Bible.en.ABS.CEV.1999</identifier>
<scope>Gen.1.1-Gen.1.2</scope>
<refSystem>Bible</refSystem>
</work>
</header>
<div scope="Gen.1.1-Gen.1.2"><lg><l
level="1"><verse
sID="Gen.1.1" osisID="Gen.1.1"/>In the beginning God </l><l
level="1">created the heavens </l><l level="2">and the
earth.<note
osisRef="Gen.1.1" osisID="Gen.1.1!footnote.1" n="a"><reference
type="source" osisRef="Gen.1.1">1.1 </reference><catchWord>the
heavens
and the earth: </catchWord><q level="1">The heavens and the
earth</q>
stood for the universe.</note> <verse
eID="Gen.1.1"/></l><l
level="1"><verse sID="Gen.1.2" osisID="Gen.1.2"/>The earth was barren,
</l><l level="2">with no form of life;<note osisRef="Gen.1.2"
osisID="Gen.1.2!footnote.1" n="b"><reference type="source"
osisRef="Gen.1.2">1.1,2 </reference><catchWord>In … life:
</catchWord>Or
<q level="1">When God began to create the heavens and the earth, the
earth was barren with no form of life.</q></note> </l><l
level="1">it
was under a roaring ocean </l><l level="2">covered with darkness.
</l><l
level="1">But the Spirit of God<note osisRef="Gen.1.2"
osisID="Gen.1.2!footnote.2" n="c"><reference type="source"
osisRef="Gen.1.2">1.2 </reference><catchWord>the Spirit of God:
</catchWord>Or <q level="1">a mighty wind.</q></note>
</l><l
level="2">was moving over the water. <verse
eID="Gen.1.2"/></l></lg></div>
</osisText>
</osis>
Todd
DM Smith wrote:[...][...][...][...][...][...][...][...]
>>> works with Sword modules.
>>>
>>> The basic architecture of our program is that the user requests
one
>>> or more passages (each passage is a contiguous set of verses),
upon
>>> which the program fetches the text from the Sword module and then
>>> that text is turned into a valid, well-formed OSIS document. The
>>> Sword text might be GBF, ThML, plain text, or OSIS. Once we have
the
>>> OSIS we use xslt to transform it into HTML and display it to the
user.
>>>
>>> In this process we take the fragment of Sword text that has been
>>> turned into a list of OSIS elements and make it a child of a div
of a
>>> osisText in an OSIS document complete with all the required
elements
>>> (i.e. the header). This OSIS document is transformed by xslt that
>>> ignores the header element. Since it was synthesized out of
>>> nothingness, it is valueless.
>>>
>>> Would it be possible and reasonable to define an fragment element
>>> that could be used to hold any fragment of a document.
>>>
>>> I'm thinking something like: (not showing attributes)
>>> <osis>
>>> <fragment>
>>> elements that can appear at any level (i.e. just like a div)
>>> </fragment>
>>> <fragment>
>>> ....
>>> </fragment>
>>> </osis>
>>>
>>>
>>>[...][...]
|
Re: [osis-user] OSIS Fragment
DM Smith <dmsmith555(at)yahoo.com> |
2006-02-20 12:59:45 |
[ FULL ]
|
Todd,
Thanks for your reply. But this is what we already do, as mentioned
below. (Well an insignificant variation of it.) There is no risk in
defaulting the reference system if the program that creates the fragment
also consumes it.
If you add up the bytes and the processing cycles to chew through
this for an answer set of several thousand fragments (worse case is an
answer set that consists of every other verse), it adds up to be fairly
significant. And all of it just to satisfy being valid with the schema
and all of it just to ignore it when processing it.
What I am looking for is nothing more than an optimization that
allows for a valid OSIS document.
It would be fine to have an attribute or two on the <fragment>
element that would point to the document from which it came so the
header from it could be used, if necessary. This would migitate any risk
of not having a valid header. e.g <fragment source="uri for the source"
osisIDWork="work id from the source document">...</fragment>
DM
Todd Tillinghast wrote:[...][...]
>>> DM,
>>>
>>> Thanks for all the proposed edits! A lot to correct but every one
>>> makes the manual better!
>>>
>>> Hmmm, somebody said that about open source projects.... ;-)
>>>
>>> Just to make sure what you are asking for:
>>>
>>> When you say a <fragment> element that holds content like
<div>, how
>>> would <div> differ in your view from <fragment>?[...]
>>> Realize there is something you see as different but it isn't clear
>>> to me, yet, hence the question.
>>>
>>> Or is this a question of how to handle arbitrary XML
fragments?[...]
>>> Thanks again for the comments!
>>>
>>> Hope you are looking forward to a great weekend!
>>>
>>> Patrick
>>>
>>> DM Smith wrote:
>>>
>>>> I am an author of JSword and BibleDesktop, a bible application
that
>>>> works with Sword modules.
>>>>
>>>> The basic architecture of our program is that the user
requests one
>>>> or more passages (each passage is a contiguous set of verses),
upon
>>>> which the program fetches the text from the Sword module and
then
>>>> that text is turned into a valid, well-formed OSIS document.
The
>>>> Sword text might be GBF, ThML, plain text, or OSIS. Once we
have
>>>> the OSIS we use xslt to transform it into HTML and display it
to
>>>> the user.
>>>>
>>>> In this process we take the fragment of Sword text that has
been
>>>> turned into a list of OSIS elements and make it a child of a
div of
>>>> a osisText in an OSIS document complete with all the required
>>>> elements (i.e. the header). This OSIS document is transformed
by
>>>> xslt that ignores the header element. Since it was synthesized
out
>>>> of nothingness, it is valueless.
>>>>
>>>> Would it be possible and reasonable to define an fragment
element
>>>> that could be used to hold any fragment of a document.
>>>>
>>>> I'm thinking something like: (not showing attributes)
>>>> <osis>
>>>> <fragment>
>>>> elements that can appear at any level (i.e. just like a
div)
>>>> </fragment>
>>>> <fragment>
>>>> ....
>>>> </fragment>
>>>> </osis>
>>>>
>>>>
>>>>
>>>[...][...]
|
Re: [osis-user] OSIS Fragment
Patrick Durusau <patrick(at)durusau.net> |
2006-02-20 13:34:45 |
[ FULL ]
|
DM,
You may have already answered your own question without realizing it. I
like those. ;-)
Let me walk through what I think you are doing/need to do and you tell
me when I jump off the track:
1. You have any number of OSIS documents, all of which can be validated
against the OSIS schema.
2. You want to search those documents and return fragments that meet
some search criteria.
3. At no point do you want to re-validate the fragments, as you already
know they are from valid OSIS documents.
4. What you do need to know is what content model you are going to see
in any particular fragment that is being processed (for XLST purposes
for example).
5. So, at this point you are including the header because it is required
to have a valid OSIS document.
Have I captured it more or less accurately so far?
But if validity is not an issue, then why not use SAX to process the
fragments, without a schema at all?
Just stick whatever wrapper you want around the fragments and role on.
Or is there some reason for needing information from the schema?
Or do you want to produce a valid OSIS document from all of the
fragments with only one header?
Hope you are having a great day!
Patrick
DM Smith wrote:
[...][...]
>>>
>>>> DM,
>>>>
>>>> Thanks for all the proposed edits! A lot to correct but every
one
>>>> makes the manual better!
>>>>
>>>> Hmmm, somebody said that about open source projects.... ;-)
>>>>
>>>> Just to make sure what you are asking for:
>>>>
>>>> When you say a <fragment> element that holds content
like <div>,
>>>> how would <div> differ in your view from
<fragment>?
>>>
>>> The difference is that of purpose, parenting and form.
>>> A fragment would have purpose in a processing system as an
artifact
>>> of processing a whole document. It would have meaning only in that
>>> context.
>>> It differs in parenting in that it would be a child of the root
>>> <osis> element.
>>> And in form in that div is milestoneable, a fragment is not and
>>> there is little need for attributes.
>>>
>>> More specifically, something like (:
>>> <xs:complexType name="osisCT">
>>> <xs:choice>
>>> <xs:element name="osisCorpus" type="osisCorpusCT"
>>> minOccurs="0"/>
>>> <xs:element name="osisText" type="osisTextCT"
>>> minOccurs="0"/>
>>> <xs:element name="fragment" type="fragmentCT"
>>> minOccurs="0"/>
>>> </xs:choice>
>>> <xs:attribute name="TEIform" fixed="TEI.2"/>
>>> </xs:complexType>
>>>
>>> <xs:complexType name="fragmentCT" mixed="true">
>>> <xs:choice minOccurs="0"
maxOccurs="unbounded">
>>> <xs:element name="a" type="aCT"/>
>>> <xs:element name="abbr"
type="abbrCT"/>
>>> <xs:element name="chapter"
type="chapterCT"/>
>>> <xs:element name="closer"
type="closerCT"/>
>>> <xs:element name="date"
type="dateCT"/>
>>> <xs:element name="div" type="divCT"/>
>>> <xs:element name="divineName"
>>> type="divineNameCT"/>
>>> <xs:element name="figure"
type="figureCT"/>
>>> <xs:element name="foreign"
type="foreignCT"/>
>>> <xs:element name="hi" type="hiCT"/>
>>> <xs:element name="index"
type="indexCT"/>
>>> <xs:element name="inscription"
>>> type="inscriptionCT"/>
>>> <xs:element name="lb" type="lbCT"/>
>>> <xs:element name="lg" type="lgCT"/>
>>> <xs:element name="list"
type="listCT"/>
>>> <xs:element name="mentioned"
>>> type="mentionedCT"/>
>>> <xs:element name="milestone"
>>> type="milestoneCT"/>
>>> <xs:element name="milestoneEnd"
>>> type="milestoneEndCT"/>
>>> <xs:element name="milestoneStart"
>>> type="milestoneStartCT"/>
>>> <xs:element name="name"
type="nameCT"/>
>>> <xs:element name="note"
type="noteCT"/>
>>> <xs:element name="p" type="pCT"/>
>>> <xs:element name="q" type="qCT"/>
>>> <xs:element name="reference"
>>> type="referenceCT"/>
>>> <xs:element name="salute"
type="saluteCT"/>
>>> <xs:element name="seg" type="segCT"/>
>>> <xs:element name="signed"
type="signedCT"/>
>>> <xs:element name="speaker"
type="speakerCT"/>
>>> <xs:element name="speech"
type="speechCT"/>
>>> <xs:element name="table"
type="tableCT"/>
>>> <xs:element name="title"
type="titleCT"/>
>>> <xs:element name="transChange"
>>> type="transChangeCT"/>
>>> <xs:element name="verse"
type="verseCT"/>
>>> <xs:element name="w" type="wCT"/>
>>> </xs:choice>
>>> </xs:sequence>
>>> <xs:attribute name="canonical" type="xs:boolean"
>>> default="true" use="optional"/>
>>> <xs:attribute name="TEIform"
fixed="fragment"/>
>>> </xs:complexType>
>>>
>>>>
>>>> Realize there is something you see as different but it isn't
clear
>>>> to me, yet, hence the question.
>>>>
>>>> Or is this a question of how to handle arbitrary XML
fragments?
>>>
>>> The desire is not how to handle arbitrary XML fragments. We've got
>>> that nailed. We wrap them in a div in an osisText in an osis
>>> element. The problem is that an osisText requires a header to be
valid.
>>>
>>> We need to wrap the element in something so it is well-formed, as
it
>>> could be a list of elements and text nodes. We do processing with
>>> xslt and provide the schema so that it can get defaults (and
though
>>> not used here external entities) For good measure we use a
>>> validating parser.
>>>
>>> The problem is that in constructing a search result set consisting
>>> of several thousand passages, each represented as a well-formed
and
>>> valid OSIS document, the header is repeated without value that
>>> number of times.
>>>
>>> So in processing we need to dig down into the document from
<osis>
>>> to <osisText>, skip <header> and all its descendants,
to the <div>
>>> and then present the children of that div.
>>>
>>> It would be easier, in time, space and code complexity, to go from
>>> <osis> to <fragment> and process its children.
>>>
>>>> BTW, the XML fragment activity died at the W3C for lack of
interest.
>>>
>>>
>>> I am not really interested in any formal definition, though having
>>> one and tools to support it might be good.
>>> I am more interested in a simple solution to a practical problem.
>>>
>>>>
>>>> Thanks again for the comments!
>>>>
>>>> Hope you are looking forward to a great weekend!
>>>>
>>>> Patrick
>>>>
>>>> DM Smith wrote:
>>>>
>>>>> I am an author of JSword and BibleDesktop, a bible
application
>>>>> that works with Sword modules.
>>>>>
>>>>> The basic architecture of our program is that the user
requests
>>>>> one or more passages (each passage is a contiguous set of
verses),
>>>>> upon which the program fetches the text from the Sword
module and
>>>>> then that text is turned into a valid, well-formed OSIS
document.
>>>>> The Sword text might be GBF, ThML, plain text, or OSIS.
Once we
>>>>> have the OSIS we use xslt to transform it into HTML and
display it
>>>>> to the user.
>>>>>
>>>>> In this process we take the fragment of Sword text that
has been
>>>>> turned into a list of OSIS elements and make it a child of
a div
>>>>> of a osisText in an OSIS document complete with all the
required
>>>>> elements (i.e. the header). This OSIS document is
transformed by
>>>>> xslt that ignores the header element. Since it was
synthesized out
>>>>> of nothingness, it is valueless.
>>>>>
>>>>> Would it be possible and reasonable to define an fragment
element
>>>>> that could be used to hold any fragment of a document.
>>>>>
>>>>> I'm thinking something like: (not showing attributes)
>>>>> <osis>
>>>>> <fragment>
>>>>> elements that can appear at any level (i.e. just
like a div)
>>>>> </fragment>
>>>>> <fragment>
>>>>> ....
>>>>> </fragment>
>>>>> </osis>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>[...][...][...]
|
Re: [osis-user] OSIS Fragment
"Steven J. DeRose" <sderose(at)acm.org> |
2006-02-20 13:54:45 |
[ FULL ]
|
My first inclination was the same as Patrick's -- looks a lot like a
<div>. But I see your point about the header.
You could use a very abbreviated header as Todd pointed out; that has
the advantage that no one has to create or maintain a separate
schema, that you could use all the same tools, etc. etc. That may
well be your best bet.
Of course, if this is purely for use within a particular system, you
could do just about anything. Perhaps the easiest would be to just
take the OSIS schema, and promote div to be the root element (or make
something like <fragment-set> for the top), remove the
milestonability for div, and just go with that. It wouldn't be
official, but as long as it's within one system, that's no problem.
If you wanted to export or interchange with other software, though,
you'd want to make sure there was a totally trivial way to convert --
say, "Save As Full Document" that fills in the header (perhaps you'd
have a shared header pointed to from all your fragments).
How may fragments do you expect to be working with? Can you say a bit
more about the application domain? Is there a general application to
this where you think we should consider creating a formal way to
share headers in a later version of OSIS, perhaps?
S[...]
|
Re: [osis-user] OSIS Fragment
DM Smith <dmsmith555(at)yahoo.com> |
2006-02-20 14:19:45 |
[ FULL ]
|
Patrick Durusau wrote:[...]
But may not have. In the example of the SWORD module in OSIS, we might
want to assume that the document has been validated. But as far as I
know no document has been validated. I have a process that does not
assume that the input is good, but has rudimentary ability to scrub bad
data when found. In fact, some are not even well-formed, let alone valid.
[...]
Again, I don't know that the fragments are valid. And it may be that the
fragment is not well formed and needs to be adjusted to make it
well-formed. An example of this would be a fragment that consists of a
single verse, where the document markup is forced to use the milestoned
form of verses. And the verse returns contains a either the start or the
end of a container element but not the other. In this case, the software
needs to detect that it is not well-formed (which it does) and upon
exception enters an error handling routine that detects the anomaly and
synthetically adds the missing piece.
In this instance, it would be nice to be able to validate against the
schema to know if the error handler did thing correctly.
But you are right that if the document were known to be valid then the
fragments would not need to be re-validated unless they were not well
formed.
[...]
More or less.
[...]
The schema declares default values for attributes. With the schema, it
is possible to know what the default values are but not otherwise.
[...]
Default and fixed values for attributes. And if you use them in 2.5,
internal and external entities defined referenced in the schema.
[...]
We do ultimately assemble all the fragments into a synthetic document
with one header. Then we pass that to xslt. Currently we wrap each
fragment with a div. In this context the overhead of a header and the
extra level introduced by osisText is minimal. It is the processing to
this point that is significant.
[...][...]
>>>
>>> You can shorten the header down to the following and indicate the
>>> portion of the whole document (Bible) this XML document contains
>>> using the <scope> element and the <identifier
type="OSIS"> to
>>> uniquely identify the work.
>>>
>>> (You are talking some risk by defaulting the reference system.)
>>>
>>> <?xml version="1.0" encoding="UTF-8"?>
>>> <osis ...>
>>> <osisText osisIDWork="thisWork" xml:lang="en">
>>> <header>
>>> <work osisWork="thisWork">
>>> <identifier
type="OSIS">Bible.en.ABS.CEV.1999</identifier>
>>> <scope>Gen.1</scope>
>>> <refSystem>Bible</refSystem>
>>> </work>
>>> <div scope="Gen.1">...</div>
>>> </osisText>
>>> </osis>
>>>
>>>
>>> This would allow you to have only Gen.1 in a stand alone document.
>>>
>>> If it were a single verse or two you could do the following.
>>>
>>> <?xml version="1.0" encoding="UTF-8"?>
>>> <osis ...>
>>> <osisText osisIDWork="thisWork" xml:lang="en">
>>> <header>
>>> <work osisWork="thisWork">
>>> <identifier
type="OSIS">Bible.en.ABS.CEV.1999</identifier>
>>> <scope>Gen.1.1-Gen.1.2</scope>
>>> <refSystem>Bible</refSystem>
>>> </work>
>>> </header>
>>> <div scope="Gen.1.1-Gen.1.2"><lg><l
level="1"><verse
>>> sID="Gen.1.1" osisID="Gen.1.1"/>In the beginning God
</l><l
>>> level="1">created the heavens </l><l level="2">and
the earth.<note
>>> osisRef="Gen.1.1" osisID="Gen.1.1!footnote.1"
n="a"><reference
>>> type="source" osisRef="Gen.1.1">1.1
</reference><catchWord>the
>>> heavens and the earth: </catchWord><q level="1">The
heavens and the
>>> earth</q> stood for the universe.</note> <verse
>>> eID="Gen.1.1"/></l><l level="1"><verse
sID="Gen.1.2"
>>> osisID="Gen.1.2"/>The earth was barren, </l><l
level="2">with no
>>> form of life;<note osisRef="Gen.1.2"
osisID="Gen.1.2!footnote.1"
>>> n="b"><reference type="source" osisRef="Gen.1.2">1.1,2
>>> </reference><catchWord>In … life: </catchWord>Or
<q level="1">When
>>> God began to create the heavens and the earth, the earth was
barren
>>> with no form of life.</q></note> </l><l
level="1">it was under a
>>> roaring ocean </l><l level="2">covered with darkness.
</l><l
>>> level="1">But the Spirit of God<note osisRef="Gen.1.2"
>>> osisID="Gen.1.2!footnote.2" n="c"><reference type="source"
>>> osisRef="Gen.1.2">1.2 </reference><catchWord>the
Spirit of God:
>>> </catchWord>Or <q level="1">a mighty
wind.</q></note> </l><l
>>> level="2">was moving over the water. <verse
>>> eID="Gen.1.2"/></l></lg></div>
>>> </osisText>
>>> </osis>
>>>
>>> Todd
>>>
>>> DM Smith wrote:
>>>
>>>> Patrick Durusau wrote:
>>>>
>>>>> DM,
>>>>>
>>>>> Thanks for all the proposed edits! A lot to correct but
every one
>>>>> makes the manual better!
>>>>>
>>>>> Hmmm, somebody said that about open source projects....
;-)
>>>>>
>>>>> Just to make sure what you are asking for:
>>>>>
>>>>> When you say a <fragment> element that holds content
like <div>,
>>>>> how would <div> differ in your view from
<fragment>?
>>>>
>>>> The difference is that of purpose, parenting and form.
>>>> A fragment would have purpose in a processing system as an
artifact
>>>> of processing a whole document. It would have meaning only in
that
>>>> context.
>>>> It differs in parenting in that it would be a child of the
root
>>>> <osis> element.
>>>> And in form in that div is milestoneable, a fragment is not
and
>>>> there is little need for attributes.
>>>>
>>>> More specifically, something like (:
>>>> <xs:complexType name="osisCT">
>>>> <xs:choice>
>>>> <xs:element name="osisCorpus"
type="osisCorpusCT"
>>>> minOccurs="0"/>
>>>> <xs:element name="osisText" type="osisTextCT"
>>>> minOccurs="0"/>
>>>> <xs:element name="fragment" type="fragmentCT"
>>>> minOccurs="0"/>
>>>> </xs:choice>
>>>> <xs:attribute name="TEIform" fixed="TEI.2"/>
>>>> </xs:complexType>
>>>>
>>>> <xs:complexType name="fragmentCT" mixed="true">
>>>> <xs:choice minOccurs="0"
maxOccurs="unbounded">
>>>> <xs:element name="a" type="aCT"/>
>>>> <xs:element name="abbr"
type="abbrCT"/>
>>>> <xs:element name="chapter"
type="chapterCT"/>
>>>> <xs:element name="closer"
type="closerCT"/>
>>>> <xs:element name="date"
type="dateCT"/>
>>>> <xs:element name="div"
type="divCT"/>
>>>> <xs:element name="divineName"
>>>> type="divineNameCT"/>
>>>> <xs:element name="figure"
type="figureCT"/>
>>>> <xs:element name="foreign"
type="foreignCT"/>
>>>> <xs:element name="hi"
type="hiCT"/>
>>>> <xs:element name="index"
type="indexCT"/>
>>>> <xs:element name="inscription"
>>>> type="inscriptionCT"/>
>>>> <xs:element name="lb"
type="lbCT"/>
>>>> <xs:element name="lg"
type="lgCT"/>
>>>> <xs:element name="list"
type="listCT"/>
>>>> <xs:element name="mentioned"
>>>> type="mentionedCT"/>
>>>> <xs:element name="milestone"
>>>> type="milestoneCT"/>
>>>> <xs:element name="milestoneEnd"
>>>> type="milestoneEndCT"/>
>>>> <xs:element name="milestoneStart"
>>>> type="milestoneStartCT"/>
>>>> <xs:element name="name"
type="nameCT"/>
>>>> <xs:element name="note"
type="noteCT"/>
>>>> <xs:element name="p" type="pCT"/>
>>>> <xs:element name="q" type="qCT"/>
>>>> <xs:element name="reference"
>>>> type="referenceCT"/>
>>>> <xs:element name="salute"
type="saluteCT"/>
>>>> <xs:element name="seg"
type="segCT"/>
>>>> <xs:element name="signed"
type="signedCT"/>
>>>> <xs:element name="speaker"
type="speakerCT"/>
>>>> <xs:element name="speech"
type="speechCT"/>
>>>> <xs:element name="table"
type="tableCT"/>
>>>> <xs:element name="title"
type="titleCT"/>
>>>> <xs:element name="transChange"
>>>> type="transChangeCT"/>
>>>> <xs:element name="verse"
type="verseCT"/>
>>>> <xs:element name="w" type="wCT"/>
>>>> </xs:choice>
>>>> </xs:sequence>
>>>> <xs:attribute name="canonical"
type="xs:boolean"
>>>> default="true" use="optional"/>
>>>> <xs:attribute name="TEIform"
fixed="fragment"/>
>>>> </xs:complexType>
>>>>
>>>>>
>>>>> Realize there is something you see as different but it
isn't clear
>>>>> to me, yet, hence the question.
>>>>>
>>>>> Or is this a question of how to handle arbitrary XML
fragments?
>>>>
>>>> The desire is not how to handle arbitrary XML fragments. We've
got
>>>> that nailed. We wrap them in a div in an osisText in an osis
>>>> element. The problem is that an osisText requires a header to
be
>>>> valid.
>>>>
>>>> We need to wrap the element in something so it is well-formed,
as
>>>> it could be a list of elements and text nodes. We do
processing
>>>> with xslt and provide the schema so that it can get defaults
(and
>>>> though not used here external entities) For good measure we
use a
>>>> validating parser.
>>>>
>>>> The problem is that in constructing a search result set
consisting
>>>> of several thousand passages, each represented as a
well-formed and
>>>> valid OSIS document, the header is repeated without value that
>>>> number of times.
>>>>
>>>> So in processing we need to dig down into the document from
<osis>
>>>> to <osisText>, skip <header> and all its
descendants, to the <div>
>>>> and then present the children of that div.
>>>>
>>>> It would be easier, in time, space and code complexity, to go
from
>>>> <osis> to <fragment> and process its children.
>>>>
>>>>> BTW, the XML fragment activity died at the W3C for lack of
interest.
>>>>
>>>>
>>>> I am not really interested in any formal definition, though
having
>>>> one and tools to support it might be good.
>>>> I am more interested in a simple solution to a practical
problem.
>>>>
>>>>>
>>>>> Thanks again for the comments!
>>>>>
>>>>> Hope you are looking forward to a great weekend!
>>>>>
>>>>> Patrick
>>>>>
>>>>> DM Smith wrote:
>>>>>
>>>>>> I am an author of JSword and BibleDesktop, a bible
application
>>>>>> that works with Sword modules.
>>>>>>
>>>>>> The basic architecture of our program is that the user
requests
>>>>>> one or more passages (each passage is a contiguous set
of
>>>>>> verses), upon which the program fetches the text from
the Sword
>>>>>> module and then that text is turned into a valid,
well-formed
>>>>>> OSIS document. The Sword text might be GBF, ThML,
plain text, or
>>>>>> OSIS. Once we have the OSIS we use xslt to transform
it into HTML
>>>>>> and display it to the user.
>>>>>>
>>>>>> In this process we take the fragment of Sword text
that has been
>>>>>> turned into a list of OSIS elements and make it a
child of a div
>>>>>> of a osisText in an OSIS document complete with all
the required
>>>>>> elements (i.e. the header). This OSIS document is
transformed by
>>>>>> xslt that ignores the header element. Since it was
synthesized
>>>>>> out of nothingness, it is valueless.
>>>>>>
>>>>>> Would it be possible and reasonable to define an
fragment element
>>>>>> that could be used to hold any fragment of a document.
>>>>>>
>>>>>> I'm thinking something like: (not showing attributes)
>>>>>> <osis>
>>>>>> <fragment>
>>>>>> elements that can appear at any level (i.e. just
like a div)
>>>>>> </fragment>
>>>>>> <fragment>
>>>>>> ....
>>>>>> </fragment>
>>>>>> </osis>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>[...]
|
Re: [osis-user] OSIS Fragment
DM Smith <dmsmith555(at)yahoo.com> |
2006-02-20 14:34:45 |
[ FULL ]
|
Steven J. DeRose wrote:[...]
The application I am talking about is BibleDesktop
(www.crosswire.org/bibledesktop). Currently it searches one SWORD module
at a time. It does not matter what it is encoded in. It could ThML, GBF,
PlainText or OSIS. If a search result is not in OSIS, we convert it into
OSIS (without schema validation, who knows if our transformation is
correct!) and then do further processing.
The worst case scenario would be a search that would return every other
verse of a bible or nearly 16,000 answers. We plan to extend the ability
to search multiple bibles at the same time. So take the number of
English SWORD Bible modules and multiple it out.
[...]
Yes, I understand that the outcome of this request is "perhaps later".
In the context of a SWORD application the context is a lookup of a
requested passage or a search for qualifying verses/passages. It is not
the entire document. So at the time of a lookup or a search result, it
is not present in what is returned. It would be good to have a formalism
that specifies the reuse of the header from the "master" document.
Something that declares that this fragment is taken directly from that
document.
|
Re: [osis-user] OSIS Fragment
DM Smith <dmsmith555(at)yahoo.com> |
2006-02-20 15:29:44 |
[ FULL ]
|
Steven J. DeRose wrote:[...]
In my response, I meant to mention that I am not talking about a
separate schema but a simple extention to the current one. I included it
in another e-mail to this thread: All that is needed is to make fragment
be a top level choice along with osisText.
Oh, and the name of the element does not matter to me. <osisChunk> would
work as well.[...]
|
|