/ Zope / WHI Hosted Mailing Lists / osis-user / Archive / 2004 / 2004-05 / USFM --> TE <--OSIS

[ << ] [ >> ]

[ OpenSource Editor for OSIS ? / "Benedykt P. ... ] [ OSIS version / Patrick Durusau ... ]

USFM --> TE <--OSIS
Jim_Albright(at)wycliffe.org
2004-05-23 21:44:16 [ FULL ]
I am working on creating style names inside of the SIL Translation Editor 
that are flat in nature as in Word (Paragraph and Character styles).
To show the correspondence I am creating a table with the USFM code that 
maps to a TE style name and also mapping from an OSIS document  to TE.

USFM    TE                      OSIS
\p      Paragraph               p
so '\p' in USFM is converted to 'Paragraph' in TE and mapping a 'p' in 
OSIS to 'Paragraph' in TE

\m      Paragraph_Continuation  p[(at)type='?continuation?']
Here we have '\m' in USFM mapping to 'Paragraph_Continuation' in TE and in 
OSIS a p with type='continuation' maps to 'Paragraph' in TE

I have put ?xxx? to indicate that this attribute does not exist in an 
enumerated list of attributes. I would like to see them exist.

I have around 150 tags that I am now working on the OSIS to TE mapping but 
wanted to check with you all to see if what I am doing makes sense to you. 
So in other words, if you started with a properly marked up OSIS document 
produced by a TE to OSIS transformation you should be able to do an OSIS 
to TE transformation using the OSIS column in the template match. 
Something like:
\pi     Letter_Paragraph        q[(at)type='?letter?']

<xml:template match="p">
        <!--create a TE 'Paragraph' -->
</xml:template>
<xml:template match="p[(at)type='?continuation?'">
        <!--create a TE 'Paragraph Continuation' -->
</xml:template>
<xml:template match="q[(at)type='?letter?'">
        <!--create a TE 'Letter Paragraph' -->
</xml:template>



So my first question is this type of presentation understandable to 
someone understanding OSIS? 

Next question: Do you see some other construction to handle the Paragraph 
Continuation? See NIV MRK 1.6 for an example.


Jim Albright
704 843-0582
Wycliffe Bible Translators

RE: [osis-user] USFM --> TE <--OSIS
"Todd Tillinghast" <todd(at)contentframeworks.com>
2004-05-25 12:15:25 [ FULL ]
Jim,

Because USFM is a linear sequence of markers and XML (OSIS) is
heirarchial, there is a need in USFM to indicate that something that
started before and was interrupted or "contains" something is
continuing.

The result is that there is not direct mapping of the \m format marker
to an OSIS element.  However, while \m is used this way 99% of the time
it is possible that someone could use \m to indicate a "flush left"
starting paragraph that is not a continuation.  In this case there is no
standardized OSIS mechanism to differentiate a "flush left" paragraph
from a "normal" paragraph because all that is present is formatting
information and not the semantic meaning behind why it is formatted
"flush left".

Possibly more troubling to may USFM users is the similar mapping of the
\b format marker when encoding an OSIS document.  There is no "blank
line" in OSIS.  The \b format marker is most commonly used after (or
between) a block of poetry.  There are a few rare cases where it is used
to separate sections in cases where the section does not have a section
title.

In OSIS you would simply encode the following example:
<p>
   ... text and element of the paragraph ...
   <q type="block"> 
      ... text and elements of the block quote ...
   </q>
   ... continuation of the paragraph that would be marked with a \m
format marker ...
</p>


The following encoding would not be correct:

<p>
   ... text and element of the paragraph ...
</p>
<q type="block"> 
   ... text and elements of the block quote ...
</q>
<p type="continuation">
   ... continuation of the paragraph that would be marked with a \m
format marker ...
</p>

as well as:

<p>
   ... text and element of the paragraph ...
   <q type="block"> 
      ... text and elements of the block quote ...
   </q>
</p>
<p type="continuation">
   ... continuation of the paragraph that would be marked with a \m
format marker ...
</p>

Todd
[...]
Editor[...]
that[...]
and in[...]
but[...]
you.[...]
document[...]
OSIS[...]
Paragraph[...]

RE: [osis-user] USFM --> TE <--OSIS
Jim_Albright(at)wycliffe.org
2004-05-25 15:03:59 [ FULL ]
So how would I map from OSIS to TE the Paragraph Continuation since TE is 
in the flat world?

The blank line because of a blank section ... I handle with a blank 
section.

Jim Albright
704 843-0582
Wycliffe Bible Translators






"Todd Tillinghast" <todd(at)contentframeworks.com>
05/25/2004 12:15 PM
Please respond to osis-user

 
        To:     <osis-user(at)whi.wts.edu>
        cc: 
        Subject:        RE: [osis-user] USFM --> TE <--OSIS


Jim,

Because USFM is a linear sequence of markers and XML (OSIS) is
heirarchial, there is a need in USFM to indicate that something that
started before and was interrupted or "contains" something is
continuing.

The result is that there is not direct mapping of the \m format marker
to an OSIS element.  However, while \m is used this way 99% of the time
it is possible that someone could use \m to indicate a "flush left"
starting paragraph that is not a continuation.  In this case there is no
standardized OSIS mechanism to differentiate a "flush left" paragraph
from a "normal" paragraph because all that is present is formatting
information and not the semantic meaning behind why it is formatted
"flush left".

Possibly more troubling to may USFM users is the similar mapping of the
\b format marker when encoding an OSIS document.  There is no "blank
line" in OSIS.  The \b format marker is most commonly used after (or
between) a block of poetry.  There are a few rare cases where it is used
to separate sections in cases where the section does not have a section
title.

In OSIS you would simply encode the following example:
<p>
   ... text and element of the paragraph ...
   <q type="block"> 
      ... text and elements of the block quote ...
   </q>
   ... continuation of the paragraph that would be marked with a \m
format marker ...
</p>


The following encoding would not be correct:

<p>
   ... text and element of the paragraph ...
</p>
<q type="block"> 
   ... text and elements of the block quote ...
</q>
<p type="continuation">
   ... continuation of the paragraph that would be marked with a \m
format marker ...
</p>

as well as:

<p>
   ... text and element of the paragraph ...
   <q type="block"> 
      ... text and elements of the block quote ...
   </q>
</p>
<p type="continuation">
   ... continuation of the paragraph that would be marked with a \m
format marker ...
</p>

Todd
[...]
Editor[...]
that[...]
and in[...]
but[...]
you.[...]
document[...]
OSIS[...]
Paragraph[...]


[...]

RE: [osis-user] USFM --> TE <--OSIS
"Todd Tillinghast" <todd(at)contentframeworks.com>
2004-05-25 16:45:52 [ FULL ]
Jim,

I am not that familiar with TE but if I were going to USFM or SFM from
OSIS (I suspect you can make the translation to TE) I would look for
cases where a paragraph is interrupted by another structure and then
output a \m format marker.

Going from TE to OSIS I would do the same thing I would do with USFM.

Todd

[...]
is[...]
time[...]
no[...]
the[...]
used[...]
section[...]
TE.[...]
in[...]
an[...]
mapping[...]
to[...]

RE: [osis-user] USFM --> TE <--OSIS
Jim_Albright(at)wycliffe.org
2004-05-25 17:13:25 [ FULL ]
Then if this is your xml[...]

in xslt

<xsl:template match='p' >
        <xsl:element name='Paragraph'>
                <xsl:apply-styles/>
        </xsl:element>
</xsl:template>

<xsl:template match='q/(at)block'  >
        <xsl:element name='Letter_Paragraph'>
                <xsl:apply-styles/>
        </xsl:element>
</xsl:template>


<xsl:template match='text()[parent::p][preceding-sibling::*]'   >
        <xsl:element name='Paragraph_Continuation'>
                <xsl:apply-styles/>
        </xsl:element>
</xsl:template>

will do the conversion back to TE.

Where in the documentation does it say you can't do either of the ways you 
say you can't do? It seemed with <p type='x-yyyy'> that I could just
about 
do anything.


Jim Albright
704 843-0582
Wycliffe Bible Translators






"Todd Tillinghast" <todd(at)contentframeworks.com>
05/25/2004 04:45 PM
Please respond to osis-user

 
        To:     <osis-user(at)whi.wts.edu>
        cc: 
        Subject:        RE: [osis-user] USFM --> TE <--OSIS


Jim,

I am not that familiar with TE but if I were going to USFM or SFM from
OSIS (I suspect you can make the translation to TE) I would look for
cases where a paragraph is interrupted by another structure and then
output a \m format marker.

Going from TE to OSIS I would do the same thing I would do with USFM.

Todd

[...]
is[...]
time[...]
no[...]
the[...]
used[...]
section[...]
TE.[...]
in[...]
an[...]
mapping[...]
to[...]


[...]

RE: [osis-user] USFM --> TE <--OSIS
"Todd Tillinghast" <todd(at)contentframeworks.com>
2004-05-26 07:25:08 [ FULL ]
Jim,

See below.

Todd
[...]

This is likely the general idea, but there a few problems with your
XSLT.

1) You can't use the preceding-sibling axis in a match statement.
2) You have to namespace qualify all OSIS element in an XSLT because the
namespace is required by the schema.
3) I think you mean <xsl:template match="osis:q[(at)type='block']">
4) I think you mean <xsl:apply-templates/> rather than
<xsl:apply-styles/>
5) The last template statement is the right idea (assuming you could use
preceding-sibling) but you would not want to use * because only some
markers require a continuation format marker.  There could be a <note>,
<divineName type="yhwh">, <index>, <w>, etc....  This logic
would depend
on the rules of the file format you are transforming to.

[...]
you[...]
about[...]

This was a point I brought up for consideration at the last OSIS meeting
and I suspect has not made it into the user manual yet.

Of course you can encode just about anything as an OSIS document
especially if you use "x-...".  If you take this to its extreme, you
could encode a single <div> element with a series of <seg
type="x-[format marker]> elements.

[...]
TE[...]
marker[...]
is[...]
paragraph[...]
styles).[...]
code[...]
'p'[...]
TE[...]
an[...]
to[...]

RE: [osis-user] USFM --> TE <--OSIS
Jim_Albright(at)wycliffe.org
2004-05-26 09:57:32 [ FULL ]
See <<<<<<<<<

Jim Albright
704 843-0582
Wycliffe Bible Translators







"Todd Tillinghast" <todd(at)contentframeworks.com>
05/25/2004 06:39 PM
Please respond to osis-user

 
        To:     <osis-user(at)whi.wts.edu>
        cc: 
        Subject:        RE: [osis-user] USFM --> TE <--OSIS


Jim,

See below.

Todd
[...]

This is likely the general idea, but there a few problems with your
XSLT.

1) You can't use the preceding-sibling axis in a match statement.
2) You have to namespace qualify all OSIS element in an XSLT because the
namespace is required by the schema.
3) I think you mean <xsl:template match="osis:q[(at)type='block']">
4) I think you mean <xsl:apply-templates/> rather than
<xsl:apply-styles/>
5) The last template statement is the right idea (assuming you could use
preceding-sibling) but you would not want to use * because only some
markers require a continuation format marker.  There could be a <note>,
<divineName type="yhwh">, <index>, <w>, etc....  This logic
would depend
on the rules of the file format you are transforming to.
<<<<<<<<<<<< thank you for pointing out the
errors.
<<<<<<<<<<<< In XSEM we defined this
problem area as [...]
<<<<<<<<<< possibly one of the reasons for doing
so is to avoid having to 
specify all of the possible interruptors.
<<<<<<<<<< I don't know for sure now but it sure
does simplify formatting.
<<<<<<<<<< Yes I know that technically your
solution is correct.
<<<<<<<<<< In practice it is more difficult to
implement with no other 
gain that I can see.
<<<<<<<<<< I am approaching the problem from 1.
being easily able to 
convert back to the USFM \m notation
<<<<<<<<<< and 2. publication
<<<<<<<<<< Using <p type='continuation'>
handles both situations easily.
<<<<<<<<<< We can easily check to see if someone
use <p 
type='continuation'> right after section/head
<<<<<<<<<< and turn it back to just <p> and
do the formatting difference 
looking for the first p right after a
<<<<<<<<<< section/head. In practice this is a
rare event to find in the 
wild.

[...]
you[...]
about[...]

This was a point I brought up for consideration at the last OSIS meeting
and I suspect has not made it into the user manual yet.

Of course you can encode just about anything as an OSIS document
especially if you use "x-...".  If you take this to its extreme, you
could encode a single <div> element with a series of <seg
type="x-[format marker]> elements.
<<<<<<<<<<< I am asking for more enumeration
of attributes so that this 
case doesn't come up.
<<<<<<<<<<< It is fine to allow for x-.... but
to me once you extend the 
"standard" you no longer have 
<<<<<<<<<<< "ONE STANDARD"
<<<<<<<<<<< I hope that OSIS will include
enumeration of attributes 
allowed and if you want to allow
<<<<<<<<<<< the user to extend OSIS okay but
then they should know that 
their conformance to a pure
<<<<<<<<<<< STANDARD has been broken. If the
user finds this situation 
necessary it would seem
<<<<<<<<<<< that OSIS was lacking something
important or the user didn't 
understand the proper
<<<<<<<<<<< encoding.
<<<<<<<<<<< I would love to see LOTS more
examples of proper encoding in 
the user manual.

[...]
TE[...]
marker[...]
is[...]
paragraph[...]
styles).[...]
code[...]
'p'[...]
TE[...]
an[...]
to[...]

RE: [osis-user] USFM --> TE <--OSIS
"Todd Tillinghast" <todd(at)contentframeworks.com>
2004-05-26 11:40:12 [ FULL ]
Jim,

<snip>
[...]
to[...]
formatting.[...]
easily.[...]
difference[...]
the[...]


Actually when it comes to formatting and easy of use part of it is just
a shift in thinking, part is dependant on the environment you are
rendering to, and part of it depends on how nested your content is.

If you are outputting to XSL-FO you can simply open a new <fo:block>
when the paragraph starts then open a nested <fo:block> for the elements
that "interrupts"/"is nested within" the paragraph and then simply
continue the text of the paragraph.

If there are a lot of nested structures the continuation model becomes
more burdensome than the nested model.  Consider the following:

<div type="section">
   <p>
      ... some text...
      <list>
        <item>
           <p>
               ... some text...
               <list>
                   <item>
                   </item>
                   <item>
                   </item>
               </list>
                ... some text...
               <list>
                   <item>
                   </item>
                   <item>
                   </item>
               </list>
               ... some text...
           </p>
        </item>
      </list>
      ... some text...
      <list>
        <item>
           <p>
               ... some text...
               <list>
                   <item>
                   </item>
                   <item>
                   </item>
               </list>
                ... some text...
               <list>
                   <item>
                   </item>
                   <item>
                   </item>
               </list>
               ... some text...
           </p>
        </item>
      </list>
      ... some text...
  </p>
</div>

I ran into cases like this one (I did not look up the exact case) in the
NIV.

I think that regardless of what sort of continuation structure would be
conceived that two things will be true.  One there will be counter cases
to the continuation structures identified AND rendering process would
not be simplified only complicated because there would be an additional
option to consider on top of what is already standard practice (both in
OSIS and in general XML encoding of text documents).

When it comes to using an OSIS document with nested elements rather than
the suggested continuation strategy, I think the nested strategy
actually lends itself to being easier to process rather than harder.

However, when it comes to transforming to a linear format that requires
continuation markers in some cases and not in others and possibly
different continuation markers for different things being continued then
it would certainly be easier if the OSIS document paralleled the
continuation strategy required by target format.

<snip>
[...]
this[...]

I am in favor of more standard enumerated attribute values.  As I have
been working through all of the USFM format markers, I have identified a
few cases that need to be addressed.

When it comes to adding <p type="continuation"> there is a standard way
of encoding with out adding the type="continuation" enumerated value.
This may not be what would be preferred based on your perspective but
there is a standard way of handling it.

I would hope that there would be very little if any use of x-... within
OSIS documents.

What enumerated type values do you think are missing?

Todd

MailBoxer