OSIS Schema & Best Practices Issues
Contents
- 1. Schema Bugs, Errors, Fixes
- 1.1. Dead Elements - Removal Suggested
- 1.1.1. <cell>
- 1.1.2. milestones
- 1.1. Dead Elements - Removal Suggested
- 2. Content Model Issues
- 3. Best Practices
- 3.1. Major Issues
- 3.1.1. Levels of Encoding
- 3.1.2. Milestones: Start and Stop
- 3.1.3. Predominant hierarchy
- 3.1.4. Quotes
- 3.1.5. Text in Verses
- 3.1.6. Verse splits
- 3.2. Lesser Issues
- 3.2.1. blockQuote vs. Speech
- 3.2.2. Book Titles
- 3.2.3. catchWord
- 3.2.4. Complex or discontinuous text
- 3.2.5. Continuing Paragraph
- 3.2.6. Copyright pages
- 3.2.7. Cross-References in <title>
- 3.2.8. Dictionary
- 3.2.9. <div> following <osisText>
- 3.2.10. Dublin Core
- 3.2.11. endings, multiple
- 3.2.12. Footnotes
- 3.2.13. Identifier with element
- 3.2.14. Identity of books, works
- 3.2.15. Introduction
- 3.2.16. Introduction content
- 3.2.17. Lines within a line group
- 3.2.18. Major and minor divisions
- 3.2.19. Matthew text example
- 3.2.20. Milestone Pairs
- 3.2.21. Misc. but common structures
- 3.2.22. Non-canonical text and speech
- 3.2.23. Notes
- 3.2.24. Parallel passages
- 3.2.25. Poetry
- 3.2.26. Presentation Punctuation in References
- 3.2.27. Reference encoding
- 3.2.28. Reference to entire work
- 3.2.29. Special Information
- 3.2.30. Split
- 3.2.31. Stanza
- 3.2.32. Title Page
- 3.2.33. Translator practices
- 3.2.34. Verse value
- 3.2.35. Work related practices
- 3.1. Major Issues
2. Content Model Issues
2.1. <div> attributes
Add section, front, body, back, titlePage, introduction, index, preface, afterword, colophon, lexeme to type attribute on <div>.
2.2. Insert <divineName> in <catchWord>
The <divineName> element is meant to control imposition of styling. If not included in <catchWord>, must use ad hoc methods, leading to inconsistent encoding and more complex stylesheets.
2.3. lang/script/ews
The lang vs. xml:lang issue is already identified. I think we should also consider adding a script attribute at the same places where lang currently is. (Plenty of use cases exist Cyrillic vs. Latin for Serbian being the most recognizable.) I think I recall TEI having a similar facility for identifying script.
In terms of best practices for these attributes:
lang should be specified as RFC 3066 (currently the only mention of a language RFC in the schema is a reference to 1766, which this obsoletes, in the language element)
In addition, we should specify best practices for languages not covered by ISO 639. x-E-... was suggested previously as a best practice for identifying languages included in the Ethnologue, but common practice at SIL and according to LINGUIST List, seems to be to use x-SIL-...
Additionally, I would recommend we specify LINGUIST List's codes for languages absent from ISO 639 and Ethnologue, using something like x-LING-.... (Their codes are available here: http://saussure.linguistlist.org/cfdocs/new-website/LL-WorkingDirs/forms/langs/GetListOfAncientLgs.cfm http://saussure.linguistlist.org/cfdocs/new-website/LL-WorkingDirs/forms/langs/GetListOfConstructedLgs.cfm )
If we choose to add a script attribute, ISO 15924 would be the appropriate standard to follow, but it is not final. Their pattern for codes is either of [A-Z][a-z]{3} or [0-9]{3} (Codes can be found here: http://www.evertype.com/standards/iso15924/document/dis15924.pdf)
I still don't know why ews is necessary, but it should at least be confined to some set of standard values if such a thing exists.
2.5. osisID as list, pointing at with osisRef with grain
Ex. osisID="Matt.1.2 Matt.1.3 Matt.1.4" would osisRef="Matt.1.2@ch14" be the SAME as osisRef="Matt.1.3@ch14"
Problem is that the grain reference has to have a certain staring point and that can only be the first osisID. Other issue is blind pointing, how do I know if the author has used osisID as a list?
3. Best Practices
3.1. Major Issues
3.1.1. Levels of Encoding
Category A: Bibles with just the scripture text and no notion of paragraphs and organized with a book/chapter/verse hierarchy.
Category B: Bibles with paragraphs but no sections, with the paragraphs held by chapter <div> elements.
Category C: Bibles with sections and paragraphs, where sections as <div type="x-section"> elements contain paragraphs.
3.1.6. Verse splits
<verse> element split in <lg>, <list>, and <table> and should the schema be changed
Chris: (Personally I think splidIDs are a bad thing in every circumstance where I've been forced to use them. They force text to be encoded in an extrememly unnatural manner.)
Allowing <l> inside of <verse> and allowing <l> to not require <lg> seems like it would solve the line-related part of the problem.
It seems that issue 3.2.25. Stanza was the reason <lg> was created, wasn't it?
3.2. Lesser Issues
3.2.2. Book Titles
For <title> elements, use the type attributes "short" for the short title like "Matthew", "mainTitle" for the main title, and "subTitle" for any sub titles for the book. (The same could be applied to testaments and book groups.)
3.2.3. catchWord
Catchword (unbalanced quotes, <divineName>, etc..) Chris: Also consider inserting <hi> in <catchWord>. This issue comes up in the TEV.
3.2.4. Complex or discontinuous text
Marking AddEsther where chapters interrupt other chapters and alternant reference systems are present.
3.2.5. Continuing Paragraph
How to best encode a continuing paragraph after a block quote of line group.
3.2.7. Cross-References in <title>
Should cross-references following a <title> be placed in a child <title> element?
<div type="x-section" osisRef="Matt.3.1-Matt.3.12">
<title type="section">The Preaching of John the Baptist
<title type="cross-ref"><reference
osisRef="Mark.1.1-Mark.1.8" type="parallelPassage">Mark
1.1-8</reference> <reference osisRef="Luke.3.1-Luke.3.18"
type="parallelPassage">Luke 3.1-18</reference> <reference
osisRef="John.1.19-John.1.28" type="parallelPassage">John
1.19-28</reference></title></title>
3.2.9. <div> following <osisText>
How to best organize top level structure for introduction sections, mini-dictionaries, glossaries, maps
3.2.10. Dublin Core
What should the DC elements in <work> look like for a document that is a portion of the entire work (ie. a single book, single chapter, set of books, range of verses, several sections from different books).
3.2.13. Identifier with element
The question I have is how do associate an identifier with an element. For example if I wanted to say that a paragraph or other block of text is about "anger".
The best I can come up with is a <note> element with an osisRef to desired text. (Similar to a cross reference)
This will work with scripture text with a well defined reference system, but for non-Biblical text that often does not have osisIDs this becomes an issue. (This would likely be resolved if XPath/XPointer like syntax were allowed in a reference or reference like element.) (Todd)
3.2.14. Identity of books, works
How to identify a book of Esther (Esther vs. Additions to Esther vs. Greek Esther). (Goes with 3.2.4. Complex or discontinuous text, somewhat.)
How to identify books of Ezra/Nehemiah/Esdras.
Depending on these... potentially, how to identify 1-2Kgs/1-2Chr vs. 1-4Kgdms.
How to identify books that occur multiply within a work (e.g. Esther in NRSVA & others; Psalms in Vulgate; Joshua, Judges, Daniel, & Tobit in Rahlfs')
3.2.16. Introduction content
The text found at the front of a bible, testament, book group, or book. Contain this type of content in a <div type="x-introduction"> element.
3.2.17. Lines within a line group
Use type="q", type="q2" (or similar type names) and a set of other standardized types to indicate the specific nature of the <l> element.
3.2.19. Matthew text example
Should Matt.1.2-Matt.1.6a be encoded as osisID="Matt.1.2 Matt.1.3 Matt.1.4 Matt.1.5 Matt.1.6 Matt.1.6a" or osisID="Matt.1.2 Matt.1.3 Matt.1.4 Matt.1.5 Matt.1.6"? The logic being that "a" is simply a TYPOGRAPHIC mechanism to indicate that there are two blocks of text with Matt.1.6 in them! I believe the latter form is CORRECT even though I have argued for the alternative in the past.
3.2.20. Milestone Pairs
Some guidelines on how to use milestone pairs for chapters, verses, and quotes and how to be consistent in the use of milestones vs elements.
3.2.21. Misc. but common structures
glossary, map, mini-dictionary, Thompson Chain Reference (Todd), see http://www.zondervan.com/media/pdfs/0310912229.pdf for example (I have a local copy for the meeting. pld)
3.2.22. Non-canonical text and speech
How to encode non-canonical text associated with the start of a speech. (<seg type="speechStart">She Speeks</seg>)
3.2.23. Notes
A "best practices" guideline for type attributes that indicate the type of note. Chris: Add cross-reference osisNotes type (unless there seems to be a better practice)
3.2.24. Parallel passages
Use a <div type="parallelPassage"> with strictly a set of <reference> elements and the related display text as children.
3.2.25. Poetry
How to encode lines of poetry: Line breaks, multiple translator specified line splitting alternatives. What is presentation and what is data?
3.2.26. Presentation Punctuation in References
The best way to encode a series of references with various presentation punctuation.
3.2.27. Reference encoding
In OSIS documents that are not Bibles, it is a common to see a quote of scripture text followed by the reference.
Does it make sense to add an optional osisRef attribute to <q> and <milestoneStart> to accommodate this frequent issue?
Example: <q osisRef="Matt.20.28">The Son of Man did not come to be served, but to serve . . . </q>
rather than <q>The Son of Man did not come to be served, but to serve . . .<reference osisRef="Matt.20.28">Matthew 20:28</reference></q> (Todd)
3.2.28. Reference to entire work
Introduction to CEV has references to entire works. How to encode? (Todd)
3.2.29. Special Information
How to preserve special information related to verse numbers that can not be represented with an osisID ("*1") at Bible.CEV.Gen.49.1. (Is this really a note?)
3.2.30. Split
Only split AND only use the attribute "splitID" for the following elements: <verse>, <div type="chapter">, <div type="x-section">, and <p>.
3.2.33. Translator practices
How to encode the idea created by the translator that leads to a blank line being rendered? (This would be additional spacing than would normally exist between two paragraphs to emphasize a shift in thought.)