Difference between revisions of "Bibliographic/Database"

From Apache OpenOffice Wiki
Jump to: navigation, search
(Comments and Suggestions)
(Reference Types)
Line 73: Line 73:
 
We should first define a set of different ''Reference Types'' and expand the fields needed to store the references in order to design a more robust database.
 
We should first define a set of different ''Reference Types'' and expand the fields needed to store the references in order to design a more robust database.
  
 +
=== IMPORTANT ===
 +
I believe that there are some special situations needing special fields/ special management in the DB, not adequately covered yet, in addition to some other issues described below. These are mainly '''Errata''', '''Comments and Authors Reply''' and '''Article Retractions'''.
  
 
'''Basic Reference Types'''
 
'''Basic Reference Types'''
Line 78: Line 80:
 
** Standard Article: Authors, Title, Journal, Issue, Year, Pages, other (e.g. keywords, url, local file path)
 
** Standard Article: Authors, Title, Journal, Issue, Year, Pages, other (e.g. keywords, url, local file path)
 
** Article, not-cited: as before + Availability Information (URL, other)
 
** Article, not-cited: as before + Availability Information (URL, other)
** Other Official Published Material:  
+
** Other Official Published Material: ...
 +
** '''IMPORTANT: Errata'''
 +
*** Erata: one or '''more''' Corrections might be posted in various issues of the journal
 +
*** this is usually cited as: ''Orininal Article Citation Data'' (Correction available in ''Journal'', ''Issue Nr'', ''Year'', ''Pages'') (repeat for more than one correction)
 +
*** it is possibly never cited alone
 +
*** there should be one table having these data for the Errata and a link to the original article, while the original article contains a link to this Errata
 +
** '''IMPORTANT: Commentary and Author Replay'''
 +
*** similar to Errata, there might be one or more Comments and Author Replays; this should be stored, too
 +
*** however, it is usually not included in the original citation
 +
*** it might be used however in a citation, but I do not know exaclty how to cite it optimally (original article should be provided as well)
 +
** '''IMPORTANT: Article Retraction'''
 +
*** an article may be retracted because of plagiarism or some other flaw
 +
*** this should not be used any further in the research
 +
*** however, it might be used e.g. for an article on plagiarism or flawed research
 +
*** there should be therefore one filed storing this information, too, and a link to:
 +
*** the published withdrawal letter (which explains why the article was retracted); I do not know what data to store for such a letter, either
 
*Books:
 
*Books:
 
** Book: Author, Title, Publisher, Year, other (pages, url)
 
** Book: Author, Title, Publisher, Year, other (pages, url)
** Chapter from Book: Author, Chapter Title, Book Title, Editor, other
+
** Chapter from Book: Author and Title of '''Chapter''', Book Title, Editor/ Authors, Publisher, other data
 
* Conference Abstract: Author, Title, Conference Title, Location (Town, Country), Date of Conference, other
 
* Conference Abstract: Author, Title, Conference Title, Location (Town, Country), Date of Conference, other
 
* Internet Resource: Name, (Author), URL, Date (accessed on), other (URL to locally saved version)
 
* Internet Resource: Name, (Author), URL, Date (accessed on), other (URL to locally saved version)

Revision as of 20:01, 20 September 2006

Back to Bibliographic Index

Introduction

The whole question of the need for a SQL database as part of the OpenOffice Bibliographic component is being actively debated. There are some good arguments for not having one. see the discussion in the dev list archive. David Wilson 08:53, 17 July 2006 (CEST)

Background

The biographic database is used to store a collection of bibliographic records. Many traditional bibliographic databases contained fields to store information about a limited range of printed works, books, articles, manuscripts etc. An example of this type database is the BibTex which is used with the LaTex word-processing application. Many of the current bibliographic database are derived of that early and pioneering application. As new media types were developed new fields were added to the older databases structures, such as URL's for web addresses. Also a miscellaneous reference type was added to support all the other types of media now available, video, graphics etc.

The current OpenOffice bibliographic database is of that type. See OOo documentation for bibliodatafield for a list of the fields supported in that database. This database is a simple single table database. A limitation of this type of database is that it makes it very difficult or impossible to maintain information about the relationships between works and their parts and the contributors to the works. An example is authors of works. In a single table database, like the current bibliographic database, to search for the works that an author has been involved with requires the text-string searching of the author text-fields, and for this to work the user would have had to accurately enter the author's name in the exactly the same format for every work. The benefits of a separate author table where the author's name has to entered only once and then linked to the works of that author is clear. However, as there are often several people and organizations associated with a published work: authors, editors, publishers, authors of parts of the work, sponsors, series editors etc. it is often better to define relationships in the database than have long lists of the fields in a database table to cover every possibility, most of which would not be not used in any one record. This approach of defining the relationships through the database makes the database more flexible and preserves the actual relationships, at the cost of a more complex database and increased coding complexity. A cost we believe worth incurring.

er-relationships.gif

Database Design Objectives

  • reliably import legacy formats such as RIS, Refer and BibTeX
  • provide a more general data model that supports a wider range of citation needs (humanities, law)
    • a broader range of reference types (video, audio, graphics, maps, etc.)
    • reflect increasing use of electronic and online sources
    • more complex relations (a paper presented at a conference and then published on the web, a book translated from an original, a revised book with an additional new introduction, etc.)
  • should facilitate productivity enhancements such as auto-completing text fields for author and publisher names, and linked periodicals.

The last requirement, of course, explains why it's important to normalize so much of the structure (separate tables for collections,agents, etc.).

The enhanced bibliographic database: OOo-BiblioDB

In designing the bibliographic database two models have been useful.

The first, the USA Library of Congress "Metadata Object Description Schema" (MODS) which supports modern library cataloging requirements using an XML schema, "it is intended to be able to carry selected data from existing MARC 21 records as well as to enable the creation of original resource description records". We found this model did not adequately define the structure of the reference material that we were dealing with.

The second model, described in the 'Functional Requirements for Bibliographic Records' (PDF version) (FRBR), by the International Federation of Library Associations and Institutions, defined the parts of creative works and relationships between these and their published manifestations; and relationships that people and organizations have with the various components. This model may be to complex to for our needs, however between the two models we should be able to use some of their concepts and design elements to produce a database for our needs.

The highest level views of FRBR schema is shown in the two diagrams below.

frfbs-er1.png frfbs-er2.png

Status

The database is currently is currently under-development. The current database definition is the SQL version, which can be obtained using the subversion web browser.

An entity-relationship diagram and same documentation is available, however, these may not be fully up-to-date with the definitive sql version. For an overview of the bibliographic database interaction with the rest of the bibliographic application see the components page.

What needs to be done ?

The main tasks are -

  • Complete the design of the database.
  • Write core code modules to -
    • read records in the bibliographic database and convert them to the xml format used in the document save-package, and append them to the save-package biblio-data.xml file.
    • covert records in the document save-package biblio-data.xml format to the database format, and insert them in the database.
    • covert records in bibliographic database format to that used in the bibutils package (to allow database export using biblio-utils)
    • covert records in format to that used in the bibutils package package to the format used in the bibliographic database (to allow database import using the bibutils package )
  • Design the Graphical User Interface (GUI) for maintaining the bibliographic database: adding, modifying, deleting, and searching database records.
  • Design the Graphical User Interface (GUI) for searching and selecting bibliographic records to be inserted for citations in Writer documents.
  • Build prototype GUI panels for maintaining the bibliographic database: adding, modifying, deleting, and searching database records. (These could built using OOo Database forms and OOo Basic, Java or Python.)
  • Build prototype GUI panels for searching and selecting bibliographic records to be inserted for citations in Writer documents. (These could built using OOo Database forms and OOo Basic, Java or Python.)

Comments and Suggestions

Please feel free to add comments and suggestions below


Will the database support keywords ? It does right now, it's covered with the trendy term for the same thing: tags.


Reference Types

We should first define a set of different Reference Types and expand the fields needed to store the references in order to design a more robust database.

IMPORTANT

I believe that there are some special situations needing special fields/ special management in the DB, not adequately covered yet, in addition to some other issues described below. These are mainly Errata, Comments and Authors Reply and Article Retractions.

Basic Reference Types

  • Articles:
    • Standard Article: Authors, Title, Journal, Issue, Year, Pages, other (e.g. keywords, url, local file path)
    • Article, not-cited: as before + Availability Information (URL, other)
    • Other Official Published Material: ...
    • IMPORTANT: Errata
      • Erata: one or more Corrections might be posted in various issues of the journal
      • this is usually cited as: Orininal Article Citation Data (Correction available in Journal, Issue Nr, Year, Pages) (repeat for more than one correction)
      • it is possibly never cited alone
      • there should be one table having these data for the Errata and a link to the original article, while the original article contains a link to this Errata
    • IMPORTANT: Commentary and Author Replay
      • similar to Errata, there might be one or more Comments and Author Replays; this should be stored, too
      • however, it is usually not included in the original citation
      • it might be used however in a citation, but I do not know exaclty how to cite it optimally (original article should be provided as well)
    • IMPORTANT: Article Retraction
      • an article may be retracted because of plagiarism or some other flaw
      • this should not be used any further in the research
      • however, it might be used e.g. for an article on plagiarism or flawed research
      • there should be therefore one filed storing this information, too, and a link to:
      • the published withdrawal letter (which explains why the article was retracted); I do not know what data to store for such a letter, either
  • Books:
    • Book: Author, Title, Publisher, Year, other (pages, url)
    • Chapter from Book: Author and Title of Chapter, Book Title, Editor/ Authors, Publisher, other data
  • Conference Abstract: Author, Title, Conference Title, Location (Town, Country), Date of Conference, other
  • Internet Resource: Name, (Author), URL, Date (accessed on), other (URL to locally saved version)
  • Others: please expand
Personal tools