Bibliographic/OOoBib Functional Requirements/Name Sorting

From Apache OpenOffice Wiki
Jump to: navigation, search

The sorting and cataloguing of names and subjects is complex subject. Library cataloguers have developed many rules that they use in their attempts to provide a coherent index structure on an vast and variable range of reference materials.

There are two processes that are used by Library cataloguers associated with the sorting of bibliographic data by name and subject.

The first is the ‘standardisation’ of the names and subjects through a process of applying a list of somewhat arbitrary rules to modify them for sorting. But not modifying the for display.

The second is the alphabetical sorting process which is subject to the rules of the language and character set used.

The list of rules used for the pre-sorting process is complex and some of them I would imagine be difficult to reliably automate. (For example: the work "The 1847 issue of U. S. stamps. " is catalogued as "Eighteen forty-seven issue of U. S. stamps.") Although some of the name rules such as 'Ignore initial al- in Arabic names' or the treatment of Von van Sir, Lord etc could be automated.

These rules are language specific. And I might guess country or even institution specific. In the examples I quoted in the reference below, some the rule might even be offensive to some people (For example: the rule for treating “R. Academia nazionale dei Lincei, Rome” is “Ignore foreign royalty (except British)”)

So proper treatment of name and subject sorting requires, at the least, national pre-sort processing modules, and national language sorting modules.

The list below comes from a student exercise in The Art of Computer Programming, Volume 3: Sorting and Searching by Donald E. Knuth


	Text of card					Remarks
R. Academia nazionale dei Lincei, Rome		Ignore foreign royalty (except British)
1812; ein historischer roman.			Achtzehnhundert zwöf
Bibliothèque d´histoire révolutionnaire.	Treat apostrophe as space in French
Bibliothèque des curiosités.			Ignore accents on letters
Brown, Mrs. J. Crosby				Ignore designation of rank
Brown, John					Names with dates follow those without
Brown, John, mathematician			...the latter are subarranged by
Brown, John, of Boston				descriptive words
Brown, John, 1715-1766				Arrange identical names by birthdate
BROWN, JOHN, 1715-1766			        Works “about” follow works “by”
Brown, John, d. 1811				Sometimes birthdate must be estimated
Brown, Dr. John, 1810-1882			Ignore designation of rank
Brown-Williams, Reginald Makepeace		Hyphen treated as space
Brown America.					Book titles follow compound names
Brown & Dallison’s Nevada directory.		& in English becomes and
Brownjohn, Alan
Den’, Vladimir Éduardovich, 1867		Ignore apostrophe in names
The den.					Ignore an initial article
Den lieben sssen mkdeln.			. . . provided its in nominative case
Dix, Morgan, 1827-1908				Names before words
1812 ouverture.					Dix-huit cent douze
Le XIXe sièle français.				Dix-neuvième
The 1847 issue of U. S. stamps.			Eighteen forty-seven
.1812 overture.					Eighteen twelve
I am a mathematician.				(by Norbert Weiner)
IBM journal of research and development.	Initials are like one-letter words
ha-I ha-chad.					Ignore initial article
Ia; a love story.				Ignore punctuation in titles
International Business Machines Corporation	
al-Khuw~rizmi, Muhammad ibn ~			
fl. 813-846					Ignore initial al- in Arabic names
Labour; a magazine for all workers.
Labor research association			Respell it Labor
Labour, see Labor				Cross-reference card
McCall´s cookbook				Ignore apostrophes in English
McCarthy, John, 1927				Mc = Mac
Machine-independent computer			Treat hyphen as space
  programming.
MacMahon, Maj. Percy Alexander,
1854-1929					Ignore designation of rank
Mrs. Dalloway.
Mistress of mistresses.				Mrs. = Mistress
Royal society of London
St. Petersburger Zeitung.
Saint-Saës, Camille, 1835-1921			St. = Saint, even in German
Ste. Anne des Monts, Quebec			Sainte
Seminumerical algorithms.
Uncle Toms cabin.
U. S. Bureau of the census.			U. S. = United States
Vandermonde, Alexander Théphile,
1735-1796
Van Valkenburg, Mac Elwyn, 1921-		Ignore space after prefix in surnames
Von Neumann, John, 1903-1957
The whole art of legerdemain.
Who´s afraid of Virginia Woolf?			Ignore apostrophe in English
Wijngaarden, Adriaan van, 1916-			Surname doesn't begin with lower case
							   letter

Most of these rules are subject to certain exceptions, and there are many other rules not illustrated here.

References

Representing People's Names in Dublin Core

Personal tools