Uno/Specifications/Type Names

From OpenOffice.org Wiki

Jump to: navigation, search

Basic Concepts

It is important to distinguish between core UNO and UNO IDL:

  • Core UNO is a computation model, describing how distributed UNO objects collaborate in a computation by calling interface methods on each other.
  • UNO IDL is a description language used to notate core UNO types and related, non–core-UNO entities like modules, typedefs, and constants.

The core UNO type system, along with related terms, is described in UNO Type System.

Core UNO uses names to identify named types. It is unclear whether there is a single namespace for all of these types, or whether there are different namespaces for the different kinds of types (i.e., whether there can or cannot be an enum type with name N and a struct type with the same name N). At least within each kind of type (enum types, struct types, exception types, or interface types), each name N can identify at most one type of that kind (i.e., there can never be two enum types, each with name N).

It is unspecified how a name should look like. For now, assume that a name can be an arbitrary-length sequence of Unicode characters (even an empty sequence), and that two names are equal if they are equal character-wise (specifically, not taking into account any Unicode normalization forms).

Unfortunately, the design of Uno Remote Protocol (URP) imposes some restrictions on names:

  • A sequence type is transported via URP as a Unicode string, consisting of "[]" followed by a string representation of the component type. The string representation of a named type is its name. The string representation of a sequence type is recursively defined by this description. The string representations of the other UNO types (VOID, BOOLEAN, …, ANY) are the Unicode strings "void", "boolean", "byte", "short", "unsigned short", "long", "unsigned long", "hyper", "unsigned hyper", "float", "double", "char", "string", "type", and "any", respectively. This implies several restrictions:
    1. There can be only one shared namespace for the names of named types (e.g., if there were an enum type named n and a struct type named n, the URP representation "[]n" of a sequence type could mean a sequence of the enum type, or a sequence of the struct type).
    2. There cannot be any named types whose names start with [] (e.g., if there were an enum type named n and another enum type named []n, the URP representation "[][]n" of a sequence type could mean a sequence of the enum type []n, or a sequence of a sequence of the enum type n).
    3. There cannot be any named types whose names equal any of the Unicode strings "void", "boolean", …, "any" (e.g., if there were an enum type named void, the URP representation "[]void" of a sequence type could mean a sequence of the enum type, or a sequence of type VOID).
  • A named type is transported via URP as a Unicode string, consisting of the type's name. Since Unicode strings are transported via URP in UTF-8 encoding, lead by a bounded-sized field giving the length of the UTF-8 encoded string, this implies that the UTF-8 encoding of a type's name cannot exceed some limit. Also, if that type is used as the component type of a sequence type, the limit is even lower, as the string must also contain the leading "[]". Recursively, this means that there is no useful minimal upper bound for the length of a type's name. Thus, it seems best to consider the attempt to transport a type with a too-long name via URP like a resource exhaustion (e.g., by throwing some com.sun.star.uno.RuntimeException).

In essence, this means that only the poorly-chosen representation of sequence types in URP imposes any restrictions on UNO type names. A better representation (in a new version of URP, or negotiated dynamically as a protocol property) would remove these restrictions.

Mapping from UNO IDL to UNO Names

UNO IDL uses identifiers to name certain entities, among them module definitions, enum definitions, struct definitions, exception definitions, and interface definitions. Lets use the collective term named type definitions for all enum definitions, struct definitions, exception definitions, and interface definitions. The syntactic rules of UNO IDL ensure that a named type definition TD is contained in a sequence of nested module definitions MD1, …, MDk, for some k ≥ 0. Each of the definitions MD1, …, MDk, TD introduces an identifier I(MD1), …, I(MDk), I(TD). An identifier can be taken to be an arbitrary-length sequence of Unicode characters, excluding the Unicode character “.” (U+002E FULL STOP). Each named type definition in UNO IDL defines a corresponding UNO type (an enum type, struct type, exception type, or interface type). The name of the corresponding UNO type is the concatenation I(MD1) • "." • … • "." • I(MDk) • "." • I(TD). (That is, a concatenation of the identifiers of the module definitions and the named type definition, with "." between any two.)

UNO Names and Language Bindings

When designing a UNO language binding, two aspects deserve attention with respect to UNO names:

  • Often, named UNO types are mapped to individual types in the language binding, which are also named by some mechanism. It has to be specified how the names of named UNO types map to named types in the language. For example, named UNO types are mapped to class and interface types in the Java language binding, and Java class and interface types themselves have names.
  • In the other direction, it may be advantageous if (almost) arbitrary types from the language binding can be interpreted as UNO types (and typically as named UNO types—enum types, struct types, exception types, or interface types). This holds even though the current UNO implementation in many cases does not allow to introduce new types dynamically (e.g., via a remote bridge). Here, it has to be specified how (the names of) types in the language binding map to (the names of named) UNO types.

The following advice how to address the above two issues concentrates on language bindings that support some form of namespaces.

When mapping from named UNO types to named types in the language binding, the following approach can be used to avoid clashes between types representing named UNO types, and types already present in the language: A single top-level namespace UNO is reserved for the UNO language binding. Any UNO name N = N0 • "." • … • "." • Nk (with k > 0) is split up at the contained “.” characters. The corresponding type in the language binding is given the name α(Nk), and it is placed into the nested sequence of namespaces named UNO, α(N0), …, α(Nk - 1). The function α maps from parts of UNO names (i.e., arbitrary-length sequences of Unicode characters) to identifiers that conform to the rules of the language binding. How α looks like highly depends on the language binding, of course.

In the other direction, it may be advantageous to map arbitrary types from the language (i.e., types that are not defined within the top-level namespace UNO) to UNO types. Assume that some language type named TN, contained in a nested sequence of namespaces named NN1, …, NNk (with k ≥ 0), shall be mapped to a named UNO type. One approach is to reserve a prefix P of UNO names for this language binding—only this language binding would be allowed to introduce named UNO types whose names start with P. The name of the UNO type could then be defined as the concatenation P • α′(NN1) • "." • … "." • α′(NNk) • "." • α′(TN). The function α′ maps from identifiers that conform to the rules of the language binding to arbitrary-length sequences of Unicode characters, excluding the Unicode character “.” (U+002E FULL STOP). Again, the definition of α′ is highly language-binding specific.

A few points remain open with this approach:

  • A global registry of reserved prefixes of UNO names would be needed. For now, this document could serve as such a registry.
  • There might be situations where a single type from the language binding shall be mapped to different kinds of named UNO types (e.g., as both an enum type and a struct type). If there is only a single namespace for all named UNO types, the above mapping has to be refined, for example by specifying it as the concatenation P • T • α′(NN1) • "." • … "." • α′(NNk) • "." • α′(TN), where T is "E" if it shall be a UNO enum type, "S" if it shall be a UNO struct type, etc.
  • Two different UNO language bindings could not independently introduce the same named UNO type (because they would start the name of the type with different prefixes P1 and P2).
  • It might be necessary to use different prefixes not on a per–language-binding level, but on a per–UNO-environment level: Imagine two instances of a Java language binding (i.e., two different UNO environments), which each know a Java class type with the same name but with incompatible definitions. They cannot both introduce their respective Java class type into UNO, as they would both choose the same UNO name. (Technically, this is easily solved by reserving one prefix P for a language binding, and giving each UNO environment based on that language binding its own private prefix P • P′.)

Of course, the last two of the above points are at least partly due to the fact that UNO uses a nominal, not a structural type system. It is unclear how to decide whether types introduced by different UNO environments should be considered the same UNO type or not.

Personal tools