Release: Pegasus 2.3
Author: Chuck Carmack (carmack@us.ibm.com)
July 2, 2003
NOTE: THIS IS A WORK-IN-PROGRESS
As part of the Pegasus 2.3 release, functions were added for globalization
support. Globalization involves two major aspects: internationalization
and localization.
Internationalization is the process of writing a program that is locale-neutral.
In other words, the program should be able to run in any locale without
change. There are several categories in a locale, including the language
of message strings, date format, time format, etc. For release 2.3,
the Pegasus server is concerned with the language of the message strings
it returns to its clients.
To support internationalization, a program is designed to do the following:
Support character sets that can represent customer data in any language. Typically, the program supports some variation of Unicode for internal data. There is usually some conversion between the supported character sets for external data, and the internal character set. Since Unicode covers all characters, and usually has converters on the platform, it is a good choice for the 'normalized' internal character set. The most 'interoperable' solution for external data is to support UTF-8 (eg. network and file system data). The internal data is usually UTF-16 (or UCS-2, but that is deprecated).
Extract locale-sensitive resources, such as message strings, from the code to external resource files. Typically, the resources are loaded based on the locale requested by the end-user, and returned to the end-user for display.
Localization is the process of customizing a software product to
support particular locales. For example, a product that is internationalized
might want to only localize for certain countries. This would mean
that the localized resources (eg. message files) would only be translated
and shipped for the countries that the product supports. Since the
code for the product is locale-neutral, it will be easy to drop in new
translations as more countries are supported.
The Pegasus 2.3 release added support for globalization. At a
high-level, the following additions were made to Pegasus 2.3:
Please refer to PEPs 56 and 58 for details about the globalization
support in Pegasus 2.3.
This document provides a HOWTO guide to be used by developers to globalize
code that is being added to Pegasus. The audience for this document
are:
The quickest way to approach this document is to read the General
section, and then the developer section that relates to what you are doing.
Pegasus 2.3 supports Unicode throughout the processing of requests.
External data to Pegasus is encoded in UTF-8. Internal data is encoded
in UTF-16.
External data includes the CIM-XML messages passed over the network,
the repository files, and the MOF files. For the CIM-XML messages,
Pegasus follows section 4.8 of the CIM-HTTP
specification Specifically, Pegasus supports the
"utf-8" setting for the charset parameter of the Content-Type header and
the XML encoding attribute. If no charset is specified, the 7-bit
ASCII is assumed. The Pegasus MOF compiler supports UTF-8 encoding
in the MOF files. (TODO - remove this statement if this is
not in 2.3)
The internal support of UTF-16 is encapsulated in the Pegasus String class. This class has been updated to contain UTF-16 characters. Specifically, the Char16 objects inside the String contain UTF-16 characters. Note: a UTF-16 surrogate pair is contained in two consecutive Char16 objects. To keep backwards compatibilty, the methods on the String class have not changed. New methods have been added as needed. The following describes this in more detail:
Pegasus 2.3 supports clients and providers that wish to localize.
There are two areas to be localized: ERROR
elements in the CIM-XML; and Object
Definition elements in the CIM-XML. Clients can request
the server to return error messages and CIM objects in a set of languages
of their choosing. Clients can also tag a language to the CIM objects
they are sending to the server. Providers and the server can return
error messages and CIM objects that are tagged with one of languages
requested by the client.
The localization design is based on section 4.8 of the CIM-HTTP
specification , which refers to RFC
2616. The method used to tag a language to the CIM-XML is through
the Accept-Language and Content-Language HTTP headers. These headers
are basically lists of language tags. An HTTP request can contain
an Accept-Language header, which indicates the list of preferred languages
that the client wants in the response. This list can be prioritized
by using the quality numbers. An HTTP request or response can contain
a Content-Language header, which indicates the language(s) of the content
in the message. In the Pegasus case, this would be the CIM-XML.
Note that the Content-Language header is a list of language tags.
This allows the content of an HTTP message to contain more than one translation.
However, in the Pegasus case, there is only one CIM-XML document in the
HTTP message, and thus one translation.
CIM clients may use the Accept-Language HTTP header to specify the languages
they wish to be returned in the CIM response message. CIM clients
may also use the Content-Language header to tag the language of any CIM
objects they are sending to the server in the CIM request message.
The server, and providers, should attempt to return error messages and
CIM objects in one of the accept languages requested by the client.
The server and providers should set the Content-Language header in the
CIM response message to indicate which of the requested languages they
are returning.
NOTE: Localization support was not added for the MOF files and
repository in Pegasus 2.3. The #pragma locale, #pragma instancelocale,
and translatable qualifier flavor are not supported in the Pegasus 2.3
MOF compiler. From the client perspective, classes, qualifiers, and
instances stored in the repository as not tagged with a language.
The Accept-Language and Content-Language headers will be ignored for repository
operations. However, since the repository will support UTF-8,
characters for any language may be stored there.
NOTE: Since the Content-Language header applies to the entire
HTTP message, it applies to the entire CIM-XML document. This includes
all the objects in the document, including enumerated objects, and all
the values in the objects. This is a limitation that will remain
until the CIM standard has been updated to support language tags tied to
individual CIM values. From the client perspective, it is possible
for Pegasus to send a CIM response with NO Content-Language, even if the
client had sent Accept-Language. This can happen if Pegasus
does not know the language of the response. An example is a request
that was sent to a Pegasus 2.2 provider. Another example is an enumerated
response where each provider returned a different language. Please
refer to PEP58 for details on these provider scenarios.
Pegasus 2.3 has added classes for the localization support. There
are new classes called AcceptLanguages and ContentLanguages that encapsulate
the Accept-Language and Content-Language headers, respectively. These
classes are basically containers of AcceptLanguageElement and ContentLanguageElement,
where a language element represents one language tag. The AcceptLanguages
class will keep the AcceptLanguageElement's prioritized based on quality,
according to RFC 2616.
AcceptLanguages and ContentLanguages are the objects used by code throughout
the request/response processing, from the client to the server to the providers
and back. The server handles the creation of these objects from the
HTTP headers. Code at each point in the process will have access
to these objects.
Please refer to the following files for details on the new Pegasus classes.
See the sections below for details on how to write clients and providers
to use these classes.
One of the goals of globalization for Pegasus 2.3 is the extraction
of hardcoded messages into external message files, and loading messages
from those files. The topics in this section are: how to create
message files, and how to load messages.
At the time of writing, the message loading function in Pegasus 2.3
used the International Components for Unicode (ICU)
libraries. This is expected to be the future direction for Pegasus.
ICU
uses
a resource bundle format for their message files. In order
to load the messages, ICU requires that the resource bundles are compiled
into a binary form (.res file) using their genrb tool.
The documentation for ICU resource bundles is in the Resource
Management section of the ICU
User Guide . This section will tell you how to
create, organize, and compile your resource bundles for different languages.
Note: your resource bundles should be organized in a tree structure
similiar to the one shown in the Resource Management section, including
the empty bundles in the tree.
NOTE: Pegasus 2.3 only supports simple string resources in the
ICU resource bundles. String resources may only be loaded by key.
Tables, arrays, and other complex resource types, are not supported.
Code that needs to load a message in Pegasus does not call ICU directly.
Two message loading classes were added for Pegasus 2.3: MessageLoader
and MessageLoaderParms. These classes are abstractions designed to
hide of the actual loader used. The MessageLoader is used to
load a message using a list of preferrred languages. The parameters
to MessageLoader are encapsulated in a MessageLoaderParms object.
The MessageLoaderParms object contains the parameters to load the message.
There are many parameters, but many can be allowed to default. Here
is a description of the parameters:
NOTE: WORK-IN- PROGRESS
String msg_id; | Input.
Required. |
Message ID of the message to load from the resource bundle. This is the key that ICU will use to load the message. |
String default_msg; | Input.
Required |
Message to return if the no message can be loaded for msg_id from a
resource bundle. Note: The args parameters below are substituted
into this string.
Note: For the args into this string, use the Pegasus '$' form, as described in pegasus/src/Pegasus/Common/Formatter.h. Don't use the ICU substitution format for the default message string. |
String msg_src_path; | Input.
Optional Default: $PEGASUS_HOME/msg/pegasus/pegasusServer |
Path to the root resource bundle file which contains the msg_id.
Do not include the language or file extension as part of the path.
Note: relative paths start at $PEGASUS_HOME/msg. |
AcceptLanguages acceptlanguages; | Input.
Optional Default: AcceptLanguages::EMPTY |
Contains the list of preferred languages, in priority order. This is combined with msg_src_path to determine which resource bundles to search for for the msg_id. If not EMPTY, overrides useThreadLocale and useProcessLocale. |
ContentLanguages contentlanguages; | Output | Contains the language that MessageLoader found for the msg_id. |
Boolean useProcessLocale; | Input
Optional Default = false |
If true, MessageLoader will use the default locale of the process. If true, overrides useThreadLocale. |
Boolean useThreadLocale; | Input
Optional Default = true |
If true, MessageLoader will use the locale of the caller's thread. |
Boolean useICUfallback | Input
Optional Default = false |
If true, use ICU's fallback mechnism to search more general resource bundles if the msg_id cannot be found. Note: the recommended setting is false if you are using an AcceptLanguages from a CIM client. The Accept-Languages HTTP header from the client contains the fallback specifications. |
Formatter::Arg arg0;
Formatter::Arg arg1; Formatter::Arg arg2; Formatter::Arg arg3; Formatter::Arg arg4; Formatter::Arg arg5; Formatter::Arg arg6; Formatter::Arg arg7; Formatter::Arg arg8; Formatter::Arg arg9; |
Input
Optional Default: Formatter::Arg( ) // empty arg |
These are the substitution variables, using the Pegasus Formatter::Arg class. |
Please refer to the following files for details on the new Pegasus classes.
The following example shows how a message may be loaded using the classes described above. Note: this a generic example. Each of the developer sections below have 'real-life' examples that are better suited to each type of code.
// Build an AcceptLanguages with some language elements
AcceptLanguages acceptLangs;
acceptLangs.add(AcceptLanguageElement("fr", 0.5));
acceptLangs.add(AcceptLanguageElement("de", 0.8));
acceptLangs.add(AcceptLanguageElement("es", 0.4));
// Construct a MessageLoaderParms
MessageLoaderParms parms("msgID", "default message");
parms. msg_src_path = "/my_msg_dir/my_bundle";
parms.acceptlanguages = acceptLangs;
// Note: If you have args, set them into MessageLoaderParms
// Load the localized String
String localizedMsg = MessageLoader::getMessage(parms);
Here are some basic rules for writing messages:
Providers that wish to globalize should consider the following in
their design:
To help providers handle the situations described above, Pegasus
2.3 will pass the Accept-Language received from the client to the provider.
The provider should load strings from its resource bundle based on the
client's Accept-Language. The client's Accept-Language is passed
to the provider in two ways:
The OperationContext will also contain a ContentLanguages object
that is set from the Content-Language in the client request. This
is the language of the CIM objects being passed to the provider on that
request. A localized provider should store the content language along
with the data from the CIM objects. This will allow the client to
use Accept-Language later to retreive the data in that language.
The provider should indicate the language of CIM objects it is returning
by calling setLanguage( ) on the ResponseHandler. This will be used
to set the Content-Language in the CIM response message sent back to the
client. If setLanguage( ) is not called, then no Content-Language
will be returned to the client. setLanguage( ) should only be called
once per response.
The following sample code shows a localized getInstance( ) where
the instance returned is localized based on the Accept-Language of the
client request. Note that this example also throws a localized exception.
void LocalizedProvider::getInstance(
const OperationContext & context,
const CIMObjectPath & instanceReference,
const Boolean includeQualifiers,
const Boolean includeClassOrigin,
const CIMPropertyList & propertyList,
InstanceResponseHandler & handler)
{
// convert a potential fully qualified reference
into a local reference
// (class name and keys only).
CIMObjectPath localReference = CIMObjectPath(
String(),
String(),
instanceReference.getClassName(),
instanceReference.getKeyBindings());
// begin processing the request
handler.processing();
// Find the instance to be returned.
Uint32 i;
Uint32 n = _instances.size();
for (i = 0; i < n; i++)
{
if(localReference
== _instanceNames[i])
{
// We found the instance to return
// Build the parameters for loading the localized string property.
// We are going to let the message loader parameters default to use the
// AcceptLanguages that Pegasus set into our thread.
// (this equals the AcceptLanguages requested by the client)
// Note: This parms object could be constructed once and
// reused.
MessageLoaderParms parms("myMsgID", "myDefaultString");
parms.msg_src_path = "/myprovider/msg/myResourceBundle";
// Load the string for the localized property from the resource bundle
String localizedString = MessageLoader::getMessage(parms);
// Remove the old property from the instance to be returned
Uint32 index = instances[i].findProperty("myProperty");
if (index != PEG_NOT_FOUND)
{
_instances[i].removeProperty(index);
}
// Add the localized string property to the instance
instances[i].addProperty(CIMProperty("myProperty", localizedString));
// The MessageLoader set the contentlanguages member
// of parms to the language that it found for the message.
ContentLanguages rtnLangs = parms.contentlanguages;
// We need to tag the instance we are returning with the
// the content language.
handler.setLanguages(rtnLangs);
// deliver requested instance
handler.deliver(_instances[i]);
break;
} // end if
}
// end for
// throw an exception if
the instance wasn't found
if (i == n)
{
// Build the parameters for loading the localized error message.
// We are going to let the message loader parameters default to use the
// AcceptLanguages that Pegasus set into our thread.
// (this equals the AcceptLanguages requested by the client)
// Note: This parms object could be constructed once and
// reused.
MessageLoaderParms errParms("myErrorMsgID", "myErrorDefaultString");
errParms.msg_src_path = "/myprovider/msg/myResourceBundle";
// Note: the exception calls MessageLoader::getMessage( )
// Note: no need to call handler.setLanguages( ) in this case
throw CIMObjectNotFoundException(errParms);
}
// complete processing the
request
handler.complete();
}
NOTE: A sample provider has been written that fully demonstates the
design issues described above. This provider is located at:
This sample provider also demonstrates how some of the special issues
can be handled. The special issues are caused by having a read/only
localized property and a read/write localized property. What happens
if the client sets the read/write property with a Content-Language that
is not one of the supported languages for the read/only property?
This provider allows the client to set any language into the read/write
property, and get that property back in the same language. This becomes
an issue when the client does a getInstance( ) later, because the Content-Language
on the returned instance applies to all the properties. A related
issue is what to return for Content-Language when the client does enumerateInstances,
but the instances have different languages. Recall that Content-Language
applies to the entire response (a limitation in the CIM specification).
NOTE: Indication Providers have other special considerations for
language support. Please refer to PEP58.
NOTE: The CMPI interface has been updated for language support.
Please refer to the CMPI documentation for details.
NOTE: SPECIAL ISSUES FOR OS/400 PROVIDERS:
Methods have been added to CIMClient to set the Accept-Language
and Content-Language on the request, and retrieve Content-Language on the
response.
Please refer to
Here is a code fragment that uses the new methods on CIMClient
//
// Get a localized instance in French
//
// Language priority is martian, pig-latin, and french.
We should
// get french back, even though its the lowest priority
AcceptLanguages acceptLangs;
acceptLangs.add(AcceptLanguageElement("x-martian"));
acceptLangs.add(AcceptLanguageElement("fr", 0.1));
acceptLangs.add(AcceptLanguageElement("x-pig-latin", 0.4));
// Set the requested languages into the CIMClient
client.setRequestAcceptLanguages(acceptLangs);
// Get the instance
CIMInstance instance = client.getInstance(
NAMESPACE,
cimNInstances[0].buildPath(sampleClass),
localOnly,
includeQualifiers,
includeClassOrigin);
// Get the string property that should be french
String returnedString;
instance.getProperty (
instance.findProperty("myProp")).
getValue().
get(returnedString);
// Check that we got back french
ContentLanguages CL_FR("fr");
String expectedFRString = "oui";
PEGASUS_ASSERT(CL_FR == client.getResponseContentLanguages());
PEGASUS_ASSERT(expectedFRString == returnedString);
//
// Create an instance in French
//
String oui = "Oui";
CIMInstance frInstance(CLASSNAME);
frInstance.addProperty(CIMProperty(
CIMName("myProp"),
oui));
CIMObjectPath frInstanceName = frInstance.buildPath(sampleClass);
client.setRequestContentLanguages(CL_FR);
client.createInstance(NAMESPACE, frInstance);
Also, refer to
NOTE: Consideration should be given for converting the UTF-16
characters in the String objects passed over the CIMClient interface to
a platform codepage. This is especially needed for EBCDIC platforms.
See the Provider developer section for details of the EBCDIC considerations.
TODO - some info on how CIMClient defaults the Accept-Languages.
The design for Pegasus releases beyond 2.3 is to avoid using hardcoded
messages. All new messages should be loaded from a Pegasus resource
bundle. This section describes the process to follow if you are creating
a new message. The process depends on where you are in the code.
Place any new Pegasus messages into one of the following resource
bundles:
For messages returned from one of the services in the Pegasus server
(eg. CIMOperationRequestDispatcher, or ProviderManagerService), the goal
is to make it easy for any code in the call chain to throw an exception
with a localized error string. The code throwing the exception will
not need to know the Accept-Language that the client requested. To
understand how this works, some design points need to described:
Server Design Points:
The CIMMessage object has been expanded to include an AcceptLanguages
object and a ContentLanguages object. For CIMRequestMessage, these
objects contain the Accept-Language and Content-Language headers that were
built from the client request. For CIMResponseMessage, the ContentLanguages
object is used to build the Content-Language header associated with the
CIM objects in the response message. The AcceptLanguages object
in the CIMResponseMessage is ignored.
The localization of the cimException object in the CIMResponseMessage
is handled separately from the CIM objects. The message string in
the cimException object is assumed to have been localized by the time it
is built into the XML. For this reason, the localization of the exception
is the responsibility of the code throwing the exception. (The goal
of the design is to make that easy - see below). The ContentLanguages
object in the CIMResponseMessage has NO relation to this exception.
The cimException object keeps its own localization information once it
is created.
To enable exceptions to be localized, the ability was added to set a
global language for all the code running from a Pegasus Thread object.
The top level code for a Thread can set a global AcceptLanguages object
that can accessed by all the low-level functions that it calls. This
will allow an exception thrown by low-level code to be localized based
on this global AcceptLanguages object. Note: This applies only
to Threads that are managed by a ThreadPool.
Each service in the request path of the Pegasus server sets the AcceptLanguages
into its Thread from the AcceptLanguages in the CIMRequestMessage object
that it dequeues. This sets the global langauge for all the functions
in the same thread that are called below handleEnqueue. If you
are writing a new service that processes requests, or discover a request
service that was missed, please do this. The CIMOperationRequestDispatcher
service is an example.
How to Throw a Localized Exception from Server code:
With all that background, here is how code running in a Pegasus service
can throw a localized exception:
This example assumes that the top-level code in the service had set
the global thread language beforehand. As described above, every
service in Pegasus should do that.
// First, construct a MessageLoaderParms
//
// Notes:
// 1) The errorMessageID must be in the Pegasus server resource
bundle.
// 2) The default message is the old "hardcoded" message.
// 3) The MessageLoaderParms will default to use the Pegasus
server resource bundle
// 4) The MessageLoaderParms will default to use the locale of
the current thread. Don't change this!
// 5) You might need to set the arguments for the message into
the MessageLoaderParms
MessageLoaderParms parms("errorMessageID", "default message");
// Second, throw the Exception
// Note: this applies to all the derived classes from Exception, including
the CIMException's
throw new Exception(parms);
NOTE: If you are throwing an Exception with un-localized data,
use the constructor that takes a String. An example of this would
be an Exception where you are passing in a file name. Most of the
"non-CIM" exceptions defined in Exception.h and InternalException.h take
un-localized data.
How to Load a Localized Message
For code that may not be running in a Thread with the global
language set, but has access to the AcceptLanguages object from the CIMMessage,
the code is simple:
// Construct a MessageLoaderParms
//
// Notes:
// 1) The errorMessageID must be in the Pegasus server resource
bundle.
// 2) The default message is the old "hardcoded" message.
// 3) The MessageLoaderParms will default to use the Pegasus
server resource bundle
// 4) The MessageLoaderParms will default to use the locale of
the current thread. You will change this below.
// 5) You might need to set the arguments for the message into
the MessageLoaderParms
MessageLoaderParms parms("errorMessageID", "default message");
// Tell the MessageLoaderParms which languages to search for.
// MessageLoaderParms will not use the thread locale in this case.
parms.acceptlanguages = <pass in the AcceptLanguages object>
// Load the localized String
String localizedMsg = MessageLoader::getMessage(parms);
New methods have been added to Logger to take a message ID of a
message to be loaded from the Pegasus server resource bundle. The
caller is only required to pass in the message ID, and the old "hardcoded"
message, and the args. The Logger will use MessageLoader to load
the message in the locale of the Pegasus server process, using the
hardcoded message as the default string. Please refer to pegasus/src/Pegasus/Logger.h
Code in the client side of the client/server CLIs (eg. cimconfig,
cimmof), or in directly linked CLIs (cimmofl), should use the useProcessLocale
setting in MessageLoaderParms. This will cause the messages to be
loaded in the locale based on the environment in which the program is running.
This locale can be set by the user before running the program.
TODO - describe how CIMClient will default the Accept-Language
from the process locale.
Copyright (c) 2003 BMC Software; Hewlett-Packard Development Company, L.P.; IBM Corp.; The Open Group
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
THE ABOVE COPYRIGHT NOTICE AND THIS PERMISSION NOTICE SHALL BE INCLUDED
IN ALL COPIES OR SUBSTANTIAL PORTIONS OF THE SOFTWARE. THE SOFTWARE IS
PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS
OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT
OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.