Pegasus Security Implementation Guidelines

Problem Statement

Open Pegasus has a challenging role. It provides a portal for users and programs to access a wide variety of information on a system. Pegasus is responsible for user authentication and provides a framework for the provider authors to authorize specific read and write operations on the server. Managing the resulting "trust delta" (the difference between what the provider could do in its current execution context, vs. what a given user is authorized to do) is hard. The bigger the trust delta, the greater the incentive to "break in" past authorizations in providers to "get to" a super-user/administrator execution context(or to the context of a user that can do something the authenticated user isn't authorized to do). Though the OpenPegasus 2.5 feature, "run-as-requestor," does provide a way to lower the risk to a given provider that takes advantage of the run-as-requestor context, there are still risks for the providers that decide to run at elevated privilege (defined as when the execution context has more permissions/abilities than the authorized users... hard to avoid when the execution is not as the authorized user).

Requirements, Constraints or Assumptions

Most of the risk of failing to follow the guidelines below are only present when code is not run as the authenticated user (whether that code be in the provider or the client) or in deployments for which the concept of authentication isn't used (i.e.: SNMP-public info). Even for more limited deployments, many of the problems below can still cause potential crashes/denial of service conditions.

Definitions:

Elevated code: a difference between the actions that the logged in user is authorized to perform, and the execution context of the running program. This "trust delta" must then be managed by the code to ensure that the user doesn't perform more actions than they are authorized either directly or through side-effect. For example, a process run as the UID of the authenticated user is thus said to be "non-elevated." A process running as administrator on behalf of a "non-administrator" user, would be called elevated.

Privilege: The collection of actions a process or user is not prevented from doing. An "administrator"/root user is said to be full privilege with respect to a system since that execution context does not prevent any action on the system.

Trust: The degree to which an actor that is interacting with the component under consideration is believed to behave non-maliciously. For example, an arbitrary user on the Internet has no 'trust," a junior operator or administration is trusted to no attempt malicious activity, but may accidentally attempt damaging actions. Root/Administrator code is trusted to behave correctly.

Security Testing: Non-functional testing that centers around behavior in the presence of malicious use. Examples includes testing for crashes or security side-effects in the presence of overly long inputs, special character inputs, high-system load, and network storm environments.

Security Side Effect: Applications often, in accomplishing their goals, perform actions beyond those visible to the user. Examples include writing temporary files, or clearing or requesting memory. Since this behavior is not specified in the functional requirements, it is often not tested. This "side effect" behavior is often the behavior that a malicious user attempts to leverage when trying to gain privilege. Examples include exploiting race conditions where a temporary file is momentarily world writeable, before it is chmod-ed. This window is an opportunity for a malicious user to insert data that can change the behavior of the application.

References:

Architecture: http://www.opengroup.org/security/secarch.htm

Books and References (not endorsed by Opengroup or partners):

Proposed Solution

General Implementation Guidelines:

Code that doesn't adhere to the following guidelines in elevated code should be considered a bug, including providers not running as-requestor. it is a best-practice to follow the following guidelines for all code.

Avoid buffer overflow vulnerabilities in your code (hackers use to insert arbitrary code)
Buffer overflows in network-accessible software cause the most, and some believe, the majority, of software vulnerabilities. Since Pegasus is written in C/C++, it is especially susceptible to buffer overflows. Strongly consider using a static tool like Flawfinder or RATS to look for common problems. Dynamic tools can also be used but only identify problems when the overflow actually happens vs. finding potential overflows. The susceptibility stems from a lack of bounds checks in C++ and C. Problematic functions include strcpy, sprintf, strcat, gets, and strlcat.

References:
• Smashing the stack for fun and profit, http://www.phrack.com/show.php?p=49
• Heap Overflows: http://www.phrack.org/phrack/57/p57-0x08

Avoid format string vulnerabilities in your code (hackers use to read or insert arbitrary code)
Format strings define the format and types of program variables that are substituted into an input or output string. Exploitation of format strings occurs when functions that require a format string are coded with a variable, and that variable is not validated. For example the following is vulnerable code: printf(string_from_untrusted_user) as the user can supply the format string, and read or overwrite (using %n) arbitrary data. Each developer must:

Use functions with a strict format string argument.
Check the input parameter data for format strings before assigning them to variables.
Always validate that the input parameter data does not contain any program-specific format characters before assigning the input data to variables.
Always check the return codes of library functions for failure.

Adhere to the general, good programming practices:
1. Always have people other than the coder review the code.
2. Always have people other than the coder develop tests and test the end product.
3. Check return codes from system or library calls, and handle errors or exceptions gracefully.
4. Keep your code simple: simple code decreases the risk of defects; complex code increases the risk of defects.
5. Don’t use uninitialized variables.
6. Use symbolic constants (such as #define) to minimize typos and improve code maintainability.
7. Use temporary files with care. Do not create temporary directories or files from your program that are world writeable. Limit the permissions to what is needed by the program. Clean up temporary files or directories when you are finished using them.
8. Do not put sensitive information in log files. For example, do not print social security numbers, passwords, credit card numbers, or any other sensitive or personal information for debugging purposes in the log files. Sometimes such information shows up in the web browser in case of exceptions or application failure.
9. Enforce strong password policies and a delay on failed logins. This helps to prevent unauthorized access to private data. See libpam for a good way to implement this.
10. Validate input to the program or system before processing the input. Test to see if the input is the proper type of data and in the range of acceptable or expected values and test both upper and lower bounds. For example, if you are reading in a year value (int), and you have already checked for buffer overflow and format strings:
  
  Correct: if 0 <= year <= 3000 then (accept input and process)
  
  Unsafe: if year <= 3000 then (accept input and process)
  
  The vulnerability of the unsafe example is that if someone were to return a value of –32769, then they could intentionally stop or corrupt a procedure. If the language which you are using does not enforce strong types, then a type check should also be performed before accepting the input.
Use the principles of least and necessary privilege.
Only grant the minimum set of privileges required to perform an operation, and grant these privileges for the minimum required amount of time. For example, if a provider needs to modify both mail queues and print spools, don’t run the application as root; instead, use /etc/logingroup or other facilities to give the application the privileges which it needs, but not more privileges than it needs.
Use SSL securely:
Please refer to the OpenSSL documentation for usage. Pegasus libraries help with some but not all functions necessary for certificate management and usage.
Handle race conditions securely
Race conditions occur when two or more processes access a shared resource in an order that was not expected by the program. Unordered access to resources is common in multitasking environments, and is mostly associated with either Signal Handlers or File Handling. For example, if a program "A" checks to see if a file exists before writing, but a program "B" creates a link after the check, but before the write, "A" may inadvertently overwrite the link destination with the permissions associated with "A". This can be a security problem if "A" has different permissions than "B."
Use secure defaults when possible, or clearly document when they aren't used
Design Securely
Design your code so that as little as possible runs as a privileged user. All privileged user code (especially if it listens on a network or executes on behalf of other users) should be inspected very thoroughly, so it should be short and simple. Each module of code should have a clean interface for other modules to use and a well-defined perimeter around each module. Additional details can be found at: Architectural Patterns for Enabling Application Security (http://www.joeyoder.com/papers/patterns/Security/appsec.pdf)
Test for security (use both positive and negative tests)
“Positive tests” verify that the functionality of the product works as specified. “Negative tests” attempt to subvert the security of the system, and are often overlooked when testing software. Spend some time thinking like a hacker and trying to break your system. Always test boundary conditions or corner cases for values of data, size of data, and type of data. Many common bugs are related to this. Sometimes this type of bug may result in wrong information being retrieved from the database instead of failing gracefully. Attempt to exploit the system with buffer overflow and format string attacks.
Don’t bundle private copies of security code
Security code (especially highly scrutinized open-source code) is likely to have security bulletins issued against it. When such security bulletins are inevitably issued against code you depend on, you don’t want to have to issue a bulletin against your product also. If you put a dependency in your code to a standard distribution of a component which you need (for example OpenSSL), rather than embedding a private copy, then whenever a security bulletin is issued against it, you won’t have to reissue the bulletin after repacking the fix for your private copy.

General Coding Best-Practices:

Avoid implementing security functionality: Making security claims in your documentation (beyond the implied security claims of authentication and authorization done by the operating system) can increase your risk of having a security defect. This is because any of those claims that are not fully implemented or enforced is by definition a security defect and requires an expedited fix and a security bulletin to announce that fix. Reuse of tried-and-tested code, that has been used in a security context is always a better choice. Never implement a random number generator or cryptographic algorithm unless you're a cryptographer by profession. You will almost certainly get it wrong.
You should always, however, document your security behavior.
Duplicating authorization code: Related to risk #1, every provider has the risk of authorization related defects because the authorization done in each provider is a duplication of the kernel authorization code. However, you can still decrease your risk by using common API’s. For example, many providers will need a way to tell if the authenticated user should have access to a given file. The code which does this needs to check the user id, group ids of the file in question and all of its parent directories. Any defect in this code could easily be a security defect and would need to be fixed in every copy of that code. For this reason it is imperative that this logic exists in only one place and that your provider uses that copy. Do not try to replicate this complex logic in your own provider, unless you are the single owner of that code.
WBEM provider/client combinations: Writing a WBEM provider that is also a WBEM client (makes requests of other providers) has security risks/challenges. There are two subcategories of this risk/challenge:
1. using the connectLocal() API uses the UID of the running process to do authentication. Thus, the provider initiating the request must ensure authorization of the other provider’s data before making the request. (This is another example of Risk 2, multiple copies of authorization code) One feasible way to do this is to check that the user is a privileged user before calling the other provider (in which case the UID matches the running process)
2. Using the connect() API has additional complexities. Credentials must be somehow passed into the provider and then handled appropriately. Also, there are additional client responsibilities as far as certificate validation and testing, and the consequences are more severe because the client is running with elevated privileges

Provider Implementation Guidelines

Code that doesn't following the following guidelines in providers running at elevated-privilege should be considered a bug.
Code in providers running as-requestor can consider the following as general best practice.

Check the username/uid and execute every method as if it was running as that user (i.e. had the OS kernel or authorization service done the authorization).
By checking each operation they perform, and ensuring those operations, when performed on behalf of a non-privileged user, do not have security side-effects. Any discrepancy between OS authorizations done by the kernel and that done by the provider that is not part of documented behavior is a security defect. If the user does not have the privileges to perform the requested operation the Provider must throw CIMAccessDeniedException.
Keep your design/provider simple.
While this is difficult to quantify, it is important to minimize the amount of code running as a privileged user. As a general guideline, if you have significant lines of code running with elevated privilege, the likelihood of a security defect is high. Remember that defects in elevated privilege code is a potential security defect, so all of this code must be straightforward and easy to review based on the principles mentioned in the General Coding principles above. The likelihood of a defect not being found is proportional to the amount and complexity of the code.
Provider must not use any calls such as setuid or setting environment variables (i.e. PATH) that would alter the state of the process running the CIM Server.
This could cause unexpected results for other providers or threads.
Provider must document property authorizations.
Specifically, the provider should describe which data elements they make available for reading, which system changes they are capable of making, and which users will be able to read those elements and make those changes.
Provider must check all untrusted input for validity.
While the CIM Server ensures that the input is a valid CIM request, the provider is responsible for validating that the CIM request does not cause any side effects by ensuring that the input strings contain only expected characters and that values are within an expected range. Examples of input data that must be checked include directory or file names, data within files that are read by the provider, and data returned from system calls.
Provider must execute stress tests.
These include operation in the presence of multiple interacting provider requests. Based on a white box analysis of your provider, identify ways in which testing could stress your provider. For example, sending large input strings, a large number of simultaneous requests, requests including out-of-bounds data, or ensuring that every branch is covered are just a few ways that you could stress your product to find potential defects. By exploring the way your provider fails, you can look for side effects that might lead to "infinite" resource requests, overwritten data, or other anomalies that could cause a denial-of-service or reveal a side-effect that can be leveraged as an exploit.
Design your provider to expect belligerent input.
For example, have a common method that validates all CIM requests and ensure that that method gets called for every request. The method should assume that input is invalid unless it matches a specific format and specific bounds are checked. Also, if your provider allocates any memory buffers or writes to any file based on user input, all error conditions (out-of-memory, disk full, file is a symbolic link/device file/directory instead of the expected format, buffer/array too small for data, etc.) should be checked and all of this should be enforced in a common place.
Do not allow group or world-write access to your shared library, any other executable code, configuration files, or any parent directory of any of the above.
Although only a privileged user ought to be able to create the symbolic links or shortcuts to the provider shared library in the designated WBEM provider library directory, the actual provider shared library can be placed in any directory. A provider must ensure that their shared libraries are protected in such a way that only a privileged user can modify or delete the shared library or the directory where the shared library is located.

Provider Best-Practices:

Use "UserContext registration" setting, present in Open Pegasus 2.5 and later:
In Pegasus 2.5 and after, you should strongly consider registering your provider to run as requestor context, or if not available, use Windows "impersonation" or fork a correct-user-running process. For providers in versions prior to 2.5, you may want to consider implementing your own out of process provider, to avoid the risks of running at elevated privilege. For those that must run privileged:

Check that the authenticated username provided by CIM matches the effective user id of the running process. For Pegasus 2.4 and prior, this means that only the privileged user would be able to use your provider. The general property is that if you are not elevating privilege (running on behalf of a different user), then the likelihood of a security defect is greatly decreased. Making your code more general may mean less work in the future when non-privilege-elevated providers are able to run with the correct user-id. Even in the model where Pegasus is run under a non-privileged user, there is a delta in "trust" between the different users. This still represents some, though not as much, risk as deploying a run-as-administrator Pegasus. There is an opportunity to improve Pegasus to better support fully protecting this use model, though this is less urgent than protecting the higher risk associated with an administrator running CIM server.
Recommend configuring the WBEM users group (ref: PEP 142): For Pegasus versions prior to 2.5, and subsequent, customers can configure a specific group of users who has access to WBEM providers. This allows customers to choose a tradeoff between security risk and ease-of-setup. Since every provider runs with elevated privilege, the risk of security defects is high. Thus, it is advised that customers configure this group of WBEM users to only allow access to users who are trusted not to be malicious. If you also do not run by default, this information can be in your initial setup documentation so that it gets to all of your customers. This can greatly decrease your risk of having a security defect, because all malicious activities can be potentially ruled out.

Providers should consider the tradeoff between default installation/registration and optional: An optional installation of a component (as part of an OS or software package) gives customers a choice as to whether or not to limit their interface/exposure, and maintenance/patch burden. Your provider likely meets a real need for many customers, but there are also customers who do not need the functionality you provide. There are many customers who would prefer less patching/update cost and decreased security risk (risk is added whenever there is a new interface) versus the functionality that your product provides. Although technically this doesn’t decrease the risk of having a security defect, it can give you more options for interim workarounds until you can get a critical fix out, and fewer customers would be affected by any given defect. Provider writers and bundlers should consider these benefits and weigh those against the bundling benefits of mandatory inclusion.
Log important events, such as unauthorized requests: This can help a customer track down a potential intrusion as well as debug problems. Do not include confidential information, such as passwords, in the log. Ensure that the confidentiality of information stored in the log is commensurate with access to the log. It is recommended that you use a common logging facility, such as syslog. Syslogd takes care of things like log rotation, etc. and the administrator already knows where to look for your logs.
When making system changes, use platform security checks where possible vs. rewriting your own authorization code: Duplicating authorization code at least doubles the work and is more error-prone.

Client Implementation Guidelines:

Note: In general, these are the responsibility of the applications invoking CIM client libraries to the extent that the client libraries don't yet provide the direct support.

Client code that doesn't follow these guidelines should be considered a bug:

Use SSL as follows in your remote production client. Though WBEM does provide libraries to help, client behavior is the client's responsibility:
Protect the Keystore and Truststore for remote production clients:
- Use proper file and directory permissions to protect keystore and truststore files.
- If your applications are importing the servers’ certificate to a truststore, you must ensure that the user validates the certificates received before adding them to a truststore or keystore.
- Do not use less than 1024 bit keysize to create keystores.
- Keystores/truststores should not be readable or writeable by anyone other than the user who owns them.
General programming standards
- Do not use world-writable files or directories (including /tmp and /var/tmp). Make sure all credentials (passwords/certificates) are readable only by their owner.
- Do not cache passwords unless directed to do so by the user. The user should be aware that their password is being stored permanently on the client machine.
- Do not pass passwords as an option on the command-line in non-windows clients. Command-lines are visible to all users on the system in some operating systems.

General client best-practices:

Limit access to client data: Each user of a WBEM client should have his/her own WBEM client instance. The WBEM client process should run as the correct user on the client machine.
Local vs. Remote Requests and Username/Password Authentication: Use the connectLocal() API call to connect to the CIM server whenever possible. To use this API call properly, the process must run with the correct userid

Warning: For Pegasus earlier than 2.5, doing client operations from a CIM provider significantly increases your security risk if the initial client requester was not running as root. This is due to the implementation which runs the provider in the CIM Server process space with a single, often privileged, user so the provider it connects to will be unable to use built-in authentication. Providers issuing WBEM client operations must adequately address the security risk. A few alternatives to address the security concern are: 1) ensure (either at design time or at runtime in the provider) that the user is authorized to access the data being requested from the second provider, and 2) the provider could launch another process and issue the request to the second provider as the intended user.
Background on connectLocal():
A local connection mechanism exists for clients to communicate with the CIM Server on the same system. The connectLocal() function is used for this purpose, and does not take any arguments. In the case where PEGASUS_LOCAL_DOMAIN_SOCKET is defined, (default on all but Windows, as currently the Windows connectLocal authentication is not functional as of 2.5) the user ID passed to the provider is that of the process in which the client program is running. The CIM Server verifies that the user ID of the request is indeed that of the requesting process. Namespace authorization, if enabled, is still performed. When the client must be able to connect to a CIM Server on a remote system, or when it must be able to specify a different user than that of the process, it must use the connect() function. This function allows a hostname and port number to be specified, as well as a username and password. If you need to use the connect() API, the WBEM client has several responsibilities to ensure correct authentication and to protect confidential information. Because connectLocal() does not use SSL, these guidelines only apply to the connect() interface. Using connectLocal() bypasses these requirements except where PEGASUS_LOCAL_DOMAIN_SOCKET is not defined. In that case, it behaves like connect(), using HTTPS and/or HTTP as defined in Pegasus settings.
General programming standards
- Design for belligerent input. A separate module should be responsible for validating all input before taking any action. Invalid input should be discarded. If you client has high availability requirements, deal with invalid input quickly to avoid Denial of Service attacks.
- Use a strongly-typed language if possible (i.e. Java). If your client is in C++, then use a security scanner such as RATS (http://www.securesoftware.com/resources/download_rats.html) to identify problem areas and follow the recommendations. (Note: code scanners such as these tend to make a lot of recommendations, so plan on adequate time for manual analysis and focus on your input validation module.)
- Do not use world-writable files or directories (including /tmp and /var/tmp). Make sure all credentials (passwords/certificates) are readable only by their owner.
- Do not cache passwords unless directed to do so by the user. The user should be aware that their password is being stored permanently on the client machine.
- Do not pass passwords as an option on the command-line on non-windows systems. Command-lines on non-windows systems are visible to all users on the system.
- If possible, do not make any server-initiated changes on the client system. Doing so increases the risk of security vulnerabilities in your client, and a security reviewer should be consulted.
- If possible, log events of interest, including certificate warning messages and invalid responses sent from the server. Doing so increases the ability of a user or system administrator to track down unauthorized actions. Use either a user-specific logfile or syslog. Be sure to check for corner cases like disk-space limitations.
- HTTP Indications should only be used to send confidential information in environments where the risk of exposure to man-in-the-middle type attacks is low (e.g. where a rogue CIM Listener could intercept indications). If your listener expects to receive confidential information, be sure to document that this information will be visible to anyone on the network clearly to the customer initiating the subscription.
Security Testing Guidelines
- Run the following tests, and ensure that your client gives a useful error message and does not crash. Crashes on strange and unexpected input are, at a minimum, a denial-of-service, and often represent buffer or format-string vulnerabilities.:
  - CIM server you are connecting to is not available (disabled or network problems)
  - CIM server responds with an extremely large response
  - CIM server or provider responds with invalid characters or garbage in the response
  - CIM server returns ‘access denied’

Platform Considerations

The coding guidelines may not help, but will not hurt implementations where Pegasus and its providers are not run at elevated privilege. Examples of this include environments with only one user or where Pegasus itself is executed as the requesting user.

Copyright (c) 2005 EMC Corporation; Hewlett-Packard Development Company, L.P.; IBM Corp.; The Open Group; VERITAS Software Corporation

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

THE ABOVE COPYRIGHT NOTICE AND THIS PERMISSION NOTICE SHALL BE INCLUDED IN ALL COPIES OR SUBSTANTIAL PORTIONS OF THE SOFTWARE. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.