Tuesday, February 24, 2015

A 'Robust' Schema Approach for SCIM

This article was originally posted on the Oracle Fusion Blog, Feb 24, 2015.

Last week, I had a question about SCIM's (System for Cross-domain Identity Management) approach to schema. How does the working group recommend handling message validation? Doesn't SCIM have a formal schema?

To be able to answer that question, I realized that the question was about a different style of schema than SCIM supports. The question was assuming that “schema” is defined how XML defines schema as a way to validate documents.

Rather then focus on validation, SCIM’s model for schema is closer to what one would describe as a database schema much like many other identity management directory systems of the past. Yet, SCIM isn't necessarily a new web protocol to access a directory server. It is also for web applications to enable easy provisioning. The SCIM schema model is "behavioural" - it defines the attributes and associated attribute qualities a particular server supports. Do clients need to discover schema? Generally speaking they do not. Let’s take a closer look at schema in general and how SCIM’s approach supports cross-domain schema issues.

Many Definitions of Schema and Many Schema Practices

Looking at the definition in Wikipedia, schema is a very broadly defined term. It can define a software interface, a document format (such as XML Schema), a database, a protocol, or even a template. There is even a new JSON proposal called JSON Schema. This too is very different from XML Schema. It has some elements that describe data objects, but JSON Schema focuses a lot more defining a service and more closely resembles another schema format: WADL.

With XML schema, the bias seems to be about “enforcement” and “validation” of documents or messages. Yet, for many years, the REST/JSON community has been proud of resisting formalizing “schema”. May it just hasn't happened yet. This does appear to be an old debate with two camps claiming the key to interoperability is either strict definition and validation, or strict adherence to flexibility or “robustness” or Jon Postel’s law [from RFC 793]:

“Be conservative in what you do, be liberal in what you accept from others.” 

12 years ago or so, Arran Swartz blogged "Postel's law has no exceptions!". I found Tim Bray’s post from 2004 to be enlightening - "On Postel, Again". So, what is the right approach for SCIM?

The Identity Use Case

How does SCIM balance the "robustness" vs. "verifiability" to achieve inter-operability in a practical and secure sense? Consider that:

There is often a cross-domain governance requirement by client enterprises that information be reasonably accurate and up-to-date across domains.
Because the mix of applications and users in each domain are different, the schema in one domain is will never exactly be the same as in another domain.
Different domains may have different authentication methods and data to support those methods and may even support federated authentication from another domain.
A domain or application that respects privacy tends to keep and use only the information it has a legitimate need for rather than just a standard set of attributes.
An identifier that is unique in one domain may not be unique in another. Each domain may need to generate its own local identifier(s) for a user.
A domain may have value-added attributes that other domains may or may not be interested in.

SCIM’s Approach

SCIM’s approach is to allow a certain amount of “specified" robustness that enables each domain to accept what it needs, while providing some level of assurance that information is exchanging properly. This means that a service provider is free to drop attributes it doesn't care about when being provisioned from another domain, while the client can be assured that the service provider has accepted their provisioning request. Another example, is a simple user-interface requirement where a client retrieves a record, changes an attribute and puts it back. In this case, the SCIM service provider sorts out, whether some attributes are to be ignored because they are read-only, and updates the modifiable attributes. The client is not required to ask what data is modifiable and what isn’t. This isn't a general free-for-all, that the server can do whatever it wants. Instead, the SCIM specifications state how this robust behaviour is to work.

With that said, SCIM still depends largely on compliance with HTTP protocol and the exchange of valid JSON-parsable messages. SCIM does draw the line with regards to the information content “validation” in an abstract sense like XML schema does.
Does the SCIM completely favour simplicity for SCIM clients? Not exactly. Just as a service provider needs to be flexible in what it accepts, so too must SCIM clients when a service provider responds. When a SCIM service provider responds to a client request the client must be prepared to accept some variability in SCIM responses. For example, if a service provider returns a copy of a resource that has been updated, the representation always reflects the final state of the resource on the service provider . It does not reflect back exactly what the client requested. Rather, the intent is that the service provider informs the client about the final state of a resource after a SCIM request is completed.

Is this the right model?

Let’s look at some key identity technologies of the past, their weak points and their strong points:

  • X.500 was a series of specifications developed by the ITU in 1988. X.500 had a strict schema model that required enforcement. One of the chief frustrations for X.500 developers (at least for myself) was that while each server had its own schema configuration, clients were expected to alter their requests each time. This became particularly painful if you were trying to code a query filter that would work against multiple server deployments. If you didn’t first “discover” server configuration and adjust your code, your calls were doomed to fail. Searching became infuriating when common attributes weren’t supported by a particular server deployment since the call would be rejected as non-conformant. Any deviation was cause for failure. In my experience X.500 services seemed extremely brittle and difficult to use in practice.
  • LDAP, developed by the IETF in 1996, was based on X.500, but loosened things up somewhat. Aside from LDAP being built for TCP/IP, LDAP took the progressive step of simply assuming that if a client specified an undefined attribute in a search filter, that there was no match. This tiny little change meant that developers did not have to adjust code on the fly, but could rather build queries with “or” clauses profiling common server deployments such as Sun Directory Server vs. Microsoft Active Directory and Oracle Directory. Yet, LDAP still carried too many constraints and ended up with some of the brittleness as X.500. In practice, the more applications that integrated with LDAP the less able a deployer was able to change schema over time. Changing schema meant updating clients and doing a lot of staged production testing. In short, LDAP clients still expected LDAP servers to conform to standard profiles.
  • In contrast to directory or provisioning protocols, SAML is actually a message format for sending secure assertions. To be successful, SAML had to ensure a lot of optionality that depended on “profile” specifications to clearly define how and when assertions could be used. A core to its success has been clear definition of MUST understand vs. MUST ignore. In many cases, if you don’t understand an assertion value, you are free to ignore it. This opens the door to extensibility. On the other hand, if as a relying party you understand an attribute assertion, then it must conform to its specification (schema).

In our industry, we tend to write security protocols in strict forms in order to assure security. Yet we've often achieved brittleness and lack of usability. Because information relationships around identity and the attributes consumed are constantly variable, history appears to show that identity protocols that have robust features are incrementally more successful. I think SCIM as a REST protocol, moves the ball forward by embracing a specified robust schema model, bringing significant usability features over the traditional use of LDAP.

Post-note: I mentioned in my last blog post that SCIM had reached 'last call'. The working group has felt that this issue is worth more attention and is currently discussing clarifications to the specifications as I have discussed above.

Tuesday, December 16, 2014

Standards Corner: IETF SCIM Working Group Reaches Consensus

On the Oracle Fusion blog, I blog about the recent SCIM working group consensus, SCIM 2's advantages, and its position relative to LDAP.

Friday, May 30, 2014

Standards Corner: Preventing Pervasive Monitoring

On Wednesday night, I watched NBC’s interview of Edward Snowden. The past year has been tumultuous one in the IT security industry. There has been some amazing revelations about the activities of governments around the world; and, we have had several instances of major security bugs in key security libraries: Apple's ‘gotofail’ bug  the OpenSSL Heartbleed bug, not to mention Java’s zero day bug, and others. Snowden’s information showed the IT industry has been underestimating the need for security, and highlighted a general trend of lax use of TLS and poorly implemented security on the Internet. This did not go unnoticed in the standards community and in particular the IETF.
Last November, the IETF (Internet Engineering Task Force) met in Vancouver Canada, where the issue of “Internet Hardening” was discussed in a plenary session. Presentations were given by Bruce SchneierBrian Carpenter,  and Stephen Farrell describing the problem, the work done so far, and potential IETF activities to address the problem pervasive monitoring. At the end of the presentation, the IETF called for consensus on the issue. If you know engineers, you know that it takes a while for a large group to arrive at a consensus and this group numbered approximately 3000. When asked if the IETF should respond to pervasive surveillance attacks? There was an overwhelming response for ‘Yes'. When it came to 'No', the room echoed in silence. This was just the first of several consensus questions that were each overwhelmingly in favour of response. This is the equivalent of a unanimous opinion for the IETF.
Since the meeting, the IETF has followed through with the recent publication of a new “best practices” document on Pervasive Monitoring (RFC 7258). This document is extremely sensitive in its approach and separates the politics of monitoring from the technical ones.
Pervasive Monitoring (PM) is widespread (and often covert) surveillance through intrusive gathering of protocol artefacts, including application content, or protocol metadata such as headers. Active or passive wiretaps and traffic analysis, (e.g., correlation, timing or measuring packet sizes), or subverting the cryptographic keys used to secure protocols can also be used as part of pervasive monitoring. PM is distinguished by being indiscriminate and very large scale, rather than by introducing new types of technical compromise.
The IETF community's technical assessment is that PM is an attack on the privacy of Internet users and organisations. The IETF community has expressed strong agreement that PM is an attack that needs to be mitigated where possible, via the design of protocols that make PM significantly more expensive or infeasible. Pervasive monitoring was discussed at the technical plenary of the November 2013 IETF meeting [IETF88Plenary] and then through extensive exchanges on IETF mailing lists. This document records the IETF community's consensus and establishes the technical nature of PM.
The draft goes on to further qualify what it means by “attack”, clarifying that
The term is used here to refer to behavior that subverts the intent of communicating parties without the agreement of those parties. An attack may change the content of the communication, record the content or external characteristics of the communication, or through correlation with other communication events, reveal information the parties did not intend to be revealed. It may also have other effects that similarly subvert the intent of a communicator.
The past year has shown that Internet specification authors need to put more emphasis into information security and integrity. The year also showed that specifications are not good enough. The implementations of security and protocol specifications have to be of high quality and superior testing. I’m proud to say Oracle has been a strong proponent of this, having already established its own secure coding practices.

Cross-posted from Oracle Fusion Blog.

Monday, May 12, 2014

Draft 05 of IETF SCIM Specifications

I am happy to announce that draft 05 of the SCIM specifications has been published at the IETF. We are down to a handful of issues (8) to sort out.

Major changes:

  • Clarifications on case preservation and exact match filter processing
  • Added IANA considerations
  • Formalized internationalization and encoding (UTF-8)
  • Added security considerations for using GET with confidential attributes
  • General editing and clarifications

Wednesday, April 9, 2014

Standards Corner: Basic Auth MUST Die!

Basic Authentication (part of RFC2617) was developed along with HTTP1.1 (RFC2616) when the web was relatively new. This specification envisioned that user-agents (browsers) would ask users for their user-id and password and then pass the encoded information to the web server via the HTTP Authorization header.

Basic Auth approach quickly died in popularity in favour of form based login where browser cookies were used to maintain user session, rather than repeated re-transmission of the user-id and password for each web request. Basic Auth was clinically dead and ceased being the "state-of-the-art" method for authentication.

These days, now that non-browser based applications are increasing in popularity, one of the first asks by architects is support for Basic Authentication. It seems the Basic Authentication "zombie" lives on. Why is this? Is it for testing purposes?

Why should Basic Authentication die?

Well, for one, Basic Auth requires that web servers have access to "passwords" which have continually been shown to be one of the weakest security architecture. Further, it requires that the client application ask users directly for their user-id and password greatly increasing the points of attack a hacker might have. A user giving an application (whether a mobile application or a web site) their user-id and password is allowing that application the ability to impersonate the user.  Further, we now know that password re-use continues to undermine this simple form of authentication.

There are better alternatives.

A better alternative uses "tokens", such as the cookies I mentioned above, to track client/user login state. An even better solution, not easily done with Basic Auth, is to use an adaptive authentication service whose job it is to evaluate not only a user's id and password, but can also evaluate multiple factors for authentication. This can go beyond the idea of something you know, to something you are, and something you have types of factors. Many service providers are even beginning to evaluate network factors as well, such as, has the user logged in from this IP address and geographical location before?

In order to take advantage of such an approach, the far better solution is to demand OAuth2 as a key part of your application security architecture for non-browser applications and APIs. Just like form-based authentication dramatically improved browser authentication in the 2000s, OAuth2 (RFC6749 and 6750), and its predecessor, Kerberos, provide a much better way for client applications to obtain tokens that can be used for authenticated access to web services.

Token authentication is far superior because:
  • Tokens cleanly separate user authentication and delegation from the application's activities with web services.
  • Tokens do not require that clients impersonate users. They can be highly scoped and restrictive in nature.
  • The loss of a token, means only a single service is compromised where as the loss of a password compromises every site where a user-id and password is used.
  • Tokens can be issued by multi-factor authentication systems.
  • Tokens do not require access to a password data store for validation.
  • Tokens can be cryptographically generated and thus can be validated by web services in a "stateless" fashion (not requiring access to a central security database).
  • Tokens can be easily expired and re-issued.
RFC 2617 Basic Authentication is not only dead. It needs to be buried. Stop using it. You can do it!

Cross-posted from Oracle Fusion Blog.

Thursday, March 13, 2014

Standards Corner: Maturing REST Specifications and the Internet of Things

Cross-posted from the Oracle Fusion Middleware Blog.
As many of you know, much of today's standards around REST center around IETF based specifications. As such, I thought I would share some RESTful services related news coming from last week's IETF meetings. Many working groups are now in the final stages of moving key specifications into standard status…

Friday, February 14, 2014

New IETF SCIM drafts - Revision 03 Details

Yesterday, the IETF SCIM (System for Cross Domain Identity Management) Working Group published new draft specification revisions:

This draft was essentially a clean-up of the specification text into IETF format as well as a series of clarifications and fixes that will greatly improve the maturity and interoperability of the SCIM drafts. SCIM has had a number of outstanding issues to resolve and in this draft, we managed to knock off a whole bunch of outstanding issues - 27 in all! More change log details are also available in the appendix of each draft.

Key updates include:

  • New attribute characteristics: 
    • returned - When are attributes returned in response to queries
    • mutability - Are attributes readOnly, immutable, readWrite, or writeOnly
    • readOnly - this boolean has been replaced by mutability
  • Filters
    • A new "not" negation operator added
    • A missing ends with (ew) filter was added
    • Filters can now handle complex attributes allowing multiple conditions to be applied to the same value in a multi-valued complex attributes. For example:
      • filter=userType eq "Employee" and emails[type eq "work" and value co "@example.com"]
  • HTTP
    • Clarified the response to an HTTP DELETE
    • Clarified support for HTTP Redirects
    • Clarified impact of attribute mutability on HTTP PUT requests
  • General
    • Made server root level queries optional
    • Updated examples to use '/v2' paths rather than '/v1'
    • Added complete JSON Schema representation for Users, Groups, and EnterpriseUser.
    • Reformatting of documents to fit normal IETF editorial practice
Thanks to everyone in the working group for their help in getting this one out!

Tuesday, December 17, 2013

Double-blind Identity

Note: Cross-posted from the Oracle Fusion Blog.

On November 13 and 14, the Government of British Columbia, Canada, launched the first in a series of public consultations on identity and digital services. For several years now, BC has been working on a new identity services project that would enable citizens to securely access government services online. For BC, there is clear motivation: reducing identity management and fraud costs in everything from drivers licenses to health insurance. BC's hope is that this can play a role in helping provide better services down the road as well as improving the overall privacy of residents.

For background to the challenges and motivations for BC, check out former BC CIO Dave Nikolejsin's talk on the importance of identity management in 2012 and the principles behind BC's identity project:

Incidentally my favourite quote from this video is: "Government will never be downstream from your Facebook account".

This first public consultation was focused as technical level-set meeting including standards community members from companies like Microsoft and Oracle and other vendors, as well as some members US Govt's NSTIC program, New Zealand Gov't, and British Columbia including its privacy commissioner, Elizabeth Denham. Most importantly, the meeting included several members of the general BC public picked at random from over 15,000 applicants. The day was spent educating and putting participants on a more equal footing on the basics of identity theory as well the challenges and objectives BC has for this program.

As part of this level-set, Collin Wallis, Identity Architect for the New Zealand Gov't, gave a great talk on how NZ and BC are similar in size and have many of the same overall cultural values and privacy concerns. He spoke about NZ's current "RealMe" deployments and their immediate focus on the "authentication". A similar presentation is available here.
The BC Gov't showed an overall architecture based on issuing new driver's licenses and Government Services Cards with near field chips (NFCs) supplied by SecureKey. The architecture depends mainly on OAuth2 authorization and SAML federation techniques that provide a "directed" identifier approach creating an infrastructure that has privacy qualities that could be said to be "double-blind".

You may recall that the term "double-blind" comes from medical research where both the tester and subject are blinded.  In the case of identity, the term implies the authenticator and relying party are blinded in a way that ensures privacy for the subject (the citizen).  The authenticator is not able to observe what personal information is shared or how the identifier is used assuring the citizen of freedom of use without monitoring. The relying party receives a directed identifier for the citizen unique to the relying party's use (see Kim Cameron's Law of Identity #4). What this stuff means is that two relying parties cannot share information directly based on a common identifier (as would happen if one used an omni-directional identifier like a SSN/SIN or Driver's License number). The same individual authenticating to two different parties appears to have separate identities. Hence the architecture could be said to be a double-blind system with strong privacy enabling qualities in its core design.

If successful, NZ and BC citizens may soon use their New Zealand RealMe or BC Government Services Id as a common authenticator to login and/or obtain services, while at the same time, avoid the privacy problems that occur when using a common identity such as offered by social networks, using personal email addresses. Their biggest challenge: BC and NZ need to avoid the fears of other national identity programs such as the US's "REAL ID" or the UK's "National ID" programs by building public confidence in its privacy-centred architecture.

For me personally, the most inspirational aspect of the consultation process was the leadership that Minister Andrew Wilkinson and his CTO, Ian Bailey, showed by adopting an "unconference" format that those of us in the IAM industry have come to know well at our regular meeting Internet Identity Workshop meetings. Following the first day's level-set session, an unconference session was led by Kaliya Hamlin (aka IdentityWoman) as well as Aron and Mike of IdentityNorth on the second day. This format was so successful that the group acknowledged highlights of the day were the members of the general public who chose not only to participate in deep theoretical discussion, but lead 3 of the most insightful sessions of the day!

To be clear, double-blind identity is not necessarily that new--we may have just put a term to it. After-all, there have been many systems with pseudononymous authentication. However, as technologists, we've been too accustomed to centralized architectures where both authentication and personal information in the form of claims come from a common provider (an Identity Provider). In these traditional enterprise architectures privacy has taken a back seat since employees don't often need to expect privacy in regards to performing their jobs. In contrast, these two governments are showing that privacy-enabled identity systems require separation of personal information from authentication services. I wonder what this means for cloud services architectures of the future and whether the current all-knowing social networks can survive the privacy problems they are running into by amassing so much information. It certainly suggests that the social network big data approach where all personal information is in one place is not the right way to go if governments really care about privacy.

Disclosure:  I am a resident of British Columbia. While my employer is a supplier to the government of British Columbia, I am not currently directly involved in this or any other projects with BC. My participation and comments are based as both resident and a member of the identity standards community. The views expressed are my own and do not necessarily reflect those of my employer.

Monday, November 4, 2013

Standards Corner: OAuth WG Client Registration Problem

Update: Cross-Posted on the Oracle Fusion Middleware blog.

This afternoon, the OAuth Working Group will meet at IETF88 in Vancouver to discuss some important topics important to the maturation of OAuth. One of them is the OAuth client registration problem.

OAuth (RFC6749) was initially developed with a simple deployment model where there is only monopoly or singleton cloud instance of a web API (e.g. there is one Facebook, one Google, on LinkedIn, and so on). When the API publisher and API deployer are the same monolithic entity, it easy for developers to contact the provider and register their app to obtain a client_id and credential.

But what happens when the API is for an open source project where there may be 1000s of deployed copies of the API (e.g. such as wordpress).  In these cases, the authors of the API are not the people running the API. In these scenarios, how does the developer obtain a client_id?

An example of an "open deployed" API is OpenID Connect. Connect defines an OAuth protected resource API that can provide personal information about an authenticated user -- in effect creating a potentially common API for potential identity providers like Facebook, Google, Microsoft, Salesforce, or Oracle.  In Oracle's case, Fusion applications will soon have RESTful APIs that are deployed in many different ways in many different environments. How will developers write apps that can work against an openly deployed API with whom the developer can have no prior relationship?

At present, the OAuth Working Group has two proposals two consider:

Dynamic Registration

Dynamic Registration was originally developed for OpenID Connect and UMA. It defines a RESTful API in which a prospective client application with no client_id creates a new client registration record with a service provider and is issued a client_id and credential along with a registration token that can be used to update registration over time.

As proof of success, the OIDC community has done substantial implementation of this spec and feels committed to its use.  Why not approve?

Well, the answer is that some of us had some concerns, namely:
  1. Recognizing instances of software - dynamic registration treats all clients as unique. It has no defined way to recognize that multiple copies of the same client are being registered other then assuming if the registration parameters are similar it might be the same client.
  2. Versioning and Policy Approval of open APIs and clients - many service providers have to worry about change management. They expect to have approval cycles that approve versions of server and client software for use in their environment.  In some cases approval might be wide open, but in many cases, approval might be down to the specific class of software and version.
  3. Registration updates - when does a client actually need to update its registration?  Shouldn't it be never?  Is there some characteristic of deployed code that would cause it to change?
  4. Options lead to complexity - because each client is treated as unique, it becomes unclear how the clients and servers will agree on what credentials forms are acceptable and what OAuth features are allowed and disallowed.  Yet the reality is, developers will write their application to work in a limited number of ways. They can't implement all the permutations and combinations that potential service providers might choose.
  5. Stateful registration - if the primary motivation for registration is to obtain a client_id and credential, why can't this be done in a stateless fashion using assertions?
  6. Denial of service - With so much stateful registration and the need for multiple tokens to be issued, will this not lead to a denial of service attack / risk of resource depletion?  At the very least, because of the information gathered, it would difficult for service providers to clean up "failed" registrations and determine active from inactive or false clients.
  7. There has yet to be much wide-scale "production" use of dynamic registration other than in small closed communities.

Client Association

A second proposal, Client Association, has been put forward by Tony Nadalin of Microsoft and myself. We took at look at existing use patterns to come up with a new proposal. At the Berlin meeting, we considered how WS-STS systems work.  More recently, I took a review of how mobile messaging clients work. I looked at how Apple, Google, and Microsoft each handle registration with APNS, GCM, and WNS, and a similar pattern emerges.  This pattern is to use an existing credential (mutual TLS auth), or client bearer assertion and swap for a device specific bearer assertion.

In the client association proposal, the developer's registration with the API publisher is handled by having the developer register with an API publisher (as opposed to the party deploying the API) and obtaining a software "statement". Or, if there is no "publisher" that can sign a statement, the developer may include their own self-asserted software statement.

A software statement is a special type of assertion that serves to lock application registration profile information in a signed assertion. The statement is included with the client application and can then be used by the client to swap for an instance specific client assertion as defined by section 4.2 of the OAuth Assertion draft and profiled in the Client Association draft. The software statement provides a way for service provider to recognize and configure policy to approve classes of software clients, and simplifies the actual registration to a simple assertion swap. Because the registration is an assertion swap, registration is no longer "stateful" - meaning the service provider does not need to store any information to support the client (unless it wants to).

Has this been implemented yet? Not directly. We've only delivered draft 00 as an alternate way of solving the problem using well-known patterns whose security characteristics and scale characteristics are well understood.

Dynamic Take II

At roughly the same time that Client Association and Software Statement were published, the authors of Dynamic Registration published a "split" version of the Dynamic Registration (draft-richer-oauth-dyn-reg-core and draft-richer-oauth-dyn-reg-management). While some of the concerns above are addressed, some differences remain. Registration is now a simple POST request. However it defines a new method for issuing client tokens where as Client Association uses RFC6749's existing extension point. The concern here is whether future client access token formats would be addressed properly.  Finally, Dyn-reg-core does not yet support software statements.


The WG has some interesting discussion to bring this back to a single set of specifications. Dynamic Registration has significant implementation, but Client Association could be a much improved way to simplify implementation of the overall OpenID Connect specification and improve adoption. In fairness, the existing editors have already come a long way. Yet there are those with significant investment in the current draft. There are many that have expressed they don't care. They just want a standard. There is lots of pressure on the working group to reach consensus quickly.

And that folks is how the sausage is made.

Note:  John Bradley and Justin Richer recently published draft-bradley-stateless-oauth-client-00 which on first look are getting closer. Some of the details seem less well defined, but the same could be said of client-assoc and software-statement. I hope we can merge these specs this week.