Skip to content

Why Keycloak?

CERN Single Sign-On is a core service for CERN. The majority of laboratory staff and users access applications protected by SSO on a daily basis and its stability is paramount for us all to be able to work. This page details the justification behind the choice to use Keycloak (an open source software) and clarifies our priority to provide an SSO service that meets the needs of our laboratory.

Why is Keycloak a good choice of software?

  • It has an engaged user base, active contributors is constantly improving with each release.
  • It is used by other large organisations (see here and here) in both the research and commercial sectors. This indicates that it works at scale and that there is a community willing to maintain the software long term. As of April 2023 Keycloak become part of the Cloud Native Computing Foundation (CNCF) as an incubating project; this improves long term support prospects and reduces the risk of software license changes.
  • The features of Keycloak are largely equivalent (and in some cases superior) to those offered by other software providers. Since it is an open source tool we are able to contribute to the code base if they are not.

Why not move to the cloud?

Although moving to a cloud based SSO would save in long term system maintenance, this benefit is not believed to outweigh the significant short term effort of migrating and the disadvantages described below.

  • All users, regardless of their nationality or location, must be able to authenticate through SSO. Moving to a major cloud provider may risk excluding researchers from our laboratory.
  • We must be able to manage the change and release cycle to respect technical stops and runs of the laboratory. It is important that we maintain control in this area.
  • SSO must be available across all CERN networks. Exposing a service in the internet to our technical and experiment networks introduces security risk.
  • Although initial costings may be attractive, migrating to an SSO is an expensive procedure which cannot be done transparently. Each application connected to SSO must be re-configured; at CERN many of these services are not managed by the IT department and cannot be migrated centrally. It would be difficult and time-consuming to reverse this migration should terms or conditions change in future.
  • CERN customisations, for example allowing logging-in via SSH with the same multifactor registration as the SSO, will not be implemented by commercial cloud-based SSO and thus would require additional effort to be replaced.

Why not remain with Microsoft ADFS on premise?

The previous SSO system was Microsoft Active Directory Federation Services (ADFS), supplemented by several customisations. These customisations meant that transparent upgrade to more recent ADFS versions was not possible and, combined with the MALT project, a decision was taken to move away from ADFS. Although it would be possible to return to ADFS we believe this would not be the correct decision for the following reasons:

  • ADFS has no benefit over Keycloak in terms of features. For the 9000 applications already using Keycloak, migrating back to ADFS would be a costly exercise with no gain.
  • It is unclear whether Microsoft plans to support on-premise SSO long term.

What is the cost of running our own SSO?

Running our own SSO requires active support and maintenance by the IT department. Even once all features are fully deployed, effort must be permanently dedicated to the service; this includes keeping the infrastructure up to date as well as supporting service managers with integrating their services. The IT department has launched a project to review the SSO deployment and ensure that it meets the needs of CERN.

SSO Performance & Reliability

As of November 2022, CERN SSO is protecting approximately 9000 services and serves approximately 100 logins per minute during working hours to a potential user base of 150,000 individuals. Although the system runs well over 99% of the time, specific high traffic events have identified weaknesses in the system and caused the SSO to fail. This is being investigated and resolved as a high priority.

Performance Testing Results

Recent performance tests against the pre-production environment identified several areas for focus:

  • Ensuring that CERN SSO is frequently updated to newer Keycloak version which include performance improvements.
  • Removing CERN customisations in the code which have been found to introduce slowness.
  • Consider externalising the session cache.

Stress tests were performed with 50 authentications per second over 10 minutes. The following chart compares the number of failed processes (blue), the mean response time (orange), the max response time (grey) and the standard deviation of the response time (yellow) for the following scenarios:

  • Keycloak 15 (v15 Default)
  • Keycloak 15 with CERN customisations (v15 CERN)
  • Keycloak 19 (v19 Default)
  • Keycloak 19 with CERN customisations (v19 CERN)
  • Keycloak 20 with an externalised cache (v20 K8s/RemCache)

keycloak-stress-tests

Roadmap

To address performance & reliability, the SSO team will implement the following roadmap

Stage Date
 KC 19 release Jan 2023
Removal of CERN customisations (modified 2FA strategy required) Q2 2023
Hosting of SSO on Kubernetes Q3 2023
Future KC biannual upgrades Rolling basis

Results

Following the Keycloak 19 release we see a significant improvement in SSO Performance.

Memory and network usage within the hosting infrastructure has decreased.

memory-usage-kc19

network-usage-kc19

In particular, the post duration has a direct positive impact on usability as the time to refresh a token or request token exchange, has been noticeably improved.

post-duration-kc19