In a previous article, we discussed how to authorize resource access in a distributed environment and what challenges doing so poses in terms of architecture. In this article, we detail how our engineering team dealt with some of these challenges to build a fine-grained permissions system.
Offloading Access Decisions
For any service that needs to authorize resource access, we deploy our policy engine of choice, Open Policy Agent (OPA), as a sidecar container. OPA exposes an API for evaluating policies, so that every time a user tries to access a resource, the service queries OPA locally to determine whether the user is allowed to proceed.
To illustrate this, let's say there's an incoming HTTP request to
GET /documents/:id. A Java service, for instance, may implement a servlet filter to intercept the request and respond with
403 Forbidden should OPA deny the user access to that particular document.
Alternatively, services written in Go may embed OPA as a library using the https://godoc.org/github.com/open-policy-agent/opa/rego package, eliminating the need for a sidecar.
Distributing Data Updates
Most of the time, however, policies alone cannot be enforced without proper context. In other words, to make a decision, OPA must know about users and their resource permissions. Luckily, the engine can be configured to periodically download data from a remote HTTP server. Once new data is available (e.g., a user was granted access to some resource), OPA applies it immediately without requiring to restart itself.
Working Around Eventual Consistency
As Phil Karlton once said, there are only two hard things in Computer Science: cache invalidation and naming things. OPA acts as an intelligent cache of resource permissions, and while it makes authorization fast and highly-available, there are cases for which the pull approach doesn't quite make it.
After creating a document on the Document Service from the example above, one would justly expect to have immediate access to it. However, OPA downloads bundles periodically, so it's possible that some requests might be denied with
403 Forbidden before everything is in sync. In this case, we need a mechanism to ensure that all OPA sidecars have successfully downloaded & activated a bundle with the newly created resource before allowing the user to proceed.
To work around the limitation, we tweaked BS to include bundle metadata as an optional
.manifest file in each bundle the service generates. Among other things, the file contains a
revision field, which is the timestamp of when the bundle was generated.
Then, every time OPA applies permission updates, it reports the name and the revision of the activated bundle back to BS. Given the creation timestamp of a resource, we can check it against these status reports: if all sidecars have activated a bundle that is older than the resource in question, then the user may safely access the resource without fear of getting unexpected errors.
So far, OPA has proved to be an excellent tool for our use case. Obviously, there are some implementation details that we didn't cover in this article, but overall, it should give you a rough idea of how the permissions system works at VGS.