A look inside iqlusion’s Cosmos Hub Validator architecture

By Tony Arcieri and Shella Stephens

Here at iqlusion, we have taken our past professional experiences from infrastructure and security teams at notable Silicon Valley companies and applied them in building what we believe is one of the most sophisticated proof-of-stake (PoS) validators in existence today.

In this post, we’d like to provide a deep dive into how we built our Cosmos validator, the experiences that shaped our decisions, and hopefully offer some general insights into how to build a high availability datacenter networks and hybrid clouds in general.

Rather than keep this information a trade secret, we prefer to share it to help promote the security of the overall proof-of-stake validator ecosystem, and though sharing it may slightly reduce our defense-in-depth, we are firm believers in Kerckhoffs’ principle - that our infrastructure is designed in such a way that its security ultimately lies in the cryptographic keys that control its operation (well that, and proper configuration and software updates).

Without further ado, here is a diagram of our validator network architecture:

iqlusion architecture.png

Before we dive deeper into our architecture, let’s take a step back for a second and investigate the requirements of what we’re trying to build.

What Does a Proof-of-Stake Validator Do? #

The core of the Cosmos Hub consists of a federation of validators which collectively run a Byzantine Fault Tolerant (BFT) consensus algorithm and in doing so function as a single logical computer whose compute resources can be purchased for a fee.

The security of any BFT system ultimately relies on the ability of the set of validators comprising the network to resist the influence of an outside attacker. If an attacker can gain control of more than a third of the network (by voting power, or in the case of Cosmos stake), the security guarantees of BFT break down.

Validators execute a proposed computation (i.e. a new block in a blockchain), ensure a set of constraints are upheld, and if everything checks out certify they verified the proposed computation in the form of a cryptographic signature. Therefore the core responsibility of a validator is securing an online cryptographic key and using it to certify distributed computations are performed correctly. In other words, validators provide an online key-as-a-service.

There are many other things that fall into the “online key-as-a-service” category, like every cloud key management system, the “hot wallets” at cryptocurrency exchanges, and Lightning Network nodes, or for that matter any web site that uses HTTPS encryption. In our opinion, validators fall into the highest risk category for this model, roughly equivalent to a cryptocurrency exchange or a Lightning node. Keeping a validator signing key secure, especially a heavily staked one, is a large responsibility, and one we hope we handle with due care.

The specific requirements for a validator can be found in the Cosmos Validator FAQ:

What are hardware requirements? #

Validators should expect to provision one or more data center locations with redundant power, networking, firewalls, HSMs and servers.
[…]

How to handle key management? #

Validators should expect to run an HSM that supports ed25519 keys. Here are potential options:

  • YubiHSM 2 […]

There are several reasons for this: the robustness of the network depends critically on it not being a monoculture of validators all running on the same platform. Cosmos Hub should not go down because AWS us-east1 is having problems.

Another reason to require hardware-backed key storage in a datacenter environment as opposed to a cloud is the desire to use cryptographic algorithms which are not available in a Cloud KMS or HSM environment (and in some cases, never will be).

As noted in the requirements above, validators are using the newer Ed25519 signature algorithm (also selected as the next-generation signature algorithm for X.509), which has a number of benefits over older signature algorithms both in terms of performance and potential use cases (e.g. interactive threshold signatures).

That said, one of the biggest bottlenecks in Tendermint, the consensus algorithm behind Cosmos, is the large numbers of Ed25519 signatures it uses, and the resulting verification time as well as bandwidth/storage requirements for all these signatures. At the moment, signatures make up the majority of the Cosmos Hub blockchain by data volume. If these bottlenecks could be eliminated through the use of more advanced cryptography, the network could potentially be significantly faster with a reduced data volume and therefore lower storage/bandwidth requirements.

There are some promising new cryptographic algorithms for doing this, such as BLS signatures which are being standardized in an IETF working group, which would allow the hundreds/thousands of Ed25519 signatures used in Tendermint consensus to be non-interactively aggregated down to a single 48-byte signature!

By requiring hardware-backed key storage, the Cosmos Hub validators will have significant agility over Cloud KMS systems in terms of adopting new cryptographic algorithms, and will be able to take advantage of these algorithms as soon as they first become available in cryptographic hardware. It will also allow us to deploy upcoming trusted computing environments, such as ones based on the RISC-V CPU architecture.

Given that, let’s talk about the hardware we have deployed and how we built our initial datacenter cabinet.

Equinix: Our Datacenter #

equinix.png

We selected Equinix as our datacenter hosting provider. Equinix is one of (if not the) premiere datacenter operator worldwide with over 200 facilities spanning 5 continents. The facility we’re hosted at requires 5 layers of authentication including ID verification, biometrics, and passcodes. Equinix is a popular datacenter provider for SF’s next-generation fintech companies, including some of the ones we’ve previously worked at.

Our datacenter design is influenced by our experiences as employees in other Silicon Valley companies. Here are the personal accounts of our engineering team, Tony Arcieri and Shella Stephens:

Our datacenter is primarily a key storage environment for validator signing keys. Additionally, we have elected to run our Cosmos Hub validator on bare metal. This approach allows us to isolate the hosts on which we run Tendermint Key Management System (KMS) into their own firewall zone with extremely limited connectivity, such that they are completely inaccessible outside of the datacenter network. As an added twist, Tendermint KMS makes only outbound connections, and is only allowed to talk to the firewall zone where the Cosmos Hub validator hosts reside.

As you might guess from the diagram, one of our passions is network engineering. We have tried to take approaches and ideas about how datacenter networks should be designed and operated and apply them to our new buildout. In brief these are:

Network Design #

Some specific aspects of the way we constructed our datacenter network are as follows:

Cryptographic Key Storage #

Providing secure conditional access to an online digital signature key is the core responsibility of a validator. The typical solution to the problem of secure storage of online cryptographic keys is to employ Hardware Security Modules (HSMs), which is the recommended approach in the Cosmos Validator Requirements mentioned above, and also one we are thoroughly familiar with from our previous employers.

We’ve selected YubiHSM2 devices from Yubico as our key storage solution. YubiHSMs are a newcomer to the overall HSM scene, but one we feel, based on previous experiences, provides a modern alternative to traditional HSMs, and in cases like Cosmos Hub where concerns like FIPS 140-2 compliance are irrelevant, YubiHSMs provide a simpler and substantially less expensive option for hardware-backed key storage which we believe provides better security in addition to access to more modern encryption algorithms, such as Ed25519, which are only just now finally making their way to traditional HSMs.

Like our other hardware, we’ve provisioned two YubiHSM2s as identical replicas (with the ability to provision more from a set of master secrets and encrypted backups). We’ve also taken additional steps to protect the physical security of these devices, which I won’t go into in detail for obvious reasons, however I will say they are kept under lock-and-key with active anti-tamper detection to ensure their physical security. These steps go beyond the many layers of security needed for physical access to our cabinet, and ensure the devices remain physically secure even in the event an attacker is able to get that far.

The primary reasons we selected YubiHSM2s for our key storage needs are as follows:

Securing and Managing Servers #

Our Dell EMC servers are all provisioned with identical hardware, and for any given role are also provisioned as pairs, however each pair is located a different firewall zone, with explicit, fine-grained policies about what traffic flows are allowable. As mentioned earlier, each is configured with dual redundant power supplies each plugged into one of our dual redundant PDUs, and each has dual redundant Ethernet cables plugged into both of our core switches (aggregated into a single bond interface).

We run CentOS Linux, a free build of RedHat. We are heavy users of many other distributions, including Alpine Linux for containers, and are also long-time Debian users who continue to use it for personal projects. That said, here are the reasons why we chose CentOS for a greenfield production infrastructure:

Aside from CentOS, here are some additional details about how we deploy and manage servers:

Now that we’ve looked at how we built and managed our datacenter deployment, let’s change gears and take a look at the other side of our hybrid cloud: how we leverage Google Cloud Platform.

GCP: Our Cloud #

gcp.png

We use Google Cloud Platform (GCP) as our primary cloud provider. While we have ample experience with alternatives including over a decade of experience with Amazon Web Services (AWS), and plan on leveraging it as a backup and for VPC peering with other validators, we chose to start with GCP over AWS, and so far have not regretted it.

For the purposes of avoiding making this already quite long post even longer we’ll avoid going into details for now (saving them instead for a potential future blog post), but in broad strokes GCP feels more refined than AWS with a more solid technological foundation, better performance, simpler operation with less incidental complexity, and Identity and Access Management (IAM) which is better hardened by default. As network engineers we are quite impressed with the Andromeda network virtualization stack GCP uses, and how it enables features like Live VM Migration.

In the rest of this post, we’d like to describe how we leverage GCP to for the purposes of our Cosmos Hub Validator.

Cloud Overview #

Our primary workload deployed on GCP is our Cosmos Hub sentry nodes. These are Cosmos Hub “full nodes” which communicate with other validators and other Cosmos Hub nodes over a peer-to-peer overlay network, and serve to insulate our validators from the outside world.
No messages are sent to the validator until first verified by one of these sentries, which effectively perform a sort of “pre-validation” step as a full node in the network.

We operate two types of sentries: Internet-facing sentries which use public IP addresses, and private sentries (which we internally refer to as out-of-band or “OOB”) which do not have a public IP address and can only be reached by an RFC1918 private IP address (when we say “private” sentry, we really mean it!). To reach our private sentries, we’ve deployed the up-and-coming WireGuard VPN software, which uses modern cryptography and terminates the VPN inside of the Linux kernel, providing excellent performance. We also support GCP VPC Peering, so if you are a fellow validator who would like to peer with us and are interested in either of these private peering methods, let us know!

As mentioned above, we use BGP peering from our datacenter to provide multiple redundant paths to the cloud. We have two instances of Cloud Router deployed, each in separate availability zones (i.e. us-west1a and us-west1b), ensuring that in the event of a total outage of one availability zone (which is theoretically a completely independently operated datacenter facility), the other will remain available.

For the time being, we’ve deployed our public-facing sentries in us-west1a, and our private sentries in us-west1b, each in their own isolated VPC. In each of these VPCs, we generally have one sentry active at a given time, along with a “hot spare” ready to take over in the event it fails.

All of our compute workloads are presently deployed on Compute Engine, GCP’s VM service, utilizing Google’s Container-Optimized OS (cos), a hardened Linux hypervisor for Docker containers which is effectively the server-side equivalent of ChromeOS. Notably, cos is also the OS used as the hypervisor for GCP’s Kubernetes Engine, so while we aren’t yet leveraging Kubernetes, we have our workloads deployed in a way which would make it easy to transition to doing so.

Cloud Build: Our Build System #

As part of operating a validator we do quite a few builds: container images for cloud deployment, builds of the Cosmos SDK and Tendermint KMS, and builds of production tools we develop in-house which we deploy as signed RPMs through a yum repository hosted on Cloud Storage.

We build all of these things using GCP’s Cloud Build service in conjunction with Cloud Source Repositories, their private Git repository service. Cloud Build is a heavily container-oriented build tool, allowing build steps to be decomposed into separate docker containers that operate on the same filesystem image. For example, we have an image for atomic-reactor, a RedHat-oriented tool for building container images developed as part of Project Atomic.

After containers are built, they are uploaded to Container Registry, GCP’s private Docker container registry. We deploy everything from GCR, both in the cloud and in the datacenter (and that said, one of the best aspects of greenfielding a hybrid cloud is being able to integrate cloud tooling like this into our datacenter operations workflow).

Identity Aware Proxy: Our Front Door #

We run a number of internal tools to manage operations, and need some way to get access to them. GCP provides a great solution for this: Identity-Aware Proxy (IAP), an authenticated web proxy which allows you to set fine-grained access control policies about what resources particular users are allowed to access. It can even be used for SSH tunneling.

IAP is a subcomponent of GCP’s larger Cloud Identity services, which we leverage as our identity provider. Together with Security Command Center‘s access control monitoring and anomaly detection, as well as Security Key Enforcement, which requires all members of our org are strongly authenticated using a hardware token, we believe this solution provides BeyondCorp-like security properties, considered to be the state-of-the-art today.

One final shout out: in addition to Google’s identity and access control services, we also simultaneously leverage Duo Security to provide an additional layer of authentication beyond Cloud Identity. We require both in conjunction for production access, but as Duo also provides BeyondCorp-like services, we will be comparing them to the offerings from Google and keeping them in mind.

Conclusion #

We hope you’ve enjoyed this post! There are many things we wanted to cover but did not. We’re excited about covering them in future blog posts, in particular tooling we’ve developed in-house for datacenter operations. If there’s something in particular you hoped to see covered which we did not, feel free to contact us on Twitter or email us.

What’s next for us? Probably more datacenters, more clouds, more protocols, more assets. Diversifying our infrastructure will improve our validator’s robustness. We are also excited about the prospect of hosting many more Tendermint networks than just the Cosmos Hub, such as the IRIS Network and potentially other proof-of-stake networks like Tezos, who recently announced they’ll be switching to Tendermint’s consensus protocol. We’re excited to see what the future brings, particularly around things like the Inter-Blockchain Communication (IBC) protocol.

In building out our infrastructure, we have sought to walk the fine line between security and agility: implementing all of the most common security best practices while ensuring agility around our ability to scale out and run new networks and new software. We fundamentally believe that a distributed custodian like the Cosmos validators should be secured with institutional grade setups for each validator, and also, that our ability to remain simultaneously both secure and agile is the nuance of iqlusion.

 
37
Kudos
 
37
Kudos

Now read this

Postmortem: 2019-03-29 DNS-related Cosmos Hub Validator Incident

It began with a series of PagerDuty alerts on our phones. We occasionally have false positives, but this was different: several alarms in a row. We looked up at the display in our NOC (above photo, although from a different day) to see... Continue →