We'll build a Certificate Transparency log system for internal PKIs using Kafka as a message queue and Trillian as an append-only Merkle tree. This setup allows for efficient auditing of certificate issuances, detection of misissued certificates, and gossip-based consistency checks.

Why Bother with Internal CT Logs?

Before we dive into the technicalities, let's address the elephant in the room: Why should we care about implementing CT logs for internal PKIs?

  • Detect unauthorized certificate issuances
  • Ensure compliance with internal policies
  • Improve incident response capabilities
  • Enhance overall security posture

Think of it as a trust-but-verify approach for your internal CA. You trust your CA, but you also want to keep it honest.

The Building Blocks: Kafka and Trillian

To implement our internal CT log system, we'll be using two powerful tools:

1. Apache Kafka

Kafka will serve as our message queue, handling the high-throughput ingestion of certificate data. It's like a conveyor belt for your certificates, ensuring they're processed in order and with high reliability.

2. Trillian

Trillian, developed by Google, is our append-only Merkle tree implementation. It's the backbone of our CT log, providing cryptographic assurances of log integrity and allowing for efficient proofs of inclusion.

Architecture Overview

Let's break down our system architecture:


+----------------+     +--------+     +----------+     +---------+
|  Internal CA   | --> | Kafka  | --> | Trillian | --> | Auditor |
+----------------+     +--------+     +----------+     +---------+
        |                                 |
        |                                 |
        v                                 v
+----------------+              +--------------------+
| Monitor/Alert  |              | Gossip Participants|
+----------------+              +--------------------+

1. The Internal CA submits newly issued certificates to Kafka.

2. Kafka ensures ordered, reliable delivery to Trillian.

3. Trillian appends the certificates to its Merkle tree.

4. Auditors can verify the log's consistency and check for suspicious certificates.

5. Monitoring systems alert on any anomalies.

6. Gossip participants ensure the log's consistency across multiple instances.

Implementing the System

Step 1: Setting Up Kafka

First, let's set up our Kafka cluster. We'll use Docker for simplicity:


version: '3'
services:
  zookeeper:
    image: confluentinc/cp-zookeeper:latest
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181
      ZOOKEEPER_TICK_TIME: 2000

  kafka:
    image: confluentinc/cp-kafka:latest
    depends_on:
      - zookeeper
    ports:
      - 9092:9092
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:29092,PLAINTEXT_HOST://localhost:9092
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
      KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1

Run this with docker-compose up -d, and you've got a Kafka cluster ready to ingest certificates.

Step 2: Configuring Trillian

Now, let's set up Trillian. We'll need to compile it from source:


git clone https://github.com/google/trillian.git
cd trillian
go build ./cmd/trillian_log_server
go build ./cmd/trillian_log_signer

Create a MySQL database for Trillian:


CREATE DATABASE trillian;

Initialize the database schema:


mysql -u root -p trillian < storage/mysql/schema/storage.sql

Now, start the Trillian log server and signer:


./trillian_log_server --logtostderr ...
./trillian_log_signer --logtostderr ...

Step 3: Implementing the Certificate Submitter

We need a component to submit certificates from our internal CA to Kafka. Here's a simple Go implementation:


package main

import (
	"context"
	"crypto/x509"
	"encoding/pem"
	"github.com/segmentio/kafka-go"
)

func submitCertificate(cert *x509.Certificate) error {
	w := kafka.NewWriter(kafka.WriterConfig{
		Brokers: []string{"localhost:9092"},
		Topic:   "ct-log-entries",
	})

	pemCert := pem.EncodeToMemory(&pem.Block{
		Type:  "CERTIFICATE",
		Bytes: cert.Raw,
	})

	return w.WriteMessages(context.Background(),
		kafka.Message{
			Key:   []byte(cert.SerialNumber.String()),
			Value: pemCert,
		},
	)
}

Step 4: Processing Certificates with Trillian

Now, we need to consume certificates from Kafka and add them to Trillian:


package main

import (
	"context"
	"github.com/google/trillian"
	"github.com/segmentio/kafka-go"
)

func processCertificates(logID int64) {
	r := kafka.NewReader(kafka.ReaderConfig{
		Brokers: []string{"localhost:9092"},
		Topic:   "ct-log-entries",
		GroupID: "trillian-processor",
	})

	client, err := trillian.NewTrillianLogClient(...)
	if err != nil {
		// Handle error
	}

	for {
		msg, err := r.ReadMessage(context.Background())
		if err != nil {
			// Handle error
			continue
		}

		leaf := &trillian.LogLeaf{
			LeafValue: msg.Value,
		}

		_, err = client.QueueLeaf(context.Background(), &trillian.QueueLeafRequest{
			LogId: logID,
			Leaf:  leaf,
		})
		if err != nil {
			// Handle error
		}
	}
}

Implementing Gossip-based Consistency

To ensure the consistency of our CT log across multiple instances, we'll implement a gossip protocol. This allows different log instances to compare their views of the log and detect any discrepancies.

Gossip Protocol Overview

  1. Each log instance periodically sends its latest Signed Tree Head (STH) to a set of peers.
  2. Peers compare the received STH with their own.
  3. If differences are detected, peers request and verify consistency proofs.
  4. Any inconsistencies trigger alerts for further investigation.

Here's a basic implementation of the gossip protocol:


package main

import (
	"context"
	"github.com/google/trillian"
	"github.com/google/trillian/client"
	"time"
)

type GossipParticipant struct {
	LogID     int64
	Client    trillian.TrillianLogClient
	Verifier  *client.LogVerifier
	Peers     []string
}

func (g *GossipParticipant) RunGossip() {
	ticker := time.NewTicker(5 * time.Minute)
	for range ticker.C {
		g.gossipRound()
	}
}

func (g *GossipParticipant) gossipRound() {
	ctx := context.Background()
	sth, err := g.Client.GetLatestSignedLogRoot(ctx, &trillian.GetLatestSignedLogRootRequest{LogId: g.LogID})
	if err != nil {
		// Handle error
		return
	}

	for _, peer := range g.Peers {
		peerSTH := getPeerSTH(peer) // Implement this function to get STH from a peer
		if !g.Verifier.VerifyRoot(sth.SignedLogRoot, peerSTH.SignedLogRoot) {
			// STHs don't match, request consistency proof
			proof, err := g.Client.GetConsistencyProof(ctx, &trillian.GetConsistencyProofRequest{
				LogId:          g.LogID,
				FirstTreeSize:  peerSTH.TreeSize,
				SecondTreeSize: sth.TreeSize,
			})
			if err != nil {
				// Handle error
				continue
			}

			// Verify the consistency proof
			if !g.Verifier.VerifyConsistencyProof(proof) {
				// Inconsistency detected! Raise an alert
				raiseInconsistencyAlert(g.LogID, peer)
			}
		}
	}
}

func raiseInconsistencyAlert(logID int64, peer string) {
	// Implement alert mechanism (e.g., send email, trigger incident response)
}

Auditing and Monitoring

With our CT log system in place, we need to implement auditing and monitoring to detect any suspicious activities or inconsistencies.

Implementing an Auditor

The auditor's job is to periodically check the log for any certificates that violate policy or appear suspicious. Here's a basic implementation:


package main

import (
	"context"
	"crypto/x509"
	"encoding/pem"
	"github.com/google/trillian"
	"time"
)

type Auditor struct {
	LogID  int64
	Client trillian.TrillianLogClient
}

func (a *Auditor) AuditLog() {
	ticker := time.NewTicker(1 * time.Hour)
	for range ticker.C {
		a.auditRound()
	}
}

func (a *Auditor) auditRound() {
	ctx := context.Background()
	leaves, err := a.Client.GetLeavesByRange(ctx, &trillian.GetLeavesByRangeRequest{
		LogId: a.LogID,
		StartIndex: 0,
		Count: 1000, // Adjust as needed
	})
	if err != nil {
		// Handle error
		return
	}

	for _, leaf := range leaves.Leaves {
		block, _ := pem.Decode(leaf.LeafValue)
		if block == nil {
			// Handle error
			continue
		}

		cert, err := x509.ParseCertificate(block.Bytes)
		if err != nil {
			// Handle error
			continue
		}

		if isSuspiciousCertificate(cert) {
			raiseSuspiciousCertificateAlert(cert)
		}
	}
}

func isSuspiciousCertificate(cert *x509.Certificate) bool {
	// Implement checks for suspicious certificates
	// For example:
	// - Unexpected issuers
	// - Unusual validity periods
	// - Forbidden key usages
	// - Unexpected SANs
	return false
}

func raiseSuspiciousCertificateAlert(cert *x509.Certificate) {
	// Implement alert mechanism for suspicious certificates
}

Monitoring and Alerting

To keep an eye on the health and performance of our CT log system, we should implement comprehensive monitoring and alerting. Here are some key metrics to track:

  • Log size and growth rate
  • Latency of certificate submissions
  • Error rates in certificate processing
  • Gossip protocol consistency checks
  • Auditor findings and alerts

You can use tools like Prometheus and Grafana to collect and visualize these metrics. Here's an example of how to expose some basic metrics using the Prometheus client library for Go:


package main

import (
	"github.com/prometheus/client_golang/prometheus"
	"github.com/prometheus/client_golang/prometheus/promauto"
)

var (
	certificatesProcessed = promauto.NewCounter(prometheus.CounterOpts{
		Name: "ct_log_certificates_processed_total",
		Help: "The total number of processed certificates",
	})

	processingLatency = promauto.NewHistogram(prometheus.HistogramOpts{
		Name: "ct_log_processing_latency_seconds",
		Help: "The latency of processing certificates",
		Buckets: prometheus.DefBuckets,
	})

	gossipInconsistencies = promauto.NewCounter(prometheus.CounterOpts{
		Name: "ct_log_gossip_inconsistencies_total",
		Help: "The total number of detected gossip inconsistencies",
	})
)

// Use these metrics in your code:
// certificatesProcessed.Inc()
// processingLatency.Observe(duration.Seconds())
// gossipInconsistencies.Inc()

Conclusion: Trust, but Verify (and Log)

Implementing a Certificate Transparency log for your internal PKI might seem like overkill at first glance. But in the world of cybersecurity, where trust is paramount and the consequences of a breach can be catastrophic, it's a small price to pay for peace of mind.

By leveraging the power of Kafka for high-throughput message processing and Trillian for cryptographic integrity, we've created a robust system that can:

  • Detect unauthorized or misissued certificates quickly
  • Provide an immutable audit trail of all certificate issuances
  • Ensure consistency across multiple log instances through gossip protocols
  • Enable proactive monitoring and alerting for suspicious activities

Remember, in the realm of PKI, trust is good, but verification is better. By implementing this internal CT log system, you're not just improving your security posture; you're building a foundation of verifiable trust that can withstand the scrutiny of audits and the test of time.

"In God we trust. All others must bring data." - W. Edwards Deming

Now go forth and log those certificates! Your future self (and your auditors) will thank you.

Further Reading