Getting started with Docker and Apache Kafka

With Windows Server 2016 and Nanoserver

Published on 04 January 2017

For my first blog post of the new year (Happy New Year everyone!!!), I'd like to share some of my recent adventures with Docker on Windows, or, more specifically, Docker on Windows using Nanoserver as the container OS.

I've been meaning to get up to speed with Docker for a while and, having recently acquired a decent new server for the purpose, decided that a festive period break from some of my longer term projects, would be an ideal time to finally dive in. In typical Bebbs style, "diving in" invariably involves the "deep end" and, as such, it seemed that a great initiation into the containerization waters would be to take Apache Kafka - a service typically run on Linux - and deploy it within a Windows Nanoserver container - a recent release from Microsoft and still a very-much bleeding-edge OS.

I've been interested in Apache Kafka for quite a while. Described as a "distributed streaming platform" it very much resonates with my "everything is a stream" philosophy. Furthermore, some of it's connectors to various traditional RDBMS's offer an intriguing means of moving between 'state store' and 'event store' methodologies.

Getting started

For the host system, I started with a fresh install of Windows Server 2016 (Desktop Experience for convenience) on a Dell T20 Xeon. Following this quick start guide quickly led to an issue whereby the Docker package couldn't be verified by it's SHA256 hash and therefore refused to install. Fortunately I found a report of the issue and a work around here.

I have since reinstalled docker on Windows Server 2016 and did not experience the issue again so it must have been resolved.

With Docker installed and the dotnet-samples example container running, my attention turned to Nanoserver.

A quick pull and run of the Nanoserver image and I found myself at an interactive command prompt of a deployed container running Nanoserver. This can be done as follows:

docker pull microsoft/nanoserver:latest
docker run -it --rm microsoft/nanoserver:latest cmd

Kafka & Zookeeper

While looking for a pre-built image of Kafka running on Nanoserver, it quickly became apparent that in order to get an instance of Apache Kafka running, you first need a running instance of Apache Zookeeper. While you could technically run both services from within a single container (indeed, Kafka is pre-configured to look for a Zookeeper instance on localhost) I wanted to utilize the core value propositions of containers vs VM instances; namely minimal overhead and composability.

This meant that I would therefore be building two container images, one for Zookeeper and one for Kafka, both of which would be running on Nanoserver.

Building the Zookeeper image

Take 1

I built this container up from nothing. When I started here, all there was was nanoserver. Other developers said it was daft to build Zookeeper on nanoserver, but I did it all the same. Just to show'em.

I built this [container] up from nothing. When I started here, all there was was [nanoserver]. Other [developers] said it was daft to build [Zookeeper] on [nanoserver], but I did it all the same. Just to show'em.

So, from most of everything I have read about building docker images, it seemed the thing to do was use a Dockerfile to start an intermediate container based on the source image (Microsoft/Nanoserver in this instance) then run a script within the intermediate container (as part of the dockerfile) to download, install and configure all the required components. The output of this docker build process would be a new image with the appropriate services running on startup.

I therefore started by preparing a powershell script that would do just that. Following this post on StackOverflow I developed and tested a script on a Windows Server 2016 (Desktop Experience) Virtual Machine. This was done so that I could use snapshotting in order to roll-back to a clean image anytime a issue with the script was encountered.

Unfortunately, when it came time to try running Zookeeper I hit the following error at start-up:

log4j:WARN No appenders could be found for logger (org.apache.zookeeper.server.quorum.QuorumPeerConfig).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Invalid config, exiting abnormally

Some quick googling turned up this issue but every subsequent comment seemed to suggest that the issue had been resolved. I tried a frustrating number of unsuccessful workarounds until I realized that it was a PICNIC error. Specifically, while following the StackOverflow post above, I had failed to realize that the version of Zookeeper they specified wasn't actually the latest version and that the issue really had been resolved in a later version. This took a frustratingly and embarrassingly long time but hey, it's always the last place you look.

Anyway, a morning of trial and error resulted in a thing of beauty; a script that would - completely automatically - download, extract, configure, install (as a service!) and run a Zookeeper instance. This is shown below:

## Download sources
$zipUri = "http://homeserver/download/7z1604-x64.exe" # http://www.7-zip.org/a/7z1604-x64.exe";
$nssmUri = "http://homeserver/download/nssm-2.24.zip" # "https://nssm.cc/release/nssm-2.24.zip"
$javaUri = "http://homeserver/download/jre-8u111-windows-x64.exe" # "http://download.oracle.com/otn-pub/java/jdk/8u111-b14/jre-8u111-windows-x64.exe"
$zookeeperUri = "http://homeserver/download/zookeeper-3.4.9.tar.gz" # "http://apache.mirrors.nublue.co.uk/zookeeper/zookeeper-3.4.9/zookeeper-3.4.9.tar.gz"
$kafkaUri = "http://homeserver/download/kafka_2.11-0.10.1.0.tgz" # "http://apache.mirror.anlx.net/kafka/0.10.1.0/kafka_2.11-0.10.1.0.tgz"

## Application locations
$appDir = "c:\Apps"
$zipDir = $appDir + "\7zip"
$nssmDir = $appDir + "\nssm"
$zookeeperDir = $appDir + "\Zookeeper"

## Data locations
$zookeeperDataDir = $zookeeperDir + "\Data"

## Application executables
$zip = $zipDir + "\7z.exe"
$nssm = $nssmDir + "\nssm.exe"
$zookeeper = $zookeeperDir + "\bin\zkServer.cmd"

function New-TempPath()
{
    if (!(Test-Path -Path C:\Temp))
    {
        New-Item c:\Temp -ItemType Directory
    }
}

function Expand-File($zipFile, $targetPath)
{
    $args = @("e", $zipFile, "-o$targetPath", '-y')
    &$zip $args
}

function Expand-Directory($zipFile, $targetPath)
{
    $args = @("x", $zipFile, "-o$targetPath", '-aoa')
    &$zip $args
}

function Install-7zip()
{
    New-Item "c:\Temp\7zip" -ItemType Directory -Force
    Invoke-WebRequest -Uri $zipUri -OutFile c:\Temp\7zip\7zip.exe
    &"C:\Temp\7zip\7zip.exe" /S /D=$zipDir | Out-Null
    Remove-Item -Path "c:\Temp\7zip\7zip.exe"
}

function Install-NSSM()
{
    New-Item "c:\Temp\NSSM" -ItemType Directory -Force
    Invoke-WebRequest -Uri $nssmUri -OutFile c:\Temp\NSSM\NSSM.zip

    Expand-Directory c:\Temp\NSSM\NSSM.zip c:\Temp\NSSM

    ## Above will expand to a directory containing version name which we want to remove
    ## so we'll move everything up a directory
    $folder = Get-ChildItem -Path c:\Temp\NSSM -Filter "nssm-*"
    Get-ChildItem -Path $folder.FullName -Recurse | Move-Item -destination c:\Temp\NSSM -Force

    New-Item $nssmDir -ItemType Directory -Force
    Copy-Item -Path "c:\Temp\NSSM\win64\nssm.exe" $nssm -Force
}

function Install-Java()
{
    New-Item c:\Temp\Java -ItemType Directory -Force
    Invoke-WebRequest -Uri $javaUri -OutFile c:\temp\Java\Java.exe

    Start-Process "C:\Temp\Java\Java.exe" -ArgumentList "INSTALL_SILENT=Enable INSTALLDIR=C:\Java\Jre AUTO_UPDATE=Disable WEB_JAVA=Disable WEB_ANALYTICS=Disable EULA=Disable REBOOT=Disable NOSTARTMENU=Enable SPONSORS=Disable REMOVEOUTOFDATEJRES=0" -NoNewWindow -Wait

    [Environment]::SetEnvironmentVariable("JAVA_HOME", "C:\Java\Jre", "Machine")

    Remove-Item -Path "C:\Temp\Java\Java.exe"
}

function Get-Zookeeper()
{
    New-Item c:\Temp\Zookeeper -ItemType Directory -Force
    Invoke-WebRequest -Uri $zookeeperUri -OutFile c:\temp\Zookeeper\Zookeeper.tar.gz
    Expand-File c:\temp\Zookeeper\Zookeeper.tar.gz c:\temp\Zookeeper
    Expand-Directory c:\temp\Zookeeper\Zookeeper.tar $zookeeperDir

    ## Above will expand to a directory containing version name which we want to remove
    ## so we'll move everything up a directory
    $folder = Get-ChildItem -Path $zookeeperDir -Filter "zookeeper-*"
    Get-ChildItem -Path $folder.FullName -Recurse | Move-Item -destination $zookeeperDir -Force

    Remove-Item -Path $folder.FullName
    Remove-Item -Path "c:\temp\Zookeeper" -Recurse
}

function Initialize-Zookeeper()
{
    New-Item -Path $zookeeperDataDir -ItemType Directory -Force
    $zookeeperDataLinuxDir = $zookeeperDataDir.Replace('\', '/')

    Copy-Item -Path ($zookeeperDir + '\conf\zoo_sample.cfg') -Destination ($zookeeperDir + '\conf\zoo.cfg') -Force

    $configFile = $zookeeperDir + '\conf\zoo.cfg'
    $logFile = $zookeeperDir + '\conf\log4j.properties'

    $config = [IO.File]::ReadAllText($configFile) -replace "dataDir=[\/\w]*", ("dataDir=" + $zookeeperDataLinuxDir)
    [IO.File]::WriteAllText($configFile, $config)

    $logProperties = [IO.File]::ReadAllText($logFile) -replace "#log4j.rootLogger=DEBUG, CONSOLE, ROLLINGFILE", "log4j.rootLogger=DEBUG, CONSOLE, ROLLINGFILE"
    [IO.File]::WriteAllText($logFile, $logProperties)
}

function Install-Zookeeper()
{
    &$nssm install Zookeeper $zookeeper | Out-Null
    &$nssm set Zookeeper AppDirectory $zookeeperDir | Out-Null

    &$nssm set Zookeeper DisplayName "Zookeeper" | Out-Null
    &$nssm set Zookeeper Description "Apache Zookeeper. Running from $zookeeperDir" | Out-Null
    &$nssm set Zookeeper Start SERVICE_AUTO_START | Out-Null
    &$nssm set Zookeeper ObjectName LocalSystem | Out-Null
    &$nssm set Zookeeper Type SERVICE_WIN32_OWN_PROCESS | Out-Null
}

function Start-Zookeeper()
{
    &$nssm start Zookeeper | Out-Null
}

function Stop-Zookeeper()
{
    &$nssm stop Zookeeper | Out-Null
}

New-TempPath

Install-7zip
Install-NSSM
Install-Java

Get-Zookeeper
Initialize-Zookeeper
Install-Zookeeper
Start-Zookeeper

With this mighty script in hand I prepared the following dockerfile:

FROM microsoft/nanoserver
MAINTAINER Ian Bebbington <docker@bebbs.co.uk>
LABEL Description="Zookeeper running on Microsoft Nanoserver" Version="0.1"
ADD Install-Zookeeper.ps1 /
RUN [ "powershell.exe", "C:/Install-Zookeeper.ps1" ]

And watched in dismay as it completely failed to build a container.

You see, while the script ran perfectly on Windows Server 2016, Nanoserver is a far more constrained environment. It has neither support for 32-bit assemblies nor any graphic stack to speak of so, in short-order, the 7zip utility, Java installer and Non-Sucking Service Manager executables all failed.

Well, crap.

Take 2

So! I built a second one!

So! I built a second one!

My next thought was to try salvaging as much of the script as possible by using Powershell Remoting to interactively install the required components and then committing the changes to a new image.

While, in retrospect, this was undoubtedly the wrong way forward, I was simultaneously fortunate and frustrated by the fact that it simply doesn't seem possible to use powershell remoting with Nanoserver when running within a container. Indeed, after learning more about WinRM than I thought possible, posting on Microsoft's Windows Container forums and even offering my first bounty on StackOverflow I simply could not find an answer to why it wasn't possible to establish a remote session.

In the mean time...

Take 3

So, I built a third one...

I built a third one...

Powershell remoting works beautifully with Nanoserver when running in a Hyper-V virtual machine but Hyper-V networking and Docker networking configurations don't seem to play well together. Indeed, after creating a new virtual-router so that I could access the Nanoserver virtual machine from the host PC, the Docker NAT network became inaccessible. Now, I'm sure it would be possible to dig into the virtual networking configuration and find a way to resolve this but, having spent an incredibly frustrating few hours reconfiguring WinRM, I decided it would be quicker to simply re-install the host OS and start from scratch.

Take 4

But the fourth one stayed up!

But the fourth one stayed up!

To accompany the fresh host environment, I decided to employ a fresh approach to building the container image. Namely, use a script to build the container's file system structure on the host PC and then simply copy it wholesale to the container from within the dockerfile. This meant deploying the Java Runtime Environment from a compressed archive rather than silent executable and using the dockerfile entrypoint instruction to run Zookeeper rather than installing it as a service.

After all the faff and frustration of the previous two attempts (not to mention reinstallation of OS on host PC), this approach was remarkably smooth. Again, in retrospect this was undoubtedly the correct approach but this approach almost certainly benefited from all the knowledge I had accrued from the previous failed attempts. As always, you can learn more from failure than success.

Anyway, in relatively short order, I had a script that prepared and configured the container's file system structure on the host PC and a dockerfile that copied this structure to a new image and set the Zookeeper service as the entrypoint for the image. These are shown below:

## Download sources
$zipUri = "http://homeserver/download/7z1604-x64.exe" # http://www.7-zip.org/a/7z1604-x64.exe";
$javaUri = "http://homeserver/download/jre-8u111-windows-x64.tar.gz" # "http://download.oracle.com/otn-pub/java/jdk/8u111-b14/jre-8u111-windows-x64.tar.gz"
$zookeeperUri = "http://homeserver/download/zookeeper-3.4.9.tar.gz" # "http://apache.mirrors.nublue.co.uk/zookeeper/zookeeper-3.4.9/zookeeper-3.4.9.tar.gz"
$dockerModuleUri = "http://homeserver/download/Docker.0.1.0.zip" # "https://github.com/Microsoft/Docker-PowerShell/releases/download/v0.1.0/Docker.0.1.0.zip"

## Build location
$buildDir = Get-Location
$tmpDir = $buildDir.Path + "\Temp"
$rootDir = $buildDir.Path + "\Root"
$biuldAppDir = $rootDir + "\Apps"
$buildDataDir = $rootDir + "\Data"
$buildDockerZip = $tmpDir + "\Docker.zip"
$buildDockerModule = $tmpDir + "\Docker"
$buildZipDir = $tmpDir + "\7zip"
$buildJreDir = $biuldAppDir + "\Jre"
$buildZookeeperDir = $biuldAppDir + "\Zookeeper"
$buildZookeeperDataDir = $buildDataDir + "\Zookeeper"

## Temp files
$zipInstaller = $tmpDir + "\7zInstaller.exe"
$jreGzip = $tmpDir + "\Jre.tar.gz"
$jreTar = $tmpDir + "\Jre.tar"
$zooKeeperGzip = $tmpDir + "\Zookeeper.tar.gz"
$zooKeeperTar = $tmpDir + "\Zookeeper.tar"

## Target locations
$targetDir = "C:\"
$appDir = $targetDir + "\Apps"
$dataDir = $targetDir + "\Data"
$jreDir = $appDir + "\Jre"
$zookeeperDir = $appDir + "\Zookeeper"
$zookeeperDataDir = $dataDir + "\Zookeeper"

## Executables
$zip = $buildZipDir + "\7z.exe"
$zookeeper = $zookeeperDir + "\bin\zkServer.cmd"
$docker = "docker"

function New-TempPath()
{
    if (!(Test-Path -Path $tmpDir))
    {
        New-Item $tmpDir -ItemType Directory
    }
}

function Remove-TempPath()
{
    Remove-Item $tmpDir -Recurse -Force
}

function New-RootPath()
{
    Remove-Item $rootDir -Recurse -Force
    New-Item $rootDir -ItemType Directory
}

function Remove-RootPath()
{
    Remove-Item $rootDir -Recurse -Force
}

function Expand-File($zipFile, $targetPath)
{
    $args = @("e", $zipFile, "-o$targetPath", '-y')
    &$zip $args | Out-Host
}

function Expand-Directory($zipFile, $targetPath)
{
    $args = @("x", $zipFile, "-o$targetPath", '-aoa')
    &$zip $args | Out-Host
}

function Install-DockerModule()
{
    Invoke-WebRequest -Uri $dockerModuleUri -OutFile $buildDockerZip
    Expand-Archive -Path $buildDockerZip -DestinationPath $buildDockerModule -Force

    Import-Module $buildDockerModule
}

function Remove-DockerModule()
{
    Remove-Module $buildDockerModule
}

function Install-7zip()
{
    $folder = New-Item $buildZipDir -ItemType Directory -Force
    Invoke-WebRequest -Uri $zipUri -OutFile $zipInstaller
    &$zipInstaller /S /D=$folder | Out-Null
    Remove-Item -Path $zipInstaller
}

function Remove-7zip()
{
    Remove-Item $buildZipDir -Recurse -Force
}

function Get-Java()
{
    Invoke-WebRequest -Uri $javaUri -OutFile $jreGzip
    Expand-File $jreGzip $tmpDir
    Expand-Directory $jreTar $buildJreDir

    ## Above will expand to a directory containing version name which we want to remove
    ## so we'll move everything up a directory
    $folder = Get-ChildItem -Path $buildJreDir -Filter "jre*"
    Get-ChildItem -Path $folder.FullName -Recurse | Move-Item -destination $buildJreDir -Force

    Remove-Item -Path $folder.FullName -Force
    Remove-Item -Path $jreGzip -Force
    Remove-Item -Path $jreTar -Force
}

function Get-Zookeeper()
{
    Invoke-WebRequest -Uri $zookeeperUri -OutFile $zooKeeperGzip
    Expand-File $zooKeeperGzip $tmpDir
    Expand-Directory $zooKeeperTar $buildZookeeperDir

    ## Above will expand to a directory containing version name which we want to remove
    ## so we'll move everything up a directory
    $folder = Get-ChildItem -Path $buildZookeeperDir -Filter "zookeeper-*"
    Get-ChildItem -Path $folder.FullName -Recurse | Move-Item -destination $buildZookeeperDir -Force

    Remove-Item -Path $folder.FullName -Force
    Remove-Item -Path $zooKeeperTar -Force
    Remove-Item -Path $zooKeeperGzip -Force
}

function Initialize-Zookeeper()
{
    New-Item -Path $buildDataDir -ItemType Directory -Force
    New-Item -Path $buildZookeeperDataDir -ItemType Directory -Force

    $zookeeperDataLinuxDir = $zookeeperDataDir.Replace('\', '/')

    Copy-Item -Path ($buildZookeeperDir + '\conf\zoo_sample.cfg') -Destination ($buildZookeeperDir + '\conf\zoo.cfg') -Force

    $configFile = $buildZookeeperDir + '\conf\zoo.cfg'
    $logFile = $buildZookeeperDir + '\conf\log4j.properties'

    $config = [IO.File]::ReadAllText($configFile) -replace "dataDir=[\/\w]*", ("dataDir=" + $zookeeperDataLinuxDir)
    [IO.File]::WriteAllText($configFile, $config)

    $logProperties = [IO.File]::ReadAllText($logFile) -replace "#log4j.rootLogger=DEBUG, CONSOLE, ROLLINGFILE", "log4j.rootLogger=DEBUG, CONSOLE, ROLLINGFILE"
    [IO.File]::WriteAllText($logFile, $logProperties)
}

function New-DockerImage()
{
    Build-ContainerImage -Path $buildDir -Repository "ibebbs/nanozoo:latest"
}


# Setup directory structure
New-TempPath
New-RootPath

# Install required tools
Install-DockerModule
Install-7zip

# Get components
Get-Java
Get-Zookeeper
Initialize-Zookeeper

# Build docker image
New-DockerImage

# Cleanup
Remove-DockerModule
Remove-7zip
Remove-TempPath
Remove-RootPath
FROM microsoft/nanoserver
MAINTAINER Ian Bebbington <docker@bebbs.co.uk>
LABEL Description="Zookeeper running on Microsoft Nanoserver" Version="0.1"
ADD Root /
ADD Start-Zookeeper.ps1 /
RUN setx /M JAVA_HOME C:\Apps\Jre
EXPOSE 2181
ENTRYPOINT [ "powershell.exe", "C:/Start-Zookeeper.ps1" ]

And this one worked. This one started. This one stayed up!

Building the Kafka image

With the Zookeeper scripts as a pattern, it was ludicrously easy to script up another image for Kafka. Just a few changes to file names and configuration parameters and Kafka started almost first time.

I won't copy the script or dockerfile here as they're extremely similar to the Zookeeper versions. Instead, all scripts and files used above can be found in my Docker repository on Github and the resultant images can be found on Docker hub.

Moving forward

But I don't want any of that!

But I don't want any of that!

Moving forward, I need to address a couple of short-comings in the Kafka script (specifically the hard-coded IP address for the Zookeeper container) and then look to use Docker Compose to automatically bring up Zookeeper and Kafka on demand.

It's been an interesting journey so far and I've not even begun to actually use the deployed services yet! Still, it is truly magical to run a docker container and see it boot an entire Windows server and service in just 10-20 seconds and a few hundred Mb.