Brainsware Blog

(In other news: App development, UX design, business: we're writing about the entire development of Sealas)

Data Driven Infrastructure

by igor on

Every company is unique in that it is made up of a number of special snowflakes who — over the history of that company — have shaped it wit their blend of idiosyncrasies. These idiosyncrasies are often niche or market driven (read deadlines, money, personnel power, knowledge). Over time they mould into a set of (often unwritten) company wide conventions. Think of how working in a company that provides a niche service, such as health insurance for Swiss Federal Railway Employees, shapes the jargon of the company from ground up, starting with server naming conventions. Brainsware started out with one foot in hosting, so we could get money aside during long stretches of development. One of the earliest patterns we established in this context was privilege separation: we wanted to make really sure when one customer got hacked that the whole farm wouldn''t be hosed. The decision was made to separate customers on a process and on multiple user levels1 by running one Apache httpd per customer. Finally, all connections would go through a single proxy server that would do things such as: SSL termination, Web Application Firewalling, and of course caching. It also opens up an enormous flexibility on every single customer''s configurations — rather than having a one size fits all…

Moving to Puppet

when we started our move towards puppet we adopted what scraps we had seen in other infrastructures and it was not too long until it started feeling uniquely wrong for a number of reasons. We had essentially given up on our simple and robust scripts for a system we did not know and that started growing out of proportion as we struggled to even grasp its basics. <insert picture of profile.pp and a sound of shrieking horror> We set out to change this with two goals in mind:

  • Adapt our well understood and well documented conventions
  • drive new changes — idempotently — from data, not code.

At the time we started learning and adapting Puppet had all what it took to make these things happen, but the patterns it seemed weren''t very well understood. Or perhaps we were lacking the language to best convey them… https://groups.google.com/d/msg/puppet-users/BwQHLreadJ0/XLVT-ksID7wJ …and the experience to understand the answers we were given. In the following section we will dissect exactly how our 'hosting' module works — from the conceptual architecture to the implementation details.

Data is in Hiera

From the very beginning it felt wrong to put all this code into a "profile". Within a week it had grown to a multiple hundred lines of unwieldy and repetitive monstrosity. It wasn''t until we heard of Hiera that it occurred to us that this code was not code at all, but actually data. Even then Hiera seemed a strange concept: you put class variables into Hiera — and that''s it

     class ssh (
         $groups = hiera(''ssh::groups'', [],
     ){
       $ensure = $group ? {
         []      => false,
         default => true,
       }
       sshd_config { ''AllowGroups'':
         ensure => $ensure,
         value  => $groups,
       }

reference: https://projects.puppetlabs.com/issues/22274 & https://groups.google.com/forum/#!topic/puppet-users/NC5pazolhjQ We wanted more! We wanted all the boring repetition gone and encapsulated in code that implemented our conventions as we''ve had them for years now. We wanted the declaration of a new customer or a new application to be simple. By starting out with all configuration data however, and then iteratively reducing it to a point where it made sense to us and the machine:

    ---
    instance:
        port:        8005
        shortname:   excom
        defines:
            - LowPerformance
    vhosts:
        www.example.com:
            scp_uid:    21005
            scp_password: xxx
            type:         static
            httpd_conf:
                - DirectoryIndex homeD.html
        beta.example.com:
            scp_uid:    22005
            scp_password: xxx
            db_password: yyy
            type:        php
        alpha.example.com:
            scp_uid:   23005
            scp_password: zzzz
            db_password: yyy
            type:        passenger

we also put our conventions under new scrutiny. They held up nicely, as we managed to arrive where we wanted to at the start. I want to reiterate this point, because I think it is very important: Even though we had well established conventions that were well documented2, we started out with the full dataset to make sure we would not miss anything. In particular we did not want to miss any opportunities to extend our flexibility or for simplifications.

Architecture

Our architecture follows a classical three-tier pattern

Three Tier Architecture

  1. The first tier is a proxy/cache/SSL terminator/WAF provided by Apache Traffic Server. In order to provide a coherent service, this tier will want to know which URLs to map to which application servers.

  2. The second tier are the application servers provided by one Apache httpd instance per customer. In order to provide a coherent service, this tier will want to know which applications, and by extension, which URLs to host and how.

  3. The third tier are the Database servers provided by MySQL or PostgreSQL. In order to provide a coherent service, this tier will want to know which databases to host for which users, but also which application servers to allow access to them.

It might not be immediately apparent but the second tier holds a lot of this information by the sole fact that it knows who it is (::fqdn) and what it does. Puppet has no concept that crosses or spans nodes, because everyone would disagree what such a concept should be (profiles, roles, "services," etc…). With such a restriction at hand we made the conscious decision to drive all changes from the web node. Webnode feeds data off Hiera, and exports it to LB and DB This decision left us with interesting constraints & flexibilities that are caused by the way that puppet and hiera interact. we can only run one such service as we envision on the web node, but we can run multiple of them on the DB/Proxy nodes: Webnode and Application node feed data off Hiera, and export it to LB and DB As it happens , that is exactly what we wanted.

Implementation - Overview

     class hosting::web (
       $customer = hiera(''customer'', [])
     ) {

       class { ''httpd'':
         php => true,
       }
       include ''hosting::web::ssh''
       hosting::web::instance { $customer:
         tag => $tag,
       }

       realize( Account::Systemgroup[''sftponly''] )

       $webroot = ''/srv/web''

       file { $webroot:
         ensure => ''directory'',
         mode   => ''0755'',
         owner  => ''root'',
         group  => ''root'',
       }

       Account::Systemgroup[sftponly] ->
       File[$webroot] ->
       Class[hosting::web::ssh] ~> Service[ssh]
     }

We made the concious decision to not override hiera() with a call to hiera_array() here. This leaves open the possibility to overwrite $customers on a specific node. If for instance one customer is paying for that node they should have all it alone!

Implementation - Intermezzo: Httpd

Since the beginning of time we have packaged our own version of Apache Httpd because the version provided by Debian/Ubuntu was to constrictive in a number of ways that did not lend itself to easily establishing a multiuser/multiprocess pattern. Over time we have crafted a specialized configuration that has served us well. Now with implementing puppet the first obvious problem was that the puppetlabs/apache module did not support Apache httpd version 2.4, much less »our« way of doing things. As such we built our own module. brainsware/httpd

Implementation - create_resources()

While the httpd module we hacked is primarily targeted to be used with our configuration, it does not map specifically to our conventions. The puppet code provides a translation layer between simplified data, lived convention and module APIs:

     # this type will create a hosting instance
     # filling it with sensible defaults where missing
     define hosting::web::instance {

       include hosting::web
       $webroot = $hosting::web::webroot

       # look up this customer''s hosting instance in hiera:
       $customer = $title
       $instance = hiera(''instance'', {})

       # uid == port. port *must* be supplied.
       unless has_key ($instance, ''port'') {
         fail(''hosting::web::instance requires a port!'')
       }
       $uid = $instance[''port'']

       $type = has_key ($instance, ''type'') ? {
         true  => $instance[''type''],
         false => ''php'',
       }
       # add php => as true or false
       $instance[''php''] = $type ? {
         /[pP][hH][pP]/ => true,
         default        => false,
       }

       # make sure that the webdir in this httpd installation, is the same as
       # as our $webroot:
       $instance[webdir] = $webroot

       # and construct httpd instance without type.
       # n.b.: Protocol defaults to HTTP, as 99% of our customers use just that.
       # When the need arrises to support HTTPS, we''ll support it here as well.
       $httpd_instance = {
         "${customer}" => delete($instance, ''type''),
       }

       file { "${webroot}/${customer}":
         ensure => ''directory'',
         mode   => ''0755'',
         owner  => ''root'',
         group  => ''root'',
       }

       # create hosting user for the httpd instance.
       # The directory above is needed before we can create vhosts,
       # so better create it now.
       account::hostinguser{ $customer:
         uid     => $uid,
         require => File["${webroot}/${customer}"],
       }
       create_resources(''httpd::instance'', $httpd_instance)

       # Fill the defaults with all we know so far,
       # and leave the rest to hosting::web::vhosts
       $vhost_defaults = {
         ''type''      => $type,
         ''instance''  => $customer,
         ''shortname'' => $instance[''shortname''],
         ''port''      => $uid,
         ''tag''       => $tag,
       }
       hosting::web::vhosts{ $customer:
         defaults => $vhost_defaults,
       }
     }

before finally calling create_resources() for the httpd instance, and then yielding to the next level Virtual Hosts.

     # This defined type allows use to to create all the vhosts
     define hosting::web::vhosts ( $customer = $title,
       $defaults = [],
     ){
       # lookup the vhosts:
       # in this context, we already have $customer and $service set correctly
       $vhosts = hiera(''vhosts'', [])

       # We use create_resources() here, because, conveniently, it does the merging for us:
       create_resources(''hosting::web::vhost::create'', $vhosts, $defaults)
     }

Here we see again the explicit lookup from hiera(), before calling once again create_resources() to finish the job; and the deeper reason why we''re using create_resources() becomes more obvious: It is very easy to pass a set of defaults — i.e.: conventions — along with the data that might be overriding them.

Implementation - Exporting Data

Now we are at the very core — the little heart that contains all the data and can finally do our job:

     # "private" type, that we only call via hosting::web::vhosts
     # to create a single vhost
     define hosting::web::vhost::create (
         $instance,
         $shortname,
         $scp_uid,
         $scp_password,
         $port,
         $db_password = UNDEF,
         $httpd_conf  = [],
         $servername  = $title,
         $type        = ''php'',
     ) {

       include hosting::web

       $webroot = $hosting::web::webroot

We transform the data:

       # shortname: excom, instance: example.com
       # With servername: foo.bar.example.com, user should now be: excomfoobar
       $subdomain = regsubst($servername, "(.+)\\\\.${instance}", ''\\1'')
       $nodotsubd = delete($subdomain, '','')
       $user      = "${shortname}${nodotsubd}"

To create a User account and its home directory:

       # Allow $user to write here.
       file { "${webroot}/${instance}/${subdomain}/htdocs":
         ensure  => ''directory'',
         owner   => $user,
         group   => $instance,
         mode    => ''0755'',
         require => Account::Scpuser[$user],
       }

       account::scpuser { $user:
         uid       => $scp_uid,
         gid       => $instance,
         password  => $scp_password,
         subdomain => $subdomain,
       }

as well as the Virtual Host itself:

       httpd::vhost { $servername:
         instance   => $instance,
         type       => $type,
         httpd_conf => $httpd_conf,
         manage_dir => ''phpdirs'',
       }

Most importantly however, we use our knowledge to export the data so the Load Balancer and the Database can create those resources.

       # export this new knowledge about the vhost to our load balancer:
       $map = {
         "http://${servername}" => "http://${::fqdn}:${port}",
       }
       @@hosting::lb::remap { $servername:
         map => $map,
         tag => $tag,
       }

       # export MySQL database, if needed:
       if $db_password {
         # our convention is to call the database the same as the user.
         @@hosting::db::db { $user:
           password => $db_password,
           host     => $::fqdn,
           tag      => $tag
         }
       }

     }

Implementation - Intermezzo: Apache Traffic Server

Apache Traffic Server has become one of the corner stones in our infrastructure, replacing the rather shaky cache that is mod_cache_disk. We have also written a puppet module for managing Apache Traffic Server

Implementation - Collecting Resources

Because we wanted to make the collection of resources as simple as possible we wrapped the base types in a secondary layer. This makes the distinction easier and frees up ``tags'''' to be used for more sensible things. More importantly it also enables us to once again transparently implement conventions:

     # This type is a wrapper for MySQL::Db
     #
     # it contains some of our treasured defaults, making it all a bit more convenient
     # but most importantly, by being a wrapper with a specific name, we can use
     # it to collect only those specific nodes.
     define hosting::db::db (
       $password,
       $host  = $::fqdn,
     ) {
       $db   = $title
       $user = $title
       mysql::db { $db:
         user     => $user,
         password => $password,
         host     => $host,
         grant    => [''all''],
       }
     }

All that is left is to call the collector class on the right nodes now.

     # Collect & realize dbs
     # n.b.: We use a wrapper here, so it''s easier to distinguish
     # between this export and other exports!
     class hosting::db::collect {
       Hosting::Db::Db <<| tag == $tag |>>
     }

In Closing

I am adding this section because the last sentence was a bit anticlimactic for such a big subject. Of course this is not the end. It is merely a beginning. We are still only just discovering puppet, its strengths and weaknesses, and lately how to extend it. I would also like to take the chance and give a big shout-out to all the extremely friendly and helpful people of \#puppet (and other puppet fora). Without their guidance and answers this journey would have been a lot more painful. Thank you.

Finally, I want to point to two projects which have made development really easy, because we did not need to destroy production servers: Vagrant & Packer



  1. These days we''d probably look into Docker, even though it still seems very immature. 

  2. They existed as tools written Perl and GNU make. Yes. But they existed and worked well. 

Share this Post: