fbpx
1-888-310-4540 (main) / 1-888-707-6150 (support) info@spkaa.com
Select Page

Scaling Jenkins for the Enterprise

Scaling Jenkins for the Enterprise featured image
Published by Michael Roberts
on October 7, 2022

I’m Michael Roberts, Vice President for SPK. And I want to tell you the tricks for scaling your business effectively with a Jenkins enterprise solution. I’ll also explain how to avoid, and remediate both a Jenkins monolith, and islands of Jenkins.

In the early part of my career, I worked at a startup web hosting company. This company was not funded by venture capital. It was bootstrapped by an enterprising CEO that was just leaving the military. His approach to company growth was founded in his love of open-source software. He utilized it wherever possible to provide services to its web hosting and domain registration customers. Through my 19 years at the company, I learned two key truths about open source software:

      1. It can fill a technology gap and provide lots of value.  
      2. It can paint you into a corner with not much support or ability to scale.  

And that too is the story of Jenkins. Scalability for Jenkins enterprise solutions can be hampered if it is not structured and organized effectively.

Image of cloud tied to network devices

Is There A Jenkins Enterprise Solution?

CloudBees Jenkins is arguably one of the most popular open source development tools on the planet. Some reports show that over 70% of all continuous integration pipelines run on Jenkins. For any organization looking to develop CI/CD software, Jenkins is an easy win. It works well for small teams and allows them to integrate code multiple times a day. However, as teams, projects, and software scales, maintenance costs and performance can suffer.

These performance and cost issues manifest themselves in multiple ways. And the ability to scale with Jenkins is not an uncommon situation. Regardless of scaling for small business or a Jenkins enterprise solution.

As Jenkins is adopted across the company, we see two anti-patterns emerge. Either:

      1. Jenkins becomes so large, and slow that it is no longer usable.. 
      2. Each development team wants to have their own, customized Jenkins version. This allows them to manage and control their version and ensure maximum performance.  

Each scenario has its pros and cons, so let’s dive into each.

The Jenkins Monolith

The term Jenkins Monolith comes from all teams using one Jenkins controller. This controller has every possible plugin installed. And it integrates with dozens, or hundreds, of developers’ code.  

CloudBees, is the Jenkins enterprise company. And according to them, they found that any controller that houses over five thousand jobs has experienced, or will likely experience, performance and stability issues. Whilst this may seem like a low number, it does depend on the integration work that is being done. For example, 5,000 “Hello World” jobs will perform completely different to 5,000 complex pipelines with multiple stages, making different types of calls to binaries on the host. 

Maximizing continuous delivery with CloudBees

CloudBees has been collecting data on customer instances and performance information for years. So, now CloudBees can show that 5,000 is the right number to be concerned about. It creates a Jenkins Monolith.  Additionally, this is coupled with their field experience and data gathered from their performance team. So, they can show that customers who experience performance and stability issues are likely running controllers that run more than 5,000 jobs.

Now we know what you’re thinking. You’re going to ask “How many jobs can I run on my controller?”  

The answer is…it depends. But that 5,000 number should remain a key factor in your mind.  

Complicating the answer to that question is the pipeline complexity variable.Employing a monitoring solution to track macro metrics (CPU/RAM/DISK IO) and creating baselines allows you to properly plan for scaling. There are also micro metrics such as garbage collection logs, object creation rate and thread counts that can be analyzed and baselined. This enables you tounderstand your job needs correctly. Additionally it allows you to correctly plan demand for additional controllers in your cluster.

Lastly, having a single, monolith Jenkins controller creates a single point of failure. Something most IT operations staff should have immediate concerns for. Why? Because this risk is where many teams come to a standstill if the server stops working. 

Signs You Are on the Road to Running a Monolith Controller

According to CloudBees, there are 3 signs you’re on the road to running a Jenkins Monolith. These 3 signs are from their white paper entitled “How to Solve the Monolithic Jenkins Controller Problem”.

  1. You Experience Longer Garbage Collection Cycles

The Jenkins enterprise scaling solution has traditionally been to increase resources. But this vertical scaling methodology comes with a variety of challenges. 

Firstly, the temporary relief doesn’t last.Ultimately other areas of the application or surrounding architecture are unintentionally, yet negatively affected.  Furthermore, administrators will tend to increase heap memory beyond the CloudBees maxium recommendation of 16GB. In doing so, they are inviting longer running garbage collection cycles.

Garbage collection cycles are inherently stop-the-world events. The aim is to limit the amount of time a garbage collection event takes place. Ideally this should be under one second so it doesn’t interrupt other operations such as HTTP requests. 

The best practice solution:

  • Horizontal scaling and reducing the footprint of the controller is the only way to properly plan and scale your CI architecture safely. Additionally this Jenkins enterprise solution creates a holistic alignment with the CI and DevOps direction.

 

  1. You don’t have a cleanup strategy

Another indicator of a monolithic controller is a large $JENKINS_HOME footprint. 

Jenkins does not rely on an external database. Instead it houses a multitude of configuration files inside of the $JENKINS_HOME defined storage location. Furthermore, in situations where a large number of users and jobs are in play within a monolithic environment, we also see abandoned job volumes increase. 

These abandoned jobs are sometimes ancient. And, these files are loaded into application memory. Therefore, this creates unnecessary additional overhead. 

The best practice solution:

  • Ensure you have a cleanup strategy.
  • Monitor the growth of your $JENKINS_HOME location.

 

  1. Lack of Infrastructure or Growing too fast

Frequently, we see Jenkins enterprise administrators struggle to acquire the necessary growth infrastructure in a timely manner. Particularly in verticals such as financial services.

Understandably, there can be a lot of red tape involved. This can prevent them from spinning up additional virtual machines. Because of this, we often see monoliths form faster than usual. Then, Jenkins administrators are forced to maintain a delicate balancing act. They need to appease their customers and keep the controllers online. 

Best practice solution:

  • Understand and plan for growth by segmenting controllers by team or business unit.
  • Set limits to the number of permitted jobs on each controller, such as the 5,000 job limit.

Islands of Jenkins

You’ve cultivated great culture and your development teams are self organizing and have the agile mindset. Great!! Because of this, each team wants to have their own customized Jenkins controller. This allows them to:

a.) Reduce the number of plugins needed.

b.) Keep the performance optimal for their use.  

So, you’re following best practices. Great! But then your IT manager asks how many Jenkins controllers you have. Not great. Now, you have to confess that you don’t know. Your IT manager starts to panic. “Well, how do we know these machines are getting patched?” they ask. Now, you’re thinking “Uh, oh”.

These are what is referred to as the “islands of Jenkins”. Engineers are left to maintain different Jenkins instances and configurations They waste valuable time and resources. Additionally it creates hidden costs for the company. Also, what about IT governance? Are security scans built into every segment of the CI pipeline? 

Because there is no oversight and control, each controller could be doing something completely differently. Because of this, collaboration with IT staff suffers. You notice the impact to, efficiency tool. And, if a team discovers a new way of doing things that drive efficiency, they have no way of sharing this information. The teams are operating in silos.

Remediating Jenkins Monolith and Islands of Jenkins

With more than 70% market share, Jenkins is the  number one CI tool used today. CloudBees employees contribute more than 80% of the code to the open source Jenkins project. That means, nobody knows Jenkins like CloudBees does.

The answer to operating the Jenkins Monolith is CloudBees CI. With CloudBees CI, you’ll have some associated Jenkins master controllers (see the graphic below). Each master instance s independent so they can have different software stacks, tools and plugins.  That way they can meet the needs of each individual team. However, they are all managed by CloudBees CI. This gives us the governance and compliance we need. One of the main reasons why IT staff actually want a monolith.  

In terms of scaling, if a new team needs access to Jenkins, a new master instance is created that is controlled by CloudBees CI.  Now, the team can get the independence they need to install the tools. Additionally, they get the technologies needed for their project. Finally, it maintains the governance and cost controls.

Another benefit of CloudBees CI is that the masters can be separate or acting as a group. The benefit of having them work together is that they can aggregate the workload. Also, CloudBees CI allows you to build templates for best practices. That way, best practices can be deployed to all masters. This means lessons learned are in production on all instances, not just one or two controllers.

Conclusion

Because Jenkins is so widely used by large and small organizations, it’s inevitable that there will be scaling problems. CloudBees has provided a solution to the two most common Jenkins scaling issues.

With CloudBees CI, you get unlimited licensing of controllers. This means you can readily scale your environment to meet your organization’s needs and job demands. Furthermore, as you are not restricted by controller count, organizations can reduce their configuration complexity. They can break their Jenkins Monolith into smaller, managed controllers. And, this allows for workload isolation. Now you have governance. You’ve created a scalable environment that can flex with any regulatory requirements. This secure environment also means teams can make changes without fear of downstream impacts.

Kubernetes is the widely accepted vehicle for CI. So, if you aren’t yet thinking about a path to CI with Kubernetes, you should be.  Check out this webinar to learn more about Kubernetes, CI, and Jenkins.

SPK are a certified CI and CD/RO CloudBees partner. and we can help you to install, manage and maintain your CloudBees Jenkins enterprise architecture. Contact us here for support.

Latest White Papers

6 Secrets To A Successful Atlassian Migration At Scale

6 Secrets To A Successful Atlassian Migration At Scale

With large scale migrations, large user bases, multiple Atlassian tools, plenty of apps, and lots of data, moving to Atlassian Cloud may feel like a steep mountain to climb. But, it doesn't have to be. In fact, we've already helped many customers make the move. Plus,...

Related Resources

Jira Cloud vs Azure DevOps Cost: Are You Paying Too Much?

Jira Cloud vs Azure DevOps Cost: Are You Paying Too Much?

Choosing the right issue and feature tracking tools to manage your software development lifecycle is important. It can be the difference between speed to market, budget savings and even boosting DevOps team morale. Whilst there are plenty of tools in the market, in...

DevOps World 2022 Recap And More From CloudBees

DevOps World 2022 Recap And More From CloudBees

DevOps World 2022 was originally set to take place in Orlando, FL on Wednesday 28th September and Thursday 29th September.  Unfortunately, Hurricane Ian had other plans and the DevOps World in-person component was canceled. Instead, a virtual conference was launched....

Data Engineering Supports Digital Transformation

Data Engineering Supports Digital Transformation

Data engineering supports digital transformation. Fact. But, how do companies move away from more paper processes and towards digital transformation? It’s not easy. And, the larger the organization, the more difficult it is. Companies can take solace in knowing there...