Search
  • Courtney Perigo

Why Continuous Integration is Essential for Machine Learning


Photo by Christopher Burns on Unsplash

Continuous integration has many desirable properties that “finesses” common problems of software development. In data engineering, we’re on a tight deadline; and we have certain features that our users and data scientists are expecting. By building continuous integration into every aspect of our project we can reduce the risk in getting our application into production.


Without continuous integration, we can build our entire application on our local machine and never know how it functions in the real world in a completely new environment. Continuous integration removes that blind spot; and allows us to see what happens when we deploy.


This is important because we are typically on a deadline; and you do NOT want to be in the final weeks of your project – and you push to production – and you find out there are several bugs or failures in your production pipeline. Continuous integration will remove that from the equation as you’ll be pushing new features of your application into development and production environments along the way.




Reduced Bugs:


Another desirable feature of continuous integration is the detection of bugs as they occur. With that said, CI will not eliminate all bugs – but it will certainly let you know when they occur. Bugs can ruin a project and hold back features. It’s best to test the solution, identify bugs as part of a continuous integration pipeline, and fix them immediately before moving on. It keeps the code clean; and keeps you on schedule.


Barriers to Frequent Development of your Application are Removed:


As data engineers working with a continuous integration, agile method – there is no excuse for not building the automated deployment of ALL aspects of our solution. It’s tempting to stand up the database in GCP but never build that as part of the CI pipeline. That’s not the way. We need to build the deployment of the database into our CI pipeline. The reason for this is that it helps us keep control of those aspects of our project and makes the handoff to a client easier (for both parties.)

In addition to easier delivery at the end of a project, we can also have all team members in a data science / data engineering / developer team contribute and push to the main branch of our application. Everyday, we can strive to release some new feature/test/contribution to the main branch – and ultimately our project. Continuous integration ensures that any feature released to the main branch is integrated into the project immediately.


Anyone can deploy the application.


Finally, continuous integration helps us with ultimate delivery of our project to a client. Ideally, the client has paid for a functional project that they can maintain and keep functioning to deliver value to the business. In our case, that value is realized through the inference methods our machine learning pipeline delivers to our client. By building ALL parts of our project into the continuous integration pipeline, we can hand off to a client and feel confident they can deploy the application on their own.



Why is a CI system an essential part of SaaS software?

Because it solves those issues I raised, and more, CI is an essential part of SaaS software. Software as a service means we have potentially multiple users, and we want those users to get the best possible software. A value proposition of SaaS is that the client has access to the best features our team has released. To ensure everyone gets the best features, in a timely manner, without bugs, continuous integration and testing will help us achieve this.

To help us deliver on CI, the following are common practices that you should consider:

  • Build automation: Every part of the project is automated and deployed with continuous integration.

  • Single source repo: All contributors to the project are committing to a single repo and their commits are using a continuous integration service/server.

  • Fix broken builds immediately: Bugs pushed to the repo should be addressed immediately throughout the life of the project.

  • Automate deployment: Strive for a single button deployment of your application – all aspects of it.

Source:

Fowler, M. (n.d.). Continuous integration. martinfowler.com. Retrieved June 30, 2022, from https://martinfowler.com/articles/continuousIntegration.html

6 views0 comments