Director of Avenga Labs
The businesses of the 2020s are quickly and inevitably becoming software companies so the boundaries of business and technology are blurred more than ever before.
In times of dynamic business changes, the need for greater flexibility and responsiveness of software delivery is growing.
The responses to that demand are the latest shifts in the software landscape:
Application developers have embraced continuous delivery (CD) in their software development lifecycles. Software changes are deployed frequently in test and production environments in small chunks with a lowered risk of destabilizing the entire digital products. CD also enabled the testing of new features using the existing user base as testers, all to provide a better User Experience for them.
→ Discover Avenga Business and Customer Experience Strategy
The complex software pipelines are automated, from writing the line of source code (which is also semi-automated already) to building, testing (including performance testing), deployment and monitoring.
The data scientists and machine learning (ML) experts work differently than software developers.
Why is that?
First of all, machine learning (ML) models are not deterministic in stark contrast to software code.
Of course it’s easy to create a buggy mess in a modern distributed system. But, we mean the non-deterministic nature of ML models as something you cannot fix, as it’s not a bug, but it’s in their DNA, and this is how the current AI technology works.
Data scientists iterate over different hypotheses. They select a different hypothesis, try it, test it, and tune it until the results are acceptable.
That’s why they work using Jupyter Notebooks and are able to modify code (Python, R, Julia) quickly, as well as analyze the results interactively.
Experimenting and constant prototyping is not making it easy for CD4ML; it’s close to the data and the problem, but hard to integrate into a built pipeline.
‘Running ML’ is also very different, as there are more phases. In simplified terms: there’s a data preparation phase, model definition phase, model training phase, model testing and tuning phase, and a model execution in the production environment.
And, model training and tuning is still a black box, so experience and intuition play a great role.
In the case of classical software development (which never grows old, by the way), testing is comparing the results generated by the system with the expected results defined by testers/analysts.
→Explore Avenga Solution Engineering and Software Development Services
For instance, if someone withdraws 500 EUR from their bank account then it is expected that the amount of money in the account would be decreased by the same amount.
In the case of Machine Learning (ML) it’s not so easy. Of course you can check the accuracy of the image recognition model on large datasets, but you still may be surprised with a sudden drop of quality when using different data. Unfortunately, this data is quite often … the real world data.
This has happened many times, even for the top AI gurus. It’s not something that you can read up on in the shallow articles about ‘how AI is changing the world for the better’, but the reality of these systems is much sadder and they require many iterations to make them work with acceptable accuracy.
There are techniques to improve accuracy, by using for instance, feedback loops, enforced learning, transfer learning (not to start from scratch) and retraining.
How do we test conversational AI and how do we test chatbots? It is possible, but much more complex.
How do we define ‘PASS’ conditions for models? It’s possible, but never as accurate as for the deterministic transactional business applications. Humans are still involved, so there’s rarely a source of ‘truth’, and ‘truth’ may even never be discovered.
Similar problems also apply to performance testing.The model behavior and efficiency is heavily dependent upon the target architecture and data.
Software can be decomposed into modules, smaller microservices and even smaller functions as a service (FaaS).
In case of machine learning (ML) models they would have to be called ‘monoliths’. The nature of neural networks is that they are … connected together and cannot be cut into vertical or horizontal dimensions like the components of Java/DotNet transactional applications.
In deterministic software systems, there are already tons of patterns and antipatterns established which are a great help for design and development. Modules, objects, records, functions, interfaces, and APIs – it’s been 50+ years of software development and they’ve even matured.
In the case of data projects, and especially Machine Learning (ML) projects, it all depends on the data, as even a single example may break the entire model and the entire hypothesis goes into the trash. Data scientists sometimes wish they had established practices for validation of the models, as they exist in software development. Even though it’s heavily based on mathematical theory, there are no mathematical proofs of how it is working and even why it is working.
What does the model version mean? What does the data version mean?
What about branches, commits, etc.?
How and which repositories should be used; it’s not just git, as is usually the case with ‘normal code’.
With classic applications, even the most modern ones, we already know multiple deployment strategies and practices are well supported by the knowledge and experience of the software teams, as well as the right tool sets.
DevOps for ML is in the early stages of development and there’s much still left to be figured out.
It’s not that data scientists don’t like automation for some hidden reason.
Many software developers tend to think: ‘you, data scientists are considered to be so smart, why don’t you just use Jenkins pipeline for your ML and it will be so cool and efficient?’.
Again, in ML it’s much harder to do.
There are visible places where automation is used more often. For example, training many variants of the model automatically to verify which feature sets to choose and how to tune them the most efficiently. Launching these workloads on tens or hundreds of machines is a promising candidate for automation.
Training and executing models on different data sets simultaneously and comparing the results automatically is also a good automation idea.
Data engineering tasks (ETL, ELT, data streaming, etc.) which are part of most machine learning (ML) projects are also easier to automate.
Data applications should focus on different metrics rather than on traditional transactional systems.
For instance, for a data project the data accuracy, integrity and statistical distribution are more important than latency, uptime, or time to first byte.
It also affects operations and the testing part of DevOps for ML.
Not all DevOps, but a significant majority of DevOps experts are focused on Java and similar pipelines for common programming languages and environments, such as Kubernetes, hybrid clouds, AWS, Azure, GCP, etc.
Training them to understand the nature of ML projects will take some time as well as for them to achieve the same level of proficiency as with business applications.
If the models are designed, learned and tested for too long, let’s say for one year, the changes in the entire business landscape may make them obsolete before they even reach production.
What is also slowing down these types of projects is the problem of data quality at the source, as well as other parts of the data processing pipeline. There are no simple or fast fixes to data quality problems. There’s no magic technological wand and a fix is not expected to arrive anytime soon.
If it takes so long and business changes so fast, then even the best data scientists aren’t able to catch up with reality. This means the failure of the entire ML project.
So definitely, there’s pressure for ML projects to deliver results faster and of better quality.
There are many areas which are easier to automate and are investments with quick returns. For example, testing models on multiple nodes simultaneously, training models on multiple processing nodes simultaneously, or running workloads in the cloud.
We should never forget what the main philosophy behind automation is: it helps to reduce costs and minimize time spent on repetitive tasks. If something is not repetitive, then automating it does not make any sense and is another nail in the coffin of project efficiency.
Data scientists should not be bothered with DevOps stuff, as it can have a negative effect on ML projects. A full cycle development strategy is really helpful here because it enables tighter collaboration within the teams and is one of the key enablers for CD4ML.
CD4ML seems to be the ultimate goal for automation of machine learning (ML) projects.
I dare say that automation has to be applied more carefully in this case, in order to make sure that what we automate is truly repetitive and that it will help improve efficiency.
Putting models in Kubernetes pods is not CD4ML, but it’s doable and relatively easy. And as always, many will do it just to check the box “CD4ML – yes, we do have it in place”.
There’s so much more to it. We can all wish it was easier, but it takes true skills and a lot of effort to make it happen and to really improve the efficiency; and let’s not forget that it’s still in its infancy phase.
→ Discover why To change or to pretend a change – that is the question
Should you set CD4ML as one of the key assumptions for your data project?
How do you define and execute CD4ML in your particular case?
Our data science and DevOps experts at Avenga are here to help.