Connect with us

Featured

Data science meets DevOps: 7 rules of machine learning

By: Yotam Yarden, senior data scientist at Amazon Web Services (AWS)

Machine Learning capabilities hold great potential for new revenue streams and tremendous cost savings for enterprises. Increasingly, businesses are using ML to strengthen their competitive advantage and drive innovation. Is your organization embracing this shift or are you falling behind? If you are on the “bias-for-action” side of the scale and have already started steering your organization towards digital & ML transformation, are you confident you are doing so in the right way?

Over the past decade, data has become increasingly important and has even been described as the “new oil”. Organizations with extensive user data can leverage data to increase sales and customer retention. Data of machinery can be leveraged to improve machines utilization of manufacturers. Computer Tomography images can be used to identify cancerous tumors. There is literally no industry segment which can’t leverage data to improve and create new business models. Meanwhile, data has never been easier and less expensive to collect, store, analyze, and share. Many enterprises are building their data lakes today precisely for this reason. But, is your organization taking full advantage of its data? Are you satisfied with the value you generate from your data? Do you struggle with building smart applications on top of your data lake? Big Data but not enough insights? Too much talking, not enough walking?

If so, consider the following tips:

  1. Be business driven and customer focused: What are your organization’s biggest challenges? Start from a focused business challenge and work backward towards a solution. Too many companies try to apply “self-driving cars” or “genome-sequencing” algorithms to a sales funnel optimization challenge just because they hired an expert in this field, while often there are models that better fit the task and bring higher value at lower costs. Don’t keep your data science team in the IT department alone. Rather, giving ownership of the data science team to a business stakeholder can invigorate your organization, and unlock new revenue streams and tremendous cost savings.
  2. Iterate fast and simple: Be quick and decisive about bringing your ML system into production. Conducting small iterations through tests, proof of concepts and pilots will help your team to bring ML workloads into production faster, and in a higher quality. Plan to have a production-ready prototype in 3 weeks, and a fully operational version in under 90 days. Even if your system is not using the state-of-the-art model, you will learn far more by iterating quickly than you would from an overly-long development cycle. ML transformations happen by building knowledge and experience through small, fast, and simple steps, rather than by multiple year planning. A redesign is inevitable. Only by experimentation, experience, and adaptation, can you realize the full potential of your ML product. Fail fast and improve often.
  3. Centralize or Decentralize ML teams? Centralize ML teams when necessary, but aim to decentralize when possible. ML applications, like any other piece of software, require maintenance, updates, and support. A centralized team may be effective at low-scale, but once you start expanding, innovation might suffer. Imagine a large innovation team who is working on multiple innovative projects, it is inevitable that at some point a substantial portion of the team’s work would be operating ongoing projects. It then might be a good time to distribute the team to its real home, within the business unit that it serves. It can be hard to ”give away” your “baby”, but it will help your ML team innovate on behalf of your customers.
  4. Consider the biggest roadblocks for data scientists & developers[1]: 1) dirty data, e.g. data sets which are unstructured, have missing attributes, and mixed data types in the same section; 2) lack of talent; 3) lack of management or financial support, as ML projects require focus and funding, organizations struggle to roll-out such a project without its management’s support; 4) lack of a clear questions to answer. Organizations are chasing improvement but are lacking specifications and clear targets to achieve them; 5) data not available or difficult to access. If you plan appropriately, you will find that most of these roadblocks are easily overcome. Lack of talent? Start hiring talent ahead of demand rather than have the data waiting for talent. Data not available? Start collecting data in advance of the project kick-off. Data not accessible? Don’t kick-off a workshop without first obtaining relevant data samples. Lack of management or financial support? Get the buy-in in advance. Find the stakeholders’ heroes who are enthusiastic about AI and can support you with budget & headcount approvals, data accessibility, and connections to other business stakeholders.
  5. The separation between Data Science and DevOps is over! “Our PhDs develop ML models and write specifications for our developers to implement in C++.” If you can relate to this customer quote, start changing your team’s structure today. There is a wide range of tools that enable data scientists to take a step towards engineering, and vice-versa. The separation of “science” and “production” can prolong your company’s development & innovation cycles, thus leading to quality and ownership issues. Thankfully, technology is evolving at an increasing pace and new tools are continually released. It has never been easier for experts to expand their capabilities and cross over into new domains.
  6. Keep the right Data Scientists/Data Engineers ratio: What is the optimal Data Scientists/Data Engineers ration? For most customers, the answer will depend on the maturity of the business. If your data are not accessible or you don’t maintain and track your data, you will likely need more engineering and less science. On the other hand, if you already have an established data pipeline, data warehouse, and data lake, you will likely want more science and less engineering. In some cases, your business will have specific requirements, which can affect the skills needed as well. As a rule of thumb, plan to have 2-3 engineers for every data scientist in the building phase, and 1:1 when a system is already deployed.
  7. Have clear KPIs (Key Performance Indicators)by which your project’s success can be measured. For example, imagine a Recommendation Engine project for an online media company. “Enhance user experience” might be a great goal, but without a way to measure success, this objective is overly ambiguous. Stakeholders might even disagree over whether the goal has been met, which can cause wasted resources and inefficient development. Can “enhancing the user experience” be measured by time spent on the platform? The number of videos watched? The number of new categories explored by the user? Each measure could lead to a different recommendation system.

Having clear goals & KPIs will help you plan and execute more effectively:

ML initiatives are exciting and can be extremely fruitful. However, lack of focus, limited resources, and improperly set of expectations can cause anxiety. Holding a “ML Discovery Workshop”, in which all stakeholders, both business and technical, brainstorm ideas, discuss their company’s biggest challenges, and plan can help enormously. During the workshop list all of your biggest challenges, their feasibility, estimated efforts, and missing skills and tools, and come up with a list of projects and a concrete execution plan. However, even the most well-intended execution plan will flounder without proper focus. With this in mind, remember: Be Customer Focused, Iterate Fast, Distribute data science when effective, Plan for roadblocks, Staff appropriately, and Choose specific KPIs that matter.

The writer is a senior data scientist at AWS and has been helping enterprises with their machine learning and cloud journey.

Subscribe to our free newsletter
To Top