Search
  • Sadik Bakiu

Modern Data Team Hats


Different hats in a team. Picture from Storyset.
Different hats in a team. Picture from Storyset.


This blog was written together Martin Rusnak from Rusnak Consulting and Bujar Bakiu.


Not that long ago (maybe somewhere this is still the case), companies had multiple one-dimensional data teams. Each team was composed of either only Data Scientists, Business Analysts, or Data Engineers. With this setup, companies struggled to integrate data products into their wider software architecture. Commonly accepted reasons for this are:

  • Lack of communication between teams. Requirements prioritized in one of them were not aligned with the other teams. For instance, if the Data Science team needed to explore the new marketing campaign data, it had to wait for the Data Engineering team to make these data available.

  • Considering solutions in isolation. Data Scientists might not be considering the performance of the solution during inference, but rather optimizing for accuracy during testing and evaluation. However, the inference would be a huge challenge for the operations team.

Overall, there were huge gaps when building end-to-end processes like automation, orchestration, and testing.


Modern Data Team Hats


To solve the problems with one-dimensional teams, the proven approach of cross-functional teams came to the rescue. In these teams, there are members focused more on Data Analyst, Data Science, ML Engineering, etc. They work together bringing more depth, a wider scope of information, and a diversity of opinions to reach their goal.


We believe that there are no clear boundaries between the roles one can play in the team. Therefore, in this post, we name these hats. A hat is a position someone holds when discussing or solving a problem.


Every team is different, however, these are the most commonly used terms to describe these hats in a data team.


Data Engineer


The Data Engineering hat builds reliable data pipelines and data infrastructure. They serve as a bridge with the infrastructure team to deploy specialized components and upgrades. They take care of integrating other data sources and implementing data quality checks. If needed, data versioning is implemented by this hat. A big part of the work is as well optimization of the performance in terms of ingesting data and answering queries. Most often used tools are:

  • Orchestration, e.g. Airflow, Dagster, Prefect

  • Data processing, e.g. Pandas, Spark, Dask

  • Data warehousing, e.g. BigQuery, Redshift, Hive

  • Data versioning, e.g. DVC, Pachyderm


Analytics Engineer


The Analytics Engineering hat is occupied primarily with cleaning and transforming the data. Together with the data engineering hat, they bring software engineering best practices to analytics code, like version control, automated testing, and deployment. Tools usually used:

  • Data warehousing, e.g. BigQuery, Redshift, Snowflake

  • Transformation, e.g. dbt, Dataform


Data Analyst


The Data Analyst hat interrogates the data looking for insights to support data-driven decision making. They have strong collaboration and skill overlap with the Analytics Engineer. They visualize the data to help everyone make sense of it. Tools used:

  • Visualization, e.g. Metabase, Looker, Power BI, Tableau

  • Transformation, e.g. dbt, Dataform, SQL


Data Scientist


The Data Scientist hat finds the best way to model the data for predictions. They have strong skills in feature engineering. People wearing this hat have deep knowledge of machine learning techniques, statistics, and analytics. Used tools are:

  • ML libraries like scikit-learn, XGboost

  • Deep Learning libraries, e.g. Tensorflow, PyTorch

  • Experiment tracking, e.g. MLflow, Kubeflow, Aim

  • Feature store, e.g. Feast, Hopsworks

  • Explainability, e.g. Lime, SHAP


Machine Learning Engineer


Machine Learning Engineering hat brings in a thorough knowledge of software engineering best practices. They productionize ML models to solve business needs and integrate them with the current organization infrastructure. They build the infrastructure for A/B testing, distributed model training, and ML workflow orchestration as well as extend existing platforms. Tools used:

  • Orchestration: MLflow, Kubeflow, Flyte, Kubernetes

  • Model serving, e.g. seldon-core, BentoML, TensorFlow Serving, Torchserve

  • Training, e.g. Horovod, Ray

  • Feature store, e.g. Feast, Hopsworks


MLOps


The MLOps hat focuses on integrating automation and monitoring at all steps of ML system construction. They bring DevOps best practices to the team, like integration, deployment, model monitoring, etc. The most commonly used tools are:

  • Model Monitoring, e.g. whylabs, evidently

  • Automation, e.g. Gitlab CI, Github Actions

  • Infrastructure, e.g. Terraform, Kubernetes, Helm charts


Product Manager


The Product Manager hat is usually separate from the other very technical hats. They make sure what is being developed bring value to the users and stakeholder.


A team will likely not contain all these hats. Which ones are required depends on the team size, challenge at hand, and many other factors. Often, one person covers more than one hat.


At Data Max, we focus on covering all the hats mentioned here. We are proud of our expertise and are eager to share our knowledge. Reach out to us at hello@data-max.io.