Skip to main content
BI4ALL BI4ALL
  • Expertise
    • Artificial Intelligence
    • Data Strategy & Governance
    • Data Visualisation
    • Low Code & Automation
    • Modern BI & Big Data
    • R&D Software Engineering
    • PMO, BA & UX/ UI Design
  • Knowledge Centre
    • Blog
    • Industry
    • Customer Success
    • Tech Talks
  • About Us
    • Board
    • History
    • Partners
    • Awards
    • Media Centre
  • Careers
  • Contacts
English
GermanPortuguês
Last Page:
    Knowledge Center
  • Approaching the Modern Data Stack

Approaching the Modern Data Stack

Página Anterior: Blog
  • Knowledge Center
  • Blog
  • Fabric: nova plataforma de análise de dados
1 Junho 2023

Fabric: nova plataforma de análise de dados

Placeholder Image Alt
  • Knowledge Centre
  • Approaching the Modern Data Stack
7 March 2024

Approaching the Modern Data Stack

Approaching the Modern Data Stack

Key takeways

Organisations need to uncover operational blind spots and harness the potential of uncollected data. Decisions on the adoption of technology, whether through suppliers, consultants or internal teams, are key to eliminating these blind spots.

Smaller or less tech-savvy organisations are embracing data analytics and cloud services to improve customer understanding, simplify processes and explore automation possibilities.

Making informed decisions about tools and architecture involves considering factors such as managed services, customised workloads and cloud and vendor lock-ins. Organisations must strike a balance between costs, maintenance efforts and flexibility for a modular architecture aligned with short-term value and long-term vision.

What business data points are you blind to? What’s the potential left on the table with the data that you’re not storing? What relationship, short and long term, do you want with the technology that can remove these blind spots – rely on a provider, a consultancy company, or staff up internally?

TL;DR: there is not one (easy) path, or we might be out of a job.

With the abundance of data sources – from ERPs and CRMs to social media and IoTs, even start-ups and non-tech businesses understand the value of analytics and cloud services in retaining and acquiring new business by better understanding their audience, the potential for more cost-efficient internal processes and, the possibility of automation in their organization.

However, these smaller – or less tech-savvy – organizations will be cautious about developing and maintaining workloads if they are not specialized.

They will rely on a consultancy company (like us!) to develop and maintain their workloads.

They can use that initiative to grow their team internally by actively participating in the development and taking on some, if not all, the future ongoing maintenance and developments.

As a data analytics specialist, every new client means delving deep into their business to assist in building a data analytics roadmap that delivers short term but is also aligned with a long-term organization-wide vision that considers culture, competencies, goals, commitments and chooses the right processes, technology, and development path.

Development is aligned; what about the tools?

Some important points:

  • Managed services can be a great way to start the journey – for organizations with little or no capacity for maintaining workloads and don’t want to be dependent on outsourcing services – but as the number of developers, workload volume, and end users grows, it will become increasingly expensive, so choose managed services accordingly to your organization roadmap.

 

  • For a more customized workload—including purpose-built connectors for legacy systems, less common APIs, and client-specific automation tasks that require joint orchestration with core workloads—you may need to choose services, tools, or outright architecture layers that can increase maintenance effort.

 

  • Where are your sources, where are your users, and where do you want to see the data – transferring data out of the cloud implies a small fee, so it seems intuitive to try and keep source data, transformation processes, and visualization dashboards in the same cloud and region, however – that may reduce the choice of services and potentially result in higher transformation costs or reduce overall product adoption by the dashboards end-users.

 

  • The degree of good cloud lock-in needs to be discussed – an actual cloud-agnostic workload may prove expensive and mostly unnecessary, but some architecture decisions may heavily reduce any potential future migration efforts, such as choosing data analytics tools (extraction processes, business transformation logic, orchestration pipelines, etc.) that can be ported with less refactor effort to another cloud (given a base infrastructure is set up in there) or even into a third party managed services – however, these cloud agnostic tools can mostly either be more expensive or require more maintenance than their “cloud native” counterparts, making it a compromise between cloud lock-in, price and maintenance.

 

  • The degree of acceptable service vendor lock-in also needs discussion – the choice of service for extraction, transformation, orchestration, visualization, cataloguing, and so on will determine the refactoring effort required later to replace it with another service. What can you do? You can choose open-source technologies or technologies with development language that can more easily be translated into other tools (such as SQL and Spark). Still, most importantly, your architecture should be as modular as possible to limit the impact area when replacing any component.

What might a managed services data analytics architecture look like initially?

Starting with the Datawarehouse – there is some good competition on the market – Snowflake, Amazon Redshift, and Google BigQuery are popular services offering different features and pricing options. Hence, it’s critical to understand which features you need and determine your usage pattern to assess the right technology.

DBT is a good choice for creating and maintaining your business models. It’s a framework for creating SQL models with features such as data testing, data lineage, documentation, version control, and abstraction layers that make your project modular and flexible. A well-developed DBT project will promote replacing Datawarehouse technologies in the future as it considerably lowers the rework effort.

DBT is an open-source project; however, there is a Cloud offering with competitive pricing for a small development team – all without worrying about hosting!

If you’re using source data from another team or department and it’s already in storage that can be referenced by your Datawarehouse (such as Amazon RDS, Amazon S3, Azure Storage Account or Google Cloud Storage, etc.), then congratulations as your backend data architecture may start with just these two tools.

However, most projects will require extracting new data. In this case, you can use a managed service like Fivetran or Stich, which will work to host and maintain the connectors; you choose the source, the target and pay-per-use.

If you start facing requests such as – “I want specific models to be refreshed right after certain data is extracted and loaded”, “I want to add custom sources that my extraction tool does not support”, or “I need some custom automated steps to connect a few processes during the ELT process”, then you might need to add another layer into your ecosystem, a flexible orchestrator that can oversee the entire process and fill potential missing gaps, such as the Astronomer – which is essentially a managed version of Apache Airflow.

Many other managed tools are available in the market for various use cases, such as cataloguing, testing, and monitoring, that can be added to this architecture.

The more applications you add, you may have a more challenging time managing your developments, integrations and deployments, as each tool may have its own approach.

If this following sentence ever makes sense to you – “My business feature requests are growing, so is my developer team and the workload size, taking me into higher tiers of the managed services which are becoming too expensive – I want to take on some of the maintenance internally and customize my architecture to my exact needs” – then the following architecture aims to address that:

The first objective is to set up a good base infrastructure in any of the clouds that can scale, is well adopted, allows for any application hosting, and has resource efficiency – Kubernetes is the standard to fulfil these requirements (it’s how most managed services work in the backend) as it will host your applications, share cloud resources between them for efficiency and will assure communication among applications and the outside world.

You don’t need to manage “pure Kubernetes”, though, as each cloud has its own managed service for Kubernetes so that part of the maintenance is assured by the cloud provider (like every managed service – for a small fee), and you can add your cloud agnostic workloads inside – a nice compromise between customization, price, features, and maintenance.

After the base infrastructure is setup, you can then install the applications you need, such as Apache Airflow and DBT Core, among others – and refactor the previous workloads from managed services into the new architecture – if the managed services were chosen with a roadmap like this in mind, the refactor of business rules, and process orchestration will be heavily reduced.

Some benefits of this architecture are the pricing at a larger scale, the ease of creating multiple environments quickly and dynamically as needed—such as various dev environments (one for each development or developer) and proper pre-prod environments for better testing and UAT process—and a more centralized CI/CD pipeline.

Before committing to an architecture, many more considerations must occur, as each client will have unique requirements and blockers.

However, one common theme is that business teams want new features delivered faster and faster. The market for data analytics is evolving rapidly, and new and better offerings are continuously being released, so the importance of being adaptable in this marketplace is more significant than ever.

Author

Hugo Lopes

Hugo Lopes

Technical Lead

Share

Suggested Content

Data sovereignty: the strategic asset for businesses Blog

Data sovereignty: the strategic asset for businesses

In 2025, data sovereignty has become the new engine of competitiveness — turning massive volumes of information into innovation, efficiency, and strategic advantage.

Modern Anomaly Detection: Techniques, Challenges, and Ethical Considerations Blog

Modern Anomaly Detection: Techniques, Challenges, and Ethical Considerations

Anomaly Detection identifies unusual data patterns to prevent risks, using machine learning techniques

Optimising Performance in Microsoft Fabric Without Exceeding Capacity Limits Blog

Optimising Performance in Microsoft Fabric Without Exceeding Capacity Limits

Microsoft Fabric performance can be optimised through parallelism limits, scaling, workload scheduling, and monitoring without breaching capacity limits.

Metadata Frameworks in Microsoft Fabric: YAML Deployments (Part 3) Blog

Metadata Frameworks in Microsoft Fabric: YAML Deployments (Part 3)

YAML deployments in Microsoft Fabric use Azure DevOps for validation, environment structure, and pipelines with approvals, ensuring consistency.

Metadata Frameworks in Microsoft Fabric: Logging with Eventhouse (Part 2) Blog

Metadata Frameworks in Microsoft Fabric: Logging with Eventhouse (Part 2)

Logging in Microsoft Fabric with Eventhouse ensures centralised visibility and real-time analysis of pipelines, using KQL for scalable ingestion.

Simplifying Metadata Frameworks in Microsoft Fabric with YAML Blog

Simplifying Metadata Frameworks in Microsoft Fabric with YAML

Simplify metadata-driven frameworks in Microsoft Fabric with YAML to gain scalability, readability, and CI/CD integration.

video title

Lets Start

Got a question? Want to start a new project?
Contact us

Menu

  • Expertise
  • Knowledge Centre
  • About Us
  • Careers
  • Contacts

Newsletter

Keep up to date and drive success with innovation
Newsletter

2025 All rights reserved

Privacy and Data Protection Policy Information Security Policy
URS - ISO 27001
URS - ISO 27701
Cookies Settings

BI4ALL may use cookies to memorise your login data, collect statistics to optimise the functionality of the website and to carry out marketing actions based on your interests.
You can customise the cookies used in .

Cookies options

These cookies are essential to provide services available on our website and to enable you to use certain features on our website. Without these cookies, we cannot provide certain services on our website.

These cookies are used to provide a more personalised experience on our website and to remember the choices you make when using our website.

These cookies are used to recognise visitors when they return to our website. This enables us to personalise the content of the website for you, greet you by name and remember your preferences (for example, your choice of language or region).

These cookies are used to protect the security of our website and your data. This includes cookies that are used to enable you to log into secure areas of our website.

These cookies are used to collect information to analyse traffic on our website and understand how visitors are using our website. For example, these cookies can measure factors such as time spent on the website or pages visited, which will allow us to understand how we can improve our website for users. The information collected through these measurement and performance cookies does not identify any individual visitor.

These cookies are used to deliver advertisements that are more relevant to you and your interests. They are also used to limit the number of times you see an advertisement and to help measure the effectiveness of an advertising campaign. They may be placed by us or by third parties with our permission. They remember that you have visited a website and this information is shared with other organisations, such as advertisers.

Política de Privacidade