Skip to main content
BI4ALL BI4ALL
  • Expertise
    • Artificial Intelligence
    • Data Strategy & Governance
    • Data Visualisation
    • Low Code & Automation
    • Modern BI & Big Data
    • R&D Software Engineering
    • PMO, BA & UX/ UI Design
  • Knowledge Centre
    • Blog
    • Industry
    • Customer Success
    • Tech Talks
  • About Us
    • Board
    • History
    • Partners
    • Sustainability
    • Awards
    • Media Centre
  • Careers
  • Contacts
English
Português
Last Page:
    Knowledge Center
  • Large Datasets – Main challenges and best practices

Large Datasets – Main challenges and best practices

Página Anterior: Blog
  • Knowledge Center
  • Blog
  • Fabric: nova plataforma de análise de dados
1 Junho 2023

Fabric: nova plataforma de análise de dados

Placeholder Image Alt
  • Knowledge Centre
  • Large Datasets – Main challenges and best practices
6 September 2024

Large Datasets – Main challenges and best practices

Large Datasets – Main challenges and best practices

Key takeways

Large Datasets are defined by their size and complexity.

The main challenges involve performance problems and data model limitations.

Good practices involve the use of aggregated tables, simplification of models, among others.

The volume of data we deal with has grown exponentially. Large amounts of data – Large Datasets -which used to be the exception, are now the rule. Harnessing its full potential can be a decisive factor between success and failure for companies. It’s up to them to choose the best tool and apply the best practices to make the most of Large Datasets.

This article will delve into the characteristics of a Large Dataset, providing a comprehensive understanding of its main features. By understanding these characteristics, you will be better equipped to navigate the challenges and implement the best practices for effectively using Large Datasets.

 

1. What is a Large Dataset?

In Power BI, a Large Dataset is characterised mainly by two aspects:

1.1 Size:

It is exceptionally large, usually exceeding hundreds of millions, or even billions, of records, and thus easily surpasses the 1GB limit that the Power BI Pro licence allows you to publish.

1.2 Complexity:

Contains a large number of tables and relationships between them, which are often complex, with metrics and calculations that are also complex.

 

These two characteristics, combined with the need for immediate access to data, make it difficult, or almost impossible, to align with Power BI’s capacity and performance limitations.

 

2. Main challenges of Large Datasets

Using Large Datasets can present several challenges if they are not well managed. Poor management can have consequences for performance, usability and even data quality. The main challenges in using Large Datasets are:

2.1 Report Performance

As the volume of data increases, so does the time it takes for Power BI to retrieve the data. This often results in slow and unresponsive reports, negatively impacting the user experience. It’s crucial to be aware of these potential issues and implement strategies to manage them effectively.

In addition, due to their complexity, the queries launched consume many resources, namely memory and CPU, which can cause the system to overload, contributing to poor report performance.

2.2 Limitations of the Data Model

As mentioned earlier, Power BI has a limit of 1GB of data in memory in the Pro licence. Although other licences allow larger limits, managing datasets that exceed 1GB, which is often the case when dealing with large datasets, becomes more complex and requires meticulous planning to ensure full access. Combining in-memory data with Direct Query mode can help with this limitation because it allows you to access all the data without storing it in memory. However, the Direct Query mode can also impact the report’s performance since the queries are launched directly from the source when the report is opened, which means it takes longer to return results. In addition, Direct Query has a limit on the number of queries running in parallel. Therefore, reports with many visuals will be even slower.

2.3 Data quality and consistency

Ensuring good data quality and consistency becomes more complicated when dealing with a large dataset. This is because confirming the data and even cleaning up the data, such as eliminating duplicate or blank data and reworking calculations, takes longer due to the amount of data that has to be processed.

2.4 Data Refresh

Large Datasets take a long time to update because they have a large amount of data, and each update can take several hours. This can lead to delays in updates, as they may finish later or even be blocked. Consequently, the report will not show the latest data, which can mislead those consulting it.

 

3. Best practices for dealing with Large Datasets

To meet the challenges mentioned above, have a good relationship with Large Datasets and get the best out of them, the following suggestions can be implemented:

3.1 Aggregated tables

The visuals in a report don’t always need to consult the lowest granularity of facts. They, more often than not, consult a higher granularity. One way to take advantage of these observations and improve the performance of a report linked to a large dataset is to use so-called Aggregate Tables.

Aggregated tables summarise factual data at a higher level of aggregation, which makes them smaller in terms of the number of rows and the resources and memory consumed, making them lighter to process in Power BI. In addition, aggregated tables contain pre-calculated calculations such as SUMs and COUNTs, e.g. purchase value, which are often used in reports. Having these values already pre-calculated in the table makes their calculation and subsequent presentation in the report quicker and more efficient since the calculations that were previously made at the time of the query are now already calculated in the table and do not require complex operations to obtain.

Aggregated tables can be created directly in Power BI, via Power Query, or the database. When in Power BI, simply use the ‘Manage Aggregations’ functionality and define the correspondence of columns and aggregations between the aggregated table and the factual one. Note that the factual must be in Direct Query mode.

After this, contrary to what we might think, the metrics should continue to consult the factual in Direct Query; that is, they should continue to have the factual in their formula and not the aggregate. What will happen is that the Power BI engine will be responsible for deciding whether, taking into account the filters made in the calculations and visuals, it can consult the aggregate to return the result or whether it has to consult the table in Direct Query. Having the Power BI engine take charge of this choice also avoids complexity in the metrics since you don’t have to make the aggregate vs. factual choice in the metric formula.

In short, using aggregated tables is a great help in improving performance because it avoids consulting the factual for all calculations. This detail makes the user experience much more pleasant and faster when consulting the report.

3.2 Modelling

Whenever possible, a star or snowflake model should be implemented to remove as much complexity as possible from the data model. These types of models simplify the relationships between tables and reduce the complexity of queries, leading to shorter response times.

In addition, the relationships defined should have the cardinality and direction most suited to the model to ensure the most efficient data retrieval.

The creation of calculated columns or tables should also be avoided. This type of operation is difficult for Power BI, so it should be done in the database whenever possible, and only those columns and tables that are strictly necessary should be brought into Power BI to avoid taking up unnecessary memory.

3.3 Visualisations

In terms of visualisations, some suggestions can be implemented to optimise the experience of using the report:

  • Stay under 10 visualisations, including slicers, per page.
  • Maximise the number of visuals that, by default, query the aggregated table, i.e. have a higher granularity, and only query a lower granularity, which will already query the factual in Direct Query, if necessary. This can be done using the DrillDown functionality.
  • Avoid using custom visualisations. Priority should be given to using Power BI’s native visualisations.
  • Use slicers and filters to pre-filter the data in the report to reduce the amount of data processed at one time.

 

Large Datasets are a reality, and we already have functionalities, such as aggregated tables, at our disposal that allow us to face this new challenge and make the most of it.

Share

Suggested Content

Vision 2026: The complete overview of AI Trends eBooks

Vision 2026: The complete overview of AI Trends

This eBook brings together the key trends that will shape 2026, including intelligent agents, invisible AI, and physics.

The Role of Data Governance in Building a Data-Enabled Organisation Blog

The Role of Data Governance in Building a Data-Enabled Organisation

Data governance is the backbone of a truly data-enabled organisation, turning data into a trusted, secure, and strategic asset that accelerates insight and innovation.

Enable Digital Transformation through Data Democratisation Use Cases

Enable Digital Transformation through Data Democratisation

The creation of a decentralised, domain-oriented data architecture has democratised access and improved data quality and governance.

The Data Catalogue: Turning Governance into a Strategic Control Plane Blog

The Data Catalogue: Turning Governance into a Strategic Control Plane

The Data Catalogue transforms Data Governance into a strategic, automated system that connects people, data, and policies to build lasting trust and value.

Strengthening Competitiveness Through Data Strategy and Governance Use Cases

Strengthening Competitiveness Through Data Strategy and Governance

The definition and implementation of a data governance strategy and model enabled data to be aligned with business objectives, ensuring compliance and increasing efficiency and competitiveness.

Enterprise Data Maturity Assessment (DMA) for a Multinational in the Manufacturing Sector Use Cases

Enterprise Data Maturity Assessment (DMA) for a Multinational in the Manufacturing Sector

A decentralised manufacturing multinational implemented a tailored Data Maturity Assessment to align independent entities under a unified data strategy and framework.

video title

Lets Start

Got a question? Want to start a new project?
Contact us

Menu

  • Expertise
  • Knowledge Centre
  • About Us
  • Careers
  • Contacts

Newsletter

Keep up to date and drive success with innovation
Newsletter
PRR - Plano de Recuperação e Resiliência. Financiado pela União Europeia - NextGenerationEU

2026 All rights reserved

Privacy and Data Protection Policy Information Security Policy
URS - ISO 27001
URS - ISO 27701
Cookies Settings

BI4ALL may use cookies to memorise your login data, collect statistics to optimise the functionality of the website and to carry out marketing actions based on your interests.
You can customise the cookies used in .

Cookies options

These cookies are essential to provide services available on our website and to enable you to use certain features on our website. Without these cookies, we cannot provide certain services on our website.

These cookies are used to provide a more personalised experience on our website and to remember the choices you make when using our website.

These cookies are used to recognise visitors when they return to our website. This enables us to personalise the content of the website for you, greet you by name and remember your preferences (for example, your choice of language or region).

These cookies are used to protect the security of our website and your data. This includes cookies that are used to enable you to log into secure areas of our website.

These cookies are used to collect information to analyse traffic on our website and understand how visitors are using our website. For example, these cookies can measure factors such as time spent on the website or pages visited, which will allow us to understand how we can improve our website for users. The information collected through these measurement and performance cookies does not identify any individual visitor.

These cookies are used to deliver advertisements that are more relevant to you and your interests. They are also used to limit the number of times you see an advertisement and to help measure the effectiveness of an advertising campaign. They may be placed by us or by third parties with our permission. They remember that you have visited a website and this information is shared with other organisations, such as advertisers.

Política de Privacidade