3 Things: Data Analytics Highlights from March 2024
12th April 2024
By Michael A

This month's '3 Things' blog post includes topics ranging from the retirement of Power BI Premium Capacity SKUs to a deep-dive into DuckDBs Hash Aggregation feature that enables the processing of data larger than the available memory on a single machine. A new section called 'Open-Source Analytics' has also been introduced to help you keep up with that space too.

Read on as we highlight three things for each of the four technology areas that you should be aware of from last month.

Power BI

  • The Power BI Premium per Capicitu SKUs are being retired within a year. Although no immediate action is needed, if you currently have one of those SKUs, you can look at the equivalent Fabric SKUs and start thinking about making the switch. The reserved capacity options for Fabric will save you about 40% compared to their pay-as-you-go equivalent, so consider those if you want to get the best value. Learn more.

  • Power BI Report Builder now empowers you with the Get Data experience. With this, you can use Power Query to connect to over 100 data sources directly from your Power BI paginated reports. This new capability allows paginated reports to connect directly to APIs, enabling reporting scenarios that were previously impossible. The feature is currently in preview. Learn more.

  • The DAX masters at SQLBI, renowned for their expertise in DAX and Power BI, have released a white paper on Power BI's Visual Calculations features. This comprehensive resource will help you understand how visual calculations work, their pros and cons, and how to maximise their benefits in your Power BI reports. It's a must-read for anyone looking to enhance their Power BI skills. Learn more.

Microsoft Fabric

  • There were numerous announcements at the Microsoft Fabric Community Conference last month. Microsoft has gone all-in on this rapidly evolving unified data analytics platform. Arun Ulagaratchahan, Corporate Vice President of Azure Data at Microsoft, set the scene with an in-depth article explaining the four promises of Fabric: (1) Fabric is a complete platform, (2) Fabric is lake-centric and open, (3) Fabric can empower every business user, and (4) Fabric is AI-powered. Learn more.

  • Mirroring in Microsoft Fabric was announced in November 2023 and finally arrived as a public preview. This feature lets you instantly replicate data from support databases (including Azure Cosmos DB, Azure SQL Databases, and Snowflake) into Fabric with just a few clicks. Once configured, the data is kept up-to-date in near real-time, allowing you to unlock insights from the data using any of the Fabric experiences. Learn more.

  • Who would have thought that in 2024, having the ability to create folders in an analytics platform would be a big deal? Well, it is, and this month, the Folders in Workspaces feature became available in public preview. No more messy Fabric workspaces with Fabric items all over the place. No more naming workarounds to force items to appear in a specific order for comprehension. Now, you can quickly organise your Fabric items into well-structured folder hierarchies. Job done. Learn more.

Azure Data Platform

  • Generative AI continues to be a hot topic, and many vendors have taken on the quest to create the best-performing, most efficient large language models (LLMs). Databricks, one of the leaders in this space, just released DBRX. It's an open-source general-purpose LLM that stands out for its superior performance, beating many state-of-the-art models, including GPT-3.5, on most benchmarks. DBRX is a Mixture-of-Experts (MoE) model built on MegaBlocks, which makes it exceptionally quick in tokens/second. Learn more.

  • Microsoft and NVIDIA, two top players in AI technology advancement, have announced significant integrations. This collaboration will accelerate generative AI in various Microsoft platforms, products, and services, including Microsoft Azure, Azure AI services, Microsoft Fabric, and Microsoft 365. The synergies between Microsoft's cloud AI capabilities and NVIDIA's AI hardware expertise promise to deliver robust AI solutions that can transform industries. Learn more.

  • Azure SQL Databases gained support for Regular Expressions (Regex). This new capability means you Regex to your data to improve data quality, validate values, clean data points, extract insights from fuzzy data, and much more. Learn more.

Open-Source Analytics

  • There are currently three leading open table formats: Apache Hudi, Apache Iceberg, and Delta Lake. Some data lake query engines can query tables in any of the three formats, while others only support one or two. Apache XTable (previously called "OneTable") is an open-source project that makes these formats interoperable. For example, you can use XTable to generate the metadata needed for a query engine that supports only Iceberg tables to 'see' your Delta table as an Iceberg table. XTable is an Apache project with the 'incubator' status backed by Microsoft, Onehouse, Google, Walmart, Adobe, Cloudera, and Dremio. Learn more.

  • Daft is a fast, distributed dataframe-based query engine for Python built to outperform Apache Spark in performance and ease of use. A recent Delta Lake blog explores how you can use it to explore massive Delta Lake tables with excellent query performance. Learn more.

  • A recent DuckDB blog post goes into depth to explain why its hash aggregation feature is such a big deal. It enables you to aggregate data that exceeds the size of the total memory available to your machine. For example, if you have a laptop with 16 GB and a query requires 32 GB, hash aggregation will use your disk storage (e.g. SSD) to extend the available memory and complete the operation. Learn more.

Did You Find This Useful?

Get notified when we post something new by following us on X and LinkedIn.