Data Minimization: The New Challenge for Data Architectures

Data Minimization: The New Challenge for Data Architectures


Netflixing your data: What is data minimization, what influence does it have on data architecture and how does data virtualization enable you to reduce redundant data ?

24 February 2022 (9-17h CEST)
Location: Live Online Event (@YOUR DIGITAL WORKPLACE)
Presented in English by Rick van der Lans
Price: 640 EUR (excl. 21% VAT)
Register Now »

 Learning Objectives

Why do we Organise this One-day Seminar on Data Minimization ?

Data minimization must be a dominant guiding principle for any new data architecture. When data minimization is applied, architects strive for reducing the number of data copies.

Studies indicate that quintillions of data are being produced every day. However, most of it is not new or original data, but copied data. For example, data about a specific customer can be stored in a transactional system, a staging area, a data warehouse, several data marts, and in a data lake. Even within one database, data can be stored several times to support different data consumers. Additionally, redundant copies of the data are stored in development and test environments. But business users also copy data. They may have copied data from central databases to private files and spreadsheets. The growth of data is enormous within organizations, but a large part consists of non-unique, redundant data. Also, data infrastructures are currently made up of data lakes, data hubs, data warehouses, and data marts. And all these systems contain overlapping data. Additonally, organizations exchange data with each other. Often, the receiving organizations store the data in their own systems, resulting in even more copies of the data.

And now with new regulations concerning data privacy, such as GDPR, architects must be even more careful with storing copies of personally identifiable data. The more copies, the more complex conforming to these regulations is.

In the old days, several reasons existed to create data copies. But database server performance, cloud technology, and network speed have improved enormously, making it needless in most cases. Unfortunately, new data architectures are still designed in which data is stored redundantly. We think too casually about copying data and storing it redundantly. We create redundant data too easily, and this unrestrained duplication must stop. Copying data has many drawbacks and challenges:

  • Higher data latency
  • Missed opportunities
  • Complex data synchronization
  • More complex data security
  • More complex data privacy
  • Higher development costs
  • Higher maintenance costs
  • Higher technology costs
  • More complex database administration
  • More complex metadata administration
  • Reduced data quality

Rick explains why data minimization is important:

This is where data minimization comes in. It is a recommended design principle for any new data architecture. Data minimization rests on two pillars. Firstly, data-on-demand is preferable to data-by-delivery. Users must be able to access data when they require it without unnecessary needs to create copies of the data. Secondly, accessing original data is preferable to accessing copied data.

During this seminar, Rick van der Lans explains how you can work towards a data-on-demand architecture and with which solutions and technologies this becomes a reality. He will discuss, among other things, what data minimization is, what influence it has on data architectures, and how data virtualization enables you to reduce redundant data.

This masterclass is a LiveOnline event. This means that there will be an expert speaker in a virtual meeting room along with you and other highly interested participants. We will make your learning experience as immersive and interactive as we have done in the past 25+ years, but now in a live, online environment. Besides answering your appetite for knowledge and your questions, we will stimulate the interactivity between the speaker and the participats, and between participants.

Learning Objectives: What will you learn in this masterclass ?

This masterclass has various modules:

  • How the design principle called data minimization is related to simpler data architectures
  • What the two pillars of data minimization mean: data-on-demand and accessing original data
  • What the real drawbacks are of creating too many copies of the data are, including higher data latency, complex data synchronization, more complex data security and privacy, and higher development and maintenance costs
  • How new database, integration, and cloud technology can help to design simpler data architectures that contain less copied data
  • What the effect is of applying data minimization to data warehouse and data lake architectures
  • How managed-file-transfer can be replaced by data-on-demand, and how the number of data flows between organizations can be reduced
  • How data architectures should be designed from the perspective of data processing specifications and not data stores

Who should attend this masterclass ?

This one-day masterclass is aimed at everyone who is working with data storage, architecture and processing in your organisation, and are dreaming of a data-on-demand architecture.

Your role can range from Chief Data Officers (CDO) and technology planner to ICT and enterprise architect, and from data analyst, data warehouse designer, data architect, solution architect, to data engineer, data scientist and data consultant.

 Full Programme

Presented by Rick van der Lans

This masterclass is a LiveOnline event. This means that there will be an expert speaker in a virtual meeting room along with you and other highly interested participants. We will make your learning experience as immersive and interactive as we have done in the past 25+ years, but now in a live, online environment. Besides answering your appetite for knowledge and your questions, we will stimulate the interactivity between the speaker and the participats, and between participants.

8.45h - 9.00h
We welcome the participants in our digital lobby (Zoom waiting room)
9.00h
Introduction to this masterclass

1. Introduction

  • What is data minimization?
  • The influence of data minimization on data architectures
  • Pillars of data minimization
  • From data-by-delivery to data-on-demand
  • From copied data to original data
  • Reasons why data minimization is important
  • Risks of unrestrained copying and repeated storage of data
  • The business advantages of data minimization

2. New technologies can simplify data architectures

  • Analytical SQL database servers and their distributed, share-based architecture
  • Translytical database servers: combining transactions and analysis
  • Cloud technology offers the required stability and centralisation of data
  • Data virtualization enables reduction of redundant data
  • Messaging technology

3. Applying data minimization to current data architectures

  • From traditional data warehouse architectures to logical data warehouse architectures
  • From physical data lake with zones and tiers to virtual data lakes
  • From data lakehouses to logical data lake houses
  • From data fabrics to logical data fabrics
  • How to transform a managed-file-transfer solution used between organizations to a query-based architecture?
  • Keeping track of data history only once
  • The impact of data minimization of data privacy aspect
12.45h - 13.45h
Break for (Bring your own) Lunch

4. Data track diagrams for designing data architectures

  • What are data track diagrams?
  • Designing a data architecture based on data processing specifications
  • From data track diagrams to data minimization
  • Do not design from a database-centric point of view

5. From data-by-delivery to data-on-demand

  • Disadvantages of data exchange using files (data-by-mail)
  • Advantages of data-on-demand
  • Accessing geographically dispersed data sources
  • What can we learn from Netflix?

6. Closing Remarks

  • General recommendations for implementing data minimization
  • 'Youtubing' your data
17.00h
End of this Masterclass, Opportunity to continue the conversation and Q&A until 17.30h

 Speakers


Rick van der Lans (R20/Consultancy BV)
R20/Consultancy BV

Rick van der Lans is a highly-respected independent analyst, consultant, author, and internationally acclaimed lecturer specializing in data architectures, data warehousing, business intelligence, big data, and database technology. In 2018 he was selected the sixth most influential BI analyst worldwide by onalytica.com.

He has presented countless seminars, webinars, and keynotes at industry-leading conferences. For many years, he has served as the chairman of annual Data Warehousing and Business Intelligence Summit in The Netherlands.

Rick helps clients worldwide to design their data warehouse, big data, and business intelligence architectures and solutions and assists them with selecting the right products. He has been influential in introducing the new logical data warehouse architecture worldwide which helps organisations to develop more agile business intelligence systems.

Over the years, Rick has written hundreds of articles and blogs for newspapers and websites and has authored many educational and popular white papers for a long list of vendors. He was the author of the first available book on SQL, entitled including Introduction to SQL, which has been translated into several languages with more than 100,000 copies sold. Recently published books are Data Virtualization for Business Intelligence Systems and Data Virtualization: Selected Writings.

He presents seminars, keynotes, and in-house sessions on data architectures, big data and analytics, data virtualization, the logical data warehouse, data warehousing and business intelligence.

Check out these related open workshops:

Check out our related in-house workshops:

Questions about this ? Interested but you can't attend ? Send us an email !

-->