This website is a place for me to write about things I find interesting and to keep a log of the things that I learn and build.
I'm currently interested in generative AI, economics, data, painting, and fractals.
A data team gets the best results and has the happiest stakeholders when there is minimal difference between what the data team is offering and what the stakeholders are expecting.
A good way to align expectations is to have clear definitions of what data is available for analysis, and clear indications of the effort required to perform the analysis.
There are two areas where this needs to be applied:
Both parts of the system need to adhere to the same terminology and definitions in order to increase alignment.
In order to align both the data team and the business, we need a common language. This will serve as a basis for two things:
The language we want both the business and the data team to speak is the language of the business itself. Businesses are composed of two elementary components:
In general, events are what the business want to measure, and entities are how we filter or aggregate the events.
All entities and events that the business wants to ask questions about should be defined, with accompanying plain English descriptions such as: “a subscription is a period of time where a customer is receiving a service”.
This list of entities and events make up the basis of our common language, we call it the “business glossary”.
Whenever we talk about entities and events within the data team, as well as with the business, we should ensure that:
If either of the two points are not true, then we need to align all parties until they are true. If we do not, we open the door to misunderstandings further down the line.
An important artefact that builds on the business glossary is the “event matrix”. This matrix tells us which entities are associated with an event. A simple event matrix might look like this (events on the left, entities on the top):
Customer | Employee | Service | Date | |
---|---|---|---|---|
Work order | ✓ | ✓ | ✓ | |
Subscription | ✓ | ✓ | ✓ | |
Customer movement | ✓ | ✓ |
The event matrix gives us an intuitive way of understanding which entities are attached to an event.
With the business glossary and the event matrix, we already have most of the information we need, the next part is applying this information.
We should align our data with the concepts outlined in the data glossary using two table types:
The dimensions that are linked to each fact should match the relationships in the event matrix.
We should gather requirements using the language of the business - the business glossary. We should frame these requirements in the form of questions that the business needs answered. Examples include:
Because of the artefacts we discussed earlier we can very quickly understand two things:
By gathering requirements in this way we are able to ensure that the business will get what they are expecting, and increase stakeholder satisfaction.
At the beginning I mentioned that we can use these techniques to communicate the effort required to perform analysis. We can achieve this by doing three things:
By doing this, we can prioritise ingestion of new sources as the business starts asking questions about entities and events that are contained in those sources. We can also communicate to the business that effort to answer a question may be low if the source is already ingested - or high if not.
Another improvement that can help the business ask more precise questions is including a set of available “attributes” along with each entity and event in the business glossary. Attributes for a customer could include a customers age, gender, and region. Just as before, we should make sure to clearly define these attributes to reduce any misalignment.
Making this information available to the business makes communication much more simple as they can quickly understand which questions they can ask, such as:
These attributes should be physically represented by columns in the dimensions we mentioned earlier.
Let's imagine that we've:
We're now in a position where: