Supporting Event and Weather-based Data Analytics and Marketing along the Shopper Journey
Updated: May 11
Many companies operating in the eCommerce, Retail, Customer Relationship Management (CRM) and Digital Marketing industries collect large amounts of data about customers at different touch points across the so-called consumer journey. Data analytics provides a powerful means to gain customer insights, but their effectiveness depends on the data they are fed with. Data collected by individual companies often provide a partial view on the customer journey and the analytical models often neglect factors that have an important impact on customers’ decisions.
In this blog post we will share some experiences from our work in the Horizon 2020 research and innovation project called EW-Shopp, and see how you also can use our toolkit for data processing that was developed as part of the project. EW-Shopp aims at supporting companies to gain deeper customer insights by helping them develop analytical services that use rich models, which also consider events that impact on customer decisions, such as weather, marketing campaigns, and holidays. The main goal of the project was to develop a toolkit to facilitate all the data processing steps required to develop reliable weather and event-based data-driven services, including data preparation and enrichment, analytics, and visualization.
Building a knowledge graph using the EW-Shopp toolkit – DataGraft, Grafterizer, ASIA and ABSTAT
Did you know that data preparation accounts for about 80% of the work of data scientists? Preparing and transforming large amounts of data from a raw tabular format to semantically enriched data can be time consuming and difficult. Most data scientists also find this task to be one of the least enjoyable. Moreover, the process of integrating business data in EW-Shopp with events and weather data requires specific knowledge about the content of the knowledge graph and how to map data schemas to shared vocabularies that can enrich the data.
To ease this process of preparing and enriching data, three tools have been developed as part of EW-Shopp to assist users in:
Grafterizer – Cleaning and transforming business data from tabular format to linked data
ASIA – Enriching tabular business data with events and weather data that semantically enhance the content of a knowledge graph
ABSTAT – Understanding the content of the knowledge graphs (the linked product data) by providing statistical profiles and data quality insights
DataGraft is also part of the EW-Shopp toolkit. DataGraft is a platform that provides a collection of tools for integrated management of data transformations and hosting and access of graph data. It is organized as a set of cloud services that are delivered through the DataGraft portal. Grafterizer is one of those tools.
Now let's have a closer look at how the different tools can contribute to more effective work processes and free up time for data scientists to focus on more important tasks such as data analysis. After all, this is where we want to spend more time in EW-Shopp to really understand how events and weather can target marketing along the shopper journey.
Our focus has been on providing users with an integrated solution that can both clean and prepare data, semantically enrich data, and give useful insights about data quality. The result is a data preparation and enrichment service that combines all three functionalities in one user interface. That is three needs met by one solution. The process of onboarding data to the knowledge graph, starts with cleaning and transforming the raw tabular data to a scheme and format that can be mapped to a data model.
By selecting the first tab of the user interface, you will see that Grafterizer features interactive specification of data transformations along with a back-end for management and execution of data transformations. Transformation steps on rows (add, drop, filter, duplicate delete, etc.), columns (add, drop, rename, merge, etc.) and entire data set (sort, aggregate, etc.) are provided together with visualization of the result after each step. To further assist the user in understanding the data, we have added visual data profiling capabilities that analyse and determine data quality based on statistical properties, semantics and structure of data. The data quality assessment is presented to the user by means of statistical and scientific charts and visualizations:
After the user has finished cleaning and transforming the business data, the time has come to transform the tabular data to a graph format that defines the semantic relations and properties in the knowledge graph. Selecting the RDF Mapping tab, the prepared tabular data from our first step of the process can easily be mapped to a graph format by building a tree structure of RDF triples:
Since an important aspect of EW-Shopp is the integration of business data with event and weather data, ASIA provides an interface that guides the user through the semantic annotation, reconciliation and enrichment of the tabular data. Semantic annotations are used to generate mappings from the table of data to a knowledge graph (in RDF or in the ArangoDB JSON format) using one or more vocabularies. ASIA adopts a column-wise approach to semantic annotation, allowing users to define annotations based on smart suggestions provided by the tool. Currently ASIA incorporates suggestions from ABSTAT, the knowledge graph profiling tool, but can be configured to use other terminology recommendation services like the ones based on LOV (https://lov.linkeddata.es/dataset/lov/).
The profiles created by ABSTAT, also named summaries, describe the content of RDF datasets in a synthetic manner, and have proved to be helpful for a variety of application domains such as data understanding, quality assessment, analytical modelling, and vocabulary suggestion.
Moreover, ASIA links data values to shared systems of identifiers, which enables the extraction of additional data from third-party sources and their fusion into the original tabular data. ASIA supports both schema-level and instance-level annotations of a table. The illustration below shows schema-level linking.
Finally, Grafterizer, through the ASIA tool, now enables users to reconcile and extend data in various ways by the use of knowledge graphs (GeoNames, Google GeoTargets, Wikifier) and weather data repositories (ECMWF). More additions and extensions to this feature will be coming in the future!
Managing heterogeneous data using the EW-Shopp toolkit with ArangoDB support
DataGraft's original target has been towards RDF data stored in triple stores using GraphDB. As part of the EW-Shopp toolkit, the services have now been extended to provide transformation and hosting of graph data in the ArangoDB multi-model store. In ArangoDB node data in tabular form and the edge data (graph relationships) can be stored and queried in the same database.
The transformation of data into ArangoDB graph format is different from the standardised triple store due to the data model of the database. Grafterizer is now able to produce transformed collection data (i.e., node and edge collections) in JSON format to ArangoDB that can be downloaded and directly stored. The DataGraft portal can now manage ArangoDB database instances by using administrative credentials to a database. These login credentials and the databases themselves are registered by the user using the ArangoDB Web interface and copied into the DBMS admin page.
Two sets of user credentials are handled: full access and read access (these are automatically generated by DataGraft). Full access to the database (read and write) is only available for the asset owner, while the read-only access can be used for the public ArangoDB databases in DataGraft (i.e., when exposing a database as a public asset on DataGraft). Using the DataGraft asset creation and editing features, users are now able to directly upload JSON collections (either ones produced by Grafterizer, or others) to their managed ArangoDB instances, as well as provide metadata, descriptions and others.