Creating a single view of the customer across the enterprise
- Helps with customer engagement and loyalty by improving customer satisfaction and retention through personalization and targeted marketing communications.
- Helps retailers achieve higher marketing ROI by aggregating customer interactions across all channels and identifying and winning valuable new customers, resulting in increased revenues.
- Behavior Data:Customer behavior data, including the customer’s browsing and search behavior online through click-stream data and the customer’s location if the app is location-based.
- Transactional Data:The transactional data includes online purchases, coupon utilization, in-store purchases, returns and refunds.
- Personal Information:Personal information from online registration, in-store loyalty cards and warranties will be collated into a single view
- User Profile Data:Data profiling will be used as a part of the matching and deduplication process and establish a Golden Record. Profile segments can be utilized to enable marketing automation.
BigQuery is a fully managed data warehouse that is designed for running analytical processing (OLAP) at any scale. BigQuery has built-in features like machine learning, geospatial analysis, data sharing, log analytics, and business intelligence.
This integration enables Customers to move and transform data from MongoDB to BigQuery for aggregation and complex analytics. They can further take advantage of BigQuery’s Built-in ML and AI integrations for predictive analytics, fraud detection, real-time personalization, and other advanced analytics use cases.
This blog talks about how Retailers can use fully managed MongoDB Atlas and Google Cloud services to build customer 360 profiles , the architecture and the reusable repository that customers can use to implement the Reference Architecture in their environments
As part of this reference architecture, we have considered four key data sources – user’s browsing behavior, orders, user demographic information, and product catalog. The diagram below illustrates the data sources that are used for building a single view of the customer, and some key business outputs that can be driven from this data.


In this example, we have considered four representative data sources:
- User profile data through User Profiles
- Product Catalog
- Transactional data through Orders
- Behavioral data through Clickstream Events
User profile data, product catalog, and orders data are ingested from MongoDB, and click-stream events from web server log files are ingested from csv files stored on Cloud Storage.
The data ingestion process should support an initial batch load of historical data and dynamic change processing in near real-time. Near real-time changes can be ingested using a combination of MongoDB Change Streams functionality and Google PubSub to ensure high throughput and low latency design.
2. Data Processing
The data is converted from the the document format in MongoDB to the row and column format of BigQuery and loaded into BigQuery from MongoDB Atlas using the Google Cloud Dataflow Templates and Cloud Storage Text to BigQuery Dataflow templates to move CSV files to BQ.
Google Cloud Dataflow templates orchestrate the data processing and the aggregated data can be used to train ML models and generate business insights. Key analytical insights like product recommendations are brought back to MongoDB to enrich the user data.
3. AI & ML
The reference architecture leverages the advanced capabilities of Google Cloud BigQueryML and Vertex AI. Once the data is in BQ, BigQueryML lets you create and execute multiple machine learning models, but for this reference architecture, we focussed on the below models.
- K-means clustering to group data into clusters. In this case it is used to perform user segmentation.
- Matrix Factorization to generate recommendations. In this case, it is used to create product affinity scores using historical customer behavior, transactions, and product ratings.
The models are registered to Vertex AI Model Registry and deployed to an endpoint
for real-time prediction.
4. Business Insights
Using the content provided in github repo, we showcase the Analytics capabilities of Looker, which is seamlessly integrated with the aggregated data in BigQuery and MongoDB, providing advanced data visualizations that enable the business users to slice and dice the data and look for emerging trends. The included dashboards contain insights from MongoDB and from BigQuery, and from combining the data from both sources.
The detailed implementation steps, sample datasets and the Github repository for this reference architecture are available here.
There are many reasons to run MongoDB Atlas on Google Cloud, and one of the easiest is our self-service, pay-as-you-go listing on Google Cloud Marketplace. Please give it a try and let us know what you think. Also, check this blog to learn how Luckycart is able to handle large volumes of data and carry out complex computations it requires to deliver ultra-personalized activations for its customers using MongoDB and Google Cloud.
We thank the many Google Cloud and MongoDB team members who contributed to this collaboration. Thanks to the team at PeerIslands for their help with developing the reference architecture.
By: Venkatesh Shanbhag (Solutions Architect, MongoDB) and Maruti C (Solutions Architect, Google Cloud)
Source: Google Cloud Blog