Data Architect
Certification Guide

The Data Architect Certification is a credential developed for Salesforce professionals who have experience in designing Data Architecture and Management solutions on the Salesforce platform and are looking to verify their expertise. Working experience of the product is important for this certification in particular as it’s designed specifically for professionals who can architect a solution for a particular customer scenario.

* NOTE : In December 2021, Salesforce renamed the “Data Architecture and Management Designer” certification to “Data Architect” certification. Exam content has remained unchanged. You can learn more about these changes from the official Salesforce release note.

Key Facts

The exam is made up of 60 multiple choice questions

105 minutes to complete

The passing score is 58%

There are no prerequisites

Cost is USD $400 and the retake fee is USD $200 if you are unsuccessful

This information will assist you if you’re interested in becoming certified as a Data Architect and includes an overview of the core topics in the exam.

In the Data Architect exam, there are 6 topics covered. Data modeling/Database Design and Salesforce Data Management both are the areas with the highest weighted percentage at 25%. As both, combined, are already half of the percentage of the certification exam, these are the areas that you must focus on to do well.

Objective	Weighting
Data Modeling / Database Design	25%
Salesforce Data Management	25%
Large Data Volume Considerations	20%
Data Migration	15%
Data Governance	10%
Master Data Management	5%

Data Architecture and Management Designer Topic Weighting Chart

Data Architect Certification Contents

The following are the core topic areas of the Data Architect certification and what you’re expected to know:

Data Modeling / Database Design

The data modeling / database design topic is one of two largest sections of the exam.

The first objective requires you to compare and contrast various techniques for designing a Lightning Platform data model. This includes how to use standard and custom objects, standard and custom fields, different types of relationships and object features that are available as part of the platform. A data model typically includes standard and custom objects, relationships among those objects, fields, and other features such as record types. An entity relationship diagram can be utilized to visualize the data model. Metadata API can be used to retrieve, deploy, create, update or delete customization information.

The second objective is related to designing a data model that is scalable and obeys the current security and sharing model. There are various features in Salesforce that support different business processes, such as Person accounts which can be utilized to store information about individual customers. External objects can be used to make external data visible in Salesforce. Picklist fields allow users to select a value from a predefined list of values, which ensures high data quality. It is also important when designing a data model and storing data in Salesforce to ensure that there is sufficient storage space.

The next objective is to compare and contrast various techniques, approaches, and considerations for capturing and managing business and technical metadata (for example, business dictionary, data lineage, taxonomy, data classification). Various approaches can be utilized for metadata documentation. A data dictionary can consist of entity relationship diagrams (ERDs). Data taxonomy can be used for classifying data into categories and subcategories and using common terminologies. Defining a data lineage includes specifying the data origin, how data are affected, and how the records move within the lifecycle. Data classification can be used to identify unstructured metadata and categorize them based on security controls and risk levels.

You need to understand how to compare and contrast the different reasons for implementing Big Objects vs Standard/Custom objects within a production instance, alongside the unique pros and cons of utilizing Big Objects in a Salesforce data model. Big objects are typically used for storing large volumes of data, such as hundreds of millions or even billions of records. They can be used to retain records in Salesforce for compliance or auditing purposes. However, special considerations are required for creating, populating and accessing big object records. They do not support all the field types, unlike sObjects. An index is created when creating a new big object, which is used to query the records. Although records of a big object can be queried using SOQL, not all operations are supported. Async SOQL can be used to query the records in the background.

The last objective requires you to understand approaches and techniques to avoid data skew, including record locking, sharing calculation issues, and excessive child to parent relationships. Data skew occurs when too many records are related to a parent record or owner in Salesforce. There are three types of data skew, namely, account data skew, lookup skew, and ownership skew. It can cause issues such as lock exceptions and long-running sharing calculations. However, there are approaches and techniques to reduce or avoid them.

Master Data Management

There are 4 objectives in the Master Data Management section.

The first objective is to compare and contrast the various techniques, approaches, and considerations for implementing Master Data Management Solutions. An MDM solution requires choosing an implementation style, such as registry, consolidation, coexistence or transaction. Data survivorship techniques can be utilized to determine the best candidates for the surviving records. A matching policy can be utilized to determine how the records should be matched. Canonical modeling can be used for communication between different enterprise systems. Furthermore, a typical MDM solution should have certain hierarchy management features.

The second objective is given a customer scenario, recommend and use techniques for establishing a "golden source of truth"/"system of record" for the customer domain. When it comes to an MDM implementation, it is necessary to outline the golden record or the source of truth and define the system of record for different types of data elements. When there are multiple enterprise systems and data integrations, stakeholders can be brought together and data flows can be reviewed to determine the system of record for different objects and fields. It is important to review the flow of data from one system to another to determine which system should act as the system of record for a given type of record or data element when it is modified.

The third objective is given a customer scenario, recommend approaches and techniques for consolidating data attributes from multiple sources. When using an MDM solution, it is necessary to consider how different types of data attributes, such as field values, should be consolidated to create the master record. Data survivorship rules should be established to determine which field value from a particular data source should survive during consolidation of two records. Factors and criteria can be defined for data survivorship.

The final objective is given a customer scenario, recommend appropriate approaches and techniques to capture and maintain customer metadata to preserve traceability and establish a common context for business rules. Salesforce provides various features for capturing metadata, such as Event Monitoring for user events. Setup Audit Trail can be used to view and download changes made by users in Setup. Field History Tracking allows tracking of new and old field values. Field Audit Trail allows defining a data retention policy for field history data. Furthermore, custom metadata types and custom settings can be created to store custom configuration information specific to business requirements.

Salesforce Data Management

There are 4 objectives in the Salesforce Data Management section.

The first objective requires you to be able to recommend appropriate combination of Salesforce license types to effectively leverage standard and custom objects to meet business needs. A license may provision functionality at the org level such as product licenses, or at the user level such as user licenses, or it may supplement features and services to a main license such addons and feature licenses. Availability of the different standard objects and features as well as the number of custom objects, and how they can be accessed or utilized in an org depend on the licenses used. To effectively address and meet the different business needs of each type of entity and user, one must be knowledgeable of which licenses to use or combine in an org, including licenses for internal users and licenses for external users in a community.

The second objective is related to recommending techniques to ensure data is persisted in a consistent manner. This includes understanding data quality issues and techniques to improve data quality. Data quality issues can exist along various dimensions, including age, accuracy, completeness, consistency, duplication, and usage. These issues can cause missing insights, wasted time and resources, poor customer service, and reduced adoption by users. Techniques such as using the duplicate management features, validation rules, data cleansing, standardization, and dependent picklists can be utilized. Validation rules can be utilized to ensure that users enter the correct data and use the correct format. Workflow rules can be used for automatic field updates on records. Approval processes can be used to allow users to submit records for approval. Duplicate and matching rules can be used to prevent the creation of duplicate records and show duplicate alerts to users.

The next objective is related to understanding how when there are multiple systems of interaction techniques can be used to represent a single view of the customer on the Salesforce platform. A single view of customer refers to a unified 360-degree view of each customer. It can be achieved by consolidating data from multiple sources in an MDM hub using integration, performing data cleansing operations such as deduplication, and enriching the data using solutions like Lightning Data. Customer 360 Data Manager is a feature offered by Salesforce that allows consolidating data from multiple Salesforce orgs and clouds and creating a unified view of each customer using global profiles. Salesforce Identity can be used to allow customers to login once to access all the connected systems.

The final objective requires you to be able to recommend a design to effectively consolidate and/or leverage data from multiple Salesforce instances. There are various native solutions that can be used to leverage data from other instances. These include Salesforce to Salesforce, Bulk API, Salesforce Connect, and Batch Apex. A master data management (MDM) solution can be used to consolidate data in a central hub and define a single source of truth. However, data consolidation techniques need to be taken into consideration for creating a unified view of data. Customer 360 Data Manager can be used to form a single view of customer data across multiple data sources.

Data Governance

There are 2 objectives in the Data Governance section.

The first objective requires you to be able to recommend an approach for designing a GDPR compliant data model including the various options to identify, classify and protect personal and sensitive information. GDPR is the European data protection law that defines how personal information regarding EU individuals needs to be handled. The impacts of complying with GDPR include ensuring the data model can record consent and privacy preferences. Salesforce includes a number of standard objects to assist with recording preferences and consent and additional custom objects can be added for specific requirements. Personal data and sensitive personal data are covered by GDPR and need to be identified, classified and protected. Salesforce includes a number of tools and features to assist with protecting data from unauthorized access, including classic and platform shield encryption, data masking, field level security, sharing settings, event monitoring, and security session settings.

The second objective is to compare and contrast various approaches and considerations for designing and implementing an enterprise data governance program while taking into account framework for defining roles and responsibilities (for example, stewardship, data custodian, etc.), policies and standards, ownership and accountability, data rules and definitions, monitoring, and measurement. A data governance plan should focus on elements such as data definitions, quality standards, roles and ownership, security and permissions, and quality control. Data stewardship includes defining the teams, roles, and activities for data quality improvement and day-to-day maintenance. Each company needs to implement a suitable data governance model based on their specific requirements. A data governance framework can be based on a centralized, decentralized or hybrid approach. A centralized approach focuses on the execution of rules, standards, policies, and procedures by a central data governance body, while a decentralized approach focuses on data quality maintenance at the individual level. A hybrid model that combines aspects of both these models can also be utilized.

Large Data Volume Considerations

There are 3 objectives in the Large Data Volume Considerations section.

The first objective requires you to be able to design a data model that scales considering large data volume and solution performance. When a large number of records need to be stored in Salesforce, old records can be archived on a regular basis to reduce the impact on performance. Custom indexes and skinny tables can be used to improve the performance of SOQL queries in custom applications. Selective filter conditions can be used in SOQL queries to improve their performance. Divisions can be used to partition the data and reduce the number of records returned by SOQL queries.

The second objective requires you to be able to recommend a data archiving and purging plan that is optimal for customer's data storage management needs. There are various options available for archiving Salesforce data, such as using an on-platform solution like big object or storing data off-platform in an external system or data warehouse. The Bulk API can be considered for removing large volumes of data from Salesforce. An AppExchange solution can be considered to back up data when a company has a custom business requirement that cannot be met using a native solution. Data can be removed from Salesforce and archived for the purpose of reference or reporting. Tools such as Data Loader and ETL can be utilized to automate the process. Other Salesforce features such as Apex trigger and batch Apex can be used to store summarized data and field value changes instead of all the records.

The third objective is related to data virtualization. Data virtualization is a data management approach that enables org users to access and interact with external data by implementing an integration solution offered by Salesforce. It is particularly useful when there are large data volumes that need to be accessed from within Salesforce, but the data does not need to be imported and stored on the platform itself. Salesforce provides different options for virtualizing data, including using Salesforce Connect, Request and Reply, and Heroku Connect. By identifying the characteristics and features of each, one will be able to choose the virtualized data option to implement in a given scenario.

Data Migration

There are 3 objectives in the Data Migration section.

The first objective requires you to recommend appropriate techniques and methods for ensuring high data quality at load time. Validation rules can be utilized to ensure that records contain the required data and use the correct format. Process Builder can be used for automatic field updates when loading records. Duplicate and matching rules can be used to prevent the creation of duplicate records. Lightning Data allows updating records with current firmographic, industry or region-specific data.

The second objective is to compare and contrast various techniques and considerations for importing data into Salesforce. The Bulk API can be used in parallel mode to minimize the data migration time, but it can cause locking issues when migrating child records, which can be avoided by ordering them by the parent record IDs. Sharing rules can be deferred to improve migration performance. Using ‘insert’ and ‘update’ operations is faster than using the ‘upsert’ operation. It is also important to consider other aspects related to migration, such as data storage and API limitations. The Bulk API in parallel mode can be used to ensure maximum performance while importing millions of records.

The second objective is to compare and contrast various techniques and considerations for exporting data from Salesforce. Records can be regularly exported by using the Data Export option. When extracting more than 10 million records, PK Chunking can be utilized to avoid a full table scan of records. External ID can be used to avoid duplicates while importing records. Child records should be ordered by the parent record ID to avoid record locking errors. The Bulk API in parallel mode can be used to ensure maximum performance while exporting millions of records.

To prepare successfully for the certification exam, we recommend to work through our

Data Architect Practice Exams

Data Architect
Study Guide

Every topic objective explained thoroughly.
The most efficient way to study the key concepts in the exam.

Enrol for our Data Architect Study Guide

Data Architect

Practice Exams

Test yourself with complete practice exams or focus on a particular topic with the topic exams. Find out if you are ready for the exam.

Take the Data Architect Practice Exams

Comments

Md Alam

67% as ti seems to pass now a days

6 months ago Log in to Reply

Sandeep Focus Team

Hi Md! We’ve double-checked the official exam outline for the Data Architect certification, and the passing percentage is still listed as 58%. If you’ve seen any conflicting information, feel free to share, we’d be happy to investigate further!

6 months ago Log in to Reply

Vish Iyer

I passed the Data Architect certification in my first attempt. Thank you guys for creating such an awesome study material.

3 years ago Log in to Reply

Sandeep

Congratulations Vish!! for passing Data Architect on the first attempt. Thank you for the kind words and for being with us on this journey. We hope to keep bringing you more updates and features in the future to continue supporting your Salesforce journey.
Congratulations again and All the best!

3 years ago Log in to Reply

Cancel reply

You must be logged in to post a comment.

@

Not recently active

Data ArchitectCertification Guide