Salesforce continues to grow at a 30% rate year-over-year. As a result, our data silos continue to increase across the company. This is especially true when companies are acquired and incorporated into the overall Salesforce ecosystem. In order to continue to support a $20B dollar company by the year 2020, our internal data integration infrastructure and architecture will need to be modernized to support the volume of business needs while curbing operating expenses. Our vision is to connect the business to data that they can trust, when and where they need it, so that the enterprise can continue to scale and innovate. This document describe in detail how this will be achieved.
Target End State
The diagram below depicts the future end state of a modernized data integration platform. Data integration is about data movement or transporting data from one endpoint to another. To really enable the business to scale quickly, we see a platform that delivers data as a service or Data-as-a-Service (DaaS) to our end users. If you look at the direction of the integration industry, it’s really moving to real-time data access, easy-to-use REST-based API services and customer self-service. This end state positions the company to take advantage of existing standards, quickly adopt emerging standards (e.g. OData), and ultimately enable the business to move faster and find new opportunities through our data.
This diagram depicts the end-to-end lifecycle for the data, from publisher to consumer. Taking a holistic approach helps to understand how data will be transported to the consumers. Let’s peel the onion back and look at each layer of the Data Delivery Platform in more depth.
ESB & SOA
This layer is core to the entire platform and is where most of the the magic happens. Using an ESB and SOA based architecture we can resolve many of the issues related to a point-to-point integration architecture. For example, this layer is responsible for managing connectivity to source and target endpoints, real-time data routing and publishing, data aggregation from multiple sources, and decoupling systems from one another. The diagram below depicts how data integrations are built using this layer.
Let’s double click on this technical design.
As you can see from the diagram, the technical design calls for ESB Adapters; one on the source and one on the target endpoint. These adapters operate independently from each other and encapsulate logic for only their specific responability. For example, if employee data is required to be retrieved from Workday, a Cloud-based HRMS, a source adapter is created with the specific functionality to connect and query employees from Workday, transform each employee record into our Canonical Data Model (CDM), and then drop each record on to a Message Exchange (ME) for consumption by one or more target adapter.
Canonical Data Model
The CDM is a key component in the overall architecture since its core responsibility is to decouple the source system from the target system; providing improved flexibly to the enterprise and reducing overall maintenance costs. This works by mapping source system attributes to a common data format (and hopefully data definitions) for the enterprise. For example, continuing with our employee example, Workday uses <Legal_Last_Name> to denote the employee’s last name. However, Microsoft Active Directory uses <givenName>. In a typical P2P architecture, the developer would simple map <Legal_Last_Name> to <givenName> within the integration transformation. Unfortunately, this approach has tightly coupled or bound the two systems together. What happens when the organization decides to move to another HRMS application, say moving from Workday to Oracle HCM? In this case, every P2P integration that relies on Workday using the <Legal_Last_Name> attribute will need to be replaced. For large organizations this can be a very costly and complex endeavor.
However, in contrast, by inserting and using a CDM, the problem can be reduced significantly. With a CDM, the mapping now would look like <Legal_Last_Name> to <Worker_Legal_Last_Name> to <givenName>. Since <givenName> is mapped to <Worker_Legal_Last_Name> no changes are needed and the developer can focus on only remapping the new <Legal_Last_Name> to <Worker_Legal_Last_Name>. Furthermore, since all other integrations relay on the CDM mapping (i.e. <Worker_Legal_Last_Name>) the developer does not need to replace any other integration, further reducing the cost and time associated with this business applications change.
The Message Exchange (ME) responsibility is to broker messages to and from source and target adaptors in real-time. Continuing with our previous employee example, a single employee update (i.e. title change for employee id 100047) is dropped on to the ME by the source adaptor and and routed to the appropriate subscribing target adaptors. Unlike P2P architectures, new target adaptors can be added as a subscriber without building a new source adaptor; reducing net-new capital investments. More importantly, the ME provides guaranteed delivery of the message. That means if for whatever reason the subscribing target adaptors can not update the target endpoint (system down for maintenance, API user locked out, etc.) the message is not lost, but instead re-queued to try at a later time or if all retries have been exhausted, it is placed into a manual queue to be reviewed and resolved by a support engineer. This is fundamentally different from a P2P architecture where in a failed update attempt the integration will need to be re-initiated, either processing the entire data set or a partial data set. In either case, the data movement happens again.
Let’s recap the core concepts before we move on. First, the encapsulation of logic in each adaptor. This in itself allows changes to happen to the core functionality in a single location, rather than in multiple integration source code branches as seen in a typical P2P integration architectures. Secondly, the transformation to a CDM. This is how the source system is decoupled from the target system; providing enterprise flexibility and agility. When you hear your technical team talk about ‘loosely coupled integrations’ this is what they are referring to. Lastly, the ME allows the integration platform to publish a message, in example an employee update, to multiple subscribing consumers at the same time. This key concept provides a single mechanism to manage the flow of data to one or more subscribing consumers, thus, not requiring a new source integration for every existing or future subscriber.
Many companies implementing SOA-based architecture champion these core design concepts. Moreover, many companies stop at this implementation thinking they have achieved business agility. However, in today’s fast paced high-tech world, a SOA implementation will only provide so much agility. In order to achieve real business agility you’ll need to put the data in the hands of your customers.
APIs & API Management
To put data in the hands of your customers; APIs are needed. Providing APIs to the enterprise will not only enable your business to access data much more rapidly, but can help to uncover new opportunities that would have been otherwise unobtainable. APIs for internal enterprise use come in three different flavors.
- Aggregation APIs – They can be custom developed APIs that simplify the access to internal enterprise data by aggregating multiple source data into a performant, single API. An example of this is Worker data where Worker is defined as any person that performs work for the company (e.g. employee, contractor, consultant, intern, foundation, etc). Worker data resides in three different data sources, Workday, Fieldglass, and Supportforce. A custom developed API for this use case would aggregate the data and possibly enrich the data so the end user can invoke a single API and not have to spend wasted cycles learning and building code to invoke three different data sources.
- Internal Application APIs – APIs can be developed internally by separate application development teams throughout Salesforce to expose certain application functionality. This is the case with the PSE application in Org62. To move to a service oriented enterprise, more teams would need to continue with this.
- Packaged APIs – APIs can be derived from 3rd party packaged software products to access the application functionality.
This can equate to a lot of APIs. In order to manage the APIs, an API Management tool is required. Let’s discuss some of the features that an API Management software would perform.
The moment internal data is exposed via an API, you have to assume it’s at risk, even if the API resides within the internal network/firewall. Many API Management tools provide OAuth2 security authentication implementations to protect access to the APIs for users, developers, and administrators. However, organizations implementing API Management tools also protect the business from other security threats such as back-end overload, and service issues from attacks against XML, JSON, and DoS attacks. More importantly, if an API application has been compromised, it’s a simple administration task to revoke the OAuth2 token, which in turn shuts down access to the API.
Setting API policies on the request and response can help with content-based routing and filtering, rate-limiting, and traffic spikes and can help to increase performance. Without these policies, an API can be brought down and inaccessible for others to use.
An API proxy is a facade for your backend HTTP services. API proxies decouple the developer-facing API from your backend services, shielding developers from code changes and enabling you to innovate at the edge without impacting internal applications and development teams. As development teams make backend changes, developers using the API can continue to call the same API without any interruption.
One important distinction for APIs and integrations is that APIs now have real end users developing and using the API services. Visibility to data movement is important for understanding who has access to what data, using which API, and how much data is being used. With analytics we can now implement, or should I say adhere to, specific SLAs for the end users.
Self-Service API Portal
The last and final mile to providing Data-as-a-Service is to provide an easy to use interface for developers using the APIs to discover and register to use an API. This is done through a self-service portal where API discovery, registration, documentation, and support can be administered.
Let’s walk through a simple use case using our Worker data example. As an internal developer for Salesforce, I want to build a simple, but powerful mobile application that my organization can use to be more effective with day-to-day activities. Let’s say this is a mobile employee directory. As a developer, I know all the employee and contractor information resides in Supportforce, but also know it is not the source of truth, so I wouldn’t be working with the most current data set. However, to call multiple data sources all with different API implementation would be too time intensive and cumbersome. Plus, I’d need build infrastructure to store data for fastest performance.
In this scenario, with a self-service portal, I would start my development by navigating to the internal Enterprise API Portal where I’d review the site’s documentation to determine what APIs are available to me for my mobile application. I find that there are 2 Worker APIs that I can use, Worker Retrieve and Worker Query. Now I that I know of two APIs that I can use to implement my mobile app, I can simple walk through a short wizard requesting access to the APIs and the Worker fields needed for my application. Once granted, I receive an email with my key and secret needed to generate my OAuth2 token which can be used to authenticate my API requests. With access, I can now develop my application calling the two APIs to retrieve real-time, up-to-date Worker data for my mobile application. Under the hood, the Worker API has already aggregated the data I need and spared me the cumbersome task and cost of building my own data model and learning multiple SOAP-based Web APIs. Hallelujah!