Data cleanliness and data integrity. Ah, such fun things to work on, right? Kind of like spring cleaning, but for some reason all the more frustratingly painful. Since spring is coming early here in North America, we thought it could be useful to share some new insights we’ve had on this topic of data cleanliness, especially for inbound data to Salesforce.
A few of us penguins worked through this last night and today as we’re gearing up for a new release of our Soapbox Engage products, and I thought I’d share some of the highlights of a good brainstorming session that could be useful for others thinking about matching rules.
Our discussions generally broke down into three main areas:
1) Matching scenarios to keep in mind
2) Matching rules
3) Data updating rules
Let’s dive in to our best practices…
1) Matching scenarios to keep in mind
We realized that there are a lot of scenarios to keep in mind as outliers when coming up with any sort of matching logic for your organization. A few interesting ones that we thought were useful to remember when building your matching plans include:
- some households (people at the same physical address) might have the same first name, last name, and mailing address: father and sons with only a suffix different between them (i.e. John Doe Sr, and John Doe Jr.)
- some households might include a couple with different first names, same last names, and same email addresses (i.e. John Doe and Jane Doe both sharing the email@example.com account).
- some households might include a couple with different first names, different last names, and same email addresses (i.e. John Doe and Jane Doe-Smith both sharing the firstname.lastname@example.org account).
- there’s a good chance people with names that are commonly shortened (Robert – Bob, Anthony – Tony) will enter their name differently that you have in your database.
- there’s a good chance names with apostrophes or hypens might come in with different spacing than what you have in your database.
In general, it seems like organizations might find it best to lay out all the different scenarios that they think are most relevant to them, and in a perfect system, weight them on “need to have a human look at” vs. “auto-match/convert this, we don’t have time to do manual conversion/matching”. We’re already adding this logic into Soapbox Engage today, and should add some levers to allow organizations to choose their own matching logic.
2) Matching rules
We added some deeper logic to our Soapbox Engage matching rules, that we thought could be useful to share. Until we’re able to give full control over to the users to determine their own rules, we wanted to take a very conservative approach in matching (i.e. not matching unless we really feel confident about the match). Based on that, we came up with the following rules, that would run from top to bottom via our system.
MATCH = when this data comes in, just directly attribute the activity (like a donation) to the appropriate contact we’re matching against.
LEAVE = when this data comes in, put it in a queue for a human to review, then either match or convert into a new account/contact.
AUTO-CONVERT = when this data comes in, just have the system auto-convert it into a brand new account/contact
(all of these are _exact_ matches on the fields listed)
- MATCH: First Name AND Last Name AND Email Address
- MATCH: First Name AND Last Name AND MailingStreet AND MailingCity AND MailingState AND MailingPostalCode
- LEAVE: First Name AND Last Name
- LEAVE: First Name AND Email Address
- LEAVE: Last Name AND Email Address
- AUTO-CONVERT: Email Address
- AUTO-CONVERT: (none)
This system is all based on the conservative approach to matching, balanced against the opportunity cost of more time doing manual entry. Each organization might want to see this slightly differently. We’ve also put on our roadmap the ability to give organizations guidance based on a scoring system as to what we think could be the right match for them in the LEAVE situations.
3) Data Updating Rules
Since so many organizations are collecting their first contact information online, they might only have the very basics, like first name, last name, and email address in their SFDC instance. One thing that we highly recommend is that everywhere possible, orgs should as for the following key fields: First Name, Last Name, Email Address, Postal Code. This significantly helps targeting, especially for advocacy groups using SFDC.
Now, since most groups will only have basic information to start, whenever they have a chance to add more information, we should find a way to add it to their existing contact record….carefully. Here’s the two rules we came up with that are the most conservative approaches. Your org might find it useful to be more aggressive in wiping old data for new data, but we thought it best to keep it simple first.
- If the existing contact record’s MailingAddress = null, AND MailingCity = null, AND MailingState = null, AND MailingPostalCode = null, then take whatever information you’re inbounding (like a donation billing information) and insert it into the existing contact’s record.
- If the existing contact record’s MailingAddress = null, AND MailingCity = null, AND MailingState = null, AND MailingPostalCode = _not_ null but matches the MailingPostalCode in the inbound data, then take whatever information you’re inbounding (like a donation billing information) and insert it into the existing contact’s record.
Again, you can probably be much more liberal in your approach, but in thinking of the wild scenarios even in our own backyards of San Francisco and DC, it seemed appropriate to be more conservative in writing over data in the contact object.
So, there you go, some things to chew on when you’re thinking through the matching rules for your organization’s inbounding of data. We’ll be applying these best practices to Soapbox Engage, and then plan to give our users more control over setting these rules in the future themselves.