You’re Small But You Can Still Test

I hear the same advice at practically every digital strategy conference and event I attend ~ test everything. In principle I agree with this advice, but when I was working for a small nonprofit I also had to be practical about what I could realistically accomplish on the testing front given the other varied demands on my time. I’ve worked for a series of organizations with small digital programs, where it’s been just me or it’s a two-person shop. We all wish we had a list the size of Organizing for America and a digital staff to match. But we don’t! So how should you structure a testing program that matches the resources and constraints faced by a small digital department?

There are two questions you should ask yourself:

  1. What do you have the tools or software to test?

Do you have Optimize.ly to test website performance? Do you have ShareProgress which allows you to test your organizations social sharing text and indirectly test your messaging? Do you have an email system, like SalsaLabs or Convio, that permits you to A/B test your emails? If not, I suggest you acquire at least one of them pronto so you can optimize your program!

2.  The next question you should ask yourself is what do you want to optimize?

Focus on the acquisition of information that could have a large impact on your digital strategy going forward. When someone asks me if we can test something I always ask what they are hoping to learn and what are they going to do with the information the test(s) provide? I want to make sure there is a plan to use the information we gather to advance the performance of the organization moving forward.

If you’re testing on email do you want to learn about your email template, signers, or length and tone of the email? If you’re evaluating a donation page you could test the ask string, image or no image, one column or two, and the text. Clearly, there are many features and outcomes that can be evaluated with a test, so it’s important to choose carefully to get the most value for the time you allocated to testing.

When I’m trying to decide what to test during a campaign I make a list of the options and then determine which one I think will have the biggest impact moving forward for the program. If you’re working for a small nonprofit it’s also important to consider your capacity to set up the test.

Once you’ve decided what you want to test you have to decide how you want to determine success so you can interpret what your test reveals.

For example, if you’re testing a new donation page your variables for success could be the number of donations, the total amount donated, or the average size of a donation. However, you should decide up front which, if not all, of these outcomes is most important. If you choose multiple measures of success it is possible that the tests will reveal that you are successful on one front but not in another. I had a situation like that once where the tests I ran revealed that a new donation page raised more money than its predecessor, but from fewer donors.

You also need to decide what level of statistical significance you’re going to use to determine success. Do you want to be confident in the inference you have drawn virtually all of the time (i.e., 99 percent of the time) or would you be comfortable if 90 percent of the time the conclusion you reach is accurate. Not sure what I’m talking about? Never fear I’ll explain statistical significance in the analytics section of the post in a little bit.

Now you’re ready to devise a testing plan. When you do this make sure to keep in mind the amount of time it takes to conduct a series of A/B tests. Testing often involves two versions of something so if you do your own coding make sure to give yourself double the amount of time.

When I test something that I think will fundamentally change our digital strategy I want to test it multiple times before settling on a view of what the results reveal and hence deciding whether or not to implement a strategy. I do this to determine if results are consistent – say the same thing – and are believable (i.e., statistically significant), because findings can be inconsistent. Moreover, the confidence you hold in your findings can change over time. Running the same test a number of times over a period of time is called longitudinal testing. One of the organizations I worked for tested a new donation page to see if it was attracting more donors and raising more money. The organization ran the test four times over a few weeks. The initial test revealed that the original and the new donation pages yielded practically the same results in terms of number of donors and amount of money raised. However, the second test showed that the new page raised more money and had more donors than the old donation page. The third test resulted in the old donation page having better results for both variables. The fourth test suggested that the original and the new donation pages provided equivalent levels of performance in terms of total donors and total donations. Interestingly, when I compared the two donation pages by examining the performance of each from the begining of the analysis period to the end of the examination period the new donation page outperformed the old donation page in both total donations and number of donors. Situations like this is why longitudinal testing over time is important when considering making major changes to your digital program.

Analytics time!

Analytics is how you’ll determine which version wins the test.

Hang on I want to stop and take a minute to talk about statistics. I know they can be hard to understand but I want you to understand your numbers. I promise it’ll be painless!
Statistical significance is how confident you are that if you repeated the test the results would be the same. Generally three levels of statistical significant are used: 90%, 95% and 99%. The level you use depends on how sure you want to be that your results aren’t random. When testing a new donation page I generally use a 99% confidence interval. Whereas, when testing a subject line I’ll use a lower standard, typically I am happy with a 90% level of confidence in this realm. The reason to use a higher confidence interval for a test that will affect your entire digital program is you want to be much more confident that you can hang your hat on your finding so you don’t casually do something that might harm your program. Subject lines are important, however they only affect one email not the entire shebang. Fortunately, for you, me and the rest of the busy world there is an online calculator called AB/BA that will calculate the statistical significance of your test results.
Finally, save your test data. It’s impossible to remember everything you’ve ever tested and the results of the test. I have a spreadsheet that I use to record the key findings from all my tests and if it was significant or not. I’ll even make this easy for you. Here’s a template of the spreadsheet that I use.

The beauty of testing is that you’re now empowered with evidence to guide your decision making!!

 

Leave a Reply