Addressing the S3 incident with fixes and discounts
Yesterday afternoon, we unfortunately had an incident that lasted for approximately 90 minutes. This incident was caused by a bug that we placed into the system ourselves.
A few customers were experiencing difficulties in completing assemblies that contained a large number of files (> 100). In an attempt to address this, we inadvertently introduced a bug that caused result files on S3 to be overwritten by the originally uploaded files. That is, of course, a very bad thing. Our test-suite did not catch this, because in order for it to isolate our suite from S3 hiccups and for it to run faster (with all the encoding tests, it already takes nearly 30 minutes to run) we bypassed result files being sent to S3. We did have separate cases for testing our S3 integration, but those unfortunately did not catch this issue.
We noticed the problem in production and deployed a hotfix within 30 minutes. However, as business returned to normal, we deployed another feature that did not include said hotfix. As a result of this, we (or rather you) were hit by this problem again. It must not have been our lucky day.
If you are among the people impacted by this issue, please open a support ticket and we will assist in replaying all your affected assemblies. In addition to that, we will also provide a discount for your February invoice. We will keep the required temporary files around for one week (as opposed to the usual 24 hours).
Finally, we would like to apologize for this. This has been completely our fault and we are very sad that we were not able to deliver as robust of a service as we normally pride ourselves on. We have already taken the necessary steps for this to never happen again. From now on, the test-suite will save all the result files to S3 as well. It might not be the cleanest way of running tests, but this will ensure that our tests will resemble production as closely as possible. Ultimately, that has the highest priority for us.