Spectre bounds check data flow mitigation

Created 9 January 2018.

Update: Please note that there have been suggestions that this approach is not always effective, that the processor may on some occasions execute the load before the branch, but no PoC was offered.

One potential approach to mitigating the Spectre bounds check vulnerability is to create a data flow dependency between the inputs to the delayed branch and the attacked data access. This is applicable where a comparison of the index to the array length is to be retained, and precisely limits speculative memory access to the array bounds.

Typically a bounds check compares an array index to the array length and branches to an out-of-bounds (OOB) path if the index is greater than or equal to the length, and then the control transfers to the protected memory access operator and this uses only the index data and does not use the array length. If the processor has the index on hand, but not the length, then it can speculate the conditional branch and still meet the data flow dependencies for the memory access operator and this is exploited in the Spectre attack.

If data flow dependencies for the memory access can include the array length too then this blocks the processor speculatively executing the memory access until the array length is on hand too and at that point the bounds check conditional branch also has its dependencies so the speculative path can unwind if necessary. Preliminary testing suggests this does block the Spectre attack at least on some Intel processors tested.

So how can this dependency be usefully created? One approach is to prepare by adding the array length to the array base address, and then when comparing the index and to the array length to use a subtraction operator rather than a comparison operator returning both the Carry Flag (CF) for the conditional branch and a data flow result that can be an input into the memory access and that depends on both the array index and the array length. The subsequent memory access can then add the previously calculated array base plus length to this index less length and the length cancels out and the memory accesses the base plus index.

This approach seems to have significant merit. It creates only the required data flow dependency and does not otherwise stall speculative execution. In comparison the Intel suggestion of using and lfence operation would have a much higher performance impact. There are various combinations of masking, shifting, adding and subtracting that can also limit the range of the index without branching and thus create a sound data flow for speculative execution, but when these are used in addition to a comparison the performance seems lower. This simplest would be to just mask off high bits, but that is only precise if the length is a power of two, and event that appear to be slower when retaining the bounds check conditional branch.

This approach also allows the base plus length value to be pre-computed and potentially hoisted out of loops. If the based and length were constants, baked into code that might even be patched, then this could be an immediate constant for the memory access. However care needs to be take to coordinate with any changes in the array length, the length added to the base and the length subtracted from the index need to match in order to cancel.

The use of both the condition codes and the result of the subtraction might need some special care in compilers, as does the dependence of the memory access on the result of this subtraction. If these were separated in compiler optimizations then the data flow dependency might be broken. Given the high performance and precision of this approach it appears a useful approach to explore.

Here is an example written in C that can be used with the Spectre Attack C example to demonstrate that this blocks the attack.

#include <stdio.h>
#include <stdint.h>

extern size_t array1_size;
extern uint8_t array1[160];
extern uint8_t array2[256 * 512];

uint8_t temp = 0; /* Used so compiler won't optimize out victim_function() */

void victim_function(size_t x)
{
  uint32_t v;

#if 0
  if (x < array1_size) {
    v = array1[x];
  } else {
    v = 0;
  }
#endif

#if 0
  /* Just an alternative version of the above, to check the asm! */
  asm volatile ("cmpq %3, %1\n\t"
		"jnb oob%=\n\t"
		"movzbl (%2, %1, 1), %0\n\t"
		"jmp done%=\n\t"
		"oob%=:\n\t"
		"xor %0, %0\n\t"
		"done%=:\n\t"
		: "=r" (v)
		: "r" (x), "r" (array1), "r" (array1_size)
		: "cc" );
#endif

#if 1
  /* Spectre mitigation using data flow dependency on the index and legth. */
  /* Add the length to the base. The length will be subtracted from the index,
   * and then they will be added again canceling out the length. This is to
   * create a data flow dependence on the length. This calculation might be
   * usefully hoisted outside loops, and that is done explicitly. */
  void *end = (void *)array1 + array1_size;
  
  /* GCC does not support an output and a goto so this demo just branches
   * over the access when out of bounds, but compilers would be branching to
   * their oob path. */
  asm volatile ("subq %3, %1\n\t"
		"jnb oob%=\n\t"
		"movzbl (%2, %1, 1), %0\n\t"
		"jmp done%=\n\t"
		"oob%=:\n\t"
		"xor %0, %0\n\t"
		"done%=:\n\t"
		: "=r" (v), "+r" (x)
		: "r" (end), "r" (array1_size)
		: "cc" );
#endif

  temp &= array2[v * 512];
}