Emulating Clock Tick for Parallelized Event Generation

Reconstructed image from events

For my current research project, I’m developing an asynchronous camera emulator that reads in Blender-rendered frames and outputs a stream of event data. Each pixel of the simulated sensor can output multiple events over each reference interval (the exposure time of each input frame), and each pixel may change the light sensitivity of other pixels around it (which in turn affects when those pixels will fire their next events).

The details of this particular program aren’t important to understand, but they led me to the following issue.

How can I ensure the emulated pixels are accurately affecting each other at the right points in time? And how can I parallelize it to run faster?

Working under the necessary assumption that the rate of light integration is constant for a given input image, I needed to divide up the light data to my emulated pixels and deliver it, piece by piece, in a time-ordered manner. The delivery of this data must be sequentially accurate. The solution I came up with was to have a TEMPORAL_ACCURACY parameter which may be chosen at runtime. This variable essentially determines the number of timeslices we divide the reference interval into, and we divide each input image’s data into as many pieces. This data is integrated sequentially in each pixel. At this step I was able to parallelize the integration, since the pixels only affect each other’s sensitivities for the next timeslice, and the threads synchronize at the end of each timeslice.

Here’s the relevant C++ snippet using OpenMP to parallelize:

double ref_time = (frame_length / TEMPORAL_ACCURACY);
for (int timeframe = 0; timeframe < TEMPORAL_ACCURACY; timeframe += 1)
	int x, y;
#pragma omp parallel for private(y) schedule(dynamic)
	for (y = 0; y < image.height(); y++)
#pragma omp parallel for private(x) schedule(dynamic)
		for (x = 0; x < image.width(); x++)
			double *bw = image.data(x, y, 0, 0); // get grayscale pixel intensity
			pixels[y][x].integrate((*bw / TEMPORAL_ACCURACY), ref_time);

The TEMPORAL_ACCURACY, therefore, is a sliding scale directly proportional to the accuracy of the emulation, but inversely proportional to the speed of the emulation. If TEMPORAL_ACCURACY = frame_length, then we have a perfect 1:1 emulated clock tick and perfectly accurate results. frame_length is measured in nanoseconds, however, so the necessary tradeoff is to sacrifice some accuracy for higher speed. I have found satisfactory results for a frame_length of 10000000 when using a TEMPORAL_ACCURACY of 100, such that each timeslice represents the passage of 100000 clock ticks.

With a somewhat low-resolution (but realistic for the camera in question) emulated sensor of 360x240, I can achieve nearly real-time emulation speeds even with a complex sensitivity adjustment scheme. This means that the computation and output of events generated from a given input image takes roughly the same amount of time as the input frame’s exposure. The bottleneck here seems to actually be slow write performance of my NVME SSD, since all the events are written directly to disk.

I’m not in a position to share the full code of this project at this time, but I can give you more details or answer any questions if you e-mail me. I should have a paper proposal ready in the next month or two, if you’re interested.