My Masters DissertationProposal; Raytracing on CUDA
The document below is prepared as for an MSc dissertation proposal, actual dissertation post is here and the direct link to the dissertation pdf is here and the html version is here.
UNIVERSITY
of
ABERTAY DUNDEE
School of Computing & Creative Technologies
May 2008
By
Umut Riza ERTURK
0703851
Introduction
Introduction
High Concept
The aim of this project is to develop a realtime ray tracing system on Compute Unified Device Architecture (CUDA). The main idea is to create a generic and portable realtime ray tracer involving the realistic lightning factors such as soft shadows (penumbra), reflection and refraction with an acceptable speed of 30 frames per second (fps) for human eye by using KDTrees on a fully parallel processing architecture; CUDA.
Project Overview
One of the most important pushing forces of the computer graphics technologies is obviously computer games which are always getting more realistic. As the games are getting developed in terms of visual realism they need more realistic visual attributes such as soft and realistic volume shadows, multipass lightning and complex shaders. To achieve this level realism they are not only getting more resource intensive but also getting more complicated in terms of development process with the currently most common rendering method; rasterization [03]. The main problem of this classic rendering approach is all the objects in a 3D scene are compound of triangles and all these triangles have to pass through from main processing unit to graphics processing unit one by one. In the rasterization pipeline all these triangles needs to be analyzed, coloured, lighted, textured, culled and as a result became a pixel (CDRINF, 2007). This approach does not only give rise to a linear slow down with the increasing number of triangles in a game scene but also increases the complexity of the implementation of the effects to get a realistic result since each effect needs to be implemented with shader phases.
Despite the unrealistic results and complex implementation of the effects, rasterization considered as conventional rendering method for the computer games because of the low computation cost. However the microprocessor technology as well as the parallelcomputing technologies significantly improved for the last few decades which provoked to being focused on the other alternative way for rendering; ray tracing which is a completely different technique to achieve nearly perfect visual realism. The high quality visual characteristic of ray tracing is based on the physically correct simulation of the real light behaviour. Basically ray tracing algorithm starts with sending at least one ray for each pixel from the camera to the 3D world space. If the ray intersects with an object according to the material of the object, algorithm creates the reflectedrefracted rays from the object and recursively continues its process. Because of its physically correct implementation it does not struggle to create the realistic effects such as global illumination or shadowing. However this method computationally expensive due to two major problems; creating the rays and finding the intersection point of the ray and any object in the 3D space. For instance a scene with 1K triangles and with 1024×768 pixels needs at least 786432 eye rays to be created and according to the organisation method, may require millions of triangleray intersection tests; moreover these rays will interact with object surfaces.
With the help of the significantly improving hardware and software technologies these problems are getting easier to tackle. For the objectray intersection problem one of the most efficient algorithms involves KDtree structure. KDtree is a kind of binary space partitioning (BSP) tree which is one of the most preferred data structure using for interactive scenes (Shevtsov Maxim, 2007). Briefly KDtree is a multidimensional search tree for points in k dimensional space (NIST, 2005) which enables to find an object in the space with the complexity of log (n). In addition to this algorithmic improvement against the objectray intersection problem, another improvement for creating rays is the parallel computing. One of the most recent parallel computing architecture is NVidia’s CUDA hardware solution whom scales up to one hundred cores and one thousand threads. Apart from the massively multi parallel advantage other advantage is its programming language which is C.
Ray tracing is one of the most promising photo realistic rendering technique for computer games according to the researches have been making on as well as according to the improvements in hardware technologies. On the other hand CUDA is a very new and promising architecture with its suitable structure for raytracing and there are no satisfactory researches have been done about this system. This project will investigate techniques and possible programming improvements of ray tracing on massively multithread architecture named CUDA.
About this Proposal Document
This proposal aims to investigate, develop and try to improve a specific real time ray tracing structure named KDtree for a specific multi thread programming architecture called CUDA. A review of the KDTree structure as well as the CUDA hardware can be found in the ‘Literature Review’ section.
Following on from that research, a method for implementing the discussed techniques is presented in the ‘Method Design’ chapter. This general design is followed by a weekbyweek plan detailing a schedule for implementing the proposed project, in the ‘Project Plan’ section
Literature Review
This chapter aims to summarise the certain techniques, data structures and hardwares encapsulated by the scope of this project. A brief overview of each topic is given, followed by a discussion of the key areas in each field which are applicable to this project.
Ray Tracing
Ray tracing is a rendering method to create photo realistic 2D views from 3D space by simulating the real light behaviour. It is one of the first solutions for rendering that’s why it is also one of the most researched rendering techniques up to now, however because of the complexity and amount of the calculations, mostly used for offline rendering. Contrary to its complexity the idea is very simple. Before understanding ray tracing it is essential to understand the real light behaviour since ray tracing is just a simulation of it.
What is light and colour?
Light is an electromagnetic wave in a certain range of frequency which determines the colour of the light. Apart from the wave explanation of the light the other explanation is the particle explanation; light is the compound of small packages named photons which are travelling in a certain direction with the speed of light. When a photon collides with a surface three possible scenarios happen according to the physical attributes of the surface. If the surface is reflective then the photon reflects respect o the normal of the hit point. After the reflection, photon changes its colour according to the colour of the surface. There are two kinds of reflection; ‘specular reflection’ which happens on regular surfaces and ‘diffuse reflection’ which happens on irregular surfaces.

Figure 2 – Diffuse Reflection of Light 
Other scenario is the photon passes through the objects it is collided which is called refraction. After it enters into the object, it changes its direction according to the density of the surface and also according to the normal of the hit point.

The last scenario is it gets absorbed by the surface and neither can reflect nor refract, this happens only on real dark surfaces.
The reflection and the refraction can also happen at the same time in this case it can be considered as more than one photon has been created from one photon.

Basically ray tracing is the simulation of these rules in a virtual 3D world and its objective is to convert the 3D space to 2D image by determining the colour of each 2D image cell (pixel) in the 2D world. The main difference of ray tracing from the real world is, in real world all the photons are coming to our eyes from light sources directly or after reflectionsrefractions, however ray tracing algorithm does the reverse; sends the light rays (photons) from each of the pixels in the 2D world (screen) to the 3D world (e.g. game scene). The reason for sending rays from the screen to the scene is only the rays reaching to the screen will affect the final colour value of the pixel. That also means the photons reaching to the back face of the sphere (on Figure5) and reflecting through an irrelevant direction will not affect the final 2D image therefore calculations for this photon are unnecessary.
Figure 5 – Basic ray tracing concept; rays from the screen to scene 
Figure 6 – The dashed lines represents the wasted photons which will never reach to the screen 
Figure 7 – Adaptive super sampling, sending 5 rays for each pixel for the begging, increases the number of rays if the colour difference is big between these 5 rays 
Specification of the ray tracing is at least one ray needs to be sent from each pixel however in most cases one ray is not enough since the projection of one pixel may correspond to a large area in 3D world. On the other hand sending a constant number of rays (e.g. 10 rays per pixel) from each pixel, which is called supersampling, is not an efficient solution since it increases the ray tracing time linearly. Determination the number of pixels to be sent from each pixel is one of the problems of ray tracing which plays an important role on the quality issues of the image such as antialiasing. One of the efficient solutions for this performancequality paradigm is adaptive supersampling. This method sends a small amount of rays from a pixel for the beginning. If the result colour values of these rays are slightly different from each other, it increases the number of rays for the current pixel and sends new rays according to the difference between rays’ colours and at the end mixes the colours of these rays to calculate the actual pixel colour (Glassner Andrew S., 1989).
After creating the rays and sending them through the 3D scene, the colliding objects with the photon needs to be found. In addition to finding the object, exact location in object space and the normal of this location needs to be found. This is one of the most challenging problems of raytracing and many algorithms and data structures have been discussed and implemented up to now. The most known two methods for quick finding of the hit point are bounding volume hierarchies (BVH) and KDTrees approach. The detailed information about the KDTrees will be discussed in the next section but BVH method will not be discussed in this paper. Although the significant algorithmic improvements about this problem, ray tracers still spend their 75% to 95% of their processing time on this problem quotes James Foley (Foley James D., 1990).
The next step after creating the primary rays and finding the intersection points is firstly looking for the hit point’s position respect to the light sources. If there is no object between the hit point and the light source then the colour of the light ray will depend on directly the light source and more other calculations.

After taking light resources and shadows into account, next point is reflection or/and refracting the ray from the intersection point. The key input coming from the game scene to find the reflection and refraction rays is the normal of the point. A 3D scene usually compounds of geometric primitives such as spheres, cubes, planes, triangles etc. The normal calculation of a point depends on the primitive the ray intersects with. To give an example ray sphere intersection problem can easily be solved with these algebraic equations;
(Owen G. Scott, 1999)
The normal allows the reflection and refraction rays to be found;
Solution for the reflection ray Rl is;
Solution for the refraction ray Rr is;
Figure 9 – (a) reflection, (b) refraction 
After finding the reflection and refraction rays the algorithms continues recursively as it is shown in the pseudo code below;
Figure 10 – demostration of a ray path with reflections and refractions 
As a result the image quality of this rendering technique is obviously much realistic compare to the classic rasterization method since it is a real implementation of the light behaviour. However the calculation cost for each step is still expensive to put this algorithm into current games with the current hardwares. Fortunately there are satisfactory improvements for methods as well as for hardwares. In the next section one suitable technique and a specific hardware will be introduced.
KDTrees
KDtree is a special kind of binary tree data structure for organising points in kdimensional space (since the graphics applications run on three dimensional spaces the kdtree on this paper will be representing ‘three dimensional kdtree’) which provides multidimensional search [05]. The construction cost for the KDtree is logarithmic respect to the number of primitives in the scene and the average cost of traversing a voxel can be estimated (Shevtsov Maxim, 2007). Another important point about kdtrees is different from octree, which divide the 3d space into constant number of parts, kdtree divides the space into unfixed parts with planes perpendicular to one of the coordinate system axes which also differs KDtrees from the conventional BSP trees.
The use of kdtrees for raytracing algorithm is finding the intersecting objects in the 3Dworld space with the ray sent from the camera through the scene as quick as possible.
Since there are different kinds of KDtrees according to the optimisations, roughly the data structure for a node contains two child pointers (just like binary tree), a name and a key (usually a pair of floating points) which represent the dimensions of the rectangle.
The creation of a KDtree briefly can be described by the following steps;

Figure 11 – Steps for constructing a kdtree, taken from the lecture presentations of Sharat Chandran, 2002 
The major problem of constructing a kdtree is determination of the division planes since kdtrees are not fixed divided.
As shown in figure 12, dividing a scene into two parts can be done in several ways. The problem is finding more balanced division to increase the efficiency of the kdtree as well as the ray tracer. The optimisation is based on a simple rule; large areas with few objects or small areas with many objects work faster (Martin E., 2006).



Figure 12 – Different division methods, taken from lecture slides of Martin Eisemann
The cost optimisation formula according to the rule mentioned above is;
Equation 1 – cost equation to split the space with higher efficiency, taken from lecture slides of Martin Eismann
As a result, the cost predictable structure of KDtrees is a very big plus for ray tracing with multi threads since the jobs can be equally distributed between threads to maximise the efficiency of the multi processing. On the other hand, KDtrees are considered as the best solution for static scenes however with the increasing interest on ray tracing algorithms, several researches have been done to use KDtrees for dynamic scenes which makes it challenging.
CUDA
For the last few years GPU devices have reached to significant computational power (figure13) compare to CPUs. The main reason is; as the graphics operations are highly intensive and requires parallel computation, GPUs evolved in a different way from CPUs; they devotes more transistors for data processing compare to caching and data controlling (NVidia, 2007).

Figure 14 – Difference between current GPUs and CPUs, GPUs devotes more transistors for processing (figures taken from (NVidia, 2007)) 
With the extremely increasing computational power of GPUs, not only graphics applications but also many other applications have been tried to be run on GPUs. CUDA has developed as hardwaresoftware architecture to respond these needs by providing a significantly parallel and simple computation structure.

Design goals of CUDA are described by its founder NVidia ;

Figure 16 – (NVidia, 2007)
In addition to its parallel computing power, other powerful side of CUDA is its C language which allows more understandable and clean programmes. Other important differences coming with CUDA is, contrary to other GPU programming languages such as CG or GLSL, CUDA can write arbitrary memory addresses and CUDA exposes a shared memory area of 16KB which is very fast.
Briefly, main reason for choosing this new generation hardwaresoftware architecture is it provides great parallelisation opportunities which are very suitable for raytracing by using KDtrees. However CUDA’s stackless architecture is a big problem for recursive algorithms. This problem is the challenging point of the raytracing/CUDA combination. On the other hand recent researches showed that it is not impossible, in addition CUDA is a very new and promising system for the future applications.
Method Design
HighLevel Outcomes
The aims of this project are:

To implement a homogeneous (by using both the CPU as well as the GPU) real time raytracer on a specific system called CUDA by using a specific data structure named KDtrees.

To achieve an acceptable frame rate for human eye (~20fps) and visual quality with the resolution of 800×600 in a one milliontriangels scene by using a G80 GPU processor.
Programming the CPU
Constructing and updating the KDtree data structure
According to the ray tracing algorithm there are four main processes need to be done. First process is the construction of the KDtree according to the object locations in the scene. For a real time ray tracer not only constructing the data structure but also keeping the data structure is very important. The first major problem to tackle is using the KDtree for dynamic scenes since its structure is more suitable for static scenes. However thankfully there are some other papers have already been published and it is proved that by using some additional data it is possible to port kdtrees to dynamic scenes (e.g. [05]). Apart from keeping the data structure updated, the 3D object data needs to be provided by the CPU to the CUDA threads.
The other three main processes (creating the rays and finding the collision point of the object and calculating the new rays reflected or refracted from the hit point) will be done by the CUDA threads.
Programming the CUDA
Ray operations
While keeping the KDtree data structure update and sending this structure to the shared memory area of the CUDA threads, these threads will be creating the rays by using adaptive super sampling method from the screen through the scene therefore each thread will be responsible for different number of rays. This operation is not complex but heavy because of the high number of rays. The first improvement of using an extremely parallel architecture is that hundreds of these rays will be created at the same time. After creating these rays each thread will search the colliding objects in the KDtree structure and will find the normal of the hit point according to the type of the primitive since in this project sphere and triangle primitives will be implemented. Before calculating the normals, threads will finding out if is there any direct connection between the hit point and any of the light sources. After finding if the hit point is in the shadow region or in the bright side, the colour code of this hit point will be recorded to the corresponding pixel data. Next step is finding the reflection and refraction rays. This step involves the recursion problem which is not supported by CUDA. This problem seems to be the biggest problem of the project; even though using an iterative way is not a real solution because of CUDA threads small (16KB) shared memories. To overcome with this problem there are three possible solutions; trying to limit the number of reflection and refractions (which will cause quality problems), using the global memory (which will cause performance problems) or synchronising the threads about using the shared memory (which will increase the complexity of the implementation). As a result with these reflected and refracted rays the final colour value of the pixel will be determined by using a linear blending function.
To sum up, implementation of real time ray tracing by using KDtree data structure on CUDA architecture has many problems to overcome with. On the other hand, magnificently improving GPU technologies opens new doors to tackle these problems either by increasing its data process speed or by increasing the memory needed for these processes (i.e. it is expecting that CUDA 2.0 will involve recursion). Because of these achievements on hardware side, the interests about raytracing increases day by day and as a result of these achievements new methods and new scientific researches getting published every day. This project will be hopefully one of these researches.
Evaluation
Evaluation Criteria
There are two basic criteria to evaluate the result of the project:

Visual quality (qualitative)

Performance (quantitative)
Qualitative
According to the project aims, the most important qualitative requirement is the visual quality of the final image. Such that, even if the visual quality of the final image is poor the speed won’t be an important since this can be easily done by reducing the number of rays sent from screen. However the quality of the image is subjective therefore the satisfaction of this criterion will be human dependent. To reduce the subjective view; one of the most common scene (named Cornell box) using in ray tracing tests will be used to test the visual quality.

Quantitative
Since the aimed fps and the hardware as well as the software described clearly in the aims of this project, evaluating the quantitative criteria will be very basic. As soon as the getting visually satisfactory results from Cornell Box, 20 fps will be enough for the project to achieve its goal under mentioned hardware and software implementations
Project Plan
June 2008  
1^{st }– 31^{st}  Learning CUDA architecture, reviewing NVidia documents, implementation of some algorithms on CUDA to have an experience on CUDA.Revising parallel computation.  
July 2008  
1^{st }– 31^{st}  Implementation of KDtree structure, searching for improvements for KDtree structure on dynamic scenes  
August 2008  
1^{st }– 31^{st}  Constructing primitives’ data structure and working on calculation of reflectedrefracted rays.  
September 2008  
1^{st} w. 2^{nd} w.  Implementation of the reflection and refraction rays on CUDA  
3^{rd} w.  Incorporating of super sampling method on CUDA  
4^{th} w.  Progress presentation  
October 2008  
1^{st} w.  Preparing documentation of the progress  
2^{nd} w. – 3^{rd} w.  Integrating the CPUside operations with CUDAside operations. Getting first image results  
4^{th} w.  Submit 1^{st} Draft  
November 2008  
1^{st} w.  Maintaining of the program  
2^{nd} w. – 3^{rd} w. – 4^{th} w.  Preparing final draft and reviewing final draft feedback.  
December 2008  
1^{st}31^{st}  Incorporate any additional changes based on final draft feedback  
January 2008  
Submit Dissertation 
References
 Schmittler J., Pohl D., Dahmen T., Vogelgesang C., and Slusallek P., 2005. Realtime Ray Tracing for Current and Future Games, ACM Portal, [accessed 10^{th} May 2008], url: http://portal.acm.org/citation.cfm?id=1198555.1198762
 Schmittler J., Pohl D., Dahmen T., Vogelgesang C., and Slusallek P., 2004, Realtime Ray Tracing of Dynamic Scenes on an FPGA Chip, ACM Portal, [accessed 10^{th} May 2008], url: http://portal.acm.org/ft_gateway.cfm?id=1058143&type=pdf&coll=GUIDE&dl=GUIDE&CFID=68260220&CFTOKEN=96881082
 Friedrich H., Gunther J., Dietrich A., Scherbaum M., Seidel HansPeter, Slusallek P., 2006, Exploring the Use of Ray Tracing for Future Games, ACM Portal, [accessed 10^{th} May 2008], url: http://portal.acm.org/ft_gateway.cfm?id=1183323&type=pdf&coll=GUIDE&dl=GUIDE&CFID=68260623&CFTOKEN=18219301
 Woop Sven, Schmittler Jorg, Slusallek Philipp, 2005. RPU: a programmable ray processing unit for realtime ray tracing, ACM Portal, [accessed 10^{th} May 2008], url: http://portal.acm.org/ft_gateway.cfm?id=1073211&type=pdf&coll=GUIDE&dl=GUIDE&CFID=68261058&CFTOKEN=632516362
 Zhou Kun, Hou Qiming, Wang Rui, Guo Baining,2008. RealTime KDTree Construction on Graphics Hardware, Microsoft Research, [accessed 10^{th} May 2008], url: ftp://ftp.research.microsoft.com/pub/tr/TR200852.pdf
 Slusallek Philipp, Shirley Peter, Mark Bill, Stoll Gordon, Wald Ingo, 2005. Introduction to real time ray tracing – course 41, ACM Portal, http://portal.acm.org/ft_gateway.cfm?id=1183323&type=pdf&coll=GUIDE&dl=GUIDE&CFID=68262552&CFTOKEN=78451416
 Shevtsov Maxim, Soupikov Alexei, Kapustin Alexander, 2007. Highly Parallel Fast KDtree Construction for Interactive Ray Tracing of Dynamic Scenes, Intel corporation, [accessed 10^{th} May 2008], url: http://www.google.co.uk/url?sa=t&ct=res&cd=1&url=http%3A%2F%2Fkesen.huang.googlepages.com%2FIntelEG07.pdf&ei=GFUsSMazL5O0QST1JSOBQ&usg=AFQjCNHctNbBVxaKwpIZ7f71SQlBbvXobQ&sig2=iFf0gqjzcfxGAIBflFwgOg
 NVidia, 2007. NVIDIA CUDA Programming Guide, NVidia, [accessed 10^{th} May 2008], url: http://www.google.co.uk/url?sa=t&ct=res&cd=1&url=http%3A%2F%2Fdeveloper.download.nvidia.com%2Fcompute%2Fcuda%2F1_0%2FNVIDIA_CUDA_Programming_Guide_1.0.pdf&ei=pVYsSJGFEKa8QLXogasF&usg=AFQjCNHrbSl3bDFxVTvtHfjJ8RKpjlgxzg&sig2=DfIb4yBF9BFNqi_dwWAeg
 Rademacher Paul, Ray Tracing: Graphics for the Masses, The University of North Carolina, [accessed 10^{th} May 2008], url: http://www.cs.unc.edu/~rademach/xroadsRT/RTarticle.html
 Glassner Andrew S., 1989. An Introduction to ray tracing, Morgan Kaufmann
 Foley James D., Dam Andries van, Feiner Steven K., Hughes John F., 1990. Foley, James D. Computer Graphics : Principles and Practice, USA: Adisson Wesley
 Owen G. Scott, 1999. Siggraph Education Materials(online) , [accessed 10^{th} May 2008], url: http://www.siggraph.org/education/materials/HyperGraph/raytrace/rtinter1.htm
 Bikker Jacco, 2005. DevMaster, Rayt`racing: Theory & Implementation Part 7, KdTrees and More Speed,. [accessed 10^{th} May 2008], url: http://www.devmaster.net/articles/raytracing_series/part7.php
 Chandran Sharat, 2002. University of Maryland web site Data Structures lecture notes, Introduction to kdtrees, [accessed 10^{th} May 2008], url: http://www.cs.umd.edu/class/spring2002/cmsc4200401/pbasic.pdf
 Martin Eisemann, 2006. University of CaroloWilhelmina, Computer Graphics kDTree and Optimizations for Ray Tracing, [accessed 10^{th} May 2008], url: http://graphics.tubs.de/teaching/lectures/ws0607/CG1/slides/07kDTree.pdf
 Marlon John, LaMothe Andre, 2003.
Focus on Photon Mapping, Ohio:Premier Press.
 CDRINF, 2007. Real Time RayTracing May Replace GPU Rasterization, [accessed 10^{th} May 2008], url:
http://www.cdrinfo.com/Sections/News/Details.aspx?NewsId=21608
 NIST (National Institute of Standards and Technology), 2005. KD tree data structure, [accessed 10^{th} May 2008], url:
http://www.nist.gov/dads/HTML/kdtree.html