<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.2.2" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
	<title>Comments for Edward Kandrot's Programming Blog</title>
	<link>http://blog.arcanefuture.com/blog</link>
	<description>My experiences with code.</description>
	<pubDate>Thu, 29 Jul 2010 23:25:02 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.2.2</generator>

	<item>
		<title>Comment on Finished writing CUDA book by SHR</title>
		<link>http://blog.arcanefuture.com/blog/2010/03/16/finished-writing-cuda-book/#comment-630</link>
		<author>SHR</author>
		<pubDate>Tue, 27 Jul 2010 15:44:45 +0000</pubDate>
		<guid>http://blog.arcanefuture.com/blog/2010/03/16/finished-writing-cuda-book/#comment-630</guid>
		<description>shunt1

I work at NVIDIA and would be happy to line you up with some Visual Studio/CUDA C help to sort out your issue

SHR
sanford @ nvidia</description>
		<content:encoded><![CDATA[<p>shunt1</p>
<p>I work at NVIDIA and would be happy to line you up with some Visual Studio/CUDA C help to sort out your issue</p>
<p>SHR<br />
sanford @ nvidia</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Finished writing CUDA book by shunt1</title>
		<link>http://blog.arcanefuture.com/blog/2010/03/16/finished-writing-cuda-book/#comment-629</link>
		<author>shunt1</author>
		<pubDate>Fri, 23 Jul 2010 00:13:45 +0000</pubDate>
		<guid>http://blog.arcanefuture.com/blog/2010/03/16/finished-writing-cuda-book/#comment-629</guid>
		<description>My software is not ready for the NVIDIA forum yet, but will be provided when it is working properly.

However, one set of software is absolutly vital to me right now, but when I try to compile it with Microsoft Visual Studio, I get nothing but errors.

http://www.cl.cam.ac.uk/research/rainbow/projects/dcbgrid/

I am too new to the CUDA / Visual Studio combination to understand the error messages, but they seem to be caused from the software's use of an older version of CUDA.</description>
		<content:encoded><![CDATA[<p>My software is not ready for the NVIDIA forum yet, but will be provided when it is working properly.</p>
<p>However, one set of software is absolutly vital to me right now, but when I try to compile it with Microsoft Visual Studio, I get nothing but errors.</p>
<p><a href="http://www.cl.cam.ac.uk/research/rainbow/projects/dcbgrid/" rel="nofollow">http://www.cl.cam.ac.uk/research/rainbow/projects/dcbgrid/</a></p>
<p>I am too new to the CUDA / Visual Studio combination to understand the error messages, but they seem to be caused from the software&#8217;s use of an older version of CUDA.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Finished writing CUDA book by Edward</title>
		<link>http://blog.arcanefuture.com/blog/2010/03/16/finished-writing-cuda-book/#comment-628</link>
		<author>Edward</author>
		<pubDate>Thu, 22 Jul 2010 23:57:13 +0000</pubDate>
		<guid>http://blog.arcanefuture.com/blog/2010/03/16/finished-writing-cuda-book/#comment-628</guid>
		<description>It sounds like both of you are doing interesting stuff with CUDA!  Have you posted about it on the NVIDIA forums?  They are always looking for interesting projects that show off CUDA.  I'll also pass along pointers to this blog my contacts there, so that they know.</description>
		<content:encoded><![CDATA[<p>It sounds like both of you are doing interesting stuff with CUDA!  Have you posted about it on the NVIDIA forums?  They are always looking for interesting projects that show off CUDA.  I&#8217;ll also pass along pointers to this blog my contacts there, so that they know.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Finished writing CUDA book by Edward</title>
		<link>http://blog.arcanefuture.com/blog/2010/03/16/finished-writing-cuda-book/#comment-627</link>
		<author>Edward</author>
		<pubDate>Thu, 22 Jul 2010 23:52:54 +0000</pubDate>
		<guid>http://blog.arcanefuture.com/blog/2010/03/16/finished-writing-cuda-book/#comment-627</guid>
		<description>@markpoet - I agree, in writing the book it became very obvious that the same block of code was needed over and over again at the top of each global kernel.  I also agree that CUDA forces one to use the multiple levels of structures and to think about problems in that way for maximum speed.  This was all very confusing to me when I started working on CUDA, which was why I thought there was a need for a good book - one that would go through all of this, but in a way that is different than the current documentation.

The way the examples flow, one can get basic speed ups by just using one level of threading, the thread blocks.  With a basic thread block, one can thread their code without having to worry about caches, like one has to with grids.  Each level of hierarchy does make mapping of problems harder and harder, but it also gives better and better performance.  So, there is a trade off between levels used and performance, and it is up to the programmer to decide which trade off is worth it, if the problem even maps well to that hierarchy.  I hope that the book helps with this!

It would be nice if the compiler could do it, there are a lot of things in store for the future versions of the compiler, but it becomes very hard for the compiler to guess which structs will map best to the problem being coded.  With the resources available to the compiler team, I think that trade off currently in place, giving the programmer access to the caches, threading model, etc, was a good one for now.  As a performance person myself, I've often found it frustrating that C/C++ didn't give me enough access to cache size, certain opcodes, etc.  CUDA gives us that access, for which I am glad, because most compiler generated code that I have looked at wasn't very optimal.  I hold out hope that one day compilers will reach the level of human optimizers.  :)

Thank you for the review - I'm glad you enjoyed the book!

@shunt1 - I'm glad you found the files!  Sorry about the timing mix up, won't happen again!  :)

By the way, I was informed that the URL has been corrected to be:  http://developer.nvidia.com/object/cuda-by-example.html   I think there will be more links put on the NVIDIA page as well, to point people in the right place to get the code.  I hope future version will contain all of the code, that way Kindle users can just copy and paste everything without the need for a download of the zip.</description>
		<content:encoded><![CDATA[<p>@markpoet - I agree, in writing the book it became very obvious that the same block of code was needed over and over again at the top of each global kernel.  I also agree that CUDA forces one to use the multiple levels of structures and to think about problems in that way for maximum speed.  This was all very confusing to me when I started working on CUDA, which was why I thought there was a need for a good book - one that would go through all of this, but in a way that is different than the current documentation.</p>
<p>The way the examples flow, one can get basic speed ups by just using one level of threading, the thread blocks.  With a basic thread block, one can thread their code without having to worry about caches, like one has to with grids.  Each level of hierarchy does make mapping of problems harder and harder, but it also gives better and better performance.  So, there is a trade off between levels used and performance, and it is up to the programmer to decide which trade off is worth it, if the problem even maps well to that hierarchy.  I hope that the book helps with this!</p>
<p>It would be nice if the compiler could do it, there are a lot of things in store for the future versions of the compiler, but it becomes very hard for the compiler to guess which structs will map best to the problem being coded.  With the resources available to the compiler team, I think that trade off currently in place, giving the programmer access to the caches, threading model, etc, was a good one for now.  As a performance person myself, I&#8217;ve often found it frustrating that C/C++ didn&#8217;t give me enough access to cache size, certain opcodes, etc.  CUDA gives us that access, for which I am glad, because most compiler generated code that I have looked at wasn&#8217;t very optimal.  I hold out hope that one day compilers will reach the level of human optimizers.  <img src='http://blog.arcanefuture.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Thank you for the review - I&#8217;m glad you enjoyed the book!</p>
<p>@shunt1 - I&#8217;m glad you found the files!  Sorry about the timing mix up, won&#8217;t happen again!  <img src='http://blog.arcanefuture.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>By the way, I was informed that the URL has been corrected to be:  <a href="http://developer.nvidia.com/object/cuda-by-example.html" rel="nofollow">http://developer.nvidia.com/object/cuda-by-example.html</a>   I think there will be more links put on the NVIDIA page as well, to point people in the right place to get the code.  I hope future version will contain all of the code, that way Kindle users can just copy and paste everything without the need for a download of the zip.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Finished writing CUDA book by shunt1</title>
		<link>http://blog.arcanefuture.com/blog/2010/03/16/finished-writing-cuda-book/#comment-626</link>
		<author>shunt1</author>
		<pubDate>Thu, 22 Jul 2010 23:48:38 +0000</pubDate>
		<guid>http://blog.arcanefuture.com/blog/2010/03/16/finished-writing-cuda-book/#comment-626</guid>
		<description>As for my current application:

We just finished flying over the Gulf of Mexico for BP for the last two months, so that they could locate the exact positions of the oil spill.

We were obtaining images with 6 inch resolution over 1,000 square miles each and every day.  The amount of data was almost impossible to process, but we were able to generate daily maps of the oil.  Some of my software has already been converted to CUDA, but that volume was pushing our computers to their limits.

CUDA will be vital in my software development, so that it can handle this amount of data in the future.

I purchased your book on Tuesday and it has already answered some questions that I had.</description>
		<content:encoded><![CDATA[<p>As for my current application:</p>
<p>We just finished flying over the Gulf of Mexico for BP for the last two months, so that they could locate the exact positions of the oil spill.</p>
<p>We were obtaining images with 6 inch resolution over 1,000 square miles each and every day.  The amount of data was almost impossible to process, but we were able to generate daily maps of the oil.  Some of my software has already been converted to CUDA, but that volume was pushing our computers to their limits.</p>
<p>CUDA will be vital in my software development, so that it can handle this amount of data in the future.</p>
<p>I purchased your book on Tuesday and it has already answered some questions that I had.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Finished writing CUDA book by shunt1</title>
		<link>http://blog.arcanefuture.com/blog/2010/03/16/finished-writing-cuda-book/#comment-625</link>
		<author>shunt1</author>
		<pubDate>Thu, 22 Jul 2010 23:32:48 +0000</pubDate>
		<guid>http://blog.arcanefuture.com/blog/2010/03/16/finished-writing-cuda-book/#comment-625</guid>
		<description>Giggle, I just read your comment above and located the website source of the code required for the book.

Kindle users are FAST!

Anyway, I spent the last three days trying to track the author down and obtain a valid login so that I could ask that question.</description>
		<content:encoded><![CDATA[<p>Giggle, I just read your comment above and located the website source of the code required for the book.</p>
<p>Kindle users are FAST!</p>
<p>Anyway, I spent the last three days trying to track the author down and obtain a valid login so that I could ask that question.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Finished writing CUDA book by shunt1</title>
		<link>http://blog.arcanefuture.com/blog/2010/03/16/finished-writing-cuda-book/#comment-624</link>
		<author>shunt1</author>
		<pubDate>Thu, 22 Jul 2010 23:28:47 +0000</pubDate>
		<guid>http://blog.arcanefuture.com/blog/2010/03/16/finished-writing-cuda-book/#comment-624</guid>
		<description>I have enjoyed the Kindle version of your book, but got stuck on Chapter 3 when I tried to run your simple "Hello World" program.

With the Kindle version, where the heck can we obtain the Book.h and other files that you expected us to have available?

Obviously, with the Kindle version, those extra files were not provided and we need a website where they can be downloaded from.

Someone forgot a tinly little detail...

However, I have enjoyed your book and the writing style is fun to read.</description>
		<content:encoded><![CDATA[<p>I have enjoyed the Kindle version of your book, but got stuck on Chapter 3 when I tried to run your simple &#8220;Hello World&#8221; program.</p>
<p>With the Kindle version, where the heck can we obtain the Book.h and other files that you expected us to have available?</p>
<p>Obviously, with the Kindle version, those extra files were not provided and we need a website where they can be downloaded from.</p>
<p>Someone forgot a tinly little detail&#8230;</p>
<p>However, I have enjoyed your book and the writing style is fun to read.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Finished writing CUDA book by markpeot</title>
		<link>http://blog.arcanefuture.com/blog/2010/03/16/finished-writing-cuda-book/#comment-623</link>
		<author>markpeot</author>
		<pubDate>Thu, 22 Jul 2010 14:16:14 +0000</pubDate>
		<guid>http://blog.arcanefuture.com/blog/2010/03/16/finished-writing-cuda-book/#comment-623</guid>
		<description>I will post a review (Amazon Kindle edition).  

I am using CUDA for computational neuroscience.  These problems involve simulation of lots and lots of layers of cells. I would like to use tex2d or tex3d for implementation of each cell layer, but this means that I have to rewrite an identical kernel for each cell layer so that it has the proper texture references.  The MIT Cortical Neural Simulator solves this problem by using one big 1D texture for all of the cell layers and "suballocates" 1D arrays for each layer out of the large texture.  This works, but is a HACK around an awkward language feature.    

In general, I think that the entire structure around threads, grids and blocks is awkward.  CUDA should use an abstract array organization (with unbounded dimension) and the compiler should automatically map the array structure of the problem across the threads and blocks.  The current style of programming forces the programmer to think about the GPU instead of the structure of the problem.  The reliance of globals for textures limits use of textures to small programs and impedes the development of libraries of texture-related features.

A more abstract representation should not be that difficult to write.  I note that the same design patterns are used repeatedly throughout CUDA programs (for example: x = threadIdx.x + blockIdx.x * blockDim.x,  reduction operations, mapping, ).  There should be language support for these design patterns with pragmas to inform the compiler on kernel-specific optimizations. 

Anyway, my two cents.  Thanks for writing an excellent book--by far the best introduction to Cuda I have seen.

Mark</description>
		<content:encoded><![CDATA[<p>I will post a review (Amazon Kindle edition).  </p>
<p>I am using CUDA for computational neuroscience.  These problems involve simulation of lots and lots of layers of cells. I would like to use tex2d or tex3d for implementation of each cell layer, but this means that I have to rewrite an identical kernel for each cell layer so that it has the proper texture references.  The MIT Cortical Neural Simulator solves this problem by using one big 1D texture for all of the cell layers and &#8220;suballocates&#8221; 1D arrays for each layer out of the large texture.  This works, but is a HACK around an awkward language feature.    </p>
<p>In general, I think that the entire structure around threads, grids and blocks is awkward.  CUDA should use an abstract array organization (with unbounded dimension) and the compiler should automatically map the array structure of the problem across the threads and blocks.  The current style of programming forces the programmer to think about the GPU instead of the structure of the problem.  The reliance of globals for textures limits use of textures to small programs and impedes the development of libraries of texture-related features.</p>
<p>A more abstract representation should not be that difficult to write.  I note that the same design patterns are used repeatedly throughout CUDA programs (for example: x = threadIdx.x + blockIdx.x * blockDim.x,  reduction operations, mapping, ).  There should be language support for these design patterns with pragmas to inform the compiler on kernel-specific optimizations. </p>
<p>Anyway, my two cents.  Thanks for writing an excellent book&#8211;by far the best introduction to Cuda I have seen.</p>
<p>Mark</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Finished writing CUDA book by Edward</title>
		<link>http://blog.arcanefuture.com/blog/2010/03/16/finished-writing-cuda-book/#comment-622</link>
		<author>Edward</author>
		<pubDate>Thu, 22 Jul 2010 01:36:35 +0000</pubDate>
		<guid>http://blog.arcanefuture.com/blog/2010/03/16/finished-writing-cuda-book/#comment-622</guid>
		<description>Hello Mark,

Thank you!  I'm glad to hear that you enjoyed the book!  If you liked it, it would be a big help if you could post a review wherever you bought it.  Thanks!

The Kindle version shipped a week too soon, it was supposed to ship next week when the physical book ships.  Because of this, the website at NVIDIA wasn't done yet.  Jason just spent the day making the website happen!

http://developer.nvidia.com/object/cuda-by-example.html is where the source code is currently located.  I hope this helps.  I wrote the examples to be specific for what is being covered, putting extras in the header files so as not to distract from the topic at hand.  Only really works if the reader has the header files as well...  :)

I agree with your assessment of texture memory references.  I really wanted to abstract it all out into a class, but that is currently not supported in CUDA.  There were many cases where the CUDA code I was working on would have been so much easier if the references didn't have to be global.  I hope this is something that is addressed in a near-future release of CUDA - they adding new features with every release.

Do you have a specific project on which you are using CUDA?</description>
		<content:encoded><![CDATA[<p>Hello Mark,</p>
<p>Thank you!  I&#8217;m glad to hear that you enjoyed the book!  If you liked it, it would be a big help if you could post a review wherever you bought it.  Thanks!</p>
<p>The Kindle version shipped a week too soon, it was supposed to ship next week when the physical book ships.  Because of this, the website at NVIDIA wasn&#8217;t done yet.  Jason just spent the day making the website happen!</p>
<p><a href="http://developer.nvidia.com/object/cuda-by-example.html" rel="nofollow">http://developer.nvidia.com/object/cuda-by-example.html</a> is where the source code is currently located.  I hope this helps.  I wrote the examples to be specific for what is being covered, putting extras in the header files so as not to distract from the topic at hand.  Only really works if the reader has the header files as well&#8230;  <img src='http://blog.arcanefuture.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>I agree with your assessment of texture memory references.  I really wanted to abstract it all out into a class, but that is currently not supported in CUDA.  There were many cases where the CUDA code I was working on would have been so much easier if the references didn&#8217;t have to be global.  I hope this is something that is addressed in a near-future release of CUDA - they adding new features with every release.</p>
<p>Do you have a specific project on which you are using CUDA?</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Finished writing CUDA book by markpeot</title>
		<link>http://blog.arcanefuture.com/blog/2010/03/16/finished-writing-cuda-book/#comment-621</link>
		<author>markpeot</author>
		<pubDate>Wed, 21 Jul 2010 19:45:14 +0000</pubDate>
		<guid>http://blog.arcanefuture.com/blog/2010/03/16/finished-writing-cuda-book/#comment-621</guid>
		<description>I read your book last night via Kindle on my IPad.  I thoroughly enjoyed the book, especially the very nice descriptions on texture memory (Chapter 7) and streams. The code examples are particularly clear and concise.

One of the things that strikes me as a bit bizarre in Cuda (not your book) is the fact that texture memory references are referenced globally by the kernels. leading to the kludgy "if(dstOut) ..." structure in blend_kernel (section 7.3.5). I understand the reason why this is done, but it seems like there ought to be another way to inform the compiler on generation of efficient texture reference code.  For example, if textures were types and we were using C++, we ought to be able to use templated functions to force generation of texture-specific code.  ...so you might write something like:  
//Assumes that inSrc and outSrc have different types.
copy_cont_kernel&#62;&#62;( inSrc );
blend_kernel&#62;&#62;( inSrc, outSrc);  //generates code for a blend_kernel from inSrc to out Src via template expansion
copy_cont_kernel&#62;&#62;( outSrc );
blend_kernel&#62;&#62;( outSrc, inSrc);  //generates a different version of blend_kernel with texture-specific access to outSrc.
 

Are you planning to post your source code (book.h, etc) to the web?

Mark</description>
		<content:encoded><![CDATA[<p>I read your book last night via Kindle on my IPad.  I thoroughly enjoyed the book, especially the very nice descriptions on texture memory (Chapter 7) and streams. The code examples are particularly clear and concise.</p>
<p>One of the things that strikes me as a bit bizarre in Cuda (not your book) is the fact that texture memory references are referenced globally by the kernels. leading to the kludgy &#8220;if(dstOut) &#8230;&#8221; structure in blend_kernel (section 7.3.5). I understand the reason why this is done, but it seems like there ought to be another way to inform the compiler on generation of efficient texture reference code.  For example, if textures were types and we were using C++, we ought to be able to use templated functions to force generation of texture-specific code.  &#8230;so you might write something like:<br />
//Assumes that inSrc and outSrc have different types.<br />
copy_cont_kernel&gt;&gt;( inSrc );<br />
blend_kernel&gt;&gt;( inSrc, outSrc);  //generates code for a blend_kernel from inSrc to out Src via template expansion<br />
copy_cont_kernel&gt;&gt;( outSrc );<br />
blend_kernel&gt;&gt;( outSrc, inSrc);  //generates a different version of blend_kernel with texture-specific access to outSrc.</p>
<p>Are you planning to post your source code (book.h, etc) to the web?</p>
<p>Mark</p>
]]></content:encoded>
	</item>
</channel>
</rss>
