Jump to content

  •  

CNers have asked about a donation box for Cloudy Nights over the years, so here you go. Donation is not required by any means, so please enjoy your stay.

Photo

DSLR Stacking for Linux - programming advice?

This topic has been archived. This means that you cannot reply to this topic.
50 replies to this topic

#1 JBull

JBull

    Messenger

  • -----
  • topic starter
  • Posts: 401
  • Joined: 10 Oct 2005

Posted 28 November 2007 - 01:38 AM

So far I've been unhappy with the tools for stacking DSLR images in Linux. This past weekend I started a project to write a simple register & stack application for Linux. I've just started and I'm sure it will take a few months+ of development.

The first decision what image format to work with. The decision was to use PNG format because is supports 24-bit color and has a very simple lossless format. So I'm writing this in standard C using the libpng library.

Before I get too far into this, is there any reason I should not choose PNG format? I ruled out TIFF because the the standard is too loose. CRW will not be supported; images will have to be transfered from the camera and converted from CRW with an external program (such as dcraw)- these tools already exist I could not possibly improve them.

And what GUI toolkit is best? My code so far only works from command line to sum or average PNG images (no registration yet). But eventually I'll add the GUI using GTK or QT. Which GUI toolkit should I devote my time to learn?



#2 groz

groz

    Vanguard

  • *****
  • Posts: 2,148
  • Joined: 14 Mar 2007

Posted 28 November 2007 - 01:35 PM

Have you looked at gcx ? It can apparently do stacking, altho I haven't got as far yet as trying it. It's on my roadmap of things to experiment with.

The idea I've been tossing around, is end up with basically a 'stacking server' that runs as a daemon on one of the linux boxes, something that's multi processor aware, and can make use of all the processors in a cluster. I've got a few dual core boxes in the closet for various tasks, and I was thinking it would be nice to end up with something that can fan out a stacking job across all of them eventually.

I'd stick with the command line concepts for the brute force work, then build a gui wrapper around it all if you really want a gui. That allows the grunt work to be separated from the eye candy, and possibly even run on separate machines.

My roadmap here for tinkering with such things as time allows was to first investigate gcx in detail, and if possible, build some wrappers around it to run on a headless server for the brute force aspect of stacking/aligning. Reading it's blurbs, it can apparently do both. Would be kinda slick to put a web interface in front of it, so one can just upload a batch of raw images to the headless box with a web browser, wait a bit, and have it spit back the final aligned and stacked result.

As far as formats go, doesn't really matter what you start with, you will need to work with a considerable depth internally. I'd tend to just import from [insert choice here] and use floating point internally for the entire mess, then spit out [insert format here] when done. I would almost certainly use an import format thats supported by dcraw as output, then importing a large batch of [pick camera raw format here] files is as simple as setting up a dcraw pipeline, and ingest them in one shot thru the pipe.

#3 daev

daev

    Technically Enlightened

  • *****
  • Posts: 6,610
  • Joined: 10 Mar 2004

Posted 28 November 2007 - 02:11 PM

Jeff, nice to see someone stepping up to the plate. I was cringeing inside when you said you wouldn't be supporting TIFF. 24 bits just is not enough color depth to do much with. TIFF gets used a lot becuase it can handle 48bit color. 24 bit posterizes rapidly if you stretch the levels, but 48 bit puts up with a huge amount of manipulation before it starts to degrade. I really think you should consider handling 48 bit color, just my 2 cents.

On the flip side, I'd be happy to play guinea pig when you have something ready to test :)

dave

#4 Dipole

Dipole

    Ranger 4

  • -----
  • Posts: 398
  • Joined: 21 Dec 2006

Posted 28 November 2007 - 04:26 PM

I've never used gcx but I did find this in the manual:

GCX manual: stacking and aligning

#5 JBull

JBull

    Messenger

  • -----
  • topic starter
  • Posts: 401
  • Joined: 10 Oct 2005

Posted 28 November 2007 - 05:33 PM

Jeff, nice to see someone stepping up to the plate. I was cringeing inside when you said you wouldn't be supporting TIFF. 24 bits just is not enough color depth to do much with. TIFF gets used a lot becuase it can handle 48bit color. 24 bit posterizes rapidly if you stretch the levels, but 48 bit puts up with a huge amount of manipulation before it starts to degrade. I really think you should consider handling 48 bit color, just my 2 cents.

On the flip side, I'd be happy to play guinea pig when you have something ready to test :)

dave


I re-checked the PNG specification and it does in fact handle 48 bit color.

I figured there would be interest in using TIFF because its such a common format. There are just too many variations in the way TIFFS are written. When you get into the programming side of it you always hope every file adheres to a strict format. PNG does this and keeps a smaller file size without loss of data.

#6 JBull

JBull

    Messenger

  • -----
  • topic starter
  • Posts: 401
  • Joined: 10 Oct 2005

Posted 28 November 2007 - 06:06 PM

Have you looked at gcx ? It can apparently do stacking, altho I haven't got as far yet as trying it. It's on my roadmap of things to experiment with.


I'll take a closer look at GCX when I get home from work. Looks like it only does 16-bit FITS, mostly designed for a specific camera. It will be interesting to look at the algorithm for aligning and stacking if his source code is available.

One of the primary reasons I want a GUI is to provide a graphical interface in which the user may select a specific region within the image for the alignment and registration process.

One feature that I've got working already is the ability for the user to select number of lines to process in each pass. For example a user with limited system RAM can do the stack 10 lines at a time whereas someone else may opt to stack 1000 lines at a time or the whole image all-at-once. At any given time only enough memory is allocated to store the specified number of lines.

As you suggest I am using float variables to operate on the RGB pixel color values. 8-byte integers are not sufficient because every time you do division on an int variable you lose some data (the remainder is rounded off).

#7 JBull

JBull

    Messenger

  • -----
  • topic starter
  • Posts: 401
  • Joined: 10 Oct 2005

Posted 04 December 2007 - 01:37 AM

Well here's an update. The program is in a working state from command line under linux. Still need to add support for RAW files (.crw type) and finish the GUI. I need to refine the registration process to ensure the best accuracy of image alignment. Currently the registration process divides the images into four quadrants and finds the best alignment star in each quadrant (based on the radius input by user) and calculates each image offset in pixels. The best thing so far is the speed. I decided to not write any intermediates to disk and not even use arrays to hold the image data. Each image is allocated into memory, one-by-one, using C function malloc() and then processed sequentially byte-by-byte. Run-time memory consumption is approximately 3 times the size of each file, which was about 100MB in the case below. Also need to a means for dark frame subtraction and flats.

Anyway, here is the invocation and output. Notice the time to process, register and stack ten 48-bit 11.2 megapixel images is only 34 seconds, including write time for the output image. At the end is the final image (scaled down and converted to jpg for posting here).

$ ./afstack ant*png antares-stacked.png

Enter radius (1-20):2
Registering ant10.png
Offset [0,0]
Registering ant1.png
Offset [20,-1]
Registering ant2.png
Offset [19,0]
Registering ant3.png
Offset [17,0]
Registering ant4.png
Offset [12,0]
Registering ant5.png
Offset [8,0]
Registering ant6.png
Offset [10,0]
Registering ant7.png
Offset [10,0]
Registering ant8.png
Offset [6,0]
Registering ant9.png
Offset [-31,-43]
Finished Registration.

Beginning Stack...
Stacking file ant10.png
Stacking file ant1.png
Stacking file ant2.png
Stacking file ant3.png
Stacking file ant4.png
Stacking file ant5.png
Stacking file ant6.png
Stacking file ant7.png
Stacking file ant8.png
Stacking file ant9.png

Please wait for output to file antares-stacked.png

Done.
Execution time = 34 seconds.
$

Posted Image

BTW, this is a film image. I scanned it ten times over to get a digital data set to work with. If anyone has a set of raw DSLR image files I could use for testing that would be much appreciated. Otherwise I'll have to wait until I can get out to some clear skies and take some pictures. I've deleted all my old .crw files after processing to a final image. Thanks.

#8 daev

daev

    Technically Enlightened

  • *****
  • Posts: 6,610
  • Joined: 10 Mar 2004

Posted 04 December 2007 - 11:10 AM

I think I've got a bunch of CRWs on my archive drive. I'll have a look and see how many I can find. Holler if you want them.

dave

#9 Rob Willett

Rob Willett

    Viking 1

  • -----
  • Posts: 756
  • Joined: 07 Feb 2005

Posted 05 December 2007 - 05:05 PM

Just seen the thread here.

I used to work in the newspaper business. We used TIFF's almost exclusively and didn't find a problem with variations in files. The various TIFF libraries do seem to work for us.

Now PNG is pretty good, but TIFF is very much a standard that can be used with lots of other applications.

Anyway, from a system design point of view, working with a command line approach and then wrapping a GUI around it if you need it is the way to go. Focus on the functionality of the software and then put the eyecandy on later. The other advantage of command lines is that you can pipe the output of one program to the input of another without the overhead of writing out temp files. With imaging this is a big bonus!

One other suggestion is to bin floating points completely and use integers. There are various well written, fully mature libraries that allow you to handle arbitrary precision arithmetic without the massive overhead of floating points. I can't honestly remember the last time I used a float in almost 25 years of hacking C. I may never have used a float now I think about it. I also have written image processing and manipulation software for national newspapers so have some experience of managing large image data sets (in TIFF and JPG). Watch the speed up if you can work with integers.

Suggestions to make things speed up:

1) Keep everything in memory as you are doing. However look at the memory activity to see if it is really staying in memory.

2) malloc and arrays are just the same thing but addressed differently at the application level, any decent c compiler will handle working in two dimensions to be the same speed as a single dimension. The GCC compiler is pretty good. Be careful on the optimisations though.

3) Work in integers if at all possible.

4) If you can use bit shift operators, >> << & |, then you will also get a speed up. Good ways to do this is to try and align data along the word boundary of your int variables. If you look at any crypto code, they try and do most of the arithmetic using bit operators.

5) Try and work to the parameters of the CPU, e.g. if you have a 64 bit CPU then try and move 64bits of data around as this is the 'natural' length. Lost faster than 32 bit.

6) Use mature libraries for handling your file format. I tried to better the JPEG libraries and gave up as the people who wrote the code were gods and knew exactly how to make the code sing. Saves a vast amount of development time.

Hope this helps, I'd love to help but am busy building an extension before winter sets in here in the UK. Shout if you want me to look at code though.

#10 JBull

JBull

    Messenger

  • -----
  • topic starter
  • Posts: 401
  • Joined: 10 Oct 2005

Posted 05 December 2007 - 07:58 PM

Just seen the thread here.

Suggestions to make things speed up:

1) Keep everything in memory as you are doing. However look at the memory activity to see if it is really staying in memory.

2) malloc and arrays are just the same thing but addressed differently at the application level, any decent c compiler will handle working in two dimensions to be the same speed as a single dimension. The GCC compiler is pretty good. Be careful on the optimisations though.

3) Work in integers if at all possible.

4) If you can use bit shift operators, >> << & |, then you will also get a speed up. Good ways to do this is to try and align data along the word boundary of your int variables. If you look at any crypto code, they try and do most of the arithmetic using bit operators.

5) Try and work to the parameters of the CPU, e.g. if you have a 64 bit CPU then try and move 64bits of data around as this is the 'natural' length. Lost faster than 32 bit.

6) Use mature libraries for handling your file format. I tried to better the JPEG libraries and gave up as the people who wrote the code were gods and knew exactly how to make the code sing. Saves a vast amount of development time.


Thanks for the reply. Although I think the PNG format is the best for lossless images I am finding limited support for PNG within popular applications. I'm rethinking support for TIFF (using libtiff). But I have some trouble with tiffs between my Epson Scanner in WinXP and GIMP in Linux. I get several warning messages from GIMP about the Epson TIFF format but it still displays the image correctly. The bottom line I guess is that TIFF is more popular and more widely supported and therefore should be included.

About memory allocation: for debugging I've added a few lines of code to monitor memory allocation including size and address. Then when I free the memory I send another message to output indicating the size and memory address. This helps immensely to track memory allocation and free. Also a good test was linux's "system monitor" which shows memory in use in real-time as the program runs.

Concerning arrays, I read that arrays are slightly slower than a true malloc() due to the arithmetic that must be done for indexing. Makes sense to me - each time you use an array it must be converted like this: &array(index) = &array+index*sizeof(int). If this is correct then there is a tiny difference in speed. Over a large number of iterations such as 10 million pixels times three channels times 30 images this can make a difference. I could be wrong about how the compiler handles arrays and indexes.

I'm working with integers at every step except doing the stack sum or average. Each source pixel is read and allocated as 3 integers for (r,g,b). Then as the sum of the stack is made each integer is cast as float. So really the only allocated float variable in the code is sum. I'm not sure how I could make sum (or avg) into type int without losing data. Suppose sizeof(int) is 4 bytes and we're stacking thirty 48-bit images (16-bits per channel). I think int is large enough to hold the sum but for avg = sum /(number of images) you would lose some data when you divide integers I think. Maybe there is a way around this. Possibly do the sum as an integer and cast sum and num_images into float to compute the avg:
int sum, num_files;
float avg = ( (float)sum/(float)num_images);

I think it would work so long as the number of images is not so numerous that the sum of a 16-bit channel over n files is greater than size of int.

The bit shifting is an interesting idea. Seems it would apply well to stacking images. Its over my head for now but I'll read into it.

Thanks,
JB

#11 JBull

JBull

    Messenger

  • -----
  • topic starter
  • Posts: 401
  • Joined: 10 Oct 2005

Posted 05 December 2007 - 09:37 PM

Changing sum from float to int saved 4 seconds with the same test posted above. I still calculated avg as

avg = (int)((float)sum/(float)num_images));

How can I do integer division without loss? That would make it much faster because the above calculation is done 3channels X 11millionpixels = 33 million times.

The memory allocation was the same size as before because sizeof(int) = 4
sizeof(float) = 4
But taking the sum of all 110 million pixels in 10 files using int instead of float sped things up. Here is the same stack as my previous post, down from 34 seconds to 30 seconds. It takes 10 seconds to write the final image to disk. So the actual registration and stacking only takes about 20 seconds now. In other words the stacking code is now 16% faster.

$ ./afstack ant*png antares-stacked.png

Enter radius (1-20):2
Registering ant10.png
Offset [0,0]
Registering ant1.png
Offset [20,-1]
Registering ant2.png
Offset [19,0]
Registering ant3.png
Offset [17,0]
Registering ant4.png
Offset [12,0]
Registering ant5.png
Offset [8,0]
Registering ant6.png
Offset [10,0]
Registering ant7.png
Offset [10,0]
Registering ant8.png
Offset [6,0]
Registering ant9.png
Offset [6,0]
Finished Registration.

Beginning Stack...
Stacking file ant10.png
Stacking file ant1.png
Stacking file ant2.png
Stacking file ant3.png
Stacking file ant4.png
Stacking file ant5.png
Stacking file ant6.png
Stacking file ant7.png
Stacking file ant8.png
Stacking file ant9.png

Please wait for output to file antares-stacked.png

Done.
Execution time = 30 seconds.
$

#12 astrotrf

astrotrf

    Not at all Mundane

  • *****
  • Posts: 1,455
  • Joined: 30 Sep 2007

Posted 05 December 2007 - 11:28 PM

I've been writing C code for 30 years now, so I thought I'd weigh in on this thread ...

About memory allocation: for debugging I've added a few lines of code to monitor memory allocation including size and address. Then when I free the memory I send another message to output indicating the size and memory address.


There are dynamic memory debugging libraries that replace malloc(3)/free(3) and friends and will give you a great deal of information allowing you to keep track of dynamic memory, should it be a problem. It'll warn you about things like double deallocation and lots of other potential problems.

Concerning arrays, I read that arrays are slightly slower than a true malloc() due to the arithmetic that must be done for indexing. Makes sense to me - each time you use an array it must be converted like this: &array(index) = &array+index*sizeof(int).


This is, in general, correct. Stepping through memory using an autoincremented pointer is faster than iterating through an array with an index. And for two-dimensional arrays, the index arithmetic is an even larger burden.

Besides, using dynamic memory allows you to free(3) the intermediate image areas when you're done, so that when you add code to do other manipulations with the image, you can reuse the memory in other ways.

Suppose sizeof(int) is 4 bytes and we're stacking thirty 48-bit images (16-bits per channel). I think int is large enough to hold the sum but for avg = sum /(number of images) you would lose some data when you divide integers I think. Maybe there is a way around this. Possibly do the sum as an integer and cast sum and num_images into float to compute the avg:
int sum, num_files;
float avg = ( (float)sum/(float)num_images);

I think it would work so long as the number of images is not so numerous that the sum of a 16-bit channel over n files is greater than size of int.


Addressing the last point first: n would have to exceed 65,536 for the sum of 16-bit numbers to overflow a 32-bit unsigned quantity. So you're quite safe in adding the images into an int. In fact, you could save quite a bit of memory by allocating the images as unsigned short and summing into a final image allocated as unsigned int.

All you'll lose by storing the average as an int is the fractional part of the average. And since that fraction is useless anyway for purposes of displaying the image, there's no real sense in keeping it - just do the calculation in floating-point and round to the closest integer (or even unsigned short) to store.

By the way, in your code snippet, you don't have to cast both the numerator and the denominator to float - if either is float, the calculation will be done in floating-point and yield a float. (It doesn't actually change what the code has to do; it's just a bit prettier than lots of excess casts stuck into the expression.)

int sum, num_images, avg;
avg = (float)sum / num_images + 0.5;

I would advocate keeping the code simple and just going ahead with computing the average through floating-point. There are ways to stay in the integer domain, but why obfuscate the code for what may only be a tiny performance improvement at best?

Personally, I'd be fascinated to learn how you implemented shifting each image for registration while treating the two-dimensional image as a linear vector of pixels ...

#13 imeridian

imeridian

    Viking 1

  • *****
  • Posts: 899
  • Joined: 22 Aug 2007

Posted 06 December 2007 - 02:37 AM

But eventually I'll add the GUI using GTK or QT. Which GUI toolkit should I devote my time to learn?


FWIW, I very much prefer GUI apps to use GTK versus QT.

#14 JBull

JBull

    Messenger

  • -----
  • topic starter
  • Posts: 401
  • Joined: 10 Oct 2005

Posted 06 December 2007 - 02:48 AM

Terry,

Thanks very much for replying on this. I've done my best to track memory allocation and freeing but malloc() debugging libraries would help. A question about simplifying initialization of memory allocation: how can I initalize all allocated memory to zero without an extra iteration like this?
for (i=0;i<memory_size;i++) *pixel=0;

About the transformation for shifting the images: I'm using a nested y,x loop in which a third variable j is mapped as:
j=y*(width-1)*3+x (the width*3 because three values r,g,b for each pixel)

which allows the image to be linear in memory along j. A transformed x,y results in a transformed and linear j. I wonder if this calculation for j negates the advantage I had hoped for by avoiding a two dimensional array.

My background in C is fairly weak. I learned C and Fortran programming on the VAX in college but program mostly in Python as a hobby these days. I'm a chemist by profession. So this project has been a real challenge for me despite the simple nature of adding pixels. I really appreciate help and comments and fully intend to see this project to completion.

#15 Rob Willett

Rob Willett

    Viking 1

  • -----
  • Posts: 756
  • Joined: 07 Feb 2005

Posted 06 December 2007 - 04:59 AM

Jeff,

Calloc is your friend for zeroing memory on initialisation.

There are numerous libraries out there for managing memory allocation, I would certainly use them for development, once you are sure you have the memory tightened down then, they normally have a flag which removes the debugging and hence you get speed up.

I disagree over the use of n-dimensional arrays vs single dimension arrays. Whether you use the compiler to manage it for you, or you do it yourself you are still indexing along a single dimension array as if it's a multi-dimension array. The advantage of creating it as a multi-dimension array is that you can see clearly in the code what you are trying to do. I have written a number of compilers for C, Pascal and my own languages and you have to do the same work somewhere.

One way that may speed up up the code fragment below

j=y*(width-1)*3+x
is use explicit bitshift and an additional addition. Multiplying by three is the same as shifting everything over to the left by one place and adding the value again. Bitshifts are very, very, very fast, they take place directly in the CPU, additions are pretty fast as well if you can algn them on the processor word boundary. One bitshift and one addition may be significantly faster than multiplication by three. Now the compiler *may* do this optimisation for you, but it 'depends'. If you multiply by a variable, then normally the compiler can't work it out, if you know its a constant then the compiler can sometimes get it right. The only way to check is to a) look at the compiler output b) do some timing loops.

One suggestion is to put some timing sections into your code to see where most of the time is spent. Again there are numerous libraries available in C do help you here. There are also memory allocation checkers as well. gprof may be your friend here.

You've picqued my interest here contrary to what I said before. If you want some help please ask. The reason I'm interested is that I want to make an automated workflow imaging and archiving system. So we use databases (MySQL or Postgres) to manage the files.

#16 llanitedave

llanitedave

    Humble Megalomaniac

  • *****
  • Posts: 30,955
  • Joined: 25 Sep 2005

Posted 06 December 2007 - 10:51 AM

But eventually I'll add the GUI using GTK or QT. Which GUI toolkit should I devote my time to learn?


FWIW, I very much prefer GUI apps to use GTK versus QT.


Or you could do the WxWidgets toolkit, which lets you make things a little more cross-platform.

QT also has a Windows version, I believe, although I don't know how well it works.

#17 llanitedave

llanitedave

    Humble Megalomaniac

  • *****
  • Posts: 30,955
  • Joined: 25 Sep 2005

Posted 06 December 2007 - 10:53 AM

My background in C is fairly weak. I learned C and Fortran programming on the VAX in college but program mostly in Python as a hobby these days. I'm a chemist by profession. So this project has been a real challenge for me despite the simple nature of adding pixels. I really appreciate help and comments and fully intend to see this project to completion.


I'm looking forward to it, and I really admire your willingness to take on the challenge.

At least your Python experience should help on the GUI end.

#18 astrotrf

astrotrf

    Not at all Mundane

  • *****
  • Posts: 1,455
  • Joined: 30 Sep 2007

Posted 06 December 2007 - 12:21 PM

I've done my best to track memory allocation and freeing but malloc() debugging libraries would help.


I don't have a particular one to recommend, but they're easy to find on the net.

A question about simplifying initialization of memory allocation: how can I initalize all allocated memory to zero without an extra iteration like this?


You should use bzero(3) for clearing allocated memory (if you're of the BSD persuasion), or memset(3), in general.

As another poster pointed out, calloc(3) is exactly what you want in your particular situation.

However, you should examine whether you need to initialize the memory at all (using bzerio() or calloc() isn't free; initializing code still has to be executed). If you're reading images into the memory, and the image is exactly the correct size for the allocated space, there's no need to initialize the memory in advance.

In similar situations, I like the trick of initializing the memory to some impossible value, as a debugging aid, so I can, with some debugging code turned on (#ifdef DEBUG is your friend), I can detect whether I've failed to fill memory properly, in cases where that might be a possibility.

About the transformation for shifting the images: I'm using a nested y,x loop in which a third variable j is mapped as:
j=y*(width-1)*3+x (the width*3 because three values r,g,b for each pixel)

which allows the image to be linear in memory along j. A transformed x,y results in a transformed and linear j. I wonder if this calculation for j negates the advantage I had hoped for by avoiding a two dimensional array.


You wonder correctly. :D What you're doing is the moral equivalent of a 2-dimensional array of a 3 element structure containing the pixel RGB values. Whether your C code is faster than the machine code the compiler could produce is an open question.

If you are developing this under Linux, you're doubtless using the gcc compiler. Here's a bit of unsolicited advice: use -Wall for your compilation. Do not ignore *anything* it says. A few things -Wall complains about are innocuous (and I'll be glad to help you separate the wheat from the chaff here), but mostly they point to things that you should change in your code.

BTW, from what I can see on my FreeBSD boxes, GTK is the way to go for GUI building. I did systems and network programming, rather than GUI stuff, so I'm not an expert here; GTK is something I should learn, though. (Although I did write xsky (shameless plug :o), which is an interactive sky charting program.)

You're putting me to shame here; this is exactly the sort of thing I should be spending *my* spare time doing ...

#19 groz

groz

    Vanguard

  • *****
  • Posts: 2,148
  • Joined: 14 Mar 2007

Posted 06 December 2007 - 12:25 PM

ok, my turn to weigh in on this a bit again. Stuff like this can get _fun_ when a group of folks with diverse experience on the topic get into the nitty gritty.

Like the above posters, this kind of code is the thing I've been doing for 30+ years. I originally started programming in assembler for a z-80, and, during my university years, we used punch cards to write fortran, so yes, I've been at this in various ways and forms for a _few_ years too.

In terms of optimizing, in our shop, attitudes have changed over the years. There was a time, many years ago, when we used to look at the assembler output from compilers very closely, and then using profile tools, target specific pieces for hand assembly to dramatically increase execution speeds. In particular, we used to target floating point stuff a LOT, and hand code stuff for the x87 instruction set to be executed on the co-processor. Modern compilers have negated a LOT of that, but, there's still a few areas where we can help a compiler a LOT, by taking advantage of details we know in advance.

First, on the issue of bit shift and add, for the x3 example above. Shift left and then add is an order of magnitude faster than actually implementing the multiply. This can be forced easily with a code snip that works like this:-

y=(x<<2)+x; // to replace y=x*3;

Note the commenting style, that's so when some kid in 10 years is looking to change the code, it's still understandable, and somewhat self documented.

No matter what the compile options, the former will generate the bit shift + add combination, and it'll likely all be done with a pair of registers on the processor, never referencing memory at all. Modern gcc variants for the x86 will indeed generate pretty much the same code for both with all optimization turned on, while a debug build with optimization off will generate much slower code. This optimization would be indeed generated because the 3 is a hard wired constant in the sources, all bets are off if the 3 were to be replaced by a variable to account for differing pixel sizes, ie code written to work with RGB triplets, or work also with monochrome data based on a width multiplier like this:-

y=x*PixelWidth;

The philosophy here now, has changed a lot based on the types of things we've seen compilers produce over the years. Clear concise code is FAR easier to maintain, and debug. Surprisingly, compilers find it a lot easier to optimize as well, and careful analysis of the output can be very surprising at times. Our philosophy these days is to write clear and maintainable code, then let the compiler do the grunt work of making it fast. Exceptions to this are areas where we KNOW things can be made faster by simple algorithm 'helpers', ie substitute bit shift for multiply because we have knowledge in advance that the optimizer cannot trigger on. When adding such optimizations, we always leave the original in place, commented, with more comments to explain the optimization. Otherwise, somebody else getting in to do some maintanence or bug fix, will likely not understand it at all.


Memory use and allocation, another area that's been historically a problem area for many applications. We always try and add a dose of realism into our rationale in this area. If I was in a design meeting, and the project in question was a DSLR stacking program, the first 'reality check' that I'd bring up goes like this. The target user has spent minimum of a kilobuck on a telescope/mount combination, likely 3 or 4 times that, another minimum 500 bucks on the dslr with triple that a real possibility, and, at least another 500 on the associated widgets and gadgets. It's unlikely this person is going to seriously try stacking on a computer with less than half a gig of ram, and far more likely the program will encounter 2 gigs than it'll encounter a deployment on 256 meg. Time spent on working the memory footprint to a minimum is likely better spent refining registration algorithms. If you consider the application will likely be the primary workload on the machine it's being run on, and that machine will have a reasonable memory base to begin with, assuming up front that a memory footprint of 256meg during the run (half of what I consider a minimum machine) is adaquate. I would put no effort into reducing below a 256meg runtime footprint until more important algorithmic issues are resolved. If the memory footprint is a problem, then, it's easily solved with a 20 dollar bill at the local computer hardware supplier. Based on the barrier to entry for program useage (telescope / photographic equipment requirements), a ram requirement in this range is not unrealistic. The program task is to align and stack, focus on that, worry about reducing memory footprint once that's done. Until align / stack is 'great', the memory footprint is irrelavent anyways, doesn't matter how small the footprint is if the primary task isn't being accomplished. A year from now, when the primary functions are all 'good', the typical memory deployment on off the shelf cheap hardware will be in the range of twice what it is today, and the memory footprint issue rapidly becomes moot.

The same rationale comes into play when figuring out what the processing sweet spot should be. 64 bit dual core machines are the norm today. It would be great if it still runs good on a 32 bit processor, but, 64 bit stuff is rapidly becoming the norm, as is the dual core. Since this whole thing is a very processor intensive task, it would be nice to see it prepared to spawn off enough threads to utilize all of the cores available. Cleanly separating the work units so they can be spun off into an arbitrary number of threads will ensure that the program keeps up to hardware as the norm moves from dual to quad cores etc. Again, from my perspective, effectively soaking up the cpu cycles from all the available cores would be far more efficient use of development time instead of focussing on the memory footprint. Again, reference the target end user, far more likely to have a relatively recent machine with a gig or two of ram and a 64 bit dual core processor than they are to have a machine with 256 meg of ram and a single core 32 bit processor. It's simply demographics, we are talking about an end user that spends significant amounts on the supporting hardware, they are not likely to have cheaped out on the computer equipment.

In order of priority, I'd look at objectives like this.

1) First and foremost, program must do a good job of stack and align.
2) The faster the better.
3) Smaller would be nice, but only if it doesn't impact the first two objectives.

#20 Rob Willett

Rob Willett

    Viking 1

  • -----
  • Posts: 756
  • Joined: 07 Feb 2005

Posted 06 December 2007 - 12:42 PM

But eventually I'll add the GUI using GTK or QT. Which GUI toolkit should I devote my time to learn?



FWIW, I very much prefer GUI apps to use GTK versus QT.


Or you could do the WxWidgets toolkit, which lets you make things a little more cross-platform.

QT also has a Windows version, I believe, although I don't know how well it works.

I'd avoid all graphics interfaces until the core functionality is done. I'd also think about things like Python to do the front end. I have to confess that I know zilch about python but I am told that it's a neat graphics system.

#21 astrotrf

astrotrf

    Not at all Mundane

  • *****
  • Posts: 1,455
  • Joined: 30 Sep 2007

Posted 06 December 2007 - 12:42 PM

I'm a Tcl/Tk fan myself (well, mostly TK and less Tcl :lol:) ; I wrote an image-processing program once using Tcl/Tk as the GUI and did all the heavyweight processing in C. Jeff here is inspiring me to look at adding registration and stacking capabilities to *that* code. But now that I have an observatory and some scopes and CCD cameras, I seem to always find something *else* to do rather than sit and write code ...

#22 Rob Willett

Rob Willett

    Viking 1

  • -----
  • Posts: 756
  • Joined: 07 Feb 2005

Posted 06 December 2007 - 12:49 PM

Groz speaks with great truth about overall design. We've all dropped too much money on Astronomy to worry about £20 for additional memory.

He's also pretty spot on about the design process, focus on making it work and worry about memory sizes later.

Getting the overall design right now is pretty important. Thinking through the way you want to manipulate data is the key to making this work. What you need to avoid doing is writing data or moving data around. Pointers are your friend (I can't believe I just wrote that :) ) here.

I looked at the libtiff library and it seems very mature, I wrote a couple of sample programs to open tiff's I had to see what it could do. It looks pretty powerful to me, I'll keep on looking at it and seeing how it all fits together.

The more I read about tiff the more I can remember how it all works. It's very powerful, but it has lots and lots of options, such as tiles vs strips etc etc. Whether this is really needed I don't know, but hopefully the libtiff stuff hides a lot from you.

One thing I don't know about is how quick it is, I'll play further to see if I can make any judgements.

#23 llanitedave

llanitedave

    Humble Megalomaniac

  • *****
  • Posts: 30,955
  • Joined: 25 Sep 2005

Posted 06 December 2007 - 03:54 PM

Rob, you're right that functionality should come first. The GUI, while not exactly an afterthought, should be considered only after all the inputs and all the outputs are established.
(Unless of course, the choice of GUI makes possible new functionalities and new types of inputs and outputs).

Anyway, once you're at that stage, you have a number of robust GUI toolkits to choose from.

#24 astrotrf

astrotrf

    Not at all Mundane

  • *****
  • Posts: 1,455
  • Joined: 30 Sep 2007

Posted 06 December 2007 - 06:03 PM

This is all good advice. Make it work first; then worry about making it efficient (both speedwise and spacewise) later. You only need to worry about the GUI to the extent that it gives you a clear direction as to what features and functions you need to design into your code.

#25 groz

groz

    Vanguard

  • *****
  • Posts: 2,148
  • Joined: 14 Mar 2007

Posted 06 December 2007 - 08:00 PM

A question about simplifying initialization of memory allocation: how can I initalize all allocated memory to zero without an extra iteration like this?
for (i=0;i=0;


As earlier mentioned, calloc _can_ be your friend, but, I'd actually do it slightly differently. Under the covers, calloc maps to alloc and memset. I consider it 'cleaner' to not get in the habit of using calloc, for a number of reasons, mainly so that my code has a glaring call there to remind me the array is being pre-initialized, and I dont get in the habit of typing calloc when I dont need the memory initialized to zero, especially important if the array will be sparsely addressed, no point touching all the pages and forcing swap for the bits that'll never get touched.

char *ptr;
ptr=malloc(somesize);
memset(ptr,0,somesize);


Now, if you are concerned about performance, moreso than memory useage, there's a better option.


char *ptr;
ptr=valloc(somesize);
memset(ptr,0,somesize);

valloc returns a page aligned buffer, so assuming your data bits are also an alignable size, indexing into that array will be faster. Not going to happen if you are using 3 byte triplets, but, since you are using ints, this will indeed force the issue of getting page aligned buffers, which will allow the processor to better stretch it's legs doing lots of fancy indexing into that buffer. Again, turn optimization on max for a final build, let the compiler earn it's keep, the optimizer is well aware of the difference between a buffer from malloc, and one from valloc.

If you are indeed more 'forward thinking', then the posix stuff deprecated valloc a while back, i believe the correct replacement is posix_memalign.

This is a classic example of the tradeoff between execution speed, and memory footprint. Without doing measurements, i'm not completely sure offhand, but, my gut says, the few extra 4K chunks sliced out by the page align buffers will be more than made up for by the execution speed increase potential. I suspect it'll be come far more noticeable if the machine hosting the run is indeed being pushed into a lot of paging during the run, far less so if it all fits comfortably into physical memory from the get go.

Again, if you are working with large arrays, and do notice your machine starting to swap heavily, reference my earlier comments regarding a quick trip to the local computer store, and a 20 dollar bill....


CNers have asked about a donation box for Cloudy Nights over the years, so here you go. Donation is not required by any means, so please enjoy your stay.


Recent Topics






Cloudy Nights LLC
Cloudy Nights Sponsor: Astronomics