Jump to content

  •  

CNers have asked about a donation box for Cloudy Nights over the years, so here you go. Donation is not required by any means, so please enjoy your stay.

Photo

CPU Load and Software for Solar System Processing? (Let's Get Technical!)

  • Please log in to reply
54 replies to this topic

#1 MalVeauX

MalVeauX

    Cosmos

  • *****
  • topic starter
  • Posts: 9,156
  • Joined: 25 Feb 2016
  • Loc: Florida

Posted 07 April 2020 - 10:25 AM

Hey all,

 

Come ye all computer knowledgeable and software knowledgeable folk! Help solve this time consuming challenge!

 

Computer related questions with respect to solar system imaging & processing... For a while now, I've been watching CPU load, temperatures and overall time with respect to using software such as PIPP and AS!3 when it comes to the initial stages of processing. I took some screen shots and some times on a rather large file, that is common to me now (with the IMX183 sensor generating lots of data) and it really is taking a long time, which is annoying. So, maybe there's a solution out there? I didn't have any issue at all with time when I was doing several thousand frames from my IMX174 sensor, 290MM sensor and even the IMX183 sensor is fine when it comes to ROI and planets, everything was fast. But full pixel array IMX183 20Gb files are a bear to process because of time. It's so slow, and uses so few physical cores of the CPU that I wonder what it's doing? And because of that, maybe I would be better off running two instances of AS!3 and assigning them cores to run on to do twice the work in the same time? Or, maybe its my CPU and architecture being a bottle neck? I noticed in PIPP it always uses a single core no matter what I'm telling it to do, so that seems obvious to me, it will only speed up with more clock speed per core (which isn't happening it seems, most clock speeds are the same they've been for a decade and cores are just expanding). So I'm always looking for ways to speed things up. I normally don't have an issue with smaller files in the 2~5Gb range. But doing three 20Gb files takes an hour or more. But maybe it's just limited and cannot be sped up? That's the question and that's what I'd love to hear everyone's thoughts with experience with this stuff on!

 

My platform:

 

AMD Bulldozer (yes, old, not a great architecture) chipsets

AMD FX 8350 4Ghz 8 physical core CPU (I upgraded a while back from my old Phenom 4 core, saw a moderate increase in overall speed)

16Gb RAM (memory only saturates if I drizzle, otherwise, it rarely even comes close to using all this)

Samsung EVO SSD

 

I realize there's newer, better platforms and that mine is rather dated relative to what's been released over the past few years. That's why I'm asking these questions because I'm curious how much better a newer platform will be. I'm looking to save time. But only if it's really reasonable. When I went from my 3.4Ghz 4 core Phenom to this 4Ghz 8 core Bulldozer, I saw fairly 20~30% decrease in time, but I can't say if it was the architecture and clock speed difference or if the doubling of physical cores mattered more. Not all software uses it the same. When the software uses all 8 cores, at full load, it's obvious and it is much faster at doing its work than the 4 core regardless of clock speed. But I'm noticed a lot of my common software (PIPP, AS!3) do not use all cores all the time (PIPP never does; AS!3 will use all cores at full load when it does Drizzle (I rarely use Drizzle, but some do, I will not use it for this test though), but the other sequences it rarely uses all cores and rarely at any sort of significant load). It would be of course wonderful to hear that another platform uses all cores and is much faster. I'd know what to do then. But if not, if this is common to everyone's platform out there, then its the software and that's fine, I'd rather know that and just accept that it will never be faster likely and then just explore running several instances of the software on dedicated  cores and do more work over the same time at the same speed (if I can).

 

I would love to see anyone else crunch similar class data and post times with their hopefully better platforms to see the time differences or resource utilization! Especially someone with 16 cores, 32 cores, 64 cores even since those are available these days! Or even just someone with 4 to 8 cores or 16 cores but with 5Ghz or faster clock speeds?

 

My Camera (the data producer):

 

Camera: ASI183MM (20Mp array)

Test file: Full pixel array of the moon, 1000 frames at 8bit, 20Gb file (this is newer to me, much bigger files than my IMX174 & 290MM make with 1k frames)

Screen shots of each phase in AS!3 (6k alignment points, at 72 in size used for this, via auto) and PIPP

 

Software I use:

 

AS!3

PIPP

 

Processing Workflow:

 

I will separate out each piece of software to show what they're doing at each step of the processing with associated CPU load, number of cores used, memory used, etc. The time is what I'm interested in. I could care less if all the work was done on one or two cores, if it meant less time. Some processes use the full CPU and load up memory. Some barely use the resources. Clearly something going on software wise. But then again, could also be the hardware to a degree if the software isn't optimized for it but is better on a different architecture or platform.

 

Approach one is all AS!3. All the processing of the entire file is done in AS!3 from start to finish. The time is captured. Sequence will be below.

 

Approach two is PIPP first (to limit the frames by quality only and reorder them to the same stack size as I use in AS!3) then follow it with AS!3 on the smaller file already quality sorted to reduce time. This is frankly faster, much faster, to do this method.

 

The output is pretty much the same that I can tell, looking at the output of each approach and doing my normal display processing on them. So the efficiency of the software seems to come into play with PIPP and AS!3 behaves a lot nicer with smaller clumps of data (and its the same data in this case, just 25% of it).

 

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

 

I will put each approach and results in separate posts for clarity.

 

Continued....

 

Very best,


Edited by MalVeauX, 07 April 2020 - 11:55 AM.

  • bsavoie likes this

#2 MalVeauX

MalVeauX

    Cosmos

  • *****
  • topic starter
  • Posts: 9,156
  • Joined: 25 Feb 2016
  • Loc: Florida

Posted 07 April 2020 - 10:27 AM

Approach One, AS!3 All the Way

 

Again, I'm using the IMX183's output of a 1,000 frame full pixel array file, 8bit, that is 20Gb. I load it into AS!3 and use the settings as you'll see in the screen shots, nothing fancy. I screenshot each sequence in the process to show CPU use and load and memory use over time. And at the end, we look at the time associated with everything, using the full 1000 frames. When I place alignment points, I use 72 as the size and auto and it's about 6k points. I'm outputting a single 251 frame stack from this, which is common for me for lunar stuff (much higher outputs for planets of course, and often lower for solar).

 

The entire sequence on this file takes 1,142.2 seconds (19.04 minutes).

 

This is why I'm even making this thread. 20 minutes basically per file. That makes RGB imaging the moon with the IMX183 result in 3 of these files and that's per panel, so you can imagine the amount of data its generating and the processing time is significant. That said, its more convenient than doing lots and lots of panels and mosaics.

 

Notes:

 

Sequence 01, Surface Image Stabilization (459.8 secs to complete); CPU usage was minimal, averaging near 14% total load, but more interesting is that one core was doing most of the work in the 60~70% range, the other cores just sat around really, with 3 other cores exhibiting anywhere from 8~12% from time to time, and 4 other cores were totally just taking a break. This is the slowest of all the processing on these 20Gb files, and this is what interests me the most. I think the more frames you have here, the more complex it is and will test it via a smaller frame stack later in Approach Two.

 

Sequence 02, Buffering & Image Analysis (137.1 secs to complete); CPU usage was better, all the cores show a little activity. The heavy lifting was done by 4 of the cores, 3 were significantly higher than the rest around 60% utilization, the 4th just a bit less. The other 4 cores show some minor activity but are not contributing a lot. Overall this sequence has less work it seems and completes fairly fast and utilizes the CPU more than Sequence 01.

 

Sequence 03, Reference Image (215.3 secs to complete); CPU usage was good, all 8 cores are playing ball and the load is not maximium, it actually oscillated from the 40~60% range to 98% commonly on all cores but averaged fairly high on utilization and this process is much, much faster as a result. So whatever is going on here, it uses more of the resources.

 

Sequence 04, Image Alignment (159.9 secs to complete); CPU usage is full on, everyone is on board, all 8 cores are at full load in the upper 90% ranges. So this process takes advantage of everything and is pretty snappy as a result.

 

Sequence 05, Image Stacking (148.1 secs to complete); CPU usage drops quite a bit, all the cores are still on board, but 5 of the cores seem to be doing the work while 3 of the cores are just doing a bit less and the overall usage is in the 40% range averaged out, but only a few cores are doing the work here.

 

Sequence 06, MAP Analysis (15.8 secs to complete); CPU usage is full on, all cores are being used, they're all at full load, 100%, and this process using the full resources available to it is rapidly completed. It would be nice to see this kind of resource utilization in Sequences 01 through 05.

 

Sequence 07, MAP Recombination (6.2 secs to completely); CPU usage drops but the processing time was so brief I didn't see the average use of each core. It was already operating at full load on all cores as it moved into this sequence and then it just drops off as its completed, so I can't say for sure if this process used all cores and full load, or if it used less and simply took so little time that it didn't matter.

 

Finally the full sequences and their times for the processing of the test file. Again, 19 minutes approximately to do this and there was quite a lot of times where the CPU was not at full load, let alone using all cores. Strange behavior to me. Hopefully someone else understands better. The time would be reduced if this used the CPU completely but that's not the case.

 

AS!3_01_InitialSetup.jpg

 

AS!3_02_SurfaceStabilization.jpg

 

AS!3_03_BufferingImageAnalysis.jpg

 

AS!3_04_ReferenceImage.jpg

 

AS!3_05_ImageAlignment.jpg

 

Continued.....

 

Very best,


Edited by MalVeauX, 07 April 2020 - 10:28 AM.


#3 MalVeauX

MalVeauX

    Cosmos

  • *****
  • topic starter
  • Posts: 9,156
  • Joined: 25 Feb 2016
  • Loc: Florida

Posted 07 April 2020 - 10:29 AM

Continued....

 

AS!3_06_ImageStacking.jpg

 

AS!3_07_MapAnalysis.jpg

 

AS!3_08_Recombination.jpg

 

AS!3_09_TotalTimeProcessing.jpg

 

That's it for Approach One.

 

Next up, Approach Two.

 

Continued....

 

Very best,



#4 MalVeauX

MalVeauX

    Cosmos

  • *****
  • topic starter
  • Posts: 9,156
  • Joined: 25 Feb 2016
  • Loc: Florida

Posted 07 April 2020 - 10:31 AM

Approach Two, PIPP then AS!3 to Reduce Time

 

Using the same test file from the IMX183, a 1,000 frame 20Gb file, I first pass this through PIPP to reduce the frame count and focus on keeping only quality frames. This seems to be a common tactic out there, so I used it here too. The only thing I ask PIPP to do is reorder the frames from best to worst quality and to keep the best quality frames based on its default quality algorithm and I want to output 251 frames to represent the same (or similar) 251 frames I output from AS!3 in Approach One. The input and output are .SER containers.

 

The first stage or processing uses very little CPU, only one core is doing the heavy lifting in the 60% range or so, the rest are sitting around with some minor activity on a 2nd core, but the other 6 cores are totally taking a break. You can see that PIPP loads more into RAM directly, which may explain its faster results perhaps?

 

The second stage of processing, the sorting of the 251 frames, exhibits similar CPU behavior and use in that it's again a single core doing most of the work, in the 50~60% range, and the rest are just sitting around not doing much, and you can see RAM fills up rapidly here. Again, memory use seems to play a role in how fast PIPP gest the work done.
The final result is 357 seconds (5.78 minutes!) to complete the work and reduces the 1,000 frames to 251 frames of the best quality and sorted from best to worst, but no other processing was done.

 

The next step in this approach is to now feed this PIPP reduced stack of data into AS!3. So I load the 251 frame PIPP output SER into AS!3 and do the same exact process as done in Approach One, but with a smaller stack and only the best frames from the total stack to examine.

 

All the sequences in AS!3 are heavily reduced in time. Stabilization went from 459 seconds to 92 seconds! That's reduced by 80% approximately in total time here. I only reduced the file to 25% of its original stack size. PIPP used 5.78 minutes to do a lot of this work, but did it help at all? Everything else was also reduced, but less dramatic than Stabilization was. Sequence two went down to 11% of the original time, Sequence three went down to 29% of its original time, Sequence four went down to 40% of its original time, Sequence five went down to 24% of its original time, Sequence six & seven are nearly the same, so no time savings there, but that makes sense since they both output 251 frames at this point so the time should be similar and it was.

 

Overall time for AS!3 to complete the work on 251 frames and output 251 frames is 290.7 seconds (4.85 minutes!). This is 25% the time it took to let AS!3 do all the work on all 1,000 frames. The cost was 5.78 minutes in PIPP to reduce the stack first but keep the quality. Total time for Approach Two with PIPP and AS!3 combined is 637.7 seconds (10.6 minutes!). This is 50% of the time it took AS!3 to do the job and saved me 10 minutes of time. If I were to assign this to two sequences, I could get two files done in half the time of one file using Approach One. That's a lot of time saved for me.

 

AS!3 seems to be more efficient with smaller stacks that are already sorted, and utilized less memory so I wonder if AS!3 is doing a lot of work on a HDD/SSD buffer rather than RAM, unlike PIPP which has less CPU use but more RAM use and yet gets the job done faster for some of the sequence.

 

PIPP_01_InitialSetup.jpg

 

PIPP_02_ProcessingInitial.jpg

 

PIPP_03_SortingFrames.jpg

 

PIPP_04_FinalTimeProcessing.jpg

 

AS!3_10_251frames_DifferenceInTime.jpg

 

Continued to final results....

 

Very best,


Edited by MalVeauX, 07 April 2020 - 10:34 AM.


#5 MalVeauX

MalVeauX

    Cosmos

  • *****
  • topic starter
  • Posts: 9,156
  • Joined: 25 Feb 2016
  • Loc: Florida

Posted 07 April 2020 - 10:32 AM

End Results & Outputs Compared

 

And of course, how about the output? Do they compare? I loaded each 251 frame stack from each approach into IMPPG then Photoshop with the same processing. I'll provide the comparison of the same two features at 1:1 crop scale. I find them to be nearly identical, I aligned them and did a difference layer and you can only see minor differences, enough to tell me that the output is virtually the same. There's small differences due to how the stacking software did an output with tiny amounts of shifting, but not enough to be noticeable other than doing a subtraction. So I don't see a quality difference between the two approaches, but I do see a 50% reduction in total time. That has value for me. And I would absolutely love to find out another platform does this faster, so I know where to go next to further reduce time in processing.

 

It makes me wonder how it works in the background, since I'm not a coder/software writer. AS!3 definitely likes smaller stacks, especially already sorted higher quality frames, maybe it's less work on its end when the data stack is already very good and smaller, it clearly went much faster with that reduction of course. I'm also curious how AS!3 does it's worth and quality estimation and processing the sequences without much RAM use, unlike PIPP which seems to put it all into RAM. Maybe that's the speed difference not associated with CPU use? The only time I see AS!3 load the RAM to saturation is when using Drizzle (then it's 100% load CPU and fills all 16Gb of my RAM with this kind of file).

 

So I'm curious if there's any other method to speed things up further? Or hardware that would make a significant impact? Just doing this basic software sequence with Approach Two shaved 50% of my time, regardless of hardware, and has approximately identical output, so I'm wondering what else can be done? For that, I tap into the community!

 

And thank you to anyone who can help with this or provide insight and thank you to anyone who bothered to get this far.... lol.gif flowerred.gif

 

(this image is compressed due to CN file restrictions, but the difference layer is key to seeing if there's much difference or any and that's the real comparison here)

 

Approach_Comparison_Output.jpg

 

Very best,


Edited by MalVeauX, 07 April 2020 - 12:01 PM.

  • troyt, R Botero and bsavoie like this

#6 Tom Glenn

Tom Glenn

    Gemini

  • -----
  • Posts: 3,444
  • Joined: 07 Feb 2018
  • Loc: San Diego, CA

Posted 07 April 2020 - 01:53 PM

Marty, I feel your pain, because I use the same camera on the Moon, and stacking in AS!3 takes forever.  I don't favor culling any frames in PIPP, because it only selects frames based upon overall quality of the entire frame, but Autostakkert evaluates each AP independently, and even the best frames have some regions that are not as sharp, while also some of the "poor" frame have regions that are good.  This is especially true for the Moon with a large sensor.  The entire frame is not equal quality, and this forms the entire basis of using the "Local" versus "Global" stacking procedure in AS!3.  The final result is a mosaic of all the best APs, each one composed in turn of a different subset of frames.  So it's best to give AS!3 all the frames if possible.

 

My computer is a 2016 MacBook Pro, and I run AS!3 in Windows 10 using VMware.  I have 16GB of RAM, and like you, I find that AS!3 doesn't tap into the full resources available in the computer.  My "typical" lunar file is 5000 frames, and 100GB, using >15,000 APs.  These files take, on average, about 3-6 hours to stack.  I don't quite understand what causes that variation in time, with some taking longer than others.  But over 3 hours per file is the norm.  And I don't drizzle, because it causes AS!3 to crash.  In addition to taking a long amount of time, the computer heats up to very high temperatures, and this worries me for the long term health of the computer.  

 

One things I have recently found, is that Rolf's stacking program, PSS, is about 2-3 times faster than AS!3 when given the same lunar file.  I recently did another test of this using Rolf's latest edition, and found that a file that took 3 hours to stack in AS!3 took 45 minutes using PSS.  On top of that, Rolf's program has been giving me slightly better results on my lunar stacks than AS!3, although not everyone is reporting the same result here.  But the increase in speed is interesting, and shows that certain programs are able to more efficiently make use of the processing capabilities of the system.  


  • MalVeauX likes this

#7 MalVeauX

MalVeauX

    Cosmos

  • *****
  • topic starter
  • Posts: 9,156
  • Joined: 25 Feb 2016
  • Loc: Florida

Posted 07 April 2020 - 02:31 PM

Thanks Tom, wow, I couldn't do 3~6 hour processing runs just for lunar data. Yikes. I don't even like doing 20 minutes! Hah. I guess I'm in the volume camp when it comes to acquiring and processing for volume as I don't want to spend that much time, per day, on a common session with the moon (compared to something unique like an eclipse or transit where I would put more effort into it).

 

Good point about the local/global with respect to the software; that's true. I need to do more side by side examples to see if the time savings's is worth it under excellent seeing conditions (I would imagine it matters more with poor seeing or variable seeing) but if your seeing is commonly sub-arc-second, I imagine it may not be as profoundly different? But again I would have to test it a lot more to see if it's worth the potential compromise.

 

I need to try Rolf's PSS. I have it and finally got it working. I need to get familiar with the GUI and all the stuff. If it reduced your processing time from hours to minutes, then my 20 minutes may be reduced significantly.

 

I'm still interested if anyone out there is brute forcing AS!3 to output in less time by having significantly higher clocks and more cores and running instances. Right now it seems a waste to really have more than 4 cores on PIPP & AS!3 for the sequences that take the most time, so it seems higher clock speeds would be ultimately the better brute force method than more cores (which doesn't add anything at all, such as the new 64 core processors.... sounds great for this, but in reality, wouldn't actually help at all and are at lower clock speeds so it would likely be slower!).

 

I'm going to go test PSS and see what the times and results are (though keep in mind I've not used it much so I don't know the optimal values for my data to use).

 

Very best,



#8 MalVeauX

MalVeauX

    Cosmos

  • *****
  • topic starter
  • Posts: 9,156
  • Joined: 25 Feb 2016
  • Loc: Florida

Posted 07 April 2020 - 03:20 PM

Heya,

 

Ok, so I used PSS. I used default settings and buffer = 1 (it said at first it didn't like my RAM size (16Gb) for having buffer = 2).

I loaded the exact same 20Gb IMX183 file, 1000 frames into PSS.

 

Bottom line, it took just over 45 minutes for PSS to do the same thing AS!3 did (Approach One). And the results are identical in terms of doing a layer over layer subtraction. So the only difference really here is time. And it cost more double time. So unless there's a more efficient way to setup PSS, this is just too much time for me. Maybe its not optimal on my platform or something. All of these softwares would likely be way better if GPU accelerated instead of relying on primarily a single core's clock speed.

 

PSS doesn't use multi-core hardly at all. It's like PIPP in that for all of its sequences, it pretty much just camps on one core and uses it, and only uses it from 60~80% roughly, the rest of the cores just take a break really. This is why it takes double the time as AS!3, which more often makes use of 2~4 cores. But through the whole sequence in PSS, it never once wanted more than one core really. PSS definitely uses RAM like PIPP does, and unlike AS!3. Interesting to see how all three handle the resources. But bottom line for me is PSS takes longer and the output is the same as AS!3 (again, each image layered over each other and subtracted resulted in nearly 99% identical results almost entirely black frame).

 

Each sequence in PSS took a while. 10+ minutes for each sequence basically, with 4 main sequences (ranking, aligning, ranking again, stacking; each took 10+ minutes, resulting in 45 minutes total processing time on the CPU). Ranking sequence took 10 minutes and was only really using one core at 60~80%. Aligning sequence also took 10 minutes and used one core mainly, also 60~80%, with a little activity on the other 3 cores. 4 cores were dead quiet for both of them as you can see in the images. Sequence of ranking and alignment points took 10 minutes also, and was also a single core doing the work, minor activity on 3 others, zero activity on 4 others. Sequence of stacking the frames had the most CPU usage, on just one core, in the 80~90% range, also took 10 minutes, 4 quiet cores, 3 very minor cores doing hardly nothing. So from a hardware standpoint, there's no way to increase the speed other than an optimized platform, copious amounts of fast RAM and single core performance in the form of much higher clock speeds. Still, it took twice as long to do the same output, so unless there's a better way to optimize or operate PSS, this is not going to work for me versus just using AS!3 or PIPP then AS!3 (again, despite all the theory, if the output files can be subtracted from each other with nearly zero difference, then they are virtually identical and so the bottom line is less time is the winner for me).

 

PSS_01_RankingFrames.jpg

 

PSS_02_AligningFrames.jpg

 

PSS_03_StackingParameters.jpg

 

PSS_04_AlignmentPoints.jpg

 

PSS_05_RankingFrames.jpg

 

PSS_06_StackingFrames.jpg

 

Continued....

 

Very best,



#9 MalVeauX

MalVeauX

    Cosmos

  • *****
  • topic starter
  • Posts: 9,156
  • Joined: 25 Feb 2016
  • Loc: Florida

Posted 07 April 2020 - 03:25 PM

Ok, so final results from PSS with its Warp Sizes chart.

I loaded the output TIF (250 frames) from PSS into IMPPG and went to the same area on the image and processed in only in IMPPG. I then applied the same exact processing to the output TIF from AS!3 (Approach One, no PIPP). I then loaded them both with identical IMPPG output settings and layered on top of each other. I aligned them exactly manually. Then performed a difference layer. The result, a nearly all black image, so they're nearly identical.

 

(compressed image for forum hosting policy, but the key is the difference/subtraction results, they're nearly identical, so there's no significant difference in output from the two, just time....)

 

PSS_07_WarpResults.jpg

 

PSS_vs_AS!3_251frames_Identical.jpg

 

And because the results of PIPP -> AS!3 subtracted from the results of AS!3 subtracted from the results of PSS all result in 99%+ identical results via image difference/subtraction, the time is 10 minutes vs 20 minutes vs 45 minutes. Whatever minute difference there is, it's not significant here at this scale, so I'm leaning towards whatever saves me non-renewable time.

 

I'm open to trying something else! Or better settings for the job in each software? Or even better hardware!

Where's that GPU acceleration?

 

Very best,



#10 Tom Glenn

Tom Glenn

    Gemini

  • -----
  • Posts: 3,444
  • Joined: 07 Feb 2018
  • Loc: San Diego, CA

Posted 07 April 2020 - 04:29 PM

Marty, I find it odd that PSS took longer for you.  You aren't the first person that has told me that, however, but it's puzzling, because if I use identical parameters, PSS is consistently 2-3x faster than AS!3 on my computer.  Very strange.  In fact, this increase in speed is why I'm using PSS almost exclusively for my lunar images now.  



#11 MalVeauX

MalVeauX

    Cosmos

  • *****
  • topic starter
  • Posts: 9,156
  • Joined: 25 Feb 2016
  • Loc: Florida

Posted 07 April 2020 - 04:43 PM

Hi Tom,

 

Strange indeed! This leads me to think that there's a difference in platforms. Your laptop's CPU is not going to perform the way a desktop CPU will, no matter the achitecture, so that's probably why your processes take 3~6 hours, but if PSS can reduce that significantly for you and run faster than AS!3 then I would have to rule out CPU and start looking at architecture platform and operating system for differences based on the language of the software. Even though it works in the environment, maybe it just doesn't process as well on each platform. This is what I'm really curious about. As we all have different experiences with this, but we really shouldn't if you think about it, it just goes to show there's so much more going on in the background that we don't know about, both hardware and software wise. But that's why I'm calling to question this concept, to see, what really works for each platform to reduce the time of processing.

 

As a secondary result of this experiment and question, I now also have to consider.... is local/global that different, if the end results can be subtracted and show no appreciable difference? Again, I imagine this is different under poor seeing conditions. But under excellent sub-arc-second seeing conditions, where most frames are excellent, maybe it doesn't matter?

 

I haven't touched Registax6 yet to see if it behaves any differently from PIPP, AS!3 and PSS yet. I might. Not sure if I want to. I really don't use that software anymore other than Wavelets.

 

Is there any other software out there to consider?

 

I'm curious if all this software runs best in which operating system on which platform (various chipsets, intel vs AMD, architecture of the chipsets, etc).

 

Very best,



#12 Tom Glenn

Tom Glenn

    Gemini

  • -----
  • Posts: 3,444
  • Joined: 07 Feb 2018
  • Loc: San Diego, CA

Posted 07 April 2020 - 04:54 PM

 

As a secondary result of this experiment and question, I now also have to consider.... is local/global that different, if the end results can be subtracted and show no appreciable difference? Again, I imagine this is different under poor seeing conditions. But under excellent sub-arc-second seeing conditions, where most frames are excellent, maybe it doesn't matter?

I can tell you that local alignment is far better than global, especially with the Moon, and even more especially with the ASI183 because of the large sensor.  Under good conditions, if you look at the frames after they are ordered by quality, you can quickly appreciate that the frames ranked in the top 25% may appear very similar globally, but with some regions of a single frame sharper in than others, and it doesn't always track according to their global order in the ranking.  So if you make a top 10% stack, the results will be very different if you use local alignment.  Remember, global stacking uses the same frames for every AP, basically nullifying the APs entirely.  I made this comparison in another thread many months ago, showing the difference between local and global.  I was actually forced to use global for one of my files, because varying brightness in the sky near dawn was causing AP patches to show up in the stack.  This went away with global stacking, but at the expense of sharpness.  The full topic is linked below, and the relevant image is reproduced here.  This is a small region of the larger frame, and shows how global stacking produces a much softer result with identical processing.  

 

https://www.cloudyni...rocessing-tips/

 

AP_comparison_TG.jpg


  • torsinadoc and MalVeauX like this

#13 Tom Glenn

Tom Glenn

    Gemini

  • -----
  • Posts: 3,444
  • Joined: 07 Feb 2018
  • Loc: San Diego, CA

Posted 07 April 2020 - 04:58 PM

Oh, and in the thread linked above, I provided the screen grab from AS!3 for that particular stacking, and if you add up the times you will see that it took almost 5 hours!  Pretty insane, and although I have no doubt that my laptop is slower than a desktop, I have no idea why PSS is so much faster on the same system.  I haven't tried that exact file with PSS, but based on my other results, I would predict that it would take under 2 hours.  


  • MalVeauX likes this

#14 Tom Glenn

Tom Glenn

    Gemini

  • -----
  • Posts: 3,444
  • Joined: 07 Feb 2018
  • Loc: San Diego, CA

Posted 07 April 2020 - 05:25 PM

In thinking more about using PIPP to cull some frames before loading into AS!3, it would be useful to know just how deep the AP selection goes in the final stack.  For example, in my 5000 frame videos, if I make a stack of 500 frames, what is the lowest ranking frame from which any AP is derived?  If I knew, for example, that no frame any lower than #2000 was used in the final selection, that would be useful information, because I'm sure that ordering the frames in PIPP and making a new video of only 2000 frames would then be faster, and give the same result.  


  • MalVeauX likes this

#15 MalVeauX

MalVeauX

    Cosmos

  • *****
  • topic starter
  • Posts: 9,156
  • Joined: 25 Feb 2016
  • Loc: Florida

Posted 07 April 2020 - 06:51 PM

In thinking more about using PIPP to cull some frames before loading into AS!3, it would be useful to know just how deep the AP selection goes in the final stack.  For example, in my 5000 frame videos, if I make a stack of 500 frames, what is the lowest ranking frame from which any AP is derived?  If I knew, for example, that no frame any lower than #2000 was used in the final selection, that would be useful information, because I'm sure that ordering the frames in PIPP and making a new video of only 2000 frames would then be faster, and give the same result.  

This is what I'm thinking more about. No doubt its faster, as showed here. Significantly faster. Though I cannot explain why some software is really different across platforms. I have an older platform, so I would think it would be a fairly common system for this software to run on, being a Windows 7 environment, a rather old architecture that is well supported and long documented, etc. But I too think that if you quality sort the frames in PIPP and leave the best number of frames you intend to stack, plus a few extra just to give it some buffer, you should be able to still use AS!3 or PSS to use local AP's for best stacking of the best quality frames. Odds are, out of the 200~500 frames you may end up wanting to stack, all the good AP's probably came from them anyways, less likely your worst frame out of the total stack, I would assume. And perhaps this is why my results with 251 frame stacks from each one, resulted in the same result (the crops above are merely 1:1 selections of a much larger image that had contrast near terminator, so I selected that) despite different processes at work... AS!3 ended up using the same 6k AP's on both files, and gave the same results, because the other 749 frames that I didn't stack were not the higher quality frames to begin with. I imagine its even closer when seeing is really good so that all frames are virtually excellent to stack. It's probably going to show a difference with data that has highly variable seeing and poor seeing combined, especially at fine image scales (like 0.5" to 0.3" or finer I'm thinking).

 

Very best,



#16 MvZ

MvZ

    Surveyor 1

  • *****
  • Posts: 1,884
  • Joined: 03 Apr 2007
  • Loc: The Netherlands

Posted 08 April 2020 - 06:53 AM

A bit of a heads up, but I'm working on a new version of AS!3 which should be significantly faster especially for very large recordings - and especially during the surface alignment stage.

 

I know it may be a lot to ask, but if you have time, could you make available one of those large recordings to me? I don't have test material for 'high number of pixels'-data unfortunately, so I'm working on a 40GB lunar recording where each frame is 'only' about 2 megapixels big, and it would be good to test with different images. 


  • John Boudreau, R Botero, torsinadoc and 5 others like this

#17 MalVeauX

MalVeauX

    Cosmos

  • *****
  • topic starter
  • Posts: 9,156
  • Joined: 25 Feb 2016
  • Loc: Florida

Posted 08 April 2020 - 08:32 AM

A bit of a heads up, but I'm working on a new version of AS!3 which should be significantly faster especially for very large recordings - and especially during the surface alignment stage.

 

I know it may be a lot to ask, but if you have time, could you make available one of those large recordings to me? I don't have test material for 'high number of pixels'-data unfortunately, so I'm working on a 40GB lunar recording where each frame is 'only' about 2 megapixels big, and it would be good to test with different images. 

Emil, first, thank you for your software, it's excellent and a staple in the community.

 

That's very exciting to hear.

 

And yes, I'm sure there's a way for some of us to get a large pool of data to you. I'd even mail you a thumbdrive filled with it if that's convenient enough (lunar, solar, etc). My internet is 3G cellular in the woods, so it will take a few days to upload 20Gb somewhere, but, I will do that if there's somewhere to upload to. We just need a place that can host large files like this and I think a few of us would be happy to contribute some data for your needs.

 

Thoughts on hosting somewhere? I can start uploading immediately.

 

Very best,



#18 MvZ

MvZ

    Surveyor 1

  • *****
  • Posts: 1,884
  • Joined: 03 Apr 2007
  • Loc: The Netherlands

Posted 08 April 2020 - 09:00 AM

I don't know to be honest. Wetransfer doesn't allow such large uploads I think. I could open up a server, either a VPN or a 'local' private server in the Netherlands perhaps (lucky to have a 1 gig up and down there), but the chances of the connection being ok for a couple of days on both ends are maybe not that high.... And you wouldn't be able to do much else in the mean time. So let's wait a bit; I hope to have a new version for you to test within a week or so, so I'll send you a message when I have something to test. Especially for the surface alignment I'm working on now I believe it should scale very well. 

 

What kind of storage capacity are you using by the way, an SSD? The new version should be very close to being limited by the speed of the SSD; I have a file on a fast SSD here (2000MB/s), and am working through the data (40GB file) with pretty much that speed at the moment, taking about 20 seconds to read all the data (and yes, I'm making sure that the file is not buffered in memory, so this is the first read of a file and reflects the actual IO speed without RAM buffering).

 

The surface alignment is not completed yet, but depending on how many cores you have available, I believe this can be done very effectively and completely in parallel to reading the data. I don't think 20 seconds total time for surface alignment is impossible.


  • MalVeauX likes this

#19 aeroman4907

aeroman4907

    Surveyor 1

  • -----
  • Posts: 1,529
  • Joined: 23 Nov 2017
  • Loc: Castle Rock, Colorado

Posted 08 April 2020 - 09:07 AM

A bit of a heads up, but I'm working on a new version of AS!3 which should be significantly faster especially for very large recordings - and especially during the surface alignment stage.

 

I know it may be a lot to ask, but if you have time, could you make available one of those large recordings to me? I don't have test material for 'high number of pixels'-data unfortunately, so I'm working on a 40GB lunar recording where each frame is 'only' about 2 megapixels big, and it would be good to test with different images. 

That is extremely exciting Emil!  I would have gladly provided a large 100 GB file from imaging the moon with a 183C sensor, but I deleted all my videos recently.  I hope Tom Glenn can provide you one.  If you have problems obtaining one, please let me know and I will try and get one to you although I am not sure how to send you a file that big readily.



#20 lakeorion

lakeorion

    Surveyor 1

  • -----
  • Posts: 1,715
  • Joined: 03 Aug 2010
  • Loc: Lake Orion MI

Posted 08 April 2020 - 09:31 AM

I'm in that 90gb camp. QHY 183M 5000 frames lunar, with a wrinkle, mine is on a fork mount and I'm trying to use derotation.  Image stabilization uses one core and takes 12,000+ seconds.  Total processing has run up to 16 hours only to find out the derotation didn't.

 

I too have a state of the (7 years ago) art system.  AMD 3.5 gHz 6 core with 32 gb RAM and a 4 disk striped RAID.  Last kid gets out of college and I'll install a small mainframe in the basement, but in the meantime...

 

I realize it's a large computing task on a large set of data.  Any more speed is appreciated.  I'd love to be able to iterate more settings changes to optimize stacking but I don't have the patience for a multiple week DOE.

 

**Edit** I did some poking around and one of my issues is the random access time in my RAID array.  Next time I'll move the file to my gaming SSD.  I thought I tried that once and didn't see an appreciable improvement, guess I'll have to try again.


Edited by lakeorion, 08 April 2020 - 07:52 PM.


#21 MalVeauX

MalVeauX

    Cosmos

  • *****
  • topic starter
  • Posts: 9,156
  • Joined: 25 Feb 2016
  • Loc: Florida

Posted 08 April 2020 - 12:07 PM

I don't know to be honest. Wetransfer doesn't allow such large uploads I think. I could open up a server, either a VPN or a 'local' private server in the Netherlands perhaps (lucky to have a 1 gig up and down there), but the chances of the connection being ok for a couple of days on both ends are maybe not that high.... And you wouldn't be able to do much else in the mean time. So let's wait a bit; I hope to have a new version for you to test within a week or so, so I'll send you a message when I have something to test. Especially for the surface alignment I'm working on now I believe it should scale very well. 

 

What kind of storage capacity are you using by the way, an SSD? The new version should be very close to being limited by the speed of the SSD; I have a file on a fast SSD here (2000MB/s), and am working through the data (40GB file) with pretty much that speed at the moment, taking about 20 seconds to read all the data (and yes, I'm making sure that the file is not buffered in memory, so this is the first read of a file and reflects the actual IO speed without RAM buffering).

 

The surface alignment is not completed yet, but depending on how many cores you have available, I believe this can be done very effectively and completely in parallel to reading the data. I don't think 20 seconds total time for surface alignment is impossible.

Heya,

 

I'm on a 1TB SSD (Samsung EVO series) with 16Gb RAM. CPU is 4Ghz standard across 8 physical cores (FX8350). It would be super if it were possible to assign 4x cores for example to one instance of AS!3 and 4 cores to another instance of AS!3 somehow and do hammer & anvil work on multiple files to begin really taking advantage of the modern CPU's that are present with 12x and 16x cores common and now available up to 64x cores without having to have an enterprise level setup. Though 64x is overkill likely, it would take forever to set up that many instances, but I think 3 or 4 instances running on 4 cores would be pretty efficient if it were possible (without resorting to something like VMWare and virtual OS's running, etc).

 

Either way, I have lots of lunar & solar high resolution surface data to feed into your software when it's time.

 

Very best,



#22 Tom Glenn

Tom Glenn

    Gemini

  • -----
  • Posts: 3,444
  • Joined: 07 Feb 2018
  • Loc: San Diego, CA

Posted 08 April 2020 - 01:25 PM

Emil, this will be very interesting to test.  I have many 100GB videos, and even a few 160GB and 200GB videos (all with 20 megapixel frames......these usually require between 10,000 and 20,000 alignment points).  Your current version of AS!3 handles all of these files, but as I stated earlier, it will take up to 6 hours to do so.  Any improvements in speed would be welcome, and certainly anything that could get the total stacking under 1 hour would be a strong win in my book.  Good luck, and let me know when something is available to test.  I would be happy to share files with you, but I don't know the best way to share a 100GB file.  


Edited by Tom Glenn, 08 April 2020 - 01:26 PM.


#23 lakeorion

lakeorion

    Surveyor 1

  • -----
  • Posts: 1,715
  • Joined: 03 Aug 2010
  • Loc: Lake Orion MI

Posted 11 April 2020 - 09:01 AM

During the Sigma Clip portion of the stacking my CPU drops to ~33%, and disk usage hovers 0-22 mb/s.  A lot of RAM is being utilized, 21 gb.  I moved the video off of the RAID array and onto a SATA SSD and while the SSD reported response time is 1/10th of RAID array, it doesn't appear to make much difference in the long run.  Also Windows PID 4 is concurrently accessing the file, basically doubling the disk response time and load.  But I'm using the same model disk as in my acquisition laptop (WD SATA SSD 500 gb) and on acquisition it manages sustained writes of 200+ mb/s.  So I don't believe the hard drive is the bottleneck.

 

RAM speed?  I've long thought it was responsible for my slow Skyrim loading screens.  I have a similar issue there, low CPU / low disk utilization.

 

Untitled-1.jpg

This is in the second run, the 250 frame finished. 5000 frame lunar with a QHY183M.  Trying to see how different frame numbers affect my final output.


  • MalVeauX likes this

#24 lakeorion

lakeorion

    Surveyor 1

  • -----
  • Posts: 1,715
  • Joined: 03 Aug 2010
  • Loc: Lake Orion MI

Posted 11 April 2020 - 09:05 AM

And Win 7 Resource Monitor, of course the screen grab shows a spike to 72% and doesn't show System PID 4 accessing the file, but honest, they're usually the normal.

Untitled-2.jpg


Edited by lakeorion, 11 April 2020 - 09:05 AM.

  • MalVeauX likes this

#25 MalVeauX

MalVeauX

    Cosmos

  • *****
  • topic starter
  • Posts: 9,156
  • Joined: 25 Feb 2016
  • Loc: Florida

Posted 11 April 2020 - 11:26 AM

Interesting times! Thanks for sharing that. Man, hours upon hours for that single run. That's 5 times the frames from the same sensor as my own run on 1,000 frames. 20 minutes x 5 for me would be just over an hour and a half. But with your times, it's showing many hours to do 5 times the number of frames. Quite interesting. Makes me want to see what 5k frames would take my platform, though I'm not sure I want capture 5k frames at 15~20FPS, yikes.

 

Very best,




CNers have asked about a donation box for Cloudy Nights over the years, so here you go. Donation is not required by any means, so please enjoy your stay.


Recent Topics






Cloudy Nights LLC
Cloudy Nights Sponsor: Astronomics