Jump to content

  •  

CNers have asked about a donation box for Cloudy Nights over the years, so here you go. Donation is not required by any means, so please enjoy your stay.

Photo

Thoughts On PixInsight Use of RAMDisk

  • Please log in to reply
48 replies to this topic

#1 jdupton

jdupton

    Fly Me to the Moon

  • *****
  • topic starter
  • Posts: 5,281
  • Joined: 21 Nov 2010
  • Loc: Central Texas, USA

Posted 27 December 2020 - 05:36 PM

Fellow PixInsight Enthusiasts,

 

   Warning: This is another of my very long posts on various subjects. I like to give enough background information and details on a topic like this so that it can be more easily followed. The net summary of this will be to say "Running a RAMDisk for PixInsight can give you much improved Benchmark results but can actually be detrimental to overall performance when used in some configurations." Read on to find out when to use a RAMDisk and how to use one to increase PixInsight performance under selected conditions.

 

   This post was prompted by several comments and questions in another thread here on Cloudy Nights -- namely "PixInsight Benchmark - How low can you go?". The comments I make here apply to several other threads also. I have made a number of the points in this post in other threads before. I wanted to consolidate all of my thoughts regarding the use of RAMDisk in one place rather than derail other threads with side discussions. I will give brief answers and then post pointers in the other threads.

 

   The commentary below is my opinion and inferences based on testing that anyone can easily duplicate. I have no internal knowledge of PixInsight process operation. All of the following is based on careful observation which is verifiable.

 

   First, I will address the concepts of PixInsight Swap Directories and Windows (or other) OS Swap Files. I often see confusion on what each of these are and how they be related. (Short answer -- they are not directly related at all.) They serve different functions that are independently important in PixInsight processes.

 

 

PixInsight Swap Folders

   PixInsight Swap Directory entries are file system folders that can be used by PixInsight to store its "Swap Files". In PixInsight, these swap files are stored in the swap folders you specify in the PI Preferences (Process | Global | Preferences; Directories and Network). These swap files are part of the mechanism to handle the UnDo and ReDo operations while working with images. When you make a change to an image in the PI Workspace, a backup image file is saved to the Swap Folders. This allows PI to very quickly UnDo the operation by just reloading the saved image file. The same mechanism is used for a ReDo operation. The next version of the image is simply reloaded from the files saved in the Swap Folders.

 

   PixInsight writes these swap files for each change in one or more images. The image is first divided into "parts" governed by the number of directories specified in preferences. For example, if you have an image loaded that is 128 MB in size and make a change to it, the file size of the image would be divided into 4 parts if you have 4 Swap Folders defined. Each of those parts is then written into the Swap Folders. If you have Swap Folders defined on more than one physical disk location, then the parts are striped across the devices (making the writing operation faster).

 

   This UnDo / ReDo operation is the only use I have observed for the Swap Folders. If the current PI Process creates a * completely new* image frame during an operation, nothing is written to the Swap Folders. Tasks such as ImageCalibration, DeBayer, StarAlignment, ImageIntegration, and Cosmetic Correction all create new image files. These processes do not create new files in the Swap Folders. The Swap Folders are not used at all for these operations. On the other hand, processes that change an existing image do create new swap files in the Swap Folders. Processes in this group are things like HistogramTransformation, CurvesTransformation, Resample, Rescale, ColorCalibration, and lots of others. These "change an image" operations do create a new set of swap file entries when they execute. 

 

   If you look at the PixInsight Benchmark Script, you will find that only executes processes which modify the ininitial image. Specifically, the Benchmark performs the following operations on the input image: BackgroundNeutralization, ColorCalibration, DynamicCrop, MultiscaleMedianTransform, HistogramTransformation, HDRMultiscaleTransform, CurvesTransformation, CurvesTransformation (yes, twice but with different paraemters), SCNR, Resample, Resample, Resample, Resample (yes, 4 different resamples), UnsharpMask, and finally another Resample (yes, yet again). As you can see, the Benchmark does not invoke any process that creates new images along the way. So, the Benchmark Script makes heavy use of the Swap Folder structure since every operation it performs creates a new entry for a possible UnDo operation.

 

   How do you use PixInsight? Which processes take the most time when you are processing your images. I'd bet that like me, much of the time consuming activities are Calibration, Cosmetic Correction, Sub-Frame Selection and Weighting, DeBayering, Local Normalization, and Integration and Drizzle Integration. These operations make no use of the Swap Folders since there is nothing to UnDo. (You cannot "UnDo" an image integration. You can only delete the new integration image frame.) Since the Swap Folders are not even used in these very lengthy operations, using a RAMDisk for those swap files is no help at all. However as will be seen in the next few paragraphs, having a large RAMDisk in place during those operations will make them take these new image creation processes even longer to process.

 

   For those reasons, running a large RAMDisk can greatly boost your PixInsight Benchmark score but not help at all in the specific processes that take the most time. In fact, using RAM to build a RAMDisk for Swap Folders can lengthen the time consumed in those operations on large files sets.

 

 

OS Swap Files

   Now, let's consider the Windows (or other) OS Swap Files. These files are managed solely by the OS. They can be thought of as a huge extension of your RAM memory pool. In fact, the OS uses the OS Swap Files to implement Virtual memory. Each computer system has a certain amount of physical DRAM which is used to execute programs. All modern OS implementations use disk storage to extend the amount of (Virtual) RAM you have to work with. Many programs (and background processes and services) can be running "at the same time". Just open up the Windows Task Manager to see what all is running behind the scenes.

 

   All of those programs, processes, and services allocate and use RAM without regard to anything else that might be running. You can have running processes using more RAM than you physically have installed. The OS takes care of all that for you. It creates a very large pool of virtual RAM on one or more of your disk drives. Behind the scenes, it constantly moves contents back and forth between physical RAM and the virtual RAM image stored on disk. You are usually not aware of any of this. Every running program gets its RAM image loaded into physical RAM when it is actively running. If it goes idle, the OS can push it out to disk to make room for another program which starts running.

 

   This becomes important to our use of PixInsight when doing things like integrating a large number of image frames. PI allocates RAM to build up the stacks of each pixel position in the frames being integrated. These needs can be huge. As an example, for my ASI294MC-Pro OSC camera, each deBayered is 4144 x 2822 pixels in size. Each pixel position contains three channel layers and each channel's pixel is represented by a 32 bit (4 byte) floating point number. Each image is then 133.83 MB is size. When I integrate multiple nights of images, I have had as many as 750 frames that need to be in RAM. This amounts to 100,373 MB of RAM needed. Since I only have 32 GB (32,768 MB) of RAM installed, Windows is paging much of this virtual RAM to disk while PI does its integration work.

 

   So why does having a RAMDisk slow down PI's integrations? Lets' say I decided to "speed things up" and define a 16 GB RAMDisk for PI's Swap Folders. That RAM is now dedicated to something else other than being used as RAM or program space. When I start my large image integration run, the OS runs out of RAM at 16 GB rather than 32 GB. Thus it starts paging (swapping) RAM to Virtual Memory (on disk) much sooner than before. Since Disk is generally slower than RAM, that means there is a performance loss for my Image Integration operation by having the RAMDisk in place. The RAMDisk isn't even being used but hinders the OS use of Virtual memory paging.

 

   It is this aspect of RAMDisk usage that can cause issues for running "real life" operations in PixInsight. RAMDisk use is a boon to running the PI Benchmark, but if your use of PI includes any processes that spend more time creating new files as they run, then a RAM Disk is working against you.

 

 

Times A RAMDisk Can Help Speed Up PixInsight

 

   Obviously, as discussed above, the PI Benchmark is a clear place that RAMDisk helps (assuming you point the PI Swap Folders at the RAMDisk). Any operations that use only processes that are making changes to an image frame are also benefited by using a RAMDisk. Often though, these operations will be mostly compute bound and the RAM Disk actually isn't helping as much as you might think.

 

   RAMDisk can help if you do more than use it for PI's Swap Folders. Assuming you have lots and lots of physical RAM, you could use the RAMDisk for temporary frames that you do not intend to save long term. For instance, if you never save your calibrated frames after integrating them, you could point the Output Directory of the ImageCalibration process to your RAMDisk (Assuming it is big enough to hold all copies of the calibrated frames. (In other words, if you camera files for a session total less that 8 GB, then you could save all the calibrated frames to an 8 GB RAMDisk. In this case, the RAMDisk would speed up the Image Calibration process nicely since writing to the RAMDisk would be faster than writing to one of your normal disks in the system. This methodology can be extended to other new frame creation processes so long as you have space for the file on the RAMDisk. Since the RAMDisk may be volatile (depending on implementation), you should always copy your final files from RAMDisk to your normal storage media when complete processing.

 

   Pointing some of your output files to a RAMDisk will almost always speed up the running process (so long you RAMDisk doesn't run out of space). The amount of the speed up will be proportional to the storage speed of where you traditionally write the files. RAMDisk Will be much faster if your system only has HDD space for writing the files. Similarly, even if you have an SSD storage device in your system, RAMDisk may be faster. It depends on the SSD in question. A SATA SSD will usually be slower than other types. The next fastest will be a PCIe (AHCI) drive followed by an NVMe drive. For PCIe/AHCI and NVMe drives, PCIe 4.0 will be faster than PCIe3.0 or lower. My own experience is that PCIe 4.0 capable NVMe drives can be as fast as a RAMDisk drive once software management overhead is factored in. You can always use a Drive Benchmark program like Crystal Disk Mark to see the read and write speeds of the drives available to PixInsight.

 

   Most of the above can be verified by duplicating a few observations. You can test when PixInsight uses Swap folders and how it slices up the data for an UnDo / ReDo operation. You can use the Windows Task Manager combined with watching the Resource Monitor to watch your virtual memory grow beyond your physical RAM during large PI Integration operations. (Just keep watch on the "Committed" memory to see how the Virtual Memory is allocated as needed during an operation. The Memory Resource Monitor shows the Page Fault rate when the OS starts to frantically move Virtual Memory back and forth to physical memory. Watching these as PI chugs along on large processes will give you some insights as to how important it is to have ample physical RAM if you process lots frames to build an image.

 

   OK, I've rambled around enough for this post. I hope some find it useful. If I have made any errors, please feel free to point them out as I don't want to propagate any errors.

 

 

John


  • dswtan, LauraMS, iwannabswiss and 5 others like this

#2 bobzeq25

bobzeq25

    ISS

  • *****
  • Posts: 32,927
  • Joined: 27 Oct 2014

Posted 27 December 2020 - 06:25 PM

Excellent discussion.  I'll add this, it's somewhat covered above, but this is more detailed in one regard.

 

I believe that, (as I do) placing PI's swap files on a fast SSD can be the best of all worlds.  You get excellent swap file performance (my Benchmark swap score with a Samsung 970 PRO is 42000), and I retain all of the 64Gb of RAM for other purposes.


Edited by bobzeq25, 27 December 2020 - 06:26 PM.

  • jdupton and thekubiaks like this

#3 jdupton

jdupton

    Fly Me to the Moon

  • *****
  • topic starter
  • Posts: 5,281
  • Joined: 21 Nov 2010
  • Loc: Central Texas, USA

Posted 27 December 2020 - 07:11 PM

Bob,

 

   Yes, that matches my observations also. I found that I could not improve my overall Benchmark score when using two PCIe 4.0 NVMe drives (with swap folders) when compared to two different RAMDisk programs I tried. What I noticed is that the CPU portion of the benchmark score went down while the Swap Performance and score went up. The net result, though, was that the Total score went down some. I attributed that to the software overhead needed to implement and manage the RAMDisk. They are completely software based, after all. Careful configuration and very fast SSDs while reserving physical RAM, as you point out, gave me the best results for both real world and benchmarking.

 

 

John


  • bobzeq25 and thekubiaks like this

#4 endless-sky

endless-sky

    Apollo

  • -----
  • Posts: 1,014
  • Joined: 24 May 2020
  • Loc: Padova, Italy

Posted 28 December 2020 - 04:25 AM

Thank you for your thorough explanation. I stand corrected, as I was using the benchmark and the RAM disk always gave me better scores than the normal SSD that I have. It makes you wonder, then, why the benchmark only put importance to "Dos" and "UnDos" processes, when, as you said, the biggest share of the time that it takes to post-process an image is actually the pre-processing... confused1.gif lol.gif

 

Maybe it would make more sense to have a benchmark that does also take into account writing/reading intermediate output files. This would certainly help clear up things amongst the users and help dispelling some myths.

 

However, in answer to your question in the other thread, I did notice a significant improvement also on the speed of pre-processing and the integration process, while using Linux instead of Windows (30-50% faster under Linux). I thought that, since the main difference in benchmark scores from the two was the swap speed (Linux gave me more than two times the MiB/s that I got under Windows), the increase in speed for these processes (calibrating, debayering, weighting, registration, integration) was also due to the swap speed, and, therefore, the better use of the RAM disk of Linux vs Windows.

 

I actually have 80 GB of RAM, of which I dedicated 20 GB to the RAM disk. In all the integrations I have done so far, the number of files wasn't high enough to make me miss the 20 GB. But, yes, if it doesn't make any sense to have a RAM disk during the pre-processing operations, I might as well put those 20 GB back into actual RAM, and create the swap folders on a different drive. I just bought a 500 GB NVMe (Crucial P5, which gave me some pretty impressive scores with Crystal Disk Mark - still nowhere near the RAM disk, though), so I could use it also for the swap folders. Its main purpose, obviously, was to serve as output/input for all the pre-processing processes.

 

I did ponder using the RAM disk as an output folder, but that idea was quickly abandoned. My images take up way too much room, and I would usually need 30-40 GB of space to store all the files needed from ImageCalibration to DrizzleIntegration. That's almost half of my current RAM.



#5 darkstar3d

darkstar3d

    Messenger

  • *****
  • Posts: 497
  • Joined: 11 Oct 2013
  • Loc: Lake Worth, FL

Posted 28 December 2020 - 03:12 PM

My PI is about 50% slower if I take 16GB and make it a ram disk. I have 48 total and I also had the memory hog running. Edge Chrome


Sent from my iPhone using Tapatalk

#6 LuxTerra

LuxTerra

    Mariner 2

  • -----
  • Posts: 276
  • Joined: 29 Aug 2020

Posted 28 December 2020 - 05:08 PM

Very good post, I appreciate the description of how PI is using its internal swap mechanism for undo/redo. Did a good job summarizing technical details. Nothings as slow as hitting OS swap. :)

 

Question for you.

 

Given how PI is working with its internal swap, I would think that resource usages would look something like this:

 

1) CPU/RAM limited computation with little to no disk activity (assuming enough RAM for the task)

2) Upon completion of the processing step, high disk activity writing the results to internal swap, CPU usage being exclusively used for this IO; i.e. processing is done in step 1.

 

Is this how you observe it working? That would be in contrast to usage that processed a bit, read/write swap a bit, process a bit, back and forth pattern. Thanks.



#7 jdupton

jdupton

    Fly Me to the Moon

  • *****
  • topic starter
  • Posts: 5,281
  • Joined: 21 Nov 2010
  • Loc: Central Texas, USA

Posted 28 December 2020 - 06:17 PM

LuxTerra,

 

   I was not sure of the timing so ran a quick test to find the answers to your questions. I was pretty sure the swap files were not built on the fly during processing. That turned out to be right. Here is what I tried and observed:

 

  • First, I removed all but one Swap Folder from the Preferences.
     
  • I made sure the Swap Folder was empty.
     
  • I opened an integrated linear image.
     
  • I then set up a long operation that would modify that image. I wanted it to be long enough so I could tell if the Swap file was created before, during, or after the operation completed. I chose to use Masked Stretch with 200 iterations.
     
  • I placed PI and the Swap Folder view (via the Windows File Explorer) where I could watch them simultaneously.
     
  • I started the Masked Stretch process.
     
  • The UnDo file appeared immediately in the Swap Folder indicating that PI wrote out the original state of the image before beginning operation.
     
  • After the Masked Stretch finished, there were no other changes in the Swap Folder. That made me wonder how the ReDo swap file might come into being.
     
  • I then UnDid the changes made by Masked Stretch and immediately a second Swap Folder entry was added. So The ReDo portion of the UnDo / ReDo is not saved into the Swap Folder until you actually request an UnDo.

   Based on that, I think the activities for a process that changes an image first writes the original state of the image to Swap Folder and then does all the computation work. Only if you UnDo that change is the ReDo file version written to the Swap Folder.

 

   Regarding RAM usage, I don't think that really comes into play for most processes. The only one that might be an exception (that I can think of) is a Resample process where you greatly increase the size of an image. Otherwise, I might assume that PI internally uses only one additional buffer while processing an image such that it at most doubles RAM usage while actively carrying out the transformation requested.

 

 

John


Edited by jdupton, 28 December 2020 - 10:00 PM.


#8 LuxTerra

LuxTerra

    Mariner 2

  • -----
  • Posts: 276
  • Joined: 29 Aug 2020

Posted 28 December 2020 - 08:19 PM

LuxTerra,

 

   I was not sure of the timing so ran a quick test to find the answers to your questions. I was pretty sure the swap files were not built on the fly during processing. That turned out to be right. Here is what I tried and observed:

 

  • First, I removed all but one Swap Folder from the Preferences.
     
  • I made sure the Swap Folder was empty.
     
  • I opened an integrated linear image.
     
  • I then set up a long operation that would modify that image. I wanted it to be long enough so I could tell if the Swap file was created before, during, or after the operation completed. I chose to use Stretch with 200 iterations.
     
  • I placed PI and the Swap Folder view (via the Windows File Explorer) where I could watch them simultaneously.
     
  • I started the Masked Stretch process.
     
  • The UnDo file appeared immediately in the Swap Folder indicating that PI wrote out the original state of the image before beginning operation.
     
  • After the Masked Stretch finished, there were no other changes in the Swap Folder. That made me wonder how the ReDo swap file might come into being.
     
  • I then UnDid the changes made by Masked Stretch and immediately a second Swap Folder entry was added. So The ReDo portion of the UnDo / ReDo is not saved into the Swap Folder until you actually request an UnDo.

   Based on that, I think the activities for a process that changes an image first writes the original state of the image to Swap Folder and then does all the computation work. Only if you UnDo that change is the ReDo file version written to the Swap Folder.

 

   Regarding RAM usage, I don't think that really comes into play for most processes. The only one that might be an exception (that I can think of) is a Resample process where you greatly increase the size of an image. Otherwise, I might assume that PI internally uses only one additional buffer while processing an image such that it at most doubles RAM usage while actively carrying out the transformation requested.

 

 

John

Really appreciate you taking the time to run the test and provide your expertise on the matter. I do wish makers of software would do a better job (not picking PI here, it's a general issue) of documenting how their software functions. Benchmarks attempt to do that, but as you pointed out, often they are inadequate at estimating real world usage. Some thoughts...

 

1) Writing the files at the start of the step seems wasteful of user time. If the time to complete a step is the time to write the swap + the time to complete the task, then that is slower than the following: compete the previous step and allow the user to start interacting, while in the background the software saves the swap files. In this way the swap step is hidden from the user. However, that only makes sense if the data is very large and given your description, it's not; the only steps which use PI swap are single frame steps, so at most we're talking a few hundred MB maybe a GB? Any fast SSD is likely just fine, as is the software process. I.e. optimizing just adds complexity and likely provides very little real world benefit, outside of a benchmark.

 

2) If the PI swap is as you describe, then outside of using a HDD, it's almost a pointless part of the benchmark. In your example, a single ~130MB file is going to be written/read sequentially out to any SSD so quickly that it basically doesn't matter. Unless I misunderstood something, the differences here on an SSD make almost no difference; even cheap SSDs are capable of writing/reading 10x per second. Even if you had the amazing 151MP IMX sensor, that would only approach 2GB. Sure, that's a lot, but the difference of a top end SSD or a modest one is half second or so. Now on a HDD, that could be >10sec. Is this whole parallel IO, multiple SSD/RAM disk really just a PI benchmark mirage?


Edited by LuxTerra, 28 December 2020 - 08:23 PM.


#9 jdupton

jdupton

    Fly Me to the Moon

  • *****
  • topic starter
  • Posts: 5,281
  • Joined: 21 Nov 2010
  • Loc: Central Texas, USA

Posted 28 December 2020 - 10:28 PM

LuxTerra,

 

   I don't think it is as bad as you may think. PI writing out the initial state of the image to the Swap Folders doesn't take any appreciable time, even for short quick computations. Since all modern systems that are capable of running PI are multi-threaded systems, PI can send the Swap Folder write to another thread and then immediately begin the requested computational task. The two are essentially done in parallel. I chose to use an extra long processing task (about 15 seconds) only to be able to visually see when the writes took place. On a shorted task like a Histogram Transformation, the two probably finish at the same time in only a few milliseconds.

 

 

 

   This does remind me of something I meant to include in the first post of this thread. Bob (bobzeq25) mentioned in that other thread in Post #91 that having to define multiple Swap Folder entries in the Preferences seemed odd:

 

 

 

They do improve things.  It's one of the more annoying things about PI, how you have to specify a number of identical locations for swap.

 

But anyone who uses PI well is used to being annoyed by it.  <smile>  It falls into the category "If everyone likes this person's artistic creations, they're probably not very interesting".  I'm sure they had some purpose in mind, even if I don't see it, or can't understand what it possibly could be.

 

In general you want as many (identical) swap locations as you have CPU threads.  CPU knows what the latter are, you'd think they'd have made that the default for swap, instead of 1.   So you have to go through the tedious process of entering them.  Fortunately you only have to do it once.

 

   I think the reason PI does it this way is that most storage devices (HDD, SATA SSD, PCIe SSD, and NVMe SSD) have multiple parallel command queues. That is, they can accept multiple commands for writing data to the device. PI takes advantage of this by dividing up the saved UnDo image file into as many parts as you have specified folders. It can then send smaller write commands for each chunk to the storage device to save the data. An added benefit of chopping up the swap files is that you optimize the file size of each chunk for the device. You may have noticed when running Disk Benchmark programs that storage speed can be dependent on the total size of the file being written.

 

   If all folders reside on one device, that's OK since it can accept all these multiple (smaller) file write requests without having to wait for any one write request to finish. (Many storage devices can even internally reorder these write request commands to optimize speed.) If the PI Swap Folders reside on different storage devices, so much the better. Now you can spread the writes across multiple command queues on multiple devices for even better speed. Since this is all being done in a thread separate from the computational changes the user requested, they essentially execute in parallel with little overhead.

 

   With the latest version of PI, you can now have all of these write requests sent to storage using separate threads rather than a single I/O thread. That can speed things up even farther.

 

 

John



#10 LuxTerra

LuxTerra

    Mariner 2

  • -----
  • Posts: 276
  • Joined: 29 Aug 2020

Posted 28 December 2020 - 11:21 PM

LuxTerra,

 

   I don't think it is as bad as you may think. PI writing out the initial state of the image to the Swap Folders doesn't take any appreciable time, even for short quick computations. Since all modern systems that are capable of running PI are multi-threaded systems, PI can send the Swap Folder write to another thread and then immediately begin the requested computational task. The two are essentially done in parallel. I chose to use an extra long processing task (about 15 seconds) only to be able to visually see when the writes took place. On a shorted task like a Histogram Transformation, the two probably finish at the same time in only a few milliseconds.

 

 

 

   This does remind me of something I meant to include in the first post of this thread. Bob (bobzeq25) mentioned in that other thread in Post #91 that having to define multiple Swap Folder entries in the Preferences seemed odd:

 

 

   I think the reason PI does it this way is that most storage devices (HDD, SATA SSD, PCIe SSD, and NVMe SSD) have multiple parallel command queues. That is, they can accept multiple commands for writing data to the device. PI takes advantage of this by dividing up the saved UnDo image file into as many parts as you have specified folders. It can then send smaller write commands for each chunk to the storage device to save the data. An added benefit of chopping up the swap files is that you optimize the file size of each chunk for the device. You may have noticed when running Disk Benchmark programs that storage speed can be dependent on the total size of the file being written.

 

   If all folders reside on one device, that's OK since it can accept all these multiple (smaller) file write requests without having to wait for any one write request to finish. (Many storage devices can even internally reorder these write request commands to optimize speed.) If the PI Swap Folders reside on different storage devices, so much the better. Now you can spread the writes across multiple command queues on multiple devices for even better speed. Since this is all being done in a thread separate from the computational changes the user requested, they essentially execute in parallel with little overhead.

 

   With the latest version of PI, you can now have all of these write requests sent to storage using separate threads rather than a single I/O thread. That can speed things up even farther.

 

 

John

That's sort of what I was getting at. Based on your description and test, the amount of data is not a lot for a modern SSD. Any such optimization is likely measurable in a benchmark, but practically meaningless as long as we're talking about reasonably new systems.

 

Your description of the storage subsystem queues is spot on. The high queue depth performance is used to sell SSDs, but the reality is that most consumer tasks stay at queue depths <4 and often are mixed random accesses. Only servers typically see the QD32 numbers used to sell product and rarely purely sequential either; these are used because it keeps the drive 100% loaded with the easiest access (sequential) and hides any latency effects. A good shorthand metric if you want to know how fast an SSD "feels" in normal use is to look at random QD1@4k results.

 

As for PI, I think you're right. In practical terms, you want to optimize the flow of data to the SSD, but even then if you have a modern SSD you may not even notice. The question then would be, what are the potential optimization points? Now, all of this is conjecture based on computer knowledge as I haven't test PI myself, but intend to change that in the near future.

 

  • CPU cores are one thing. To avoid causing too much dirty cache IO flushes, it's likely that you just want one thread per physical core, not logical. Although, that would be a good test. With the massive caches on Zen3 that may not be valid.
  • OS software stacks matter and for PI, Linux is a great choice.
  • Lowest latency connection possible. This would be NVMe/PCIe lanes directly off the CPU, operating at their highest spec, and not multiplexed with anything else. No RAID or other latency inducing software stacks. e.g. RAID-0 may improve sequential bandwidth, but is hurts latency.
  • SSDs have a limited number of IO queues, so you definitely don't want to exceed that. IIRC, most consumer drives these days support 32 queues; enterprise SSDs are another story.
  • Many SSDs have DRAM on board. In many cases, for small amounts of data, what you're really measuring is the DRAMS ability to absorb and cache the writes, not the write performance of the actual FLASH*. The high end drives can have DRAM >1GB, which greatly exceeds your example data. While other data also gets stored in this DRAM, if we're hyper conservative and say that half of the DRAM is used for data, any image size smaller than half our cache should be equally as fast.
  • If and only if the image data being saved is larger than our SSDs cache, then would something like an ultra-low latency storage medium like Optane even matter.
  • SSDs typically have either the legacy 512B or the advanced 4kB block sizes, but more critically would be the usual (although not often specified) 1MB minimum erasure size. Basically, as the drive gets full SSDs take a hit because of the read-modify-write cycle and anything <1MB can be inefficient. For PI, based on what you've described and measured, would only really matter if our writes exceeded the SSD DRAM sizes, but let's calculate it anyways. If we can use up to 32 SSD queues optimally and we want each one to write at least 1MB chunks, then that would be a minimum image size of 32MB. If we assume 32bit pixels, that would be ~8MP mono camera or ~2.7MP OSC. Shouldn't be a limit. Even easier would be just to make sure that each thread is responsible for at least a data chunk large enough to hit peak read/write bandwidths in a tool like ATTO.

 

So, that's a lot of conjecture. grin.gif However, that's just thinking about the data flow through the computer and what may/may not cause a limitation. In practice, as long as you have a modern, fast SSD, you probably can't tell the difference outside of benchmarks until you get over 100MP OSC.

 

Edit: * this assumes asynchronous writes, which PI may/may not allow. If it forces a flush/synchronous write cycle, then low latency storage medium like Optane may be super valuable in how responsive/fast the system is.


Edited by LuxTerra, 28 December 2020 - 11:26 PM.

  • dpastern likes this

#11 jdupton

jdupton

    Fly Me to the Moon

  • *****
  • topic starter
  • Posts: 5,281
  • Joined: 21 Nov 2010
  • Loc: Central Texas, USA

Posted 28 December 2020 - 11:55 PM

LuxTerra,

 

   I don't know enough about PI internal operations to answer many of your points. On your first point regarding cores vs threads, PI does a pretty good job of utilizing the hyper-threaded cores in CPUs that support it. On my 16 core / 32 thread Ryzen system, PI does a very good job of keeping each virtual CPU running at very close to 100% utilization most of the time on some types of operations. It gives me the impression that a lot of thought and work went in the parallelizing of the heaviest PI processes.

 

 

John


  • dpastern and bobzeq25 like this

#12 deonb

deonb

    Ranger 4

  • *****
  • Posts: 334
  • Joined: 16 Jul 2020
  • Loc: WA

Posted 29 December 2020 - 12:20 AM

Edit: * this assumes asynchronous writes, which PI may/may not allow. If it forces a flush/synchronous write cycle, then low latency storage medium like Optane may be super valuable in how responsive/fast the system is.

You have to REALLY want to turn off asynchronous writes on Windows these days, including linking with an off-by-default special library in C/C++. Otherwise Windows will simply ignore the flush calls. PixInsight has no reason to do this.

I assume the developers of PixInsight also know to pass FILE_ATTRIBUTE_TEMPORARY to temporary files that they create in the process. Maybe there should be more control though to allow users to flag which types of files will always be temporary and doesn't ever need written to disk?

(If a program passes in FILE_ATTRIBUTE_TEMPORARY it doesn't write any of the file to disk unless you run out of physical memory. So it provides the benefits of a RAMDisk without the drawback of needing a permanently reserved block of RAM).

Edited by deonb, 29 December 2020 - 12:21 AM.


#13 LuxTerra

LuxTerra

    Mariner 2

  • -----
  • Posts: 276
  • Joined: 29 Aug 2020

Posted 29 December 2020 - 11:04 AM

LuxTerra,

I don't know enough about PI internal operations to answer many of your points. On your first point regarding cores vs threads, PI does a pretty good job of utilizing the hyper-threaded cores in CPUs that support it. On my 16 core / 32 thread Ryzen system, PI does a very good job of keeping each virtual CPU running at very close to 100% utilization most of the time on some types of operations. It gives me the impression that a lot of thought and work went in the parallelizing of the heaviest PI processes.


John


Just for clarity, SMT for processing was outside the context of the comment. It was only in reference to PI swap IO and would have to be tested as there’s too many assumptions. I was noting it could be tested as SMT is known to hurt other similar tasks.

#14 LuxTerra

LuxTerra

    Mariner 2

  • -----
  • Posts: 276
  • Joined: 29 Aug 2020

Posted 29 December 2020 - 11:09 AM

You have to REALLY want to turn off asynchronous writes on Windows these days, including linking with an off-by-default special library in C/C++. Otherwise Windows will simply ignore the flush calls. PixInsight has no reason to do this.

I assume the developers of PixInsight also know to pass FILE_ATTRIBUTE_TEMPORARY to temporary files that they create in the process. Maybe there should be more control though to allow users to flag which types of files will always be temporary and doesn't ever need written to disk?

(If a program passes in FILE_ATTRIBUTE_TEMPORARY it doesn't write any of the file to disk unless you run out of physical memory. So it provides the benefits of a RAMDisk without the drawback of needing a permanently reserved block of RAM).

I don’t know why PI would do that given the data, but they could. Especially on Linux. Sync writes are default configuration for things like NFS (just an example) on Linux because of different priorities. It was just an assumption I was documenting; i.e. that they would choose high performance async defaults.

Edited by LuxTerra, 29 December 2020 - 11:10 AM.


#15 endless-sky

endless-sky

    Apollo

  • -----
  • Posts: 1,014
  • Joined: 24 May 2020
  • Loc: Padova, Italy

Posted 10 January 2021 - 10:21 AM

So, clouds plus COVID restrictions = too much time in my hands.

 

I did a little experiment.

 

When I processed the 160 frames that composed my California Nebula, I took note of the times it took to run all the processes from image calibration to Bayer drizzle integration. This was done with PixInsight running on Lubuntu 20.10, CPU @ 4343 MHz, RAM @ 2626 MHz (80 GB, 20 of which dedicated to RAM disk). Input and output was done from/to the same SSD drive (Samsung 850 Pro 1 TB).

 

Not too long ago I bought myself a Christmas present, a Crucial P5 500GB CT500P5SSD8.

 

I wiped Linux from the partition of the main SSD (shared with Windows) and proceeded to reinstall it on the NVMe.

 

Then I reprocessed all the files, doing exactly the same steps and recording the time it took for each process. So, still Lubuntu 20.10, CPU @ 4343 MHz, RAM @ 2626 MHz, no RAM disk (since in the preprocessing there's no UNDOs/REDOs, who needs a RAM disk anyway, right?!). This time, input and output was done from/to the same NVMe.

 

Results are pretty stunning. Some processes took 2 to 8 times shorter to finish. Overall, I saved 30 minutes out of two hours and a half, or about 1/5 of the time. And that's just for reading/writing files (which I thought PixInsight was doing "in the background" while the threads of the CPU were crunching the next sets of images).

 

ImageIntegration and DrizzleIntegration show almost no variation, as expected, since there are only the initial files to be read, but nothing written to the hard-drive while the processes run.

 

Here's an image with the comparison. For clarity, WeightedBatchPreprocessor was only done to create a master flat out of 26 flats, calibrate the images with a master bias and the resulting master flat, apply CosmeticCorrection and Debayer the images.

 

SSD-NVMe Comparison.jpg

 

Times are directly from the times reported by the Process Console at the end of each process, so no stop watch (and human error) was used in the making of this test... lol.gif

 

Takeaways from this test: if only it wouldn't take "an eternity" to map n folders to n CPU threads in order for PixInsight to swap to a RAM disk, it would possibly be worth it to:

 

- not have a RAM disk during the pre-processing phase

- have a RAM disk only for the post-processing phase (after the integration is complete), to take advantage of faster UNDOs/REDOs


  • jdupton and thekubiaks like this

#16 santafe retiree

santafe retiree

    Apollo

  • *****
  • Posts: 1,113
  • Joined: 23 Aug 2014
  • Loc: Santa Fe, NM

Posted 10 January 2021 - 03:30 PM

John - Outstanding deep dive into the actual mechanics of PI SWAP use with excellent follow up discussion!

 

I am in the process of spec'ing out a PI rig and have settled on a Ryzen 3900X with 32GB RAM and a 1 TB PCIe NVMe primary drive for OS/App.  I was going for a second NVMe drive for SWAP but your statement:

 

"If you have Swap Folders defined on more than one physical disk location, then the parts are striped across the devices (making the writing operation faster)."

 

makes me wonder if I should be looking at two additional NVMe drives for SWAP drives instead of one.  Of course the easy answer is two is always better than one but that means a MoBo that accommodates at least 3 NVMe drives thereby driving up the cost.

 

And then there is the question of SWAP drive size.  I am using an ASI294MM in bin 1 mode.  The raw subs are 90 Mb in size so a 500 GB drive will only get me 4 effective SWAP folders 

 

Thoughts anyone?

 

Cheers,

 

Tom


  • dpastern likes this

#17 deonb

deonb

    Ranger 4

  • *****
  • Posts: 334
  • Joined: 16 Jul 2020
  • Loc: WA

Posted 10 January 2021 - 09:34 PM

John - Outstanding deep dive into the actual mechanics of PI SWAP use with excellent follow up discussion!
 
I am in the process of spec'ing out a PI rig and have settled on a Ryzen 3900X with 32GB RAM and a 1 TB PCIe NVMe primary drive for OS/App.  I was going for a second NVMe drive for SWAP but your statement:
 
"If you have Swap Folders defined on more than one physical disk location, then the parts are striped across the devices (making the writing operation faster)."
 
makes me wonder if I should be looking at two additional NVMe drives for SWAP drives instead of one.  Of course the easy answer is two is always better than one but that means a MoBo that accommodates at least 3 NVMe drives thereby driving up the cost.
 
And then there is the question of SWAP drive size.  I am using an ASI294MM in bin 1 mode.  The raw subs are 90 Mb in size so a 500 GB drive will only get me 4 effective SWAP folders 
 
Thoughts anyone?
 
Cheers,
 
Tom


This is a trickier question. With some CPU's you're very limited by PCIe-lanes.

So for example take a Ryzen 3900X or 5900X on an Asus ROG Crosshair motherboard. Awesome combination, but the Ryzen has only 24 PCIe lanes total.

The Asus Crosshair assigns 16 of those lanes to the x16 PCIe slots (for GPU), and 4 to the one NVME slot. It then shares 4 lanes to the X570 chipset that divides it up between the other PCI, on-board SATA and NVME slots.

If you put in an 3rd NVME-4 drive, you'll be limited to 8GB/sec by the 4 PCIe4 lanes. Something like a Samsung 980 1TB drive can read at 7GB/sec, so you're effectively at the limit with just 1 drive on the chipset lanes (2nd overall drive). You can squeeze a tiny bit more performance with a second drive, but not a lot.

A ThreadRipper on the other hand has 88 lanes, so whole different story.

So always research the specific CPU & motherboard combination to make sure you know how the lanes are assigned. The chipsets allows for motherboard-manufacturer choice so it's not just a CPU + Chipset thing, but a CPU+Chipset+motherboard thing.

Also keep in mind SSD drives are effectively internally striped. So they tend to get faster as they get bigger (to a point). So don't just blindly change a single 1 TB for 2x 500 GB's - the 1 TB drive is likely already much faster (in throughput) than the 500 GB. But it depends on a lot of factors, so look at the individual performance of the drives at different sizes, not just drive type.

Edited by deonb, 11 January 2021 - 02:39 AM.

  • santafe retiree likes this

#18 santafe retiree

santafe retiree

    Apollo

  • *****
  • Posts: 1,113
  • Joined: 23 Aug 2014
  • Loc: Santa Fe, NM

Posted 11 January 2021 - 12:43 AM

Very helpful! Thank you.

 

The total build is:

 

Asus TUF Gaming X570 Plus WiFi Mobo

 

Ryzen 3900X CPU

 

2 x 1TB Adata SX8200 NVMe drives

 

4 TB HDD for passive storage of projects/subs awaiting processing

 

32GB of 3200 MHz DDR4 RAM

 

500w PSU, case, case fans, and miscellaneous bits and bobs

 

comes to $1275 w/o tax.

 

I am open to suggestions --

 

Cheers,

 

Tom   



#19 deonb

deonb

    Ranger 4

  • *****
  • Posts: 334
  • Joined: 16 Jul 2020
  • Loc: WA

Posted 11 January 2021 - 02:17 AM

Very helpful! Thank you.
 
The total build is:
 
Asus TUF Gaming X570 Plus WiFi Mobo
 
Ryzen 3900X CPU
 
2 x 1TB Adata SX8200 NVMe drives
 
4 TB HDD for passive storage of projects/subs awaiting processing
 
32GB of 3200 MHz DDR4 RAM
 
500w PSU, case, case fans, and miscellaneous bits and bobs
 
comes to $1275 w/o tax.
 
I am open to suggestions --
 
Cheers,
 
Tom


I posted an correction to my post above after checking my Crosshair manual.

So yeah, looking at your case, the drives you have are PCIe3 drives so they won't actually saturate a PCIe4 chipset channel. So in theory it will help (with those drives) to have 3 of them instead of 2.

However, instead of upgrading your motherboard to support 3 drives, it will probably be cheaper to just go to PCIe4 drives first.

The Sabrent Rocket NVMe 4.0 1TB TLC drives are $170 each, and it will boost performance by 42% over the PCIe3 drives (5000/4400 mb/s vs. 3500/3500 mb/s). That's effectively $50 more per drive extra over the SX8200 drives.

Don't get the Sabrent Rocket Q4 drives though even though they're $20 cheaper. They're QLC instead of TLC and then you're running into severe NAND limits (4700/1800 mb/s) instead of bus limits. (Isn't SSD complexity fun...)

But not sure how much of your equipment is planned, or returnable, or sunk cost at this point.

Edited by deonb, 11 January 2021 - 02:40 AM.

  • dpastern, jdupton and santafe retiree like this

#20 santafe retiree

santafe retiree

    Apollo

  • *****
  • Posts: 1,113
  • Joined: 23 Aug 2014
  • Loc: Santa Fe, NM

Posted 11 January 2021 - 09:52 AM

Thanks for backstopping me -- I mistakenly believed the SX8200 drives were gen4 - the extra $100 for a 42% performance boost is worth it



#21 jdupton

jdupton

    Fly Me to the Moon

  • *****
  • topic starter
  • Posts: 5,281
  • Joined: 21 Nov 2010
  • Loc: Central Texas, USA

Posted 11 January 2021 - 08:58 PM

endless-sky,

 

So, clouds plus COVID restrictions = too much time in my hands.

 

I did a little experiment.

 

When I processed the 160 frames that composed my California Nebula, I took note of the times it took to run all the processes from image calibration to Bayer drizzle integration. This was done with PixInsight running on Lubuntu 20.10, CPU @ 4343 MHz, RAM @ 2626 MHz (80 GB, 20 of which dedicated to RAM disk). Input and output was done from/to the same SSD drive (Samsung 850 Pro 1 TB).

 

Not too long ago I bought myself a Christmas present, a Crucial P5 500GB CT500P5SSD8.

 

I wiped Linux from the partition of the main SSD (shared with Windows) and proceeded to reinstall it on the NVMe.

 

Then I reprocessed all the files, doing exactly the same steps and recording the time it took for each process. So, still Lubuntu 20.10, CPU @ 4343 MHz, RAM @ 2626 MHz, no RAM disk (since in the preprocessing there's no UNDOs/REDOs, who needs a RAM disk anyway, right?!). This time, input and output was done from/to the same NVMe.

 

Results are pretty stunning. Some processes took 2 to 8 times shorter to finish. Overall, I saved 30 minutes out of two hours and a half, or about 1/5 of the time. And that's just for reading/writing files (which I thought PixInsight was doing "in the background" while the threads of the CPU were crunching the next sets of images).

 

   That is an really good experiment. It immediately highlights one aspect of PixInsight performance that can be sometimes overlooked. We often see folks recommend using an SSD instead of a Hard Drive when running PixInsight. That is true. it can really speed things up. However, not all "SSDs" are created equal and your data really shows this. You can get a really big performance increase going from an HDD to an SSD but you can get a nice additional bump in performance if your SSD is a SATA SSD and you are moving to a PCIe/AHCI SSD or even better an NVMe SSD. Your results clearly show this advantage of using NVMe over SATA.

 

   If you have a newer system that supports NVMe in BIOS, that is the best route to go. Next best would be to use a PCIe/AHCI SSD. Lastly, a SATA SSD will still outperform a SATA HDD so is viable for older systems. Even with an NVMe SSD, you are better off with an 1x4 lane version over a 1x2 lane device. An finally, if your CPU and motherboard both support PCIe-4, getting a PCIe-4 NVMe 1x4 is about as good as it gets right now.

 

 

 

ImageIntegration and DrizzleIntegration show almost no variation, as expected, since there are only the initial files to be read, but nothing written to the hard-drive while the processes run.

 

Here's an image with the comparison. For clarity, WeightedBatchPreprocessor was only done to create a master flat out of 26 flats, calibrate the images with a master bias and the resulting master flat, apply CosmeticCorrection and Debayer the images.

 

Attachment SSD-NVMe Comparison.jpg

 

Times are directly from the times reported by the Process Console at the end of each process, so no stop watch (and human error) was used in the making of this test... lol.gif

 

   The reason that ImageIntegration and DrizzleIntegration didn't show any improvement is that you had enough RAM in your system to never really run out and hit the OS Virtual Memory Paging files. Had you run out of physical memory, you would have seen an equally large performance increase approaching that which you saw on SubFrameSelector Output. That process showed what a difference is made by NVMe over SATA when the bulk of the task is simply writing files to storage.

 

   If you wanted, you may be able to repeat the experiment for ImageIntegration but create a 64 GB RAMDisk. The RAMDisk won't be used for that process but will reduce available RAM to only 16 GB. Now when ImageIntegration runs, it will be more likely to force OS Page File activity to the SATA SSD if you have enough subs to integrate. That will really slow down a SATA SSD but will get the significant performance boost when run against the NVMe SSD.

 

 

 

Takeaways from this test: if only it wouldn't take "an eternity" to map n folders to n CPU threads in order for PixInsight to swap to a RAM disk, it would possibly be worth it to:

- not have a RAM disk during the pre-processing phase
- have a RAM disk only for the post-processing phase (after the integration is complete), to take advantage of faster UNDOs/REDOs

 

   You conclusions match what I would conclude and can be implemented without a lot of trouble. You can do the configuration once and switch back and forth as desired during processing. Here is how to do that.

 

  • Open the Global | Preferences Process
     
  • Press the Load Current Settings button (lower left)
     
  • Go To the Directories and Network section
     
  • Define your PI Swap Directories. Enter as many as you find that appears to help performance of Undo / Redo.
     
  • Drag the New Instance Icon (Little triangle icon at bottom left) to the Desktop
     
  • Use the little "N" to rename the Process Instance to "PreProcessing_Prefs"
     
  • Go Back to the Directories and Network section
     
  • Define a new set of PI Swap Directory Entries that point to the mount point of your RAMDisk. Enter as many as you find that appears to help performance of Undo / Redo.
     
  • Once again, drag the New Instance Icon to the Desktop
     
  • Use the little "N" to rename the Process Instance to "PostProcessing_Prefs"
     
  • Select both of the new Process Icons and right click the mouse. Select "Save Selected Icons". Give it a name you will remember like "PI_Saved_Preferences"

   From now on, you can load these two process icons by right-clicking on the PI desktop and selecting "Processing Icons | Merge Process Icons". Start off without a RAMDisk defined or mounted. With both process icons on your desktop, open the one called PreProcessing_Prefs and execute it. Then do all preprocessing that doesn't need a RAMDisk. Once you have an integrated image, define / load / mount your RAMDisk, execute the process Icon called PostProcessing_Prefs, and you are ready to go with all your RAMDisk swap folder defined.

 

 

John


  • N1ghtSc0p3, santafe retiree and Alucard400 like this

#22 jdupton

jdupton

    Fly Me to the Moon

  • *****
  • topic starter
  • Posts: 5,281
  • Joined: 21 Nov 2010
  • Loc: Central Texas, USA

Posted 11 January 2021 - 09:18 PM

Tom,

 

John - Outstanding deep dive into the actual mechanics of PI SWAP use with excellent follow up discussion!

 

I am in the process of spec'ing out a PI rig and have settled on a Ryzen 3900X with 32GB RAM and a 1 TB PCIe NVMe primary drive for OS/App.  I was going for a second NVMe drive for SWAP but your statement:

 

"If you have Swap Folders defined on more than one physical disk location, then the parts are striped across the devices (making the writing operation faster)."

 

makes me wonder if I should be looking at two additional NVMe drives for SWAP drives instead of one.  Of course the easy answer is two is always better than one but that means a MoBo that accommodates at least 3 NVMe drives thereby driving up the cost.

 

And then there is the question of SWAP drive size.  I am using an ASI294MM in bin 1 mode.  The raw subs are 90 Mb in size so a 500 GB drive will only get me 4 effective SWAP folders 

 

Thoughts anyone?

 

Cheers,

 

Tom

 

   Regarding the highlighted section of your post, you do not need to add an additional NVMe drive if you already have two. You can simply define PI Swap Folder entries on both drives. So long as one of the drives isn't nearly full already, you still get the benefits of using both even when one may already contain your OS and programs and such.

 

   My own system is configured that way. NVMe-1 contains the OS and Apps while NVMe-2 contains most of my data. I have PI Swap directory entries defined on each. (I am using 8 folder on each of the two drives.) If you really wanted to add yet another NVMe drive and the motherboard supports that, you could add PI Swap Folders there also. It would be even faster once you find the optimum number on each through experimentation. It just isn't required. You never have to dedicate a drive to PI Swap folders. Lots of other stuff can reside there also.

 

   There is no inherent size for Swap Folders. PI will store stuff there until it runs out of space on the whole drive. With 90 MB mono files, PI will use about 180 MB per Undo / Redo entry divided across all folders. So, if you have two drives and four swap folders per drive, PI will divide up the Undo information into 8 parts and save four on each drive. That means that any Undo uses 90 MB of space on each drive. If you did 100 "Undo-able" operations on a mono image during post processing, you would still only use up a mere 9 GB on each drive. Even once you combine channels, you will triple those sizes for the RGB image during post-processing. Even then, you are using only 27 GB for 100x Undo-able operations. Drive size is not going to be a concern.

 

 

John


  • santafe retiree and thekubiaks like this

#23 jdupton

jdupton

    Fly Me to the Moon

  • *****
  • topic starter
  • Posts: 5,281
  • Joined: 21 Nov 2010
  • Loc: Central Texas, USA

Posted 11 January 2021 - 09:47 PM

deonb,

 

However, instead of upgrading your motherboard to support 3 drives, it will probably be cheaper to just go to PCIe4 drives first.

The Sabrent Rocket NVMe 4.0 1TB TLC drives are $170 each, and it will boost performance by 42% over the PCIe3 drives (5000/4400 mb/s vs. 3500/3500 mb/s). That's effectively $50 more per drive extra over the SX8200 drives.

   I would second this recommendation. I built my system with two Sabrent Rocket NVMe 4.0 1 TB drives. I have the C: drive attached to the Ryzen 9-3950x CPU-driven M.2 slot to attempt to get the drive close to the memory controller for better page space performance. The second Sabrent is attached to X570-driven M.2 Slot and contains all my "active" data. A large slow HDD SATA drive contains all my "inactive data". This set-up has worked and performed well for me.

 

 

John


Edited by jdupton, 11 January 2021 - 09:48 PM.

  • santafe retiree and deonb like this

#24 endless-sky

endless-sky

    Apollo

  • -----
  • Posts: 1,014
  • Joined: 24 May 2020
  • Loc: Padova, Italy

Posted 12 January 2021 - 05:14 AM

endless-sky,

 

   The reason that ImageIntegration and DrizzleIntegration didn't show any improvement is that you had enough RAM in your system to never really run out and hit the OS Virtual Memory Paging files. Had you run out of physical memory, you would have seen an equally large performance increase approaching that which you saw on SubFrameSelector Output. That process showed what a difference is made by NVMe over SATA when the bulk of the task is simply writing files to storage.

 

   If you wanted, you may be able to repeat the experiment for ImageIntegration but create a 64 GB RAMDisk. The RAMDisk won't be used for that process but will reduce available RAM to only 16 GB. Now when ImageIntegration runs, it will be more likely to force OS Page File activity to the SATA SSD if you have enough subs to integrate. That will really slow down a SATA SSD but will get the significant performance boost when run against the NVMe SSD.

 

...

 

   You conclusions match what I would conclude and can be implemented without a lot of trouble. You can do the configuration once and switch back and forth as desired during processing. Here is how to do that.

 

  • Open the Global | Preferences Process
     
  • Press the Load Current Settings button (lower left)
     
  • Go To the Directories and Network section
     
  • Define your PI Swap Directories. Enter as many as you find that appears to help performance of Undo / Redo.
     
  • Drag the New Instance Icon (Little triangle icon at bottom left) to the Desktop
     
  • Use the little "N" to rename the Process Instance to "PreProcessing_Prefs"
     
  • Go Back to the Directories and Network section
     
  • Define a new set of PI Swap Directory Entries that point to the mount point of your RAMDisk. Enter as many as you find that appears to help performance of Undo / Redo.
     
  • Once again, drag the New Instance Icon to the Desktop
     
  • Use the little "N" to rename the Process Instance to "PostProcessing_Prefs"
     
  • Select both of the new Process Icons and right click the mouse. Select "Save Selected Icons". Give it a name you will remember like "PI_Saved_Preferences"

   From now on, you can load these two process icons by right-clicking on the PI desktop and selecting "Processing Icons | Merge Process Icons". Start off without a RAMDisk defined or mounted. With both process icons on your desktop, open the one called PreProcessing_Prefs and execute it. Then do all preprocessing that doesn't need a RAMDisk. Once you have an integrated image, define / load / mount your RAMDisk, execute the process Icon called PostProcessing_Prefs, and you are ready to go with all your RAMDisk swap folder defined.

 

 

John

Yes, that's exactly how it went: the number of files was small enough that the integration process could be done in one single pass, loading all the files at once. It's good to know that doing these processes on the NVMe will speed things up even in the case of running out of RAM and needing a virtual paging system.

 

And thank you very much for showing me how to have the best of both worlds. I will definitely implement that solution, now that I know it can be done with the click of a button!

 

EDIT: is there a way to know what the optimal size for a RAM disk is, based on the size of the integrated image I am working on, just for UNDOs/REDOs purposes? In other words, if the image is 500 MB, how big would the dedicated RAM disk need to be, in order for it to be big enough?


Edited by endless-sky, 12 January 2021 - 05:23 AM.


#25 jdupton

jdupton

    Fly Me to the Moon

  • *****
  • topic starter
  • Posts: 5,281
  • Joined: 21 Nov 2010
  • Loc: Central Texas, USA

Posted 12 January 2021 - 11:04 AM

Tom (santafe retiree) & endless-sky,

 

   There is one more thing I forgot to mention about using very fast PCIe 4.0 NVMe drives.

 

   If you use a CPU and motherboard with PCIe 4.0 support and add PCIe 4.0 four lane (1x4) NVMe SSD drives, a RAMDisk may not help you at all in terms of overall performance. I found (running Windows) that the software overhead of managing a RAMDisk resulted in slower performance than just using the PCIe 4.0 1x4 NVMe drives for PixInsight's Swap Folders.

 

   Once you get above a certain level of hardware performance, adding a software based RAMDisk actually slows you down when PI is also writing the Undo / Redo information to the Swap Folders. Linux may not show this particular effect but it would be interesting for someone to test to see if the hardware outpaces the software with that OS as happens in Windows.

 

   I did my testing with the PI Benchmark. When running a RAMDisk for the Swap Folders, I got nice higher Swap Score numbers but the CPU Scores went down enough that the overall total Benchmark Score suffered. I ran a bunch of PI Benchmark tries and found the answer to be consistent. This is one way the effects of running the RAMDisk can show up. If you want pure File I/O performance, the RAMDisk is the best deal regardless of hardware. If you want the best overall mixed CPU / File I/O performance, the RAMDisk may slow you down some if you already have very a fast storage subsystem.

 

 

John


  • santafe retiree likes this


CNers have asked about a donation box for Cloudy Nights over the years, so here you go. Donation is not required by any means, so please enjoy your stay.


Recent Topics






Cloudy Nights LLC
Cloudy Nights Sponsor: Astronomics