Jump to content

  •  

CNers have asked about a donation box for Cloudy Nights over the years, so here you go. Donation is not required by any means, so please enjoy your stay.

Photo

starnet++ working with cuda and libtensorflow-gpu under linux - a report

Imaging Astrophotography Report
  • Please log in to reply
15 replies to this topic

#1 bluedandelion

bluedandelion

    Vanguard

  • *****
  • topic starter
  • Posts: 2,124
  • Joined: 17 Aug 2007
  • Loc: Hazy Hollow, Western WA

Posted 06 March 2022 - 01:31 AM

Greetings!

 

I was able to successfully install cuda, and gpu enabled tensorflow libraries, and run starnet++ using these under linux. I found a similar set of steps for Windows from Darkarchon here:  https://www.darkskie...t-starnet-cuda/ . I followed those general ideas and adapted them to linux.

 

Disclosures

 

1) This is not a list of instructions for how you could successfully do the same.
2) If you follow these steps your own installation may become irreparably damaged! I do not take any responsibility. Please use your own judgment if you decide to do the same.
3) I cannot help with installation questions. I am posting this as a journal and as a proof of concept since I wasn't able to find documentation through a web search.

 

My system

 

AMD 3900X, 32 GB
Nvidia GeForce GTX 1660 Super,  Driver Version: 510.47.03,   CUDA Version: 11.6 [after cuda installation]
Linuxmint 20.3 with the xfce4 desktop. This distribution is based on Ubuntu 20.04.
 

Nvidia's list of cuda enabled GPUs are listed here: https://developer.nvidia.com/cuda-gpus but this list is incomplete. For example, It does not  list the GeForce GTX 1660 Super.  The 1660 Super supports CUDA  7.5. According to Darkarchon, starnet needs CUDA 3.5 or higher.

https://en.wikipedia...rocessing_units [cuda version is listed under supported API's]

https://www.techpowerup.com/gpu-specs/ [cuda version is listed under Graphics Features]

 

Installed cuda

 

The cuda toolkit is here: https://developer.nv...toolkit-archive .  The documentation is a very long read. Here are the steps that I took.

 

Cuda install requires the build-essential package as a dependency. However build-essential would not install with libc6 version 2.31-0ubuntu9.3 that I initially had on my system. On March 3, 2022, after libc6 updated to 2.31-0ubuntu9.7 build-essential and everything else installed smoothly.

 

$ sudo apt update
$ sudo apt upgrade
$ sudo apt install build-essential

 

I then went to https://developer.nv...toolkit-archive and clicked the following links in turn to get the quick install instructions.

CUDA Toolkit 11.6.1 -> Linux -> x86_64 -> Ubuntu -> 20.04 -> deb (local).

Installation Instructions for my distribution:

$ wget https://developer.do...-ubuntu2004.pin
$ sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
$ wget https://developer.do....03-1_amd64.deb
$ sudo  dpkg -i cuda-repo-ubuntu2004-11-6-local_11.6.1-510.47.03-1_amd64.deb
$ sudo apt-key add /var/cuda-repo-ubuntu2004-11-6-local/7fa2af80.pub
$ sudo apt-get update
$ sudo apt-get -y install cuda
$ apt install libcudnn8

 

Edit: cuda components are installed in /usr/local/

 

Installed libtensorflow-gpu

 

Found it here: https://www.tensorfl.../install/lang_c and followed install instructions.

 

$ sudo tar -C /usr/local -xzf libtensorflow-gpu-linux-x86_64-2.7.0.tar.gz
$ sudo ldconfig /usr/local/lib 

Doing ldconfig ensures that programs will find the tensorflow libraries from the /usr/local install.

 

Moved the included libtensorflow libraries to a temporary location so that the gpu enabled libraries in /usr/local were picked up. Otherwise the cpu versions are used.

For example: In the directory that contains the starnet++ command line files [such as StarNetv2CLI_linux/] I did the following.

$ mkdir temp
$ mv libtensorflow* ./temp/

 

Did the same for Pixinsight: The tensorflow libraries are in /opt/PixInsight/bin/lib

 

$ mkdir /opt/temp
$ mv libtensorflow* /opt/temp

 

Be warned here! You will need to use super user privileges to do this. You could damage your Pixinsight installation if you are not careful.

 

Set an environment variable

$ export TF_FORCE_GPU_ALLOW_GROWTH="true"

I put this in my bashrc so that it is set every session. Why is this needed? See https://www.tensorflow.org/guide/gpu
Quote:
"In some cases it is desirable for the process to only allocate a subset of the available memory, or to only grow the memory usage as is needed by the process. TensorFlow provides two methods to control this.

The first option is to turn on memory growth by calling tf.config.experimental.set_memory_growth, which attempts to allocate only as much GPU memory as needed for the runtime allocations: it starts out allocating very little memory, and as the program gets run and more GPU memory is needed, the GPU memory region is extended for the TensorFlow process.

Another way to enable this option is to set the environmental variable TF_FORCE_GPU_ALLOW_GROWTH to true. This configuration is platform specific."

 

Error message

 

There was an error message in the terminal running starnet++ from the command line:

"I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero."

 

This does not seem to matter. It can be dealt with by running the following command in the terminal as described here: https://github.com/t...ow/issues/42738

 

$ for a in /sys/bus/pci/devices/*; do echo 0 | sudo tee -a $a/numa_node; done

This was only needed once and I was asked for my sudo password.

 

Installed nvtop to monitor gpu usage

 

$ sudo apt install nvtop

 

nvtop is run from the command line in a terminal. When starnet++ is running an entry is shown under Type as "Compute" whereas normal display tasks are listed as "Graphic." The line graph shows memory and gpu usage. This should go up when starnet++ is running.

 

Results

From the command line I ran starnet++ as an argument to the time command to get a time estimate:

 

$ time ./starnet++  ~/PI/NGC700_Drzl2_Ha_nonlinear.tif fileout.tif 64  [Yes, there's a typo in the input filename, should have been NGC7000]
Reading input image... Done!
Bits per sample: 16
Samples per pixel: 1
Height: 5074
Width: 6724
Done!

 

Output of the time command:

real 5m35.637s <----- This is good!
user 5m39.473s
sys     0m1.908s

 

Also works with starnetGUI. [https://www.cloudyni...6#entry11695043]

 

Attached picture is luminance, not the Ha file whose stats I have above.

 

Many thanks to Nikita Misiura (Starnet++), JJ Teoh (starnet GUI) and Darkarchon (instructions for windows).

 

Cheers!
Ajay

Attached Thumbnails

  • nvtop_starnet.jpg
  • starless_NGC7000_Luminance.jpg

Edited by bluedandelion, 06 March 2022 - 12:24 PM.

  • gregj888, dswtan, airscottdenning and 2 others like this

#2 airscottdenning

airscottdenning

    Viking 1

  • *****
  • Posts: 933
  • Joined: 22 Aug 2008
  • Loc: Colorado

Posted 06 March 2022 - 10:23 AM

Ajay,

 

WOW! Thank you SO MUCH for documenting this!

Now I'm going to have to join the crowd trying to pry a video card out of the supply-chain nightmare maintained by the bitcoin mining hordes.

 

Way to go.

 

Scott 


  • bluedandelion and GloP like this

#3 bluedandelion

bluedandelion

    Vanguard

  • *****
  • topic starter
  • Posts: 2,124
  • Joined: 17 Aug 2007
  • Loc: Hazy Hollow, Western WA

Posted 06 March 2022 - 11:09 AM

I built my system just before the pandemic hit and it seems that all of those components sell used for more than I paid!



#4 nekitmm

nekitmm

    Vendor - StarNet Software

  • *****
  • Vendors
  • Posts: 191
  • Joined: 25 Apr 2018

Posted 06 March 2022 - 12:29 PM

Excellent job, Ajay!

 

This will be very useful for many people!


Edited by nekitmm, 06 March 2022 - 12:30 PM.

  • bluedandelion and airscottdenning like this

#5 bluedandelion

bluedandelion

    Vanguard

  • *****
  • topic starter
  • Posts: 2,124
  • Joined: 17 Aug 2007
  • Loc: Hazy Hollow, Western WA

Posted 13 March 2022 - 04:01 PM

Some updates
 

Using Multiple GPUs
 

I only have a single GPU. However, controlling how many cards or which cuda capable card is enabled is achieved via the environment variable CUDA_VISIBLE_DEVICES as documented here: https://docs.nvidia....x.html#env-vars

"Only the devices whose index is present in the sequence are visible to CUDA applications and they are enumerated in the order of the sequence. If one of the indices is invalid, only the devices whose index precedes the invalid index are visible to CUDA applications. For example, setting CUDA_VISIBLE_DEVICES to 2,1 causes device 0 to be invisible and device 2 to be enumerated before device 1. Setting CUDA_VISIBLE_DEVICES to 0,2,-1,1 causes devices 0 and 2 to be visible and device 1 to be invisible."

 

GPU indices are enumerated starting from 0. If you have two GPUs that you wish to use, set the environment variable as follows for Linux:

export CUDA_VISIBLE_DEVICES="0" [0nly the first card is visible to cuda applications]
export CUDA_VISIBLE_DEVICES="0,1"  [Use first card before the second]
export CUDA_VISIBLE_DEVICES="1,0"  [Use second card before the first]
export CUDA_VISIBLE_DEVICES="2,1" [Causes device 0 to be invisible and device 2 to be enumerated before device 1]
export CUDA_VISIBLE_DEVICES="0,2,-1,1" [Make devices 0 and 2 to be visible and device 1 to be invisible because -1 is an invalid index]

 

See the link at the top of the original post for Darkarchon's instructions on how to set environment variables in Windows.

 

-----

 

My instruction for suppressing one of the errors has to be done once in each shell you use to run the command line. It does not survive a restart.

 

There was an error message in the terminal running starnet++ from the command line:

"I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero"

This does not seem to matter. It can be dealt with by running the following command in the terminal as described here: https://github.com/t...ow/issues/42738

$ for a in /sys/bus/pci/devices/*; do echo 0 | sudo tee -a $a/numa_node; done



#6 cofford

cofford

    Lift Off

  • -----
  • Posts: 9
  • Joined: 03 Sep 2020

Posted 14 March 2022 - 08:31 PM

Thank you for posting this.  Starnet v2 is now running on my GTX1060 in Ubuntu 20.04 and it cranks!


  • bluedandelion likes this

#7 bluedandelion

bluedandelion

    Vanguard

  • *****
  • topic starter
  • Posts: 2,124
  • Joined: 17 Aug 2007
  • Loc: Hazy Hollow, Western WA

Posted 14 March 2022 - 11:46 PM

Tried the new Starnet2 PI module for Linux with PI version 1.8.9-20220313.

 

Works fine with nonlinear 16 bit Tiff: Image size: 6724x5074, Number of channels: 1, Color space: Grayscale, Bits per sample: 16

 

Stride: 256
Processing 540 image tiles: done

Done! 25.716 s

 

Stride: 128
Processing 2120 image tiles: done
Done! 01:03.93

 

Also works on linear version of this 2xDrizzled file but only if its upsampled.

Resampling to 13448x10148 px, Lanczos-3 interpolation, c=0.30: done, Window size: 512, Stride: 256
Image size: 13448x10148
Processing 2120 image tiles: done
Resampling the image to original size...
Done! 01:28.12



#8 hau_ruck

hau_ruck

    Lift Off

  • -----
  • Posts: 1
  • Joined: 25 Jun 2022

Posted 26 June 2022 - 02:03 PM

Happy to report success with this procedure, using Pop!_OS 22.04 (Ubuntu derivative distro from System76). Thanks for doing the grunt work, Ajay!

 

The only meaningful difference for setting up with Pop!_OS is the package names for installing CUDA, as the main System76 repository provides these packages without the need to install NVidia's repo.

sudo apt update
sudo apt install -y \
  build-essential \
  system76-cuda-latest \
  system76-cudnn-11.2 \
  nvtop

After that, I simply followed the Tensorflow installation/setup steps as written, started up PI, and tested both Starnet v2 and StarXTerminator. The results with an RTX 2080ti (nvidia driver 470.103.01) were a substantial improvement over a 24-core Ryzen TR 3960X.


  • bluedandelion likes this

#9 bluedandelion

bluedandelion

    Vanguard

  • *****
  • topic starter
  • Posts: 2,124
  • Joined: 17 Aug 2007
  • Loc: Hazy Hollow, Western WA

Posted 06 July 2022 - 11:59 PM

Good work!

 

Yes, there are likely to be variations in package names and version numbers as packages are updated. Some of the links I've provided may also become obsolete. Please adapt.



#10 whirlpoolm51

whirlpoolm51

    Viking 1

  • *****
  • Posts: 940
  • Joined: 05 Jan 2012
  • Loc: pittsburgh,pa

Posted 07 July 2022 - 04:47 PM

Do you have any ideas if I have to go through all this again after I update my version of pixinsight??

I successfully did this awhile ago and I don’t want to upgrade my version of PI yet because I don’t want to have to go through all this again haha

The only thing I found was that you might only have to replace the libtensor.dll file for the new version but this seems to be spotty

#11 bluedandelion

bluedandelion

    Vanguard

  • *****
  • topic starter
  • Posts: 2,124
  • Joined: 17 Aug 2007
  • Loc: Hazy Hollow, Western WA

Posted 14 July 2022 - 01:35 PM

Yeah, the only part you may need to redo is moving the libtensorflow included with PI to a temp directory as I've explained above. I think I've done one upgrade and IIRC that is the only change I needed to make.



#12 whirlpoolm51

whirlpoolm51

    Viking 1

  • *****
  • Posts: 940
  • Joined: 05 Jan 2012
  • Loc: pittsburgh,pa

Posted 15 July 2022 - 06:47 PM

It’s not working with new version and new gpu
(3080)
I redid everything but still won’t work , works fine as soon as I put the normal tensor flow.dll file back instead of the new one

Anyone have any ideas??

#13 bluedandelion

bluedandelion

    Vanguard

  • *****
  • topic starter
  • Posts: 2,124
  • Joined: 17 Aug 2007
  • Loc: Hazy Hollow, Western WA

Posted 02 August 2022 - 02:04 AM

Sorry. I am travelling and don't have my system for reference. I also don't have a deep understanding of the interplay between various components and for that reason troubleshooting different setups is beyond me. If this were my system, I'd start from scratch and on a trial and error basis update each component (look for missing library messages etc.). That is how I got this working in the first place. Maybe someone else might be able to help. Good luck. 


Edited by bluedandelion, 02 August 2022 - 02:05 AM.


#14 bluedandelion

bluedandelion

    Vanguard

  • *****
  • topic starter
  • Posts: 2,124
  • Joined: 17 Aug 2007
  • Loc: Hazy Hollow, Western WA

Posted 04 September 2022 - 07:58 PM

Hi whirlpoolm51,

 

I just updated to to the latest PI version for Linux: PI-linux-x64-1.8.9-1-20220518

Then I moved the libtensorflow files that come with the stock PI installation to a temporary location as described in my first post on this thread:
$ mkdir /opt/temp
$ mv libtensorflow* /opt/temp

 

Restarted PI and starnet used gpu acceleration. So I can say for sure that this was all that was needed after updating PI to have starnet run on the gpu.

 

Just to be clear, nothing other than the PI release was changed.

 

Goodluck.

Ajay


Edited by bluedandelion, 04 September 2022 - 08:02 PM.


#15 sarmen2

sarmen2

    Sputnik

  • *****
  • Posts: 45
  • Joined: 14 Mar 2011

Posted 13 October 2022 - 01:19 PM

Hello,

 

I'm looking for simple step by step instructions for loading StarnetV2 as a PI module on Linux Ubuntu. The PI version is 1.8.9-1. 

 

None of the instructions I've found have worked. The StarnetV2 module is not found when attempting to add new.

 

thanks in advance.



#16 Dudleydogg

Dudleydogg

    Sputnik

  • -----
  • Posts: 34
  • Joined: 19 Dec 2020
  • Loc: Central Florida

Posted 23 December 2022 - 02:03 PM

It’s not working with new version and new gpu
(3080)
I redid everything but still won’t work , works fine as soon as I put the normal tensor flow.dll file back instead of the new one

Anyone have any ideas??

This is where I am at, all the qualifications but just a Hang 0% and have to force quit Pix




CNers have asked about a donation box for Cloudy Nights over the years, so here you go. Donation is not required by any means, so please enjoy your stay.


Recent Topics





Also tagged with one or more of these keywords: Imaging, Astrophotography, Report



Cloudy Nights LLC
Cloudy Nights Sponsor: Astronomics