Linux

A great tip for throttling IO intensive processes.

July 9, 2022 Linux , , ,

If I do anything too IO intensive on my Vmware Linux installation (running on MacOSX), it often crashes my VM and the MacOS host along with it, and I have no choice but to powercycle the mac.

This used to happen with my previous Mac too (both work machines, intel generation cpus, not M1/M2.)

Today I crashed my VM/host pair trying to unpack a 4G tar.gz file, and found this great way to throttle the unpacking enough that my machine stayed alive.

My use case was:

# cat master.tar.gz | pv -L 50M | tar zxf -       

I’m running RHEL8 on my VM, but was able to find a EPEL pv rpm (pv-1.6.6-7.el8.x86_64.rpm) to install on rpmsearch.

Installing CUDA SDK on Fedora 34

June 10, 2021 Linux , , ,

I’ve been wanting to try GPU programming for a while.  My non-work laptop, which is now installed in a Windows-10 + Linux Fedora 34 dual boot configuration, has a GPU that I can play with.  lspci -v shows that it is:

GeForce GTX 1660 Ti Mobile

I’m sure this is an underpowered GPU compared to what you’d find in a desktop gaming machine (my stepson has an RTX 3XXX series GPU in his machine, which I’m sure could be made to do much more interesting things — although he thinks it’s for games.)

Setting up the nvidia driver and the cuda SDK on Linux turned out to be a bit more trouble than I figured.  This required:

  1. Installing the cuda SDK.
  2. Building a downlevel gcc version (gcc-10) so that I could run the cuda SDK samples, as Fedora 34 ships with gcc-11, and the SDK doesn’t like that.
  3. Disabling the default nouveau driver
  4. Building and installing a Linux kernel from source, bypassing the default Fedora kernel, which has a debug configuration that enforces GPL symbol purity.
  5. Manually installing the nvidia driver.

Step 1.  Installing the cuda SDK.

The nvidia site has an options dialogue for selecting the packages for your operating system.  The closest I was able to select was Fedora 33, for which the installation instructions were:

wget https://developer.download.nvidia.com/compute/cuda/11.3.1/local_installers/cuda-repo-fedora33-11-3-local-11.3.1_465.19.01-1.x86_64.rpm
sudo rpm -i cuda-repo-fedora33-11-3-local-11.3.1_465.19.01-1.x86_64.rpm
sudo dnf clean all
sudo dnf -y module install nvidia-driver:latest-dkms
sudo dnf -y install cuda

Needless to say, this didn’t work.  After installation (and reboot) I was able to create a working copy of the SDK samples using:

/usr/local/cuda-11.3/bin/cuda-install-samples-11.3.sh

Trying to build one of those samples bombs right away, with an errors like:

139 | #error -- unsupported GNU version! gcc versions later than 10 are not supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.

Step 2.  Build a downlevel gcc:

git clone git://gcc.gnu.org/git/gcc.git
cd gcc
git checkout releases/gcc-10.3.0
contrib/download_prerequisites
mkdir ../build-gcc
cd ../build-gcc
../gcc/configure --prefix=$(HOME)/gcc-10 --disable-multilib
make -j12
make install

 

With that done, I’m able to compile CUDA samples, but they all fail with cudaGetDeviceCount errors, like so:

matrixMul> ./matrixMul
[Matrix Multiply Using CUDA] - Starting…
CUDA error at …/…/common/inc/helper_cuda.h:779 code=100(cudaErrorNoDevice) “cudaGetDeviceCount(&device_count)”

A bit of googling shows that those errors all mean that the nvidia driver isn’t running or installed properly. This was confirmed by trying to run nvidia-smi which gave me:

NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Step 3. Googling suggests that the following might help:

sudo su -
echo "blacklist nouveau" > /etc/modprobe.d/blacklist.conf
dracut -f
reboot

(but it didn’t.) I’m not sure if this step was required or not, but I haven’t undone it.

Step 4. New kernel build and install.

I tried installing the nvidia driver following instructions from JR:

The basic steps are:

  • download the driver .run file.
  • telinit 3 to switch to console mode.
  • try running the NVIDIA-Linux-x86_64-465.31.run installer.
  • Look at /var/log/nvidia-installer.log and see what went wrong.

The first error I found was that the symlink in /lib/modules/5.12.8-300.fc34.x86_64/build was a dead link. I actually seemed to not have matching sources and modules. I upgraded:

sudo yum clean all
sudo yum -y upgrade

to grab matching kernel+sources (figuring there was an update available.) That also didn’t work, because /lib/modules//build pointed to a -debug location that wasn’t available. Correcting that link gave me different errors, namely, gpl symbol errors like:

FATAL: modpost: GPL-incompatible module nvidia.ko uses GPL-only symbol ‘mutex_destroy’

(there was a whole pile of similar errors.)

To build a kernel that didn’t have the gpl issues (which apparently comes with the fedora default kernel due to some sort of debug configuration), I ran:

sudo dnf group install "Development Tools"
sudo dnf install ncurses-devel bison flex elfutils-libelf-devel openssl-devel

git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
cd linux
git checkout v5.12.9
cp /boot/config-5.12.9-300.fc34.x86_64 .config
make oldconfig
make -j12
sudo make modules_install
sudo make install
sudo grub2-mkconfig -o /boot/grub2/grub.cfg
sudo grubby --set-default /boot/vmlinuz-5.12.9
reboot

Step 5. Install the nvidia driver manually, last try:

After reboot:

telinit 3
login
/path/to/NVIDIA-Linux-x86_64-465.31.run
reboot

With all this done, I nvidia-smi runs successfully:

Thu Jun 10 00:30:35 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.31 Driver Version: 465.31 CUDA Version: 11.3 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A |
| N/A 44C P8 5W / N/A | 5MiB / 5944MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 5867 G /usr/libexec/Xorg 4MiB |
+-----------------------------------------------------------------------------+

Now I should be setup to try some CUDA apps (the samples, GPU crypto miners, parallel numerical code, password crackers, or whatever else might be interesting to fool around with.) I’ve got a couple CUDA books on order from the Toronto public library, and will start fooling around in more depth once I get those.

Introducing a fixed capacity directory (Linux).

March 23, 2021 Linux , , , ,

I have a dump directory that is too easily filled up (with cores and dumps) if a programming error makes our system misbehave catastrophically.

Here’s a nice trick to create a small filesystem with fixed capacity so that the owning filesystem (in my case /var) can’t be filled up:

dd if=/dev/zero of=dump.loopback bs=1M count=10
mke2fs ./dump.loopback
mount -o loop dump.loopback ./dump
chmod 777 ./dump