Tromp's solvers

Compared to xenoncat, whose methods are described in

my solver differs in having way more buckets, wasting some memory, having simpler pair compression, being multi-threaded, and supporting (144,5).

And of course in not using any assembly.

Oh, and having some cool visualization of bucket size distribution…

3 Likes

I would prefer to not be refunded and still have access to your code. Your code is in C, so I can tinker with it. I don’t know assembly. Also, the CUDA support is also valuable.

2 Likes

Once the xenoncat performance claims are confirmed as I expect they will, then I’ll offer full (or partial, if you like to support Cuckoo Cycle) refunds, and open source my solvers anyway.

6 Likes

Wow, you are an absolute legend. Thankyou!

2 Likes

Also thanks to xenoncat, whoever he or she may be (my hat off to you) …

1 Like

Performance of xenoncat is confirmed, but due to API mismatch, correctness of solutions found is not confirmed yet. No doubt that will happen soon, and I’m already preparing my commits…

2 Likes

OK; I’ve decided to bite the bullet. Full source is available at

Just run

git clone git@github.com:tromp/equihash.git

make all

and enjoy. I will be contacting contributors in decreasing order of donation, and asking how they want to be refunded…

Whoever sent 1BTC to OgNasty, please let him know how to handle your refund.

8 Likes

Awesome! You rock, Mr. Tromp.

1 Like

What kind of dependencies are required? Haven’t tried to compile yet.

1 Like

You have open sourced your work – Thank you! – please keep my small donation.

2 Likes

No dependencies, I think?! Let me know if you find otherwise…

git clone https://github.com/tromp/equihash

works for me.

I get these results:

3 solutions
3 total solutions
1.75user 0.10system 0:01.86elapsed 99%CPU (0avgtext+0avgdata 216080maxresident)k
0inputs+0outputs (0major+7298minor)pagefaults 0swaps
2 Likes

which cpu are you using?

Intel Core i7-47090K @ 4.00 GHz

1 Like

Hey, I thought you were going to open source it! But from what I see, it is proprietary software that nobody else has the right to use or redistribute without prior permission from the author. :wink:

If you want a suggestion, you could add something like this:

Copyright 2016 John Tromp
You may use this package under the MIT Licence. You may use this package under the Transitive Grace Period Public Licence, version 1.0, or at your option, any later version. (You may choose to use this package under the terms of either licence, at your option.) See the file COPYING.MIT for the terms of the MIT Licence. See the file COPYING.TGPPL for the terms of the Transitive Grace Period Public Licence, version 1.0. See TGPPL.PDF for why the TGPPL exists, graphically illustrated on three slides.

1 Like

I am getting core dumps on Ubuntu 64 in a VirtualBox on a Debian64 host

jank@ubuntu-modeli:~/equihash$ make all
g++ -march=native -m64 -maes -mavx -std=c++11 -Wall -Wno-deprecated-declarations -D_POSIX_C_SOURCE=200112L -O3 -pthread  -DATOMIC equi_miner.cpp blake/blake2b.cpp -o equi
g++ -march=native -m64 -maes -mavx -std=c++11 -Wall -Wno-deprecated-declarations -D_POSIX_C_SOURCE=200112L -O3 -pthread  -DSPARK equi_miner.cpp blake/blake2b.cpp -o equi1
g++ -march=native -m64 -maes -mavx -std=c++11 -Wall -Wno-deprecated-declarations -D_POSIX_C_SOURCE=200112L -O3 -pthread  -DJOINHT -DATOMIC equi_miner.cpp blake/blake2b.cpp -o faster
g++ -march=native -m64 -maes -mavx -std=c++11 -Wall -Wno-deprecated-declarations -D_POSIX_C_SOURCE=200112L -O3 -pthread  -DJOINHT equi_miner.cpp blake/blake2b.cpp -o faster1
g++ -g equi.c blake/blake2b.cpp -o verify
time ./equi -h "" -n 0 -t 1 -s | grep ^Sol | ./verify -h "" -n 0
Verifying size 512 proof for equi("",0)
Command terminated by signal 4
0.00user 0.00system 0:00.20elapsed 0%CPU (0avgtext+0avgdata 2760maxresident)k
0inputs+0outputs (0major+124minor)pagefaults 0swaps
time ./equi1
Looking for wagner-tree on ("",0) with 10 20-bits digits and 1 threads
Command terminated by signal 4
0.00user 0.00system 0:00.20elapsed 0%CPU (0avgtext+0avgdata 2692maxresident)k
0inputs+0outputs (0major+120minor)pagefaults 0swaps
Makefile:47: recipe for target 'spark' failed
make: *** [spark] Error 132
jank@ubuntu-modeli:~/equihash$ ./equi
WARNING: use of atomics hurts single threaded performance!
Looking for wagner-tree on ("",0) with 10 20-bits digits and 1 threads
Illegal instruction (core dumped)
jank@ubuntu-modeli:~/equihash$ ./equi1
Looking for wagner-tree on ("",0) with 10 20-bits digits and 1 threads
Illegal instruction (core dumped)
jank@ubuntu-modeli:~/equihash$ ./faster
WARNING: use of atomics hurts single threaded performance!
Looking for wagner-tree on ("",0) with 10 20-bits digits and 1 threads
Illegal instruction (core dumped)
jank@ubuntu-modeli:~/equihash$ cat /proc/version
Linux version 4.4.0-42-generic (buildd@lgw01-13) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.2) ) #62-Ubuntu SMP Fri Oct 7 23:11:45 UTC 2016
jank@ubuntu-modeli:~/equihash$ gcc --version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.2) 5.4.0 20160609

1 Like

Thank you, @tromp! Testing this on our “super” box, which you also have an account on and can use for testing now that the code is (almost) open source (need a license, as Zooko pointed out), the eqcuda and feqcuda sometimes fail to find solutions (and take multiple seconds to complete in that case). For example, the first time I ran them, they reported 0 solutions. Trying other nonce values, I got them to non-zero solutions, and then trying nonce 0 again finally gave the expected 3 solutions. Retrying after some other tests - and it’s 0 solutions again. You probably have an uninitialized variable somewhere.

Failing run:

$ time ./eqcuda -n 0
Looking for wagner-tree on ("",0) with 10 20-bits digits and 8192 threads (128 per block)
Digit 0
Digit 1
Digit 2
Digit 3
Digit 4
Digit 5
Digit 6
Digit 7
Digit 8
Digit 9
9 rounds completed in 3.900 seconds.
0 solutions
0 total solutions

real    0m5.344s
user    0m2.875s
sys     0m2.281s

Working run:

$ time ./eqcuda
Looking for wagner-tree on ("",0) with 10 20-bits digits and 8192 threads (128 per block)
Digit 0
Digit 1
Digit 2
Digit 3
Digit 4
Digit 5
Digit 6
Digit 7
Digit 8
Digit 9
9 rounds completed in 0.096 seconds.
3 solutions
3 total solutions

real    0m1.532s
user    0m0.081s
sys     0m1.265s

0.096 would suggest 1.88/0.096 = 19.6 Sol/s, right? Per nvidia-smi, this runs on Maxwell Titan X. The box also has old Kepler Titan, but you don’t seem to have included an option to choose the CUDA device.

I also tried CPU runs. Works great on i7-4770K, but the scaling to 32 threads on 2x E5-2670 in this “super” box is poor - perhaps running some independent instances with fewer threads each (maybe just 1 thread/instance) would be faster (but would eat up more RAM, which is fine at least for testing - got 128 GB here). Feel free to experiment with this, too.

Edit: “-t 12288” (upping CUDA thread count in accordance with the difference between GTX 980 and GTX Titan X) somehow makes the speed slightly worse for eqcuda, but improves it for feqcuda, which now gets (also not all the time, but when it’s lucky):

$ time ./feqcuda -t 12288
Looking for wagner-tree on ("",0) with 10 20-bits digits and 12288 threads (128 per block)
Digit 0
Digit 1
Digit 2
Digit 3
Digit 4
Digit 5
Digit 6
Digit 7
Digit 8
Digit 9
9 rounds completed in 0.076 seconds.
3 solutions
3 total solutions

real    0m1.524s
user    0m0.070s
sys     0m1.328s

This is apparently 1.88/0.076 = 24.7 Sol/s.

1 Like

MIT LICENSE added…

2 Likes

Thank you! Looks like blake2b.cu is third-party code (right?) - are you sure its author is OK with the code being placed under MIT license? Was it already released under a MIT-compatible license?

// Blake2-B CUDA Implementation
// tpruvot@github July 2016
2 Likes

there is some bug left in faster[1] with -r option that I’ll try to iron out soon