TechEarl

Speed Up Hashcat: Workload, Optimized Kernels, and Tuning

hashcat slow? Most of the speed is in two things: the workload flags and the order you run attacks. I cover the workload profile, optimized kernels and their length cap, device selection, benchmark-driven tuning, and why attack ordering beats every flag. Tested on hashcat 7.1.2.

Ishan Karunaratne⏱️ 8 min readUpdated
Share thisCopied
The workload profile (-w), optimized kernels (-O), device selection, benchmark-driven tuning, the attack ordering that beats any flag, and the ceiling you cannot tune past.

"hashcat is slow" almost always means one of two things: you are running it with conservative defaults, or you are running the wrong attack. The flags below recover real speed, but the single biggest lever is not a flag at all, it is the order you run your attacks in. A well-ordered run on default settings beats a badly-ordered run with every performance flag set. This is how to get the most out of hashcat, and where the hard ceiling is. Tested on hashcat 7.1.2.

TL;DR

The fast path: set the workload profile with -w 3 (or -w 4 on a headless box), add -O for optimized kernels when your passwords are short, and let hashcat autotune the rest. Confirm your real speed with hashcat -b -m <mode>. But the biggest win is attack ordering: run wordlist, then wordlist + rules, then hybrids, then masks, so you spend cycles where cracks actually are. And accept the ceiling: no flag makes a slow hash fast or a strong password weak. Optimization buys you speed within the attack; it does not change the maths.

What actually controls your speed

Three things, in order of impact:

  1. The hash. A fast hash runs at billions per second; a slow one at thousands. You cannot change this, but it dictates everything else (it decides which attacks are even viable).
  2. The attack you run. Running an exhaustive mask when a wordlist would crack it is the most common way to waste a GPU-week. This is the lever you control most.
  3. The flags. Workload, kernels, device selection. Real, but smaller than the first two.

Most "hashcat is slow" problems are actually number 2 wearing number 3's clothing.

The workload profile (-w)

The workload profile trades desktop responsiveness for throughput. The four levels, verified from hashcat --help:

-wProfileUse when
1LowYou are actively using the desktop and want it responsive
2DefaultGeneral use
3HighThe machine is dedicated to cracking
4NightmareA headless rig you never touch directly

On a dedicated cracking box, -w 3 (or -w 4) is free speed you are leaving on the table at the default:

bash
hashcat -m 0 -a 0 hashes.txt rockyou.txt -w 3

Optimized kernels (-O) and their catch

-O switches hashcat to optimized kernels, which are meaningfully faster, sometimes dramatically so. The catch: they cap the maximum password length (often to 31 or fewer characters, mode-dependent). For most real passwords that limit never bites, so -O is close to free speed:

bash
hashcat -m 0 -a 0 hashes.txt rockyou.txt -w 3 -O

When to drop -O: if you are attacking long candidates (passphrases, long combinator output) that exceed the cap, the optimized kernel would silently skip them. For those runs, leave -O off so hashcat uses the pure kernels with no length limit.

For very slow hashes, the companion flag is -S (slow-candidate mode), which can improve throughput on the likes of bcrypt.

Pick the right device

By default hashcat uses everything it can see. To check what that is, and to force a device type:

bash
hashcat -I                  # list backends and devices
hashcat -m 0 ... -D 2       # GPU only (device type 2; 1 is CPU)
hashcat -m 0 ... -d 1       # use only device number 1 (e.g. one of several GPUs)

On a multi-GPU rig, -d lets you dedicate specific cards to a job. On a laptop, forcing -D 2 ensures you are not accidentally cracking on the CPU.

Benchmark to know your real numbers

Do not guess at your speed; measure it. The benchmark gives you the guesses-per-second figure for a mode on your exact hardware, which is what you use to estimate whether an attack is feasible:

bash
hashcat -b -m 0       # benchmark MD5
hashcat -b -m 3200    # benchmark bcrypt (watch how much slower it is)
hashcat -b            # benchmark a broad set of modes

Comparing your benchmark to published numbers (the GTX 1080 Ti benchmark deep dive has a full table across modes) tells you whether your setup is performing as it should or whether a driver or thermal issue is holding it back.

The real optimization: attack ordering

No flag matters as much as running attacks in the right order. The principle is "cheapest, highest-yield first," so you crack the easy passwords immediately and only spend expensive cycles on what is left:

  1. Wordlist (rockyou.txt). Seconds to set up, catches reused passwords.
  2. Wordlist + rules (-r best66.rule). The highest-yield attack.
  3. Hybrid (-a 6). For word + digits patterns.
  4. Targeted masks for known shapes.
  5. Bigger wordlists and rule stacks, then incremental masks, only if needed.

Run with --username and feed cracked passwords back with --loopback, because one cracked password predicts others. This ordering will out-crack a brute force with every performance flag set, in a fraction of the time. The full reasoning is in the attack types.

Keep the rig stable

Speed is worthless if the run crashes or the hardware throttles. On a long job, guard the temperature so a card backs off cleanly instead of overheating or producing errors:

bash
hashcat -m 0 ... -w 3 --hwmon-temp-abort=90

Good airflow and a sane abort temperature keep throughput consistent across a multi-hour run, which matters more than squeezing out the last few percent with manual kernel tuning. (The manual knobs -n, -u, and -T exist, but hashcat's autotune is good; only touch them if you know your hardware well and have benchmarked the difference.)

The ceiling you cannot tune past

Be clear-eyed about what optimization buys you. It makes a given attack run faster. It does not:

  • Make a slow hash fast. bcrypt at cost 12 is thousands of guesses per second no matter what flags you set.
  • Make a strong password weak. A long, random passphrase is out of reach at any speed.

When the estimated time is still years after you have tuned everything, the answer is not more tuning, it is a smarter attack (wordlist + rules) or accepting that this particular hash is not coming out. Knowing the difference is the real skill.

Where to go next

Sources

Authoritative references this article was fact-checked against.

Tagshashcatoptimizationperformanceworkloadpassword cracking

Found this useful? Pass it on.

Copied

Ishan Karunaratne

Software Systems Architect · Senior Software Engineer · Engineering Leadership

Software systems architect and senior software engineer with more than two decades designing, building, and running production software, Linux systems, and DevOps infrastructure, and lately working AI into the stack. Now a CTO, though what I write here is drawn from the full arc of that work, across architecture, engineering, and operations, not any single job.

Keep reading

Related posts

Hardening a custom WordPress REST API write endpoint against unauthorized and replayed requests

Securing a WordPress REST API Write Endpoint

A custom write endpoint accepts changes from the open internet. Harden it step by step: header secret, constant-time compare, HMAC signatures, replay protection, rate limiting, secret out of the repo, and a hidden route.

Bash for loop reference: brace-range {1..10}, sequence (seq), array, glob, C-style, nested, parallel with xargs. Plus safe file iteration with find -print0, globbing pitfalls, and macOS Bash 3.2 vs Linux Bash 4+ differences.

Bash For Loops: Syntax, Examples, and One-Liners

Every form of the Bash for loop with working examples: brace-range, sequence-expression, array, glob, C-style, nested, and parallel. Plus the safe file-iteration patterns, common pitfalls, and macOS Bash 3.2 vs Linux Bash 4+ gotchas.