Logs: liberachat/#haskell

←Prev Next→

Page 1 .. 17549 17550 17551 17552 17553 17554 17555 17556 17557 17558 17559 .. 17990

— 1,798,956 events total

2026-02-14 20:17:44	<tomsmeding>	I see
2026-02-14 20:17:56	<int-e>	(Rather than digging into the history I'll just assume this hasn't changed recently except for bumping versions.)
2026-02-14 20:18:07	→	s3np41 joins (~s3np41@078088254000.unknown.vectranet.pl)
2026-02-14 20:18:17	<tomsmeding>	do you happen to know how critical those version upper bounds are, in GHC's LLVM support?
2026-02-14 20:18:38	<int-e>	I don't
2026-02-14 20:21:49	×	merijn quits (~merijn@host-cl.cgnat-g.v4.dfn.nl) (Ping timeout: 250 seconds)
2026-02-14 20:22:08	<int-e>	Hmm. For me, the baked in commands (from the settings file) have a version, e.g. llc-14 for ghc-9.10.3 and llc-19 for ghc-9.12.2. I wonder what the binary distributions put there... ghc --info \| grep LLVM will show that info without you having to go looking for the settings file.
2026-02-14 20:23:07	<geekosaur>	they can be pretty critical but I don't think it's critical for recent versions
2026-02-14 20:23:31	<geekosaur>	there was a point where `opt` parameters changed and ghc didn't know how tgo call newer versions correctly
2026-02-14 20:25:54	<geekosaur>	also I mentioned the settings file bvecause I saw a claim in backscroll that the correct llvm version wasn't on their PATH, which means a settings file edit to point to the correct one
2026-02-14 20:26:38	→	ouilemur joins (~jgmerritt@user/ouilemur)
2026-02-14 20:28:52	<probie>	It's not giving me SIMD instruction :'(
2026-02-14 20:29:02	<probie>	I wonder if it's the use of `read` instead of `unsafeRead`?
2026-02-14 20:29:31	→	pavonia joins (~user@user/siracusa)
2026-02-14 20:32:46	→	merijn joins (~merijn@host-cl.cgnat-g.v4.dfn.nl)
2026-02-14 20:33:37	×	peterbecich quits (~Thunderbi@71.84.33.135) (Ping timeout: 264 seconds)
2026-02-14 20:34:44	<tomsmeding>	possibly, yes
2026-02-14 20:37:41	×	merijn quits (~merijn@host-cl.cgnat-g.v4.dfn.nl) (Ping timeout: 244 seconds)
2026-02-14 20:38:17	×	caubert quits (~caubert@user/caubert) (Ping timeout: 250 seconds)
2026-02-14 20:38:35	×	Pixi quits (~Pixi@user/pixi) (Quit: Leaving)
2026-02-14 20:43:03	<[exa]>	probie: btw why not make a small FFI to a relatively portable C?
2026-02-14 20:43:41	<[exa]>	(man, can we FFI to futhark?)
2026-02-14 20:43:46	→	Pixi joins (~Pixi@user/pixi)
2026-02-14 20:44:08	<tomsmeding>	[exa]: https://gitlab.com/Gusten_Isfeldt/futhask
2026-02-14 20:44:27	<tomsmeding>	(never used it)
2026-02-14 20:45:36	→	peterbecich joins (~Thunderbi@71.84.33.135)
2026-02-14 20:45:43	<[exa]>	ok not bad :)
2026-02-14 20:49:55	→	merijn joins (~merijn@host-cl.cgnat-g.v4.dfn.nl)
2026-02-14 20:52:13	<probie>	[exa]: Because I shouldn't need to
2026-02-14 20:53:21	→	caubert joins (~caubert@user/caubert)
2026-02-14 20:54:32	×	machinedgod quits (~machinedg@d75-159-126-101.abhsia.telus.net) (Ping timeout: 252 seconds)
2026-02-14 20:56:12	<[exa]>	probie: I find it better than relying on the compiler accidentaly noticing that I want SIMD (but yeah it's still :( )
2026-02-14 20:56:37	<tomsmeding>	GHC has SIMD primops, but they only work with LLVM
2026-02-14 20:56:50	<tomsmeding>	very recently IIRC some of them started working on NCG too
2026-02-14 20:56:55	<[exa]>	probie: btw try to unroll the loop manually, that might give llvm enough decisive force
2026-02-14 20:57:02	×	merijn quits (~merijn@host-cl.cgnat-g.v4.dfn.nl) (Ping timeout: 256 seconds)
2026-02-14 20:57:03	<probie>	I don't think it's LLVM's problem here; GHC is just not generating good code https://paste.tomsmeding.com/8ZYY5Pka
2026-02-14 20:58:11	<tomsmeding>	why are there so many loads for only two stores? I assume this is different code than you posted originally?
2026-02-14 20:58:23	×	caubert quits (~caubert@user/caubert) (Ping timeout: 252 seconds)
2026-02-14 20:59:21	<[exa]>	that looks like a lot of indirection
2026-02-14 20:59:26	<probie>	https://paste.tomsmeding.com/NRYKh5Fj
2026-02-14 21:00:09	×	L29Ah quits (~L29Ah@wikipedia/L29Ah) (Ping timeout: 260 seconds)
2026-02-14 21:00:22	<[exa]>	probie: you have unboxed or primitive vectors?
2026-02-14 21:01:42	<tomsmeding>	probie: if it's easy to paste the optimised LLVM IR, that would make it easier to see what's going on, probably
2026-02-14 21:02:03	<int-e>	IOVector is boxed.
2026-02-14 21:02:09	<tomsmeding>	there are a bunch of loop-invariant loads here that I expect llvm to lift out
2026-02-14 21:02:22	<EvanR>	last I heard ghc didn't have SIMD support
2026-02-14 21:02:35	<EvanR>	oh, LLVM
2026-02-14 21:02:52	<tomsmeding>	int-e: every mutable vector variant has its own definition of the "IOVector" type synonym
2026-02-14 21:02:59	<[exa]>	int-e: afaik you can import the one from the .unboxed.mutable or .primitive.mutable module
2026-02-14 21:03:10	<int-e>	tomsmeding: gah
2026-02-14 21:03:28	<probie>	int-e: it's not I omitted the `import qualified Data.Vector.Unboxed.Mutable as V`. Weirdly, I get slightly better llvm if I use `Storable` instead of `Unboxed`
2026-02-14 21:03:42	<int-e>	Right. I should've known that.
2026-02-14 21:03:48	<tomsmeding>	in general, Storable is more straightforward
2026-02-14 21:03:56	<tomsmeding>	but in theory, either should work here
2026-02-14 21:04:08	<fgarcia>	llvm goes to at least 23 now. it could be the SIMD changes haven't made it down
2026-02-14 21:04:25	×	infinity0 quits (~infinity0@pwned.gg) (Ping timeout: 255 seconds)
2026-02-14 21:06:12	<[exa]>	probie: man, you're introducing a data dependency there, it can't simd
2026-02-14 21:06:45	<tomsmeding>	[exa]: isn't this code just zipWith (+)
2026-02-14 21:06:50	<tomsmeding>	oh no
2026-02-14 21:06:57	<[exa]>	it's writing back to the original vector
2026-02-14 21:07:02	<tomsmeding>	yeah probie ^
2026-02-14 21:07:15	<tomsmeding>	lol
2026-02-14 21:07:34	→	caubert joins (~caubert@user/caubert)
2026-02-14 21:07:50	<[exa]>	probie: try this https://paste.tomsmeding.com/GjpwizwI
2026-02-14 21:08:28	<[exa]>	(edited right into pastebin so didn't try it but you see the point I guess)
2026-02-14 21:08:49	<tomsmeding>	with this being Word8 you may even want to unroll 32x
2026-02-14 21:09:11	<tomsmeding>	or at least 16x to use 128bit SSE4 registers
2026-02-14 21:09:20	<[exa]>	oh
2026-02-14 21:09:27	<[exa]>	ok I somehow hoped this is at least floats
2026-02-14 21:09:30	<tomsmeding>	but 4 should at least get you different assembly
2026-02-14 21:09:35	→	merijn joins (~merijn@host-cl.cgnat-g.v4.dfn.nl)
2026-02-14 21:09:40	<[exa]>	are there SIMD instructions for chars?
2026-02-14 21:09:48	<tomsmeding>	yes
2026-02-14 21:10:02	[exa]	learned today
2026-02-14 21:10:17	<tomsmeding>	https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=epi8
2026-02-14 21:10:29	<tomsmeding>	_mm_add_epi8 is the one you want here (paddb)
2026-02-14 21:11:05	<tomsmeding>	or the _mm256 version, or _mm512 if you want to use your juicy AVX512
2026-02-14 21:11:39	<tomsmeding>	think about that, 64 adds with 1-cycle latency
2026-02-14 21:11:56	<[exa]>	oh these are the epi8 instructions from the intrinsic guide that I ignored everytime
2026-02-14 21:12:10	<tomsmeding>	epi is integer stuff
2026-02-14 21:12:20	×	peterbecich quits (~Thunderbi@71.84.33.135) (Ping timeout: 256 seconds)
2026-02-14 21:12:51	<tomsmeding>	and apparently it can even do two of those _mm256_add_epi8 instructions in one cycle, by the CPI of 0.5
2026-02-14 21:13:28	<tomsmeding>	(yes, the throughput label is misleading; I checked that a div_pd has 4 there and add_pd 0.5, so indeed it's CPI = 1/throughput)
2026-02-14 21:13:46	<probie>	There isn't really a data dependency though, since memory is never read again after being written
2026-02-14 21:13:52	<[exa]>	tomsmeding: where do you read that? intel intrinsics guide says 3 per cycle
2026-02-14 21:13:56	<probie>	oh wait, <expletive>
2026-02-14 21:14:04	<tomsmeding>	probie: the compiler doesn't know that
2026-02-14 21:14:06	<probie>	there can be aliasing
2026-02-14 21:14:09	<tomsmeding>	yes
2026-02-14 21:14:20	<[exa]>	probie: memory order too strong QQ
2026-02-14 21:14:29	×	merijn quits (~merijn@host-cl.cgnat-g.v4.dfn.nl) (Ping timeout: 245 seconds)
2026-02-14 21:14:33	<tomsmeding>	[exa]: oh I mistyped, I meant _mm512_add_epi8
2026-02-14 21:14:38	<[exa]>	probie: anyway it might be the case that the compiler just ignores it but I'd bet this is the problem number 1
2026-02-14 21:14:40	<tomsmeding>	the _mm256 and _mm variants indeed have 3
2026-02-14 21:14:52	<tomsmeding>	this is DEFINITELY not ignored by llvm
2026-02-14 21:15:11	<probie>	Even gcc only ignores it if you pass -O3 IIRC
2026-02-14 21:15:13	<tomsmeding>	and I can also assure you that GHC will not tell LLVM that these things do not alias
2026-02-14 21:15:23	<tomsmeding>	ignoring this is a blatant violation of the semantics
2026-02-14 21:15:42	<tomsmeding>	I would be surprised if ghc does this at any optimisation level

←Prev Next→

Page 1 .. 17549 17550 17551 17552 17553 17554 17555 17556 17557 17558 17559 .. 17990

All times are in UTC.