Home freenode/#haskell: Logs Calendar

Logs: freenode/#haskell

←Prev  Next→ 502,152 events total
2021-04-12 00:18:53 ulfryk joins (~ulfryk@2a01:4b00:872d:e600:a55a:b8e3:54cc:d8d6)
2021-04-12 00:21:33 × myShoggoth quits (~myShoggot@75.164.73.93) (Ping timeout: 240 seconds)
2021-04-12 00:22:59 wroathe joins (~wroathe@c-68-54-25-135.hsd1.mn.comcast.net)
2021-04-12 00:23:12 × Sgeo quits (~Sgeo@ool-18b98aa4.dyn.optonline.net) (Ping timeout: 240 seconds)
2021-04-12 00:23:27 × ulfryk quits (~ulfryk@2a01:4b00:872d:e600:a55a:b8e3:54cc:d8d6) (Ping timeout: 260 seconds)
2021-04-12 00:23:36 Sgeo joins (~Sgeo@ool-18b98aa4.dyn.optonline.net)
2021-04-12 00:24:12 × star_cloud quits (~star_clou@ec2-34-220-44-120.us-west-2.compute.amazonaws.com) (Ping timeout: 240 seconds)
2021-04-12 00:25:01 zyeri joins (zyeri@gateway/shell/tilde.team/x-worsvflxuunnsvnw)
2021-04-12 00:25:01 × zyeri quits (zyeri@gateway/shell/tilde.team/x-worsvflxuunnsvnw) (Changing host)
2021-04-12 00:25:01 zyeri joins (zyeri@tilde.team/users/zyeri)
2021-04-12 00:26:21 justanotheruser joins (~justanoth@unaffiliated/justanotheruser)
2021-04-12 00:26:38 × quinn quits (~quinn@c-73-223-224-163.hsd1.ca.comcast.net) (Ping timeout: 240 seconds)
2021-04-12 00:27:28 × wroathe quits (~wroathe@c-68-54-25-135.hsd1.mn.comcast.net) (Ping timeout: 252 seconds)
2021-04-12 00:28:44 <koz_> d34df00d: What's your question(s)?
2021-04-12 00:28:45 × Tario quits (~Tario@200.119.187.163) (Read error: Connection reset by peer)
2021-04-12 00:31:36 quinn joins (~quinn@c-73-223-224-163.hsd1.ca.comcast.net)
2021-04-12 00:34:18 Tario joins (~Tario@201.192.165.173)
2021-04-12 00:38:36 × tmciver quits (~tmciver@cpe-172-101-40-226.maine.res.rr.com) (Ping timeout: 260 seconds)
2021-04-12 00:40:20 tmciver joins (~tmciver@cpe-172-101-40-226.maine.res.rr.com)
2021-04-12 00:40:36 cloudpip joins (sid67735@gateway/web/irccloud.com/x-lqqwgjfhbduhzygo)
2021-04-12 00:41:09 × acidjnk_new quits (~acidjnk@p200300d0c72b950365222184c91f1222.dip0.t-ipconnect.de) (Ping timeout: 250 seconds)
2021-04-12 00:41:56 <cloudpip> hi all, I'm trying to build a recompile-and-run-loop with ghc, so it'll compile your code and run the main function in a loop so for example in an interactive program, you can close the program and it'll recompile the sources that changed and restart main
2021-04-12 00:42:07 × abhixec quits (~abhixec@c-67-169-139-16.hsd1.ca.comcast.net) (Remote host closed the connection)
2021-04-12 00:42:17 <cloudpip> https://github.com/homectl/workspace/blob/main/livecoding/src/Debug/LiveCoding.hs <- it works, with 2 caveats I'd like to resolve
2021-04-12 00:44:00 <cloudpip> 1) most importantly, I want it to compile to object code, like ghci -fobject-code. this clearly is possible (since ghci -fobject-code works), but when I set it to HscAsm, it no longer reloads the modules even though it does recompile them
2021-04-12 00:44:14 <cloudpip> HscAsm does work, but reloading the modules does not
2021-04-12 00:45:55 <cloudpip> 2) adding -hide-all-packages (via Opt_HideAllPackages) makes it crash: https://www.irccloud.com/pastebin/SL8YmJSc/
2021-04-12 00:46:36 × vicfred quits (~vicfred@unaffiliated/vicfred) (Quit: Leaving)
2021-04-12 00:49:38 jamm_ joins (~jamm@unaffiliated/jamm)
2021-04-12 00:53:52 × jamm_ quits (~jamm@unaffiliated/jamm) (Ping timeout: 258 seconds)
2021-04-12 00:56:02 × ViCi quits (daniel@10PLM.ro) (Quit: Quit!)
2021-04-12 00:56:40 abhixec joins (~abhixec@c-67-169-139-16.hsd1.ca.comcast.net)
2021-04-12 00:57:40 <wrunt> cloudpip: maybe you can find a clue in the implementation of Dyre, since it does run-time compilation? (https://github.com/willdonnelly/dyre)
2021-04-12 00:58:17 <cloudpip> I'm staring at GHCi.UI and I don't see what I'm doing differently
2021-04-12 00:58:24 <cloudpip> it looks exactly the same to me
2021-04-12 00:59:23 <cloudpip> https://github.com/willdonnelly/dyre/blob/master/Config/Dyre/Compile.hs#L73
2021-04-12 00:59:27 <cloudpip> dyre seems to just call ghc?
2021-04-12 01:00:10 <cloudpip> I specifically don't want to call ghc, because I don't want to wait 1 minute for ld to do the executable linking
2021-04-12 01:00:22 <cloudpip> ghci in-memory linking is super fast, so I want that
2021-04-12 01:01:01 <cloudpip> I looked at hint, it only does HscInterpreted
2021-04-12 01:03:45 <cloudpip> now I'm looking at "plugins", which does in fact do loading of .o files, but it's more low level than what I need.. still, it might be useful (though it doesn't work on windows)
2021-04-12 01:05:44 ulfryk joins (~ulfryk@2a01:4b00:872d:e600:a55a:b8e3:54cc:d8d6)
2021-04-12 01:05:47 vicfred joins (~vicfred@unaffiliated/vicfred)
2021-04-12 01:05:58 GZJ0X_ joins (~gzj@unaffiliated/gzj)
2021-04-12 01:07:18 × ulfryk quits (~ulfryk@2a01:4b00:872d:e600:a55a:b8e3:54cc:d8d6) (Remote host closed the connection)
2021-04-12 01:08:29 × Tuplanolla quits (~Tuplanoll@91-159-68-239.elisa-laajakaista.fi) (Quit: Leaving.)
2021-04-12 01:09:49 × gzj quits (~gzj@unaffiliated/gzj) (Ping timeout: 252 seconds)
2021-04-12 01:10:31 DTZUZU_ joins (~DTZUZO@207.81.119.43)
2021-04-12 01:11:04 × whataday quits (~xxx@2400:8902::f03c:92ff:fe60:98d8) (Remote host closed the connection)
2021-04-12 01:11:44 <d34df00d> koz_: well, I have this code for doing IDCT (the bottom-most function, idctBlocks, is doing that, plus collecting all of the results to ensure things are fully evaluated, but that's perhaps irrelevant):
2021-04-12 01:11:50 <d34df00d> https://bpaste.net/PXTA
2021-04-12 01:11:56 <d34df00d> It's also built with -fllvm -O2
2021-04-12 01:12:11 whataday joins (~xxx@2400:8902::f03c:92ff:fe60:98d8)
2021-04-12 01:12:12 × DTZUZU quits (~DTZUZO@205.ip-149-56-132.net) (Ping timeout: 240 seconds)
2021-04-12 01:12:57 <d34df00d> And it is ridiculously slow. It takes about 2 seconds of CPU time on some test data I have (which has about 1 million of 8×8 matrices over which IDCT happens, so about 64 million elements in the vector that the function takes).
2021-04-12 01:13:04 ulfryk joins (~ulfryk@2a01:4b00:872d:e600:a55a:b8e3:54cc:d8d6)
2021-04-12 01:13:29 <d34df00d> My rough estimate of the time required for this is from 250 milliseconds (for dumb, scalar code) to about 30 ms if SIMD is involved.
2021-04-12 01:14:03 <d34df00d> So I wonder what I'm doing wrong and how can I make this faster.
2021-04-12 01:15:05 <d34df00d> Ah, and the performance of the code is insensitive to whether I'm going row-wise or column-wise — replacing arrSlice = R.unsafeSlice arr (sh :. x :. All) with arrSlice = R.unsafeSlice arr (sh :. All :. x) there (and similarly for idctSlice) has no effect on performance whatsoever.
2021-04-12 01:15:14 <d34df00d> Which definitely should not happen for a well-optimized code.
2021-04-12 01:17:38 × ulfryk quits (~ulfryk@2a01:4b00:872d:e600:a55a:b8e3:54cc:d8d6) (Ping timeout: 258 seconds)
2021-04-12 01:17:58 × quinn quits (~quinn@c-73-223-224-163.hsd1.ca.comcast.net) (Ping timeout: 240 seconds)
2021-04-12 01:18:38 ulfryk joins (~ulfryk@2a01:4b00:872d:e600:a55a:b8e3:54cc:d8d6)
2021-04-12 01:19:31 quinn joins (~quinn@c-73-223-224-163.hsd1.ca.comcast.net)
2021-04-12 01:19:49 wroathe joins (~wroathe@c-68-54-25-135.hsd1.mn.comcast.net)
2021-04-12 01:21:14 DTZUZU joins (~DTZUZO@205.ip-149-56-132.net)
2021-04-12 01:21:22 × quinn quits (~quinn@c-73-223-224-163.hsd1.ca.comcast.net) (Client Quit)
2021-04-12 01:22:43 × xff0x quits (~xff0x@2001:1a81:5278:bf00:33a0:2c0f:72ed:caee) (Ping timeout: 260 seconds)
2021-04-12 01:23:11 × ulfryk quits (~ulfryk@2a01:4b00:872d:e600:a55a:b8e3:54cc:d8d6) (Ping timeout: 260 seconds)
2021-04-12 01:23:18 <koz_> What difference(s) do you observe without -fllvm?
2021-04-12 01:23:26 × DTZUZU_ quits (~DTZUZO@207.81.119.43) (Ping timeout: 240 seconds)
2021-04-12 01:24:14 ulfryk joins (~ulfryk@2a01:4b00:872d:e600:a55a:b8e3:54cc:d8d6)
2021-04-12 01:24:18 xff0x joins (~xff0x@2001:1a81:52af:1400:3ce5:1261:85cb:8b42)
2021-04-12 01:27:14 <d34df00d> Oh, much slower.
2021-04-12 01:27:17 <d34df00d> Still waiting…
2021-04-12 01:27:33 <koz_> OK, and I guess the same _with_ -fllvm but with -O1?
2021-04-12 01:27:40 <koz_> I'm trying to rule out weird regressions.
2021-04-12 01:27:50 <d34df00d> Alrighty, done waiting. 45 seconds with -fasm for that module vs -fllvm.
2021-04-12 01:28:09 <d34df00d> vs 2 for -fllvm, that is.
2021-04-12 01:28:12 <d34df00d> Let me try -O1 now.
2021-04-12 01:28:26 <d34df00d> Yeah, -O1 with -fllvm is slower, but not that much — 3.3 seconds vs 2 seconds.
2021-04-12 01:28:36 <koz_> OK so not a weird regression.
2021-04-12 01:28:40 koz_ thinks a bit.
2021-04-12 01:28:49 × ulfryk quits (~ulfryk@2a01:4b00:872d:e600:a55a:b8e3:54cc:d8d6) (Ping timeout: 250 seconds)
2021-04-12 01:29:11 quinn joins (~quinn@c-73-223-224-163.hsd1.ca.comcast.net)
2021-04-12 01:29:19 × quinn quits (~quinn@c-73-223-224-163.hsd1.ca.comcast.net) (Client Quit)
2021-04-12 01:29:32 <d34df00d> I mean, as a next step I could either try accelerate instead of repa, or try to write that stuff myself with primops (too bad ghc primops don't have horizontal add, meh!), or dunno.
2021-04-12 01:29:35 <koz_> Yeah, might be better see if anyone knows, cause I'm a bit mystified.
2021-04-12 01:29:47 <d34df00d> But I have a gut feel that repa can do better than that, and I'd like to know how.
2021-04-12 01:30:15 <d34df00d> I mean, it cannot be one or two orders of magnitude slower than something that's somewhat easily achievable.
2021-04-12 01:30:35 <koz_> Repa is meant to emit CPU code right?
2021-04-12 01:30:38 <koz_> Have you tried massiv?
2021-04-12 01:30:39 <d34df00d> Yep.
2021-04-12 01:30:41 <d34df00d> Nope.
2021-04-12 01:30:45 <koz_> I'd be curious if massiv could do better.
2021-04-12 01:31:16 <koz_> For Cabal files, if I wanna detect being on a Mac, I test 'os(macos)', right?
2021-04-12 01:31:56 <d34df00d> Yeah, I'll definitely give it a shot! I never really used massiv, so I'll be curious to see how it performs.
2021-04-12 01:32:08 merijn joins (~merijn@83-160-49-249.ip.xs4all.nl)
2021-04-12 01:32:13 <d34df00d> In another task where I used repa, it was very close to what I would expect performance-wise.
2021-04-12 01:32:51 <koz_> Yeah, but there is definitely some Repa-specific specialist knowledge required to make sense of what will run well or not.

All times are in UTC.