Logs: liberachat/#haskell
| 2021-08-10 06:56:54 | <edwardk> | its more that copilot doesn't care enough about such things |
| 2021-08-10 06:57:28 | <edwardk> | and so taking an off the shelf language model leads to a mess. just doesn't have enough haskell under its belt, and you probably want a masked language model to work around hole driven development |
| 2021-08-10 06:57:38 | → | dhouthoo joins (~dhouthoo@178-117-36-167.access.telenet.be) |
| 2021-08-10 06:57:39 | <edwardk> | rather than a token stream model |
| 2021-08-10 06:57:41 | <Gurkenglas> | You should tell the copilot people, might starstrike them into adding support for isomorphisms and transpilers and such |
| 2021-08-10 06:58:38 | <edwardk> | right now i'm kind of annoyed that they basically didn't think anything about original code licensing, so i'm not terribly incentivized to help them rip off even longer parroted sections of my codebases. |
| 2021-08-10 06:58:50 | <edwardk> | even if my licenses were pretty damn permissive, they were there |
| 2021-08-10 06:59:40 | <Gurkenglas> | eh, some future timelines are bound and try to balance out the expected future compensation |
| 2021-08-10 06:59:53 | <Gurkenglas> | *bound to try and |
| 2021-08-10 07:00:08 | <edwardk> | well, here's hoping one of them acausally trades me a lottery ticket |
| 2021-08-10 07:00:44 | × | dhouthoo quits (~dhouthoo@178-117-36-167.access.telenet.be) (Client Quit) |
| 2021-08-10 07:00:52 | <Gurkenglas> | Do you expect switching from miri to groq to increase global expected utility? |
| 2021-08-10 07:03:48 | <edwardk> | That's a tough question. I still very much believe in MIRI and MIRI's mission. I just don't necessarily think that the best way for them to achieve that mission is to dump money into paying me to live in Berkeley and burn cash supporting my work and compute needs at this time, dollar for dollar. |
| 2021-08-10 07:04:01 | × | takuan quits (~takuan@178-116-218-225.access.telenet.be) (Ping timeout: 248 seconds) |
| 2021-08-10 07:06:01 | <edwardk> | I started working with Groq explicitly to get access to a bunch of compute for MIRI of all things. When MIRI had a bit of a realization that they needed to realign their research program, I stepped down and just decided to switch to working for Groq full time, pushing a bit more of a focus on HPC workloads than ML, almost incidentally. Out of all the players in the TPU-ish space, the Groq chip is the closest thing to something suitable |
| 2021-08-10 07:06:01 | <edwardk> | for "my kind" of workloads rather than traditional ML or HPC. If I don't get directly involved then really none of them will be suitable. If I stay directly involved at least one vendor in this space might be producing a chip I can use. |
| 2021-08-10 07:06:19 | <Gurkenglas> | I sorta hoped that two MIRI-level people talking all day is much more than double lonely output :( |
| 2021-08-10 07:07:25 | <edwardk> | I kinda loved working with Nate, and James, and Max, and company. And when there was a lot more Haskell going on I definitely felt like I was offering a pretty good force multiplier to their existing research. |
| 2021-08-10 07:07:45 | <Gurkenglas> | Are FPGA-type chips not 80/20 on being fit for whatever you'd like to do 5 years from now? |
| 2021-08-10 07:07:57 | <edwardk> | Not even 1/100. |
| 2021-08-10 07:08:46 | <edwardk> | FPGAs are pretty cripplingly limited devices. |
| 2021-08-10 07:08:50 | <Gurkenglas> | Damn. How come? Surely not quantum stuff. What's the elementary operations or underlying properties you need? |
| 2021-08-10 07:09:10 | <siraben> | edwardk: codepilot on haskell is trash |
| 2021-08-10 07:09:10 | <siraben> | Doesn't even typecheck |
| 2021-08-10 07:09:11 | <siraben> | (IME) |
| 2021-08-10 07:09:21 | <edwardk> | not sure if you've looked at https://groq.com/wp-content/uploads/2020/06/ISCA-TSP.pdf |
| 2021-08-10 07:09:29 | <siraben> | we have semantic information like static types... why not use it like HLS does? |
| 2021-08-10 07:09:29 | <siraben> | Ugh. |
| 2021-08-10 07:09:37 | <Gurkenglas> | have not, didnt know you switched from miri to groq until this coversation |
| 2021-08-10 07:10:19 | <Gurkenglas> | siraben, given how little everyone seems to understand mlp architectures it'd have to wrap around that blackbox |
| 2021-08-10 07:10:25 | <edwardk> | i've pushed a couple of different ways to compile functional code for gpu-like architectures over the last few years. one of them rhymes with the SPMD-on-SIMD designs used by the intel SPMD program compiler. the chip is pretty much a SIMD unit on steroids. |
| 2021-08-10 07:10:52 | <siraben> | Gurkenglas: MLP? |
| 2021-08-10 07:11:02 | <Gurkenglas> | siraben, uhhhhh NLP |
| 2021-08-10 07:11:30 | <edwardk> | Gurkenglas: i'm still very much keeping my door open for MIRI. If they need help I'm more than happy to talk. I'm more than willing to fly out and help them run workshops and the like. |
| 2021-08-10 07:11:32 | <siraben> | Ah, heh |
| 2021-08-10 07:11:41 | <edwardk> | Gurkenglas: I just don't really feel like I need to take their money to do those things. |
| 2021-08-10 07:12:06 | <siraben> | Gurkenglas: not to mention the legal issues with codepilot, I would disallow usage of it in any team |
| 2021-08-10 07:12:38 | <Gurkenglas> | siraben, sounds troubling. Is it just because they didn't bother to ask? |
| 2021-08-10 07:12:40 | × | adam1 quits (~adam@2001-b011-4007-2236-a1a1-867b-8ec5-4452.dynamic-ip6.hinet.net) (Ping timeout: 258 seconds) |
| 2021-08-10 07:12:53 | <edwardk> | I don't get paid for the work I do for Topos, or for the Haskell Foundation either, so its more a sign of a gradual evolution of 'hey i'm working for you for cash' to 'hey i really believe in your cause' from my perspective. |
| 2021-08-10 07:12:56 | <siraben> | What sounds troubling? |
| 2021-08-10 07:13:14 | <edwardk> | and for now, i think the major thing i can do is carry on in this direction, which i can't really do from within the confines of MIRI |
| 2021-08-10 07:13:19 | <siraben> | I think it's troubling that codepilot is capable of spitting out GPL3 licensed code verbatim with no citation to be used against thel icense. |
| 2021-08-10 07:13:31 | <siraben> | GPLv3+* |
| 2021-08-10 07:14:07 | <Gurkenglas> | siraben, that copilot-type approaches might fizzle out from legal issues. I mean, I guess we don't *really* need to reduce the number of people needed to produce a big ml project without knowing how it works... |
| 2021-08-10 07:14:14 | <edwardk> | siraben: i would be much more sanguine about that if they hadn't gone through all the @*(#) trouble over the last few years of starting to tag each and every repo with exactly what license it was under, so its pretty damn obvious how to tag the training data. |
| 2021-08-10 07:15:01 | <siraben> | edwardk: and that they didn't train it on their own private codebases is uh, telling. |
| 2021-08-10 07:15:31 | <Gurkenglas> | if it's just verbatim copies you're worried about, that seems like they didnt need to do it on the ml layer, just include attribution with the completions when you do find it in the training set |
| 2021-08-10 07:15:33 | <edwardk> | Gurkenglas: it does admittedly exacerbate that machine learning models seem to be treated entirely as a way to launder human bias so it can all be blamed on an unknowable machine these days |
| 2021-08-10 07:16:24 | <edwardk> | Gurkenglas: thing is if i take your fancy image and run it through a lossy compression scheme its still your image. this isn't appreciably different than that. |
| 2021-08-10 07:16:31 | <Gurkenglas> | I doubt they're deliberately switching to ml so the bias is laundered. It'd look like this even if everyone just never thought about bias. |
| 2021-08-10 07:16:33 | → | gehmehgeh joins (~user@user/gehmehgeh) |
| 2021-08-10 07:16:54 | <Gurkenglas> | edwardk, the same could be said about be learning haskell by reading your code |
| 2021-08-10 07:16:58 | <edwardk> | if they included the attributions of everything that contributed to a weight in a gigantic 4 petabyte acknowledgements file, i'd be okay with it |
| 2021-08-10 07:17:02 | <Gurkenglas> | s/be/me/ |
| 2021-08-10 07:17:38 | <Gurkenglas> | you want me gravestone to include an attribution to you? :) ill see what i can do |
| 2021-08-10 07:17:49 | <edwardk> | =) |
| 2021-08-10 07:18:16 | <edwardk> | don't get me wrong, intellectual property issues are going to be a huge dumpster fire for the foreseeable future |
| 2021-08-10 07:18:39 | <edwardk> | copilot is an opening salvo in this war, one that takes a pretty extreme position |
| 2021-08-10 07:19:15 | <edwardk> | and frankly as someone who writes code it generally won't be me winning or losing the battles that shape this, it'll be the lawyers |
| 2021-08-10 07:19:27 | <Gurkenglas> | t'was always going to happen when everyone noticed that plagiarism doesn't mean much when humans are just patternmatchers will some compilation checking thrown in |
| 2021-08-10 07:20:08 | <gehmehgeh> | uhhh, what did I miss? I've just logged in |
| 2021-08-10 07:20:23 | <siraben> | gehmehgeh: codepilot BS |
| 2021-08-10 07:20:43 | <siraben> | I mentioned that it performs terribly on Haskell code |
| 2021-08-10 07:20:59 | <edwardk> | i'm not one for hard lines here. i do tend to believe though that if i copy a bunch of algorithms from someone else's code that i'll include them in the copyright of the files i produce, even if i don't legally have to, e.g. because i transliterated from python to haskell or what have you. you can find that across hundreds of repos of mine. |
| 2021-08-10 07:21:07 | <Gurkenglas> | gehmehgeh, my languages question summoned Big Edward, I derailed into an interview, be careful not to bore him away! |
| 2021-08-10 07:21:26 | <Gurkenglas> | not that i'm planning to write a press article ._. |
| 2021-08-10 07:22:00 | <edwardk> | its not boredom that will eventually drive me away but the fact that i have a long day at work tomorrow as i'm trying to frontload a bunch of stuff before i disappear for icfp |
| 2021-08-10 07:22:23 | <siraben> | anyone attending PLMW? |
| 2021-08-10 07:22:28 | <gehmehgeh> | hmm |
| 2021-08-10 07:24:07 | <Gurkenglas> | siraben, do you think that if we wrote a performant man in the middle for copilot packets that checks a github bloom filter for plagiarism and adds an attribution, that microsoft would use it? |
| 2021-08-10 07:24:26 | → | acidjnk_new joins (~acidjnk@p200300d0c72b951794521116c489c693.dip0.t-ipconnect.de) |
| 2021-08-10 07:24:52 | → | acidjnk_new3 joins (~acidjnk@p200300d0c72b9517f04572592712ff76.dip0.t-ipconnect.de) |
| 2021-08-10 07:25:08 | <edwardk> | I do wish PLMW was a thing when i went to my first ICFP. when i showed up then I had _no_ idea what a conference was for, or even really any sense of the structure of the whole publishing 'game' of academia. I was shockingly naive. |
| 2021-08-10 07:25:59 | <edwardk> | I mean, I never really did figure it all out. I just started writing down a bunch of code and looked up and it was 10 years later and I had grey hair. |
| 2021-08-10 07:27:32 | → | lavaman joins (~lavaman@98.38.249.169) |
| 2021-08-10 07:27:33 | <dibblego> | ey mate |
| 2021-08-10 07:27:50 | <Gurkenglas> | ...better: do this performantly enough to do it on the copilot users machine, publish an extension, and let microsoft decide whether to bundle it, so that those who wish to can use copilot ethically. |
| 2021-08-10 07:28:07 | → | vysn joins (~vysn@user/vysn) |
| 2021-08-10 07:28:16 | <dibblego> | oh copilot, lawyers v programmers |
| 2021-08-10 07:28:34 | × | jespada quits (~jespada@90.254.247.46) (Ping timeout: 240 seconds) |
| 2021-08-10 07:28:46 | × | acidjnk_new quits (~acidjnk@p200300d0c72b951794521116c489c693.dip0.t-ipconnect.de) (Ping timeout: 258 seconds) |
| 2021-08-10 07:29:20 | <Gurkenglas> | edwardk, do you think one should just ignore the academia game if one can just do math until one gets noticed anyway |
| 2021-08-10 07:29:25 | <edwardk> | Gurkenglas: i'd rather have the training data actually curated by licenses you're willing to accept into the codebase, then i can know i'm contamination free from sources i can't use. i rather deliberately don't read a lot of GPL code for instance. |
| 2021-08-10 07:29:44 | <edwardk> | Gurkenglas: worked for me. hasn't worked for a lot more people than me. |
| 2021-08-10 07:30:04 | <Gurkenglas> | Whoa. Okay, that does bite the humans-are-plagiarism bullet |
| 2021-08-10 07:30:20 | <Taneb> | I kind of feel like copilot is solving the wrong problem, or solving a right problem in the wrong place maybe. Writing code is (relatively) easy. Writing code that I trust is correct is a lot harder |
| 2021-08-10 07:30:30 | <Gurkenglas> | Do you also not read code that you expected to have ignored licenses? |
| 2021-08-10 07:31:03 | <edwardk> | Gurkenglas: to do legit reverse engineering you often need two teams. one to take the original and write a white paper describing it, and the others to read the white paper and reimplement. almost anything else will get you smashed in court. |
| 2021-08-10 07:31:05 | lortabac_ | is now known as lortabac |
| 2021-08-10 07:31:05 | × | jneira_ quits (~jneira_@28.red-80-28-169.staticip.rima-tde.net) (Quit: Connection closed) |
| 2021-08-10 07:31:06 | → | ham2 joins (~ham4@d8d8627d5.access.telenet.be) |
| 2021-08-10 07:31:18 | <edwardk> | at least if it ever gets tested |
| 2021-08-10 07:31:25 | → | jespada joins (~jespada@90.254.247.46) |
| 2021-08-10 07:31:58 | <edwardk> | lots of "Halt and Catch Fire" era stuff hinged on that kind of cleanliness. the world just got sloppy |
| 2021-08-10 07:32:00 | <Gurkenglas> | ...so what you would want from the mlp people is to turn github into a giant natural language book, and then turn the book into copilot? |
| 2021-08-10 07:32:09 | <Gurkenglas> | *nlp |
| 2021-08-10 07:32:10 | × | ham quits (~ham4@user/ham) (Ping timeout: 268 seconds) |
| 2021-08-10 07:32:40 | × | lavaman quits (~lavaman@98.38.249.169) (Ping timeout: 272 seconds) |
| 2021-08-10 07:32:41 | <edwardk> | it'd be a lot closer to getting through our existing legal system. but i don't make the rules. |
All times are in UTC.