Logs: freenode/#haskell
| 2021-03-29 13:50:34 | <johnnyboy[m]> | and then I turn it into a nice structured format, e.g. JSON, XML, CSV, HTML tables, markdown tables, or LaTeX tables |
| 2021-03-29 13:50:41 | × | heatsink quits (~heatsink@2600:1700:bef1:5e10:f0bc:f236:90c7:a6f5) (Ping timeout: 252 seconds) |
| 2021-03-29 13:51:40 | <johnnyboy[m]> | I did use the optparse library for parsing the command line arguments and it works great |
| 2021-03-29 13:52:21 | <tomsmeding> | johnnyboy[m]: can you perhaps share the code that runs that regex? |
| 2021-03-29 13:52:39 | <tomsmeding> | to double-check syntax |
| 2021-03-29 13:52:49 | × | cr3 quits (~cr3@192-222-143-195.qc.cable.ebox.net) (Client Quit) |
| 2021-03-29 13:53:09 | → | ddellaco_ joins (~ddellacos@ool-44c73afa.dyn.optonline.net) |
| 2021-03-29 13:54:08 | → | cr3 joins (~cr3@192-222-143-195.qc.cable.ebox.net) |
| 2021-03-29 13:55:11 | <johnnyboy[m]> | <tomsmeding "johnnyboy: can you perhaps share"> this is the file where my regexes are: https://gitlab.com/jllang/spin2latex/-/blob/master/src/Token.hs |
| 2021-03-29 13:55:21 | <johnnyboy[m]> | I should really rename that project |
| 2021-03-29 13:55:36 | <[exa]> | johnnyboy[m]: you should really use attoparsec |
| 2021-03-29 13:55:39 | <johnnyboy[m]> | it's not really restricted to producing LaTeX tables anymore |
| 2021-03-29 13:56:06 | <tomsmeding> | oh hi john :) |
| 2021-03-29 13:56:21 | <johnnyboy[m]> | hi, tom :) |
| 2021-03-29 13:57:02 | × | graf_blutwurst quits (~user@2001:171b:226e:adc0:3dbe:eebd:8040:b693) (Remote host closed the connection) |
| 2021-03-29 13:57:13 | <johnnyboy[m]> | maybe a parser combinator library would be a good idea in the long run |
| 2021-03-29 13:57:44 | <[exa]> | like, I understand the language may be regular so a "full" context-free grammar parser looks like an overkill |
| 2021-03-29 13:57:57 | <johnnyboy[m]> | anyway, my intention is to simply just discard most of the input text and only pick a handful of interesting fields |
| 2021-03-29 13:58:11 | → | deviantfero joins (~deviantfe@190.150.27.58) |
| 2021-03-29 13:58:41 | <[exa]> | except, running normal attoparsecs is usually much less complex than compiling, optimizing and running the regexes |
| 2021-03-29 13:59:26 | <[exa]> | also, it quite often happens that you need to do very ugly regex tricks to capture stuff that's trivial with parsers |
| 2021-03-29 13:59:28 | <tomsmeding> | johnnyboy[m]: are you sure that [[:space:]] doesn't work? this page claims that it's supported (see under "Feature support"): https://hackage.haskell.org/package/regex-tdfa-1.3.1.0/docs/Text-Regex-TDFA.html |
| 2021-03-29 13:59:46 | <johnnyboy[m]> | maybe there's something else wrong then |
| 2021-03-29 13:59:48 | <[exa]> | and finally, you'll be ready for the moment you at some point realize you need to support parentheses. |
| 2021-03-29 14:00:09 | <johnnyboy[m]> | by the way, the version in github has a mistake there |
| 2021-03-29 14:00:23 | <tomsmeding> | [exa]: OP said that the text being parsed has been the same format since ages, so unlikely to change |
| 2021-03-29 14:00:23 | <johnnyboy[m]> | sorry, no |
| 2021-03-29 14:00:37 | <merijn> | tomsmeding: That's not really relevant, though :p |
| 2021-03-29 14:01:00 | <merijn> | tomsmeding: Because the attoparsec version will be simpler to read/write even if you don't have to update it |
| 2021-03-29 14:01:08 | <tomsmeding> | though I do agree that parser combinators are nicer than regex in Haskell, especially in Haskell, where parser combinators are so nice |
| 2021-03-29 14:01:17 | <tomsmeding> | merijn: that latter point depends on your familiarity with regex ;) |
| 2021-03-29 14:02:03 | × | nbloomf quits (~nbloomf@2600:1700:ad14:3020:cca4:232:630d:d55c) (Quit: My MacBook has gone to sleep. ZZZzzz…) |
| 2021-03-29 14:02:16 | <[exa]> | tomsmeding: I've heard this a few times. Usually followed by "whew what a nice export, what if we also packed in a $non_regular_feature to make the export more colorful?" |
| 2021-03-29 14:02:35 | <tomsmeding> | export != import? |
| 2021-03-29 14:02:39 | → | ixlun joins (~matthew@109.249.184.133) |
| 2021-03-29 14:03:16 | <[exa]> | (I meant the export that comes from the other part of the program) |
| 2021-03-29 14:03:48 | tomsmeding | is just trying to provide a bit of pushback to "how do I do X simple common thing with technique A? -- Well, please use technique PQR that does 10 other things but is much nicer" :) |
| 2021-03-29 14:03:53 | <tomsmeding> | not trying to be hostile |
| 2021-03-29 14:04:24 | <maerwald> | tomsmeding: parser combinators are more expressive and as such may not be desired :) that is following the principle of using the least powerful tool. |
| 2021-03-29 14:04:45 | <tomsmeding> | [exa]: ah, I see |
| 2021-03-29 14:04:46 | <maerwald> | That argument has also repeatedly been made by the LANGSEC authors |
| 2021-03-29 14:05:35 | <maerwald> | treat input as a language, write a validator and use the least expressive tool |
| 2021-03-29 14:05:52 | <maerwald> | in that sense, they also created parser combinator library for C |
| 2021-03-29 14:06:03 | → | MarcelineVQ joins (~anja@198.254.208.159) |
| 2021-03-29 14:06:10 | <maerwald> | (arguing that parser combinators are magnitudes more safer than a hand-written one) |
| 2021-03-29 14:06:14 | <merijn> | maerwald: parser combinators are just recursive descent parsers with a convenient paint of code |
| 2021-03-29 14:07:00 | → | heatsink joins (~heatsink@2600:1700:bef1:5e10:f0bc:f236:90c7:a6f5) |
| 2021-03-29 14:07:57 | <merijn> | Well written recursive descent parsers are just as efficient and minimal as their corresponding LALR(k) version. But most humans find recursive descent much easier to write/think about (and better errors) |
| 2021-03-29 14:08:29 | <tomsmeding> | % import Text.Regex.TDFA |
| 2021-03-29 14:08:30 | <yahb> | tomsmeding: ; <no location info>: error:; Could not find module `Text.Regex.TDFA'; It is not a module in the current program, or in any known package. |
| 2021-03-29 14:08:33 | <tomsmeding> | boo |
| 2021-03-29 14:08:50 | <tomsmeding> | anyway johnnyboy[m]: 'match (makeRegex "\t" :: Regex) "\t" :: Bool' gives True for me |
| 2021-03-29 14:09:05 | <johnnyboy[m]> | I think I'm going to replace tabs with spaces and see if I can then match against `[[:space:]]` |
| 2021-03-29 14:09:08 | <tomsmeding> | just a literal tab character is apparently valid in a regex-tdfa regex |
| 2021-03-29 14:09:19 | <johnnyboy[m]> | just to rule out the possibility that it's the tabs that somehow mess things up |
| 2021-03-29 14:10:10 | <[exa]> | tomsmeding: in the "pushback" direction I'd probably suggest awk :] |
| 2021-03-29 14:11:06 | <johnnyboy[m]> | okay, it's not the tabs |
| 2021-03-29 14:11:14 | × | heatsink quits (~heatsink@2600:1700:bef1:5e10:f0bc:f236:90c7:a6f5) (Ping timeout: 245 seconds) |
| 2021-03-29 14:11:15 | <johnnyboy[m]> | my regexes are just wrong somehow |
| 2021-03-29 14:11:31 | <tomsmeding> | cue the rest here saying you should use parser combinators :p |
| 2021-03-29 14:11:37 | <tomsmeding> | what's your source text and what's your regex |
| 2021-03-29 14:11:52 | → | malumore_ joins (~malumore@151.62.126.223) |
| 2021-03-29 14:12:12 | <johnnyboy[m]> | https://gitlab.com/jllang/spin2latex/-/blob/master/testdata/success1.txt |
| 2021-03-29 14:12:18 | <johnnyboy[m]> | that's a test file I'm using now |
| 2021-03-29 14:12:34 | → | waleee-cl joins (uid373333@gateway/web/irccloud.com/x-mhbpgvvowjjnvcmn) |
| 2021-03-29 14:14:01 | <johnnyboy[m]> | this is my regex: https://privatebin.net/?cea173e3eb0202b4#EJcfvuZf734KwdZUG8CBhoHfiNNj6cPH3E3M8hLs4o8u |
| 2021-03-29 14:14:07 | → | molehillish joins (~molehilli@2600:8800:8d06:1800:6438:fe04:a25d:577) |
| 2021-03-29 14:14:19 | × | ddellaco_ quits (~ddellacos@ool-44c73afa.dyn.optonline.net) (Remote host closed the connection) |
| 2021-03-29 14:14:29 | <johnnyboy[m]> | so I'm looking for a number, followed by "actual memory use for states" |
| 2021-03-29 14:14:42 | <johnnyboy[m]> | with whitespace (tab) between |
| 2021-03-29 14:14:45 | → | rj joins (~x@gateway/tor-sasl/rj) |
| 2021-03-29 14:15:03 | × | malumore quits (~malumore@151.62.126.223) (Ping timeout: 268 seconds) |
| 2021-03-29 14:15:11 | <tomsmeding> | johnnyboy[m]: are you sure you're skipping the initial whitespace? i.e. aren't you missing a prefix ' *'? |
| 2021-03-29 14:15:28 | <johnnyboy[m]> | `$ cat success1.txt | grep -E "[0-9]+.[0-9]{3}[[:space:]]+actual"` returns `0.292 actual memory usage for states` |
| 2021-03-29 14:16:08 | <tomsmeding> | yes because 'grep' allows matching at any point in a line |
| 2021-03-29 14:16:14 | <johnnyboy[m]> | ah |
| 2021-03-29 14:16:22 | <tomsmeding> | oh TDFA also does; ignore |
| 2021-03-29 14:16:24 | <johnnyboy[m]> | ok, I'll try adding an initial [[:space:]]+ |
| 2021-03-29 14:16:31 | <johnnyboy[m]> | or [[:space:]]* |
| 2021-03-29 14:16:41 | → | ddellaco_ joins (~ddellacos@ool-44c73afa.dyn.optonline.net) |
| 2021-03-29 14:17:10 | <tomsmeding> | it matches that line for me :p |
| 2021-03-29 14:17:18 | <dminuoso> | Regular expressions. How to introduce long lasting bugs by carelessly bolted-on regular expressions. |
| 2021-03-29 14:17:20 | <tomsmeding> | so your problem is outside of the regex I think |
| 2021-03-29 14:17:51 | <johnnyboy[m]> | but I have this other regex for picking the error count |
| 2021-03-29 14:18:11 | <johnnyboy[m]> | it works even if the line containing "errors: xxx" does start with something else |
| 2021-03-29 14:18:27 | <tomsmeding> | regexen are like excel: computer scientists are embarrassed to admit their effectiveness |
| 2021-03-29 14:18:33 | → | slack1256 joins (~slack1256@dvc-186-186-101-190.movil.vtr.net) |
| 2021-03-29 14:18:36 | × | molehillish quits (~molehilli@2600:8800:8d06:1800:6438:fe04:a25d:577) (Ping timeout: 258 seconds) |
| 2021-03-29 14:18:46 | × | Iceland_jack quits (~user@95.149.219.0) (Ping timeout: 268 seconds) |
| 2021-03-29 14:18:46 | <tdammers> | what about regexcel? |
| 2021-03-29 14:18:48 | <tomsmeding> | johnnyboy[m]: indeed, for me your regex matches that line |
| 2021-03-29 14:19:01 | <tomsmeding> | so I'm thinking the problem is not with the regex, but with the code that runs the regex |
| 2021-03-29 14:19:16 | <tomsmeding> | tdammers: excel has gotten lambdas recently, surely it can also have regex |
| 2021-03-29 14:19:17 | → | Jd007 joins (~Jd007@162.156.11.151) |
| 2021-03-29 14:19:32 | <tomsmeding> | oh it already does |
| 2021-03-29 14:20:42 | <tdammers> | I bet it includes an email system too |
| 2021-03-29 14:20:50 | ski | . o O ( <https://www.microsoft.com/en-us/research/blog/lambda-the-ultimatae-excel-worksheet-function/>,<https://www.microsoft.com/en-us/research/publication/a-user-centred-approach-to-functions-in-excel/> ) |
| 2021-03-29 14:21:04 | → | dcbdan joins (~dcbdan@c-73-76-129-120.hsd1.tx.comcast.net) |
| 2021-03-29 14:21:09 | <johnnyboy[m]> | <tdammers "I bet it includes an email syste"> I thought that was emacs |
| 2021-03-29 14:21:53 | <ski> | Emacs includes an editor |
All times are in UTC.