13:52:44 <donri> stepcut: new bytestring with built in builder released
13:53:07 <stepcut> go on...
13:53:28 <donri> might be useful for the new server, instead of blaze-builder
13:53:55 <stepcut> cool
13:54:00 <Lemmih> acid-state can finally stop being super slow.
13:54:13 <donri> oh? wait, it's slow?
13:54:42 <Lemmih> Well, it can only do, like, tens of thousands of transactions per second.
13:54:49 <Lemmih> I want to do millions.
13:54:56 <donri> gazillions!
13:55:01 <stepcut> !
13:55:16 <donri> was old bytestring holding it back for some reason?
13:55:28 <Lemmih> Yeah.
13:55:40 <Lemmih> The transaction log looks a bit like this:
13:55:49 <Lemmih> [length]
13:55:51 <Lemmih> [hash]
13:55:54 <Lemmih> [content]
13:56:38 <Lemmih> So I use cereal to serialize the content, then I hash it and use cereal again to layout the frame.
13:57:42 <Lemmih> Basically, I have a fair number of nested calls to the builder.
13:58:01 <Lemmih> And, by default, the builder allocates a 32k buffer.
13:58:48 <Lemmih> So when I want to save 'This Event Happened' (less than 100 bytes) in the transaction log, I have to allocate 32k.
13:59:20 <donri> is this was flushing is all about?
13:59:35 <Lemmih> No, it doesn't have anything to do with flushing.
13:59:57 <Lemmih> I have to serialize a lot of small pieces of information and cereal/binary sucks in that case.
14:00:15 <donri> aha
14:01:40 <donri> any reason blaze-builder wouldn't have helped?
14:01:42 <Lemmih> Serializing 10000 events per second would allocate 320MB/s for no reason.
14:03:41 <Lemmih> You'd have to rewrite all the serialization instances.
14:04:45 <Lemmih> We need either cereal or binary to use the improved builders internally.
14:04:52 <donri> so why does new bytestring help? the builder is a new api, mostly a port from blaze, you'd still need to rewrite things to actually use it?
14:04:56 <donri> aha
14:06:43 <Lemmih> We're nearly there.
14:07:02 <donri> cool cool cool
14:11:52 <donri> random idea: I want an "update" that takes a list of Update methods, schedules them and returns () when all are durable. does that make sense?
14:12:30 <donri> I currently find myself grouping updates in a single method, for importing datasets, because it's easier than scheduleUpdate and manually waiting for the mvar
14:19:04 <Lemmih> Sounds good. Send code.
14:19:16 <Lemmih> scheduleUpdate+mvar should do it.
14:22:37 <donri> Lemmih: is it enough to wait for the last mvar?
14:23:21 <donri> seeing as "order is honored"
14:28:03 <Palmik> Hi guys, I have written a question about how you would like the autoincrement key in ixset/higgset to behave, it's too long for IRC, so here it is http://codepad.org/A1eCCQB8 it's the two first paragraphs.
14:48:18 <stepkut> ooo
14:51:04 <stepkut> ok, so one issue is that there are three related things which we need to be clear about: auto-increment, primary key, and unique key
14:52:15 <stepkut> if we introduce the concept of a primary key.. then that implies we need to reject updates that would result in the primary key not being unique anymore?
14:53:07 <stepkut> the tough part of a kd-tree is making rebalancing efficient, the kdtree code we have so far makes no attempt to do it smartly
14:53:19 <stepkut> that was the next step I believe
14:53:58 <stepkut> as I recall, the HiggsSet interface is a lot nicer than IxSet. Though, we could port that intterface to ixset too
14:54:18 <stepkut> so, we should be careful to separate the interface from the implementation when discussing what we want
14:55:14 <stepkut> the current IxSet API is currently far less type-safe than it should be
14:56:19 <stepkut> it would be nice to see a list of specific design objects for an upgrade/replace for IxSet
14:57:05 <stepkut> for example, kdtree aims to make it efficient to chain a sequence of queries together or do a query on multiple keys. But that makes inserts more expensive.
14:57:26 <stepkut> and updates
14:57:27 <donri> Lemmih: sent to your gmail
14:59:29 <donri> Palmik: note that kd-tree and kdmap aren't the same thing, i.e. kdmap uses a kd-tree to implement an ixset-like structure... (i think i used the wrong name when mentioning them first to you)
14:59:52 <donri> stepkut should have the darcs repo for kdmap
14:59:56 <Palmik> stepkut, well, what I also like about HiggsSet over IxSet is that is does not require Typeable -- that is the result of the interface basically and so porting the interface to IxSet without changing the the representation would not much sense I think.
15:00:07 <Palmik> donri, yes, I found the repo.
15:00:32 <donri> ah, and you have the right one? i can't find it on google, only some much less finished something
15:00:47 <stepkut> Palmik: ok. I have no real love for IxSet. It is pretty sucky in many ways
15:00:54 <Palmik> I only found the 2 dimensional one, so I might now have the one actually. :)
15:01:27 <donri> i.e. this is the wrong one AFAIK http://src.seereason.com/haskell-kdmap/
15:01:53 <Palmik> Yes, I have the wrong one. stepkut, do you have a link to the "right" one, please? :)
15:01:55 <stepkut> yeah, that is the 'wrong' one
15:02:49 <donri> hm is it this one? http://darcs.monoid.at/kdtree/
15:03:32 <stepkut> yup
15:03:41 <Palmik> I have that one as well. :)
15:03:44 <donri> ah so it is kdtree and not kidmap after all
15:03:46 <stepkut> though, there now seem to be several kdtree implementations
15:04:02 <Palmik> there is one on Hackage as well
15:04:06 <stepkut> yeah
15:04:43 <stepkut> so, what is the state of HiggsSet ?
15:06:12 <Palmik> Hmm, what do you have in mind?
15:06:37 <stepkut> what needs to be done on HiggsSet before we can start using it instead of IxSet?
15:07:04 <Palmik> Well, documentation, test coverage, perfomance testing, this is the bare minimum I would say.
15:07:05 <donri> when i last looked at higgs, it had all these weird things in the (public) API that felt like hacks, like custom-written Enum/Ord instances,  lots of "undefined" and stuff. i'm curious if we can do without that complete, and if not at least some TH to generate it would improve things.
15:07:57 <donri> https://github.com/lpeterse/HiggsSet/blob/master/src/Data/HiggsSet.hs#L147
15:08:10 <donri> feels like abuse of Enum
15:08:58 <Palmik> The problem is that we probably do not want (and need) Enum, we only need "constructor -> int", not both ways.
15:09:31 <donri> aha
15:10:10 <donri> (and TH or generics so you don't have to write that manually for every type)
15:10:58 <Palmik> At least i think so. fromEnum is used right now, but I hope that we can do without it. So yes, the closest could should probably be to get rid of this (or see if it's actaully possible)
15:11:06 <donri> and cereal/safecopy instances for higgsset so you don't have to write orphans
15:11:51 <donri> i've also been told that kd-trees have better performance *potential* than either ixset or higgsset (but that balancing is tricky)
15:13:03 <donri> @seen lpeterse
15:13:03 <lambdabot> Unknown command, try @list
15:15:28 <stepkut> a big issues with IxSet is that it sucks if you are trying to key on multiple things like, foo @= thing1 @= thing2
15:15:29 <Palmik> Hmm, I not sure how from asymptotic point of view -- if I can believe wiki -- the only thing that strikes me as odd is that insertion complexity does not depend on the number of dimensions (higgsset should have: insert O(k log n), fromList (kn log n), query single key O(log n))
15:15:37 <stepkut> trying to s/key/query/
15:16:13 <stepkut> does HiggsSet improve that at all?
15:16:50 <donri> stepkut: presumably yes IIRC
15:16:53 <stepkut> the nice thing about kdtree is that you start with a big tree, and then each query can work by just cutting branches off the tree
15:17:03 <Palmik> Well, you build "selections" first with higgsset and then run the selection.
15:17:07 <stepkut> so if you do a bunch of queries it should be pretty efficient
15:17:20 <stepkut> in ixset we rebuild the Data.Map after each query
15:17:37 <Palmik> Selection is this: http://hpaste.org/new
15:18:05 <stepkut> :)
15:18:15 <Palmik> Oh
15:18:16 <stepkut> a do-it-yourself approach apparently
15:18:30 <Palmik> http://hpaste.org/75014
15:18:33 <Palmik> :D
15:18:52 <stepkut> ah
15:19:03 <stepkut> so you can build up a complex query and then run the whole query at once?
15:19:33 <Palmik> Yes
15:21:15 <Palmik> Functions like delete, update, etc. are all base on the selection. It probably will not feel as nice as using ixset is.
15:21:32 <stepkut> not sure that IxSet feels nice..
15:21:50 <Palmik> But that could be mostly fixed with infic operators.
15:21:53 <Palmik> x
15:22:02 <Palmik> Well, it's rather good I think.
15:22:29 <Palmik> Chaining @* feels quite natural... or do you have something better in mind?
15:23:00 <Palmik> Maybe persistent-like api. I do not know.
15:23:34 <stepkut> I don't have anything particular in mind aside from a list of annoyances with IxSet
15:24:00 <Palmik> If you have it written down, i would be really interested in it.
15:24:23 <stepkut> let's see..
15:24:44 <stepkut> 1. you can use @* and @= to query on keys that don't actually exist and you get a runtime error instead of a compile time error
15:25:20 <stepkut> 2. if the first key in your list of keys does not exist for certain values, those values will get silently dropped from the IxSet (entirely a bug, not a feature)
15:26:10 <stepkut> 3. when you want to query more than one key, it will rebuilds the internal Data.Maps between each query
15:26:26 <Palmik> Yes, i wanted to ask you about that one. I was not really sure how 'change' works.
15:26:30 <Palmik> (about 2.)
15:26:32 <stepkut> 4. having to manually create, increment, and manage unique identifiers outside of IxSet is tedious and error prone
15:27:27 <Lemmih> donri: Your patch keeps the entire 'events' list in memory until everything has been scheduled.
15:27:27 <stepkut> regarding 2, there are places in the code where it takes the first Data.Map from the list of keys and converts that to a Set and does operations on that then converts the results make to an IxSet. But if your value didn't happen to be in the first Data.Map it will be lost
15:28:15 <donri> Lemmih: aha! how would you fix that?
15:28:45 <stepkut> Palmik: multikey range searchs are also sucky.. like if you want to find all the locations between some longitudes and latitudes.. though that is a bit specialized
15:29:13 <Lemmih> donri: Manual recursion.
15:29:54 <stepkut> Palmik: also.. IxSet may or may not use too much RAM. People think it does, but no one has ever been able to prove it
15:30:00 <Lemmih> go [] = return (); go [x] = void $ update ...; go (x:xs) = schedule x >> go xs
15:30:09 <Palmik> Oh, I think they call that "window query" in the paper, supposedly kd-tree is good fit for these. Or do you mean the interface, rather than implementaion (that was mentioned in 3.)
15:30:30 <donri> Lemmih: did you fix it already or should I?
15:30:33 <Palmik> Hmm, it stores the value for each key it has, right?
15:30:42 <stepkut> for the window query, I meant performance
15:31:23 <stepkut> but.. I don't think we can get an indexed collect type which is ideal for all types of queries. You just get to decide which corner you want to shove the ugly bits into
15:31:48 <stepkut> so, for kd-trees, window queries are efficient, but rebalancing is a pain
15:32:50 <stepkut> probably a good starting point is to try to be efficient doing the things that are normally efficient using a traditional sql database
15:32:54 <stepkut> ?
15:33:35 <stepkut> some databases have special support for geospatial queries, but .. the nice thing about acid-state is that you can whatever collection type is best for your needs.. for example, hackage2 just uses Data.Map :)
15:35:03 <Palmik> So, does it have special map for tag -> package for example?
15:35:04 <stepkut> regardin 4, auto-increment: I have not actually thought about how it would work. I just know that in sql, you can a field auto-increment, and I know that when I show people how to use acid-state+IxSet I feel embarassed that I have to manage the incrementing of my indexes manually
15:35:24 <stepkut> Palmik: something like that
15:36:03 <Lemmih> donri: You fix it.
15:36:03 <donri> Lemmih: sent
15:39:29 <stepkut> is there a way to get parse errors from Aeson.. right now I just get 'Nothing', indicating the parse failed, but no information about why
15:40:01 <Palmik> stepkut, thanks, I have noted all these things for future reference. :)
15:40:39 <donri> stepkut: you can use "json" and atto's parseOnly
15:41:43 <donri> or parseEither parseJSON
15:45:43 <Lemmih> donri: You're becoming a regular acid-state contributor. Keep up the good work.
15:45:49 <stepkut> Palmik: awesome! I am very excited about this :)
15:46:05 <donri> Lemmih: with your guidance, anything is possible! ^_^
16:12:17 <stepkut> ok, json successfully parsing now
16:20:29 <Igloo> stepcut: Are my 2 happs patches still on your radar?
18:23:48 <stepcut> Igloo: yup I was just thinking about that on my walk home. Today or tomorrow.
18:24:51 <Igloo> OK, great, thanks!
18:41:31 <stepcut> jaspervd1: i heard a rumor you might have or know about a fast EBNF based parser ?
19:48:59 <donri> stepcut: http://www.reddit.com/r/haskell/comments/106xwf/the_monadtrans_class_is_missing_a_method/c6b16a5?context=3  tekmo might have an alternative to mtl-style transformers (haven't read it all myself yet)
22:48:21 <donri> aww i forgot to mention a random idea to palmik
22:49:11 <donri> which is, looking into if we could use -XTransformListComp interestingly and performantly with the future ixset
22:49:32 <donri> http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#generalised-list-comprehensions
22:50:20 <donri> most likely in combination with -XMonadComprehensions
23:08:44 <jaspervdj> stepcut: I have to dissapoint you
23:09:08 <jaspervdj> stepcut: https://github.com/tsurucapital/parsergen
23:09:19 <jaspervdj> it was probably about that but it's not full EBNF
23:10:57 <stepcut> k
23:11:07 <stepcut> just as long as I am not duplicating efforts :)
23:16:30 <donri> stepcut: what'cha workin' on there
23:40:18 <stepcut> donri: right now, posting a response to this, http://www.reddit.com/r/haskell/comments/107v79/comparing_snaps_and_yesods_template_languages/
23:44:36 <donri> \o/
23:47:33 <stepcut> before that I was working on a binding to the stripe API.. but now I see there is one on hackage
23:47:43 <stepcut> which is annoying, because I don't like something he did :)
23:48:49 <stepcut> uses a monad transformer for no good reason!
23:50:01 <donri> heist:  is it xml? nope! is it html? nope! it's heist!
23:50:25 <stepcut> HSP has the same problem :)
23:50:36 <donri> hsp doesn't pretend otherwise :p
23:50:38 <stepcut> HSP needs to have several parsing modes IMO
23:51:05 <donri> oh?
23:51:24 <donri> i think it's better to stick to input xml, and then just render as html or whatever
23:51:29 <stepcut> yeah.. one that doesn't automatically escape & and stuff I think
23:52:06 <stepcut> also, it needs to support XML/HTML comments
23:52:44 <donri> hm yea need those for e.g. conditional IE
23:53:13 <stepcut> yeah
23:53:21 <donri> (the web is such a big hack!)
23:54:15 <donri> does hsp properly support xml namespaces?
23:54:30 <donri> i think not?
23:54:49 <stepcut> it supports them.. no idea if it does them properly :)
23:55:03 <donri> i think it supports namespace prefixes, not namespaces per se
23:55:09 <stepcut> ah
23:55:19 <stepcut> I have no idea what that means, but you are probably right
23:56:48 <donri> <foo xmlns="bla"/> and <bla:foo xmlns:bla="bla"/> are the same element, something like that
23:57:56 <donri> probably tricky to do in hsp directly since you can embed... so would need to do it by post-processing, which you can already do now