00:06:58 <rostayob> stepcut: I'm in a monad like that: XMLGenT (RouteT (ErrorT (ReaderT (ServerParT IO)))), and it says it's not an instance of ServerMonad. I'm quite sure I've include all the instances I had to include
00:10:23 <rostayob> is it normal that I can't get a request in something like that?
03:42:29 <stepkut> rostayob: did you import Happstack.Server.HSX ?
04:50:17 <Lemmih> gienah: Mind sending me a darcs patch?
04:51:01 <stepkut> Lemmih: greetings
04:58:07 <Lemmih> Hiya
04:58:32 <stepkut> the new acid-state and safecopy stuff looks sweet
04:58:34 <Lemmih> New version of cereal is out, awesome!
04:58:40 <stepkut> ooo
05:00:11 <Lemmih> I think the documentation builder crashed on hackage. Who might I poke to get it fixed...
05:00:19 <stepkut> it would be nice if the next generation IxSet supported some sort of auto-increment for the 'primary key'
05:00:41 <stepkut> Lemmih: hmm, ask in #hackage maybe?
05:04:28 <Lemmih> I've been thinking about IxSet.
05:04:32 <stepkut> spiffy
05:04:58 <stepkut> I think I will make a wiki page with all the wishlist features for IxSet
05:05:02 <Lemmih> I'm wondering if being a little bit evil might be worth it.
05:05:09 <stepkut> what sort of evil?
05:05:21 <stepkut> unsafeCoerceEvil
05:05:30 <Lemmih> Not that evil.
05:05:48 <Lemmih> Mutable cells kinda evil.
05:05:51 <stepkut> yeah
05:06:02 <Lemmih> But only for adding indices on the fly.
05:06:09 <stepkut> ah
05:06:38 <Lemmih> findWithIndex recordAccessor key ixset
05:07:09 <stepkut> do you have an ideas on what more efficient ixset internals might look like?
05:07:53 <Lemmih> Hm, I haven't been giving much thought to efficiency.
05:08:15 <Lemmih> I just wanna keep memory usage low and then add more nodes if things need to go faster.
05:08:42 <stepkut> the kd-tree stuff benchmarks showed it as being both faster and requiring less RAM than the current IxSet implementation -- but does not support all operations we might want (such as returning the results sorted by a specific key)
05:10:07 <Lemmih> I'm not sure keeping things in Haskell space will scape. GHC's GC is notoriously at dealing with large datasets.
05:10:14 <stepkut> yeah
05:10:46 <Lemmih> I'd rather keep the data as binary data and then only have the indices as Haskell structures.
05:10:58 <stepkut> yeah
05:12:18 <stepkut> what if it stored keys and IO enumerators. It would require IO (unfortunately), but the enumerators could retrieve the values from RAM or from Disk?
05:12:48 <stepkut> since solid-state drives are something like 200x faster than rotational disks for only 3x the cost.. that might be interesting?
05:12:54 <Lemmih> Requiring IO is a no-go for acid-state.
05:13:08 <stepkut> right
05:13:11 <Lemmih> I'm more a fan of using mmap.
05:13:17 <stepkut> ah
05:13:48 <Lemmih> It could even be backed by a file if we think we can manage it better than swap.
05:14:19 <stepkut> GHC+swap is usually pretty brutal
05:14:56 <stepkut> are you thinking that this structure would also be used (in part) provide sharding support?
05:14:57 <Lemmih> Yeah but GHC wouldn't be accessing the data so we'd have saner access patterns.
05:16:11 <stepkut> mightybyte may have implemented a version of IxSet that stores values packed as binary.. I can't remember if he actually did it, or just thought about doing it
05:16:38 <stepkut> but the list of maps approach used by IxSet seems very expensive
05:16:51 <stepkut> deletes and updates require rebuilding all the indexes
05:17:08 <Lemmih> Really? Why?
05:17:45 <Lemmih> I thought only sub-queries had to rebuild the indexes.
05:17:51 <stepkut> those too
05:17:56 <stepkut> inserts are fine
05:18:10 <stepkut> but when you modify a value, there is no way to know what keys are now invalid
05:18:43 <Lemmih> How about defining modify as lookup+delete+insert.
05:19:03 <stepkut> delete has the same issue
05:19:09 <Lemmih> Why?
05:19:15 <Lemmih> That doesn't make sense.
05:20:04 <stepkut> there is a copy of value in all the different maps
05:20:46 <stepkut> and there is some difficulty figuring out what keys/values need to be removed from the other maps I think
05:20:49 <Lemmih> Right. It just has to do N deletes.
05:20:52 <stepkut> I have not looked at this in a long time
05:22:08 <stepkut> I could be wrong
05:22:11 <Lemmih> Theoretically it shouldn't be a problem.
05:24:53 <stepkut> yeah.. looking at the code.. perhaps it is ok
05:25:30 <stepkut> so only sub-queries are an issue
05:30:19 <stepkut> (which is basically queries on multiple keys)
05:30:35 <stepkut> and sorting /limiting the results
05:31:09 <Lemmih> Can't do much about that.
05:32:30 <stepkut> depends on how much you are willing to change.. kd-tree should support sub-queries and queries on multiple keys efficiently. But doesn't help with the sort issue
05:33:06 <stepkut> perhaps there is an even better datastructure that does though..
05:33:28 <Lemmih> How does kd-tree's support sub-queries?
05:34:19 <Lemmih> I thought even the big databases had trouble reusing indexes on subsets.
05:35:20 <stepkut> kd-tree is basically a single binary tree. When you do a query you are just pruning branches from that single tree. If you don't bother to rebalance the tree, then that should be pretty fast?
05:37:05 <stepkut> for example, let's say you have a simple balanced binary tree of integers. The root node is 5. If you do a query that returns all values >= 5, then you just chop off the left branch of the tree and you are done
05:37:53 <Lemmih> How does that support multiple keys?
05:38:48 <Lemmih> Hah, makeStableName is pretty fast.
05:39:02 <stepkut> at each level of the tree you only look at one key and you cycle through the keys
05:39:50 <stepkut> so if you had, KdTree (Char, Int), at the root you would only consider the 'Char' part to determine if the value you are inserting/looking up is to the left or the right. At the next level, only the Int is considered.
05:40:35 <stepkut> if you are searching for a combination of a Char + Int, then you will have a comparison to make at every level
05:41:20 <stepkut> if you are only looking for an Int, then at the top-level the Char doesn't help you, so you search both the left and right sub-trees and merge the results
05:42:03 <Lemmih> That's pretty neat.
05:42:06 <stepkut> yeah
05:42:21 <stepkut> so if you are searching for multiple keys, you can do it in a single pass
05:43:21 <stepkut> multiple distinct sub-queries require multiple passes, but you can do it by pure pruning
05:44:04 <stepkut> also easy to use `par` to parallelize a query
05:45:07 <stepkut> I would think it is possible to apply the binary / mmap stuff to it as well
05:48:06 <stepkut> so, using the mmap stuff would allow you to have a dataset that is larger than available RAM ?
05:49:52 <Lemmih> Yeah
05:50:50 <stepkut> nice.
05:51:16 <stepkut> between that and sharding we would have a much better answer to, "what happens when my dataset exceeds RAM capacity"
05:51:17 <Lemmih> Not sure how well it would work, though.
05:51:24 <Lemmih> Yeah.
05:51:41 <stepkut> well in terms of speed and predictability?
05:51:52 <Lemmih> Yeah.
05:52:43 <Lemmih> It might be necessary to change the format to something page friendly.
05:52:45 <stepkut> seems like that is where sharding could come in ? If you want fast use more RAM. if you want cheap, use disk ?
05:53:18 <Lemmih> Yeah.
05:53:34 <stepkut> neat
05:53:52 <stepkut> right now the answer is "don't do that.". but the goal is to fix that for the next release (among other things)
05:54:16 <Lemmih> Next release of what?
05:54:20 <stepkut> happstack
05:54:23 <Lemmih> Oh.
05:54:28 <stepkut> splitting happstack-state and happstack-data into standalone hackage projects was also a goal ;)
05:54:54 <stepkut> as was making query / update take an explicit state handle
05:55:14 <stepkut> and fixing performance issues with ixset
05:55:40 <Lemmih> I was so sure that no-one used happstack-state. Then some guys contacted me about acid-state and told me they were using happstack-state for their customer-facing application. /-:
05:56:00 <stepkut> I use it :)
05:56:07 <stepkut> http://www.creativeprompts.com/
05:56:18 <stepkut> among other things
05:56:19 <Lemmih> You are a braver man than me, then. (:
05:56:22 <stepkut> :)
05:57:03 <stepkut> it is quite reliable
05:57:35 <stepkut> it has design deficiencies and limitations, but it is reliable
05:58:11 <stepkut> patch-tag.com uses it as well
05:58:39 <stepkut> there are some other sites too
05:58:57 <stepkut> hard to say how many.. people only say something when things break :)
05:59:55 <Lemmih> I'm pretty sure it has some dark corners. I wouldn't be surprised if it behaved badly when something unexpected happened.
06:00:45 <stepkut> possibly
06:01:13 <stepkut> if you call 'fail/error' inside your update/query functions bad stuff happens
06:01:37 <Lemmih> Hah, yeah. The guy who mailed me told me that. (:
06:01:52 <stepkut> I am not aware of anything else... been using it pretty regularily for 3 years with out any other issues
06:02:02 <stepkut> actually, there is another concern
06:02:28 <stepkut> it is a bit tricky (impossible?) to make sure that no events are written to disk after your last checkpoint
06:02:43 <Lemmih> That's pretty much by design.
06:03:00 <Lemmih> Since you can't guarantee it when using multiple nodes.
06:03:09 <stepkut> that is problematic, because if I change the update / query methods, they may replay incorrectly
06:03:59 <stepkut> yes.. I have never figured out how changing your update / query methods was going to work with multiple nodes
06:04:25 <Lemmih> Making it easy to change methods would make moving to multimaster harder.
06:04:36 <Lemmih> It's done in two stages.
06:04:55 <Lemmih> Or three, depending on how you're counting.
06:05:40 <Lemmih> First you add support for handling the new events and push it to the cluster.
06:05:58 <Lemmih> Then you start using the new events.
06:06:44 <Lemmih> Once all the nodes have been updated, you checkpoint and remove the unused methods.
06:07:19 <stepkut> you can't start using new events until everyone is updated, right ?
06:07:25 <Lemmih> It's tricky but errors can be caught at runtime and shouldn't affect the stability of the cluster.
06:07:29 <Lemmih> Right.
06:08:00 <stepkut> that sounds tolerable for adding new methods.. but seems painful for modifying existing methods
06:08:45 <Lemmih> Oh?
06:09:17 <stepkut> yeah.. because I have to have the old version and new version of the method both available for a while?
06:10:00 <Lemmih> Right, just like when you're migrating your state.
06:10:30 <Lemmih> gienah: safecopy works fine with ghc-6.12 for me.
06:11:12 <stepkut> so I have fooMethod. Now I want to change fooMethod. So I have to make a copy of the whole fooMethod function and give it a new name?
06:11:50 <stepkut> like fooMethodv2 ?
06:12:03 <Lemmih> yeah /-:
06:12:26 <stepkut> and then I have to modify everyplace in my code that does, update FooMethod, to call update FooMethodv2 instead ?
06:12:49 <Lemmih> Yep.
06:13:11 <Lemmih> That's the cost of safety (:
06:13:50 <stepkut> I would rather just ensure that nothing gets written after my checkpoint :)
06:14:01 <stepkut> then I wouldn't need the old version
06:15:02 <Lemmih> The feature would be trivial to add in acid-state.
06:16:08 <stepkut> on a single server setup that works nicely I think. For multi-master it means that you have to upgrade and restart all your servers at once -- hence the price is a little bit of downtime when you push live
06:16:29 <stepkut> but you could choose between developer pain vs downtime at least
06:16:37 <Lemmih> Yeah.
06:16:51 <Lemmih> That's a great point.
06:17:13 <Lemmih> Will add that to the multimaster wiki once it gets up and running. (:
06:17:16 <stepkut> :)
06:18:03 <stepkut> regarding dir and path.. I don't think they can be implemented the way you want with out changing the ReaderT Request to StateT Request or something.. but that doesn't mean it shouldn't be done
06:18:24 <stepkut> it just requires more thought
06:20:18 <Lemmih> Right, I saw your email. I'm gonna sit this one out.
06:21:27 <stepkut> I think it should be possible to implement the old style with the modified type though -- so it would be possible to provide a temporary migration module like, Happstack.Server.RoutingDeprecated, so that people would not be forced to update their code immediately
06:21:46 <stepkut> ok, way past my bed time
06:22:36 <Lemmih> Nighty night.
06:26:35 <stepkut> also, it is still not clear to me how that migration works when you modify fooMethod. if I have fooMethod and fooMethodv2 and I am rolling up the new servers. At first the server needs to be calling, update FooMethod. Then once everyone is upgrade, something magic happens and the server switches to calling, update FooMethodv2? But how does that switch actually occur. If I have, update FooMethod, written in my code all over the place, it
06:26:35 <stepkut> won't suddenly be update FooMethodv2
06:27:44 <Lemmih> It is done manually.
06:27:52 <stepkut> manually in what sense ?
06:28:14 <stepkut> (if stillRunningOld then update FooMethod else update FooMethodv2) ?
06:29:02 <Lemmih> Once the cluster has been updated, the user goes though his code and replaces FooMethod with FooMethodv2.
06:29:42 <stepkut> ACTION ponders a moment
06:30:43 <stepkut> oh right. when you add FooMethodv2, that means you can responding to incoming FooMethodv2 events even if you are not calling them in your own code
06:30:51 <stepkut> ACTION clearly needs to go sleep
07:24:56 <gienah> Lemmih: I sent you a darcs patch and an email with the ghc 6.12.3 compile error. I added acid-state-0.3.2 to the gentoo-haskell overlay, thanks :-)
10:06:57 <rostayob> stepkut, st3pcut: yeah that was the problem, thanks
13:56:17 <Lemmih> gienah: Pushed, thanks.
16:46:08 <stepkut> Lemmih: do you plan to add TH functions like $(deriveSafeCopy ''Foo) ?
16:46:56 <stepkut> I've written 2.5 instance by hand and I am already tired of it..
16:56:56 <Lemmih> stepkut: I'll see what I can do.
16:57:05 <Lemmih> stepkut: Ok, done. The code is now in the repository.
16:57:18 <stepkut> fast! I like it!
16:58:01 <Lemmih> ACTION is a fracking code ninja!
16:58:03 <stepkut> what is blocking a new hackag release ?
16:58:14 <Lemmih> cereal and hackage.
16:58:30 <stepkut> hmm. not sure I can do anything about those
16:58:34 <Lemmih> And cereal isn't really blocking a release.
16:58:50 <Lemmih> Hackage doesn't build haddock documentation atm.
16:58:55 <stepkut> yeah
16:59:05 <Lemmih> I'd like the documentation to be available when I make the release.
16:59:09 <stepkut> me too :)
16:59:20 <stepkut> did you find out why it was broken ?
16:59:29 <stepkut> s/was/is/
16:59:43 <Lemmih> Nope, sent a mail to Ross, though.
16:59:51 <Lemmih> Hopefully he'll get back to me.
16:59:53 <Lemmih> Or just fix it.
17:00:52 <stepkut> how would you feel about a SafeCopy Text instance ? It would add a dependency on text, but that is in the haskell platform..
17:01:17 <Lemmih> Hm, that might be a good idea.
17:01:29 <Lemmih> Adding instances for all the platform libraries.
17:01:33 <stepkut> yeah
17:01:42 <Lemmih> What other libraries are in the platform?
17:02:01 <stepkut> http://hackage.haskell.org/platform/changelog.html
17:02:42 <stepkut> text, time, and containers are the most useful I think
17:03:20 <Lemmih> Yeah, I was thinking the exact same thing.
17:03:32 <Lemmih> We already have containers so just text+time needs to be added.
17:03:32 <stepkut> cool. Pretty much every app I write uses those
17:04:15 <stepkut> Text is a little tricky.. the obvious choice is to make it a primitive and store the data as utf-8 encoded
17:04:39 <stepkut> instance SafeCopy Text where
17:04:39 <stepkut>     kind    = primitive; putCopy = contain . safePut . Text.encodeUtf8; getCopy = contain $ fmap Text.decodeUtf8 safeGet
17:04:39 <stepkut>  
17:04:56 <stepkut> but that means you can't ever decide that the data should be stored in a different format ?
17:04:57 <Lemmih> Isn't it complex enough to merit a version tag?
17:05:13 <stepkut> I would prefer a version tag just in case..
17:05:16 <stepkut> same with time
17:05:22 <Lemmih> Yeah, me too.
17:05:36 <Lemmih> I even considered removing the 'primitive' kind.
17:05:41 <stepkut> yeah
17:06:13 <stepkut> did you read that paper about cloud haskell ?
17:06:18 <Lemmih> I care more about data safety than saving a few bytes.
17:06:21 <Lemmih> Nope.
17:06:25 <Lemmih> The one by SPJ?
17:06:27 <stepkut> yeah
17:06:40 <Lemmih> It is any good?
17:06:48 <stepkut> I browsed it very quickly.. I intend to go back and reread it
17:07:33 <stepkut> it includes support for serializing higher order functions and stuff
17:09:41 <Lemmih> I assume the nodes all have to be of the same version?
17:10:03 <stepkut> dunno, I have not read the paper
17:10:15 <stepkut> I just read bits and pieces and looked at the headings
17:10:54 <stepkut> but, basically it seems to deal with serializing data and closures and shipping them around to other processes on a  distributed network
17:11:10 <stepkut> very much inspired by Erlang
17:12:23 <stepkut> the most interesting contribution is probably the method for serializing closures and sending them around the network. But I don't know how they actually do that. maybe it is not interesting..
17:13:06 <stepkut> looks like it uses some template haskell
17:13:28 <Lemmih> What's the link to the paper?
17:13:42 <stepkut> http://research.microsoft.com/en-us/um/people/simonpj/papers/parallel/remote.pdf
17:14:10 <Lemmih> New versions of acid-state and safecopy will hit hackage soonish, btw.
17:14:12 <stepkut> it actually looks similar to MACID I think
17:14:38 <rostayob> so what's new in MACID?
17:16:43 <Lemmih> Fixes for ghc-6.12.3 and I dropped the dependency on binary in favor for cereal.
17:16:55 <rostayob> oh ok.
17:17:37 <Lemmih> Nothing major but cereal is seriously neat. Binary has been a thorn in my side for awhile.
17:18:04 <stepkut> heh
17:18:12 <rostayob> I don't think I can appreciate the change, but cool
17:18:54 <rostayob> I noticed a big discussion yesterday night, is the fate of happstack-state clear yet?  is the kd-tree stuff happening?
17:19:09 <Lemmih> cereal has error handling, binary does not. cereal deals well with incomplete input, binary does not.
17:19:48 <Lemmih> s/grammatical errors//
17:20:09 <stepkut> rostayob: happstack-state and happstack-data will almost certainly be deprecated in favor of acid-state and safecopy (with a migration path / guide)
17:20:45 <rostayob> oh, I didn't know about acid-state! i'll check it out
17:21:24 <stepkut> rostayob: it is a clean reimplementation of happstack-state that fixes many of the API / UX annoyances
17:22:46 <Lemmih> What's UX?
17:23:12 <rostayob> stepkut: cool. what about IxSet?
17:23:35 <stepkut> User eXperience I think
17:23:56 <stepkut> more like Developer eXperience I guess
17:24:01 <Lemmih> Ah, was guessing Universal Xenophobia but that didn't quite fit in.
17:24:38 <stepkut> rostayob: there are many ideas about IxSet, but nothing firm
17:25:10 <rostayob> stepkut: ok. I'm really curious to see if sorting and balancing can be done with the kd-trees
17:25:46 <stepkut> rostayob: there could be multiple IxSet like structures depending on what best fits your needs -- though a single one that does everything would be better ;)
17:26:29 <rostayob> stepkut: no but I really like the IxSet idea, I just have difficulties imagining an efficient IxSet
17:27:17 <stepkut> rostayob: I had some more thoughts on sorting.. You won't be able to get the results in sorted order by just traversing the tree in some order.. but I think there might be ways to do better than converting to a list and doing a full sort
17:27:49 <stepkut> rostayob: imagine if you have KdTree (Char, Int), and the root node is ordering on the 'Char' key.
17:28:31 <stepkut> rostayob: if you want to get the results sorted by the Char key, then you can sort the left side sort the right side, and then do, left ++ right.
17:29:04 <rostayob> stepkut: definitely, in fact the complexity won't be n log n
17:29:09 <stepkut> right
17:29:10 <rostayob> but even n is bad in this case
17:29:22 <stepkut> rostayob: right. but it is better than n log n :)
17:29:27 <rostayob> stepkut: yes eheh
17:30:11 <rostayob> it's basically a mergesort with benefits
17:30:19 <stepkut> rostayob: yep
17:31:48 <rostayob> mhm. still, that wouldn't be good. I wonder if there is a way
17:32:19 <stepkut> rostayob: well, there is no commitment to using kdtree
17:32:32 <stepkut> rostayob: I am going to try to put together a wiki page describing all the wishlist features
17:33:47 <rostayob> kk
17:34:04 <stepkut> rostayob: I wonder if sorted+limit can be done better..
17:35:13 <rostayob> stepkut: the limit can be done really well if you use sets
17:35:45 <stepkut> rostayob: for example, if you only wanted the 10 most recent entries.. then maybe you can find those pretty quickly
17:35:53 <rostayob> yeah of course
17:36:00 <rostayob> the best thing is to store the indexes
17:36:11 <rostayob> in a tree that stores the size of the branches
17:36:15 <rostayob> Data.Set does that
17:36:21 <stepkut> kdtree does as well
17:36:35 <rostayob> in this way you can skip and limit in log n
17:38:39 <stepkut> but how do you support multiple indexes?
17:39:48 <rostayob> stepkut: the traditional way is to have a tree per index
17:39:54 <rostayob> like B-tree for databases
17:40:20 <stepkut> rostayob: that is how the current IxSet works..
17:40:48 <stepkut> rostayob: but it is trickier with out pointers..
17:40:52 <rostayob> exactly
17:41:13 <rostayob> afaik, almost all databases work like that. the disk based ones
17:41:34 <stepkut> yep
17:42:11 <stepkut> but how do you make it purely functional?
17:42:38 <rostayob> yeah that's a big problem, and I think that if you solve you can publish eheh
17:43:53 <stepkut> with the current IxSet I think that return values sorted by an index should be fast (because you just call Map.toAscList on the appropriate index).
17:44:01 <stepkut> but subqueries are essentially useless
17:44:24 <stepkut> kdtree makes sub-queries a lot nicer.. but makes sorting the results more expensive
17:45:32 <stepkut> actually.. with the current IxSet you don't really get cheap sorted results ..
17:45:41 <rostayob> no, I was going to say that
17:46:00 <rostayob> you do but if the last set is the one you're looking for
17:46:32 <rostayob> but you can't do it programmaticaly
17:46:38 <stepkut> if you do, sortedBy (Proxy :: Proxy SomeKey) ixSet, then you can get it quickly
17:46:48 <rostayob> mhm yeah
17:47:03 <stepkut> but if you do, sortedBy (Proxy :: Proxy SomeKey) (ixSet @= someKey), then you don't really, because it will have to rebuild the indexes after the @=
17:47:03 <rostayob> but it's tied how Set works anyway
17:47:51 <stepkut> A Set would not allow you to sort on a specific key.. it would just look at the ordering of the value as a whole ?
17:49:47 <rostayob> wait I don't know how IxSet works exactly
17:49:54 <stepkut> :)
17:49:58 <rostayob> Ix (Map key (Set a)) (a -> [key])
17:51:02 <Lemmih> Heh, Cloud Haskell and acid-state have converged on the same solution.
17:51:11 <stepkut> so an Ix (index) is a Map from a 'key' to a 'Set a' of values that contain that key. There is also a function that can be applied to a value 'a' and it returns the list of [key] found in that value 'a'
17:51:14 <stepkut> Lemmih: :)
17:51:25 <stepkut> Lemmih: hope they cite you :p
17:52:25 <rostayob> yeah but how do multiple query work?  entries @= (Author "john@doe.com") @< (Updated t1)
17:52:58 <Lemmih> I wish. Turns out doing it this way is the obvious way, apparently.
17:53:00 <stepkut> rostayob: it rebuilds the IxSet between queries
17:53:07 <rostayob> oh, ok that's bad.
17:53:12 <stepkut> rostayob: yep.
17:53:36 <stepkut> rostayob: with kdtree each query just prunes branches from the tree
17:54:00 <rostayob> stepkut: yeah
17:54:17 <stepkut> rostayob: you start with the full tree, and each query just snips more branches off it.. if you skip rebalancing, then it should be very fast
17:55:10 <rostayob> well the rebalancing is comparable to the IxSet rebuilding in badness :D
17:55:25 <rostayob> depends how often you add
17:55:29 <stepkut> rostayob: perhaps.. though maybe less so
17:56:20 <stepkut> rostayob: also, kdtree does not aim to keep it perfectly balanced ..
17:57:03 <stepkut> rostayob: there is a configurable out of weight factor.. and you can do faster balancing by randomly guessing for some data sets
17:57:40 <stepkut> rostayob: with IxSet an insert could result in having to rebalance all the Maps for all the indices..
17:57:54 <rostayob> right...
17:58:35 <Lemmih> How big of a problem is sorting?
17:58:51 <stepkut> Lemmih: for rostayob it was a deal breaker :)
17:59:28 <stepkut> Lemmih: if you are implenting something like reddit / hacker news / etc, you need an efficient way to get a range of results back from the database sorted by a key (such as date)
18:00:01 <stepkut> Lemmih: if you have to sort ever article ever submitted to reddit in order to get the most recent 10.. that is not going to be good
18:00:10 <rostayob> well i'd say 95% of web apps have this problem...
18:00:21 <rostayob> whenever you want to display the "last" something, or "top" something
18:00:37 <rostayob> and maybe the last by some user
18:00:42 <rostayob> those are really common operations
18:01:04 <stepkut> emails, blogposts, tweets, etc
18:01:18 <rostayob> yeah
18:01:28 <Lemmih> And kdtrees can't do that efficiently?
18:01:43 <rostayob> nope
18:02:14 <stepkut> Lemmih: not really.. they can do better than, sort . toList, but not as good as b-trees
18:02:18 <rostayob> but if kdtrees alone could do that efficiently... they would be the best data structure evar
18:02:26 <stepkut> rostayob: :)
18:02:38 <stepkut> rostayob: they are good for nearest neighbor :p
18:02:46 <rostayob> I mean if one instance of a data structure can index and sort at the same time... well
18:02:51 <rostayob> on all the indexes
18:03:18 <Lemmih> A kdtree with a single key is pretty much just a binary tree, right?
18:03:24 <stepkut> Lemmih: right
18:03:25 <rostayob> correct
18:03:47 <Lemmih> So sorting is efficient with a single key but not with two or more?
18:03:54 <stepkut> Lemmih: yeah..
18:04:03 <stepkut> Lemmih: with two or more it is at least O(n) I think
18:04:11 <rostayob> Lemmih: well that, and you cant sort after multiple indexes, or select an index and then sort after another index
18:04:26 <rostayob> like if you want something from username "foo" ordered by date
18:04:38 <stepkut> there is a step where you can sort the left and right branches independently, but then to combine them you need to do a merge .. which is O(n)
18:05:33 <rostayob> stepkut: it's not even only that, as I said when you "nest" indexes it's a mess
18:06:00 <stepkut> rostayob: nest indexes?
18:06:01 <Lemmih> Can anyone do this efficiently?
18:06:13 <rostayob> stepkut: like I said, order after this and then this
18:06:32 <rostayob> Lemmih: every useful database can do that
18:07:03 <Lemmih> Sort on multiple indexes in less than O(n)?
18:07:18 <rostayob> oh yes
18:07:19 <stepkut> Lemmih: in a normal db, you would have a b-tree for each index. To get the results in sorted order, you just traverse the b-tree in some simple ordering (like in-order or pre-order or something)
18:08:24 <stepkut> Lemmih: hmm. there are two similar but different things that could mean
18:08:30 <rostayob> and you can also have compound indexes
18:08:54 <stepkut> Lemmih: we have the problem that we can not sort on a single index when the database has multiple indexes
18:09:27 <stepkut> Lemmih: the problem of sorting by multiple/compound indexes at once is different
18:10:08 <Lemmih> How do normal dbs do sub-queries?
18:10:14 <stepkut> normal databases can definitely return the results sorted by a single index even if the database has multiple indexes
18:10:20 <rostayob> (I was missing the word, compound :) )
18:13:01 <stepkut> Lemmih: by 'sub-query', do we simply mean that we want to search multiple keys/fields ?
18:13:32 <Lemmih> Yes.
18:14:48 <rostayob> Lemmih: It's just another b-tree, with the two keys you want as ordering
18:15:30 <rostayob> I mean for example in mongoDB, you explicitly create indexes over multiple keys
18:15:36 <Lemmih> Well, we can do that.
18:16:19 <Lemmih> It seems like a decent solution to me.
18:16:28 <stepkut> with kd-tree you can do a query on multiple keys in a single pass
18:16:55 <Lemmih> So you can if you have an index for each possible key combination.
18:17:07 <Lemmih> Or each key combination that you want to use.
18:17:22 <rostayob> Lemmih: it would be quite messy, since you index things after types and since youd don't have references. what if you update something?
18:17:55 <Lemmih> rostayob: Oh yes, indexing things by type is going out the window no matter what.
18:18:16 <rostayob> Lemmih: then what do you index after? considering that you're indexing data types
18:18:51 <rostayob> what I did personally is just a simple TH mapping from algebraic data types to mongodb document format
18:18:53 <stepkut> Lemmih: with kd-tree you just pick your individual indices. You do not have to create an extra indices to optimize searching of multiple keys.
18:19:07 <Lemmih> Perhaps: addIndex :: MySet a -> (a -> key) -> (MySet a, Index key a)
18:19:08 <stepkut> Lemmih: so there is no combinatorial explosion problem -- you can mix and match however you want
18:19:34 <Lemmih> rostayob: lookup :: Index key a -> key -> a -> [a]
18:19:39 <Lemmih> Or something.
18:19:50 <Lemmih> There are many possibilities.
18:20:12 <stepkut> the two big things that kd-tree does really well are searches on multiple keys (in a single query) and sub-queries (running multiple queries in a sequence)
18:21:09 <rostayob> Lemmih: yeah that's a solution, but again what happens when you update something?
18:21:23 <Lemmih> rostayob: It would update the index as well?
18:21:44 <Lemmih> I'm pretty sure normal dbs does that as well (:
18:22:08 <rostayob> yes but with normal dbs, you refer to one thing in all the indexes
18:22:24 <Lemmih> Say again?
18:22:34 <Lemmih> Oh, right.
18:22:45 <Lemmih> You're talking about mutating things.
18:22:46 <rostayob> with normal dbs, you have indexes and then you have documents that hold the actual data
18:22:49 <rostayob> yeah
18:23:02 <Lemmih> Not used to such evil talk.
18:23:04 <Lemmih> :p
18:23:08 <rostayob> eheh
18:24:00 <rostayob> but it's a problem
18:24:00 <Lemmih> Well, mutating things is bad. And most dbs are guilty of sacrifising acid-guarantees to the devil for improved performance.
18:24:51 <Lemmih> I'm totally fine with O(c ln n)
18:24:59 <rostayob> Lemmih: honestly, I'd rather give up acid and have performance in a web-app. I think acid is crucial when money is around.
18:25:30 <rostayob> but otherwise
18:25:40 <rostayob> I mean you can have a set of atomic operations
18:25:48 <Lemmih> I don't see why updating a fixed number of indexes is a performance problem.
18:26:25 <rostayob> no I was saying that because you mentioned it, but it doesn't have to do with the indexes
18:26:45 <stepkut> Lemmih: one advantage to the current way of doing indexes (which each index is stored as a separate map) is that it could allow you to have different types of indexes.. for example, if you are storing longitude and latitude, you might want a structure like an r-tree that can do spatial searches
18:27:12 <Lemmih> We might be at an impasse here. In my book, correctness is absolute. Performance is a distant second.
18:27:55 <rostayob> Lemmih: well, in web app expecially, performance is a big thing
18:28:09 <stepkut> rostayob: and getting the right answer is not ?
18:28:26 <rostayob> stepkut: oh yes, that's why I'm using haskell
18:28:46 <rostayob> but saying that performance is a "distance second" is a very wrong logic for a web framework, imho.
18:28:50 <rostayob> *distant
18:29:14 <rostayob> anyway, I'm off to eat something
18:30:54 <stepkut> deriveSafeCopyHappstackData is good news :)
18:31:06 <Lemmih> Actually, normal dbs gotta update the indexes as well. You can't just mutate the data underneath without updating the indexes.
18:31:36 <stepkut> well, sortof.. in reality to migrate stuff I need to load up the old monolithic state data anyway
18:31:57 <stepkut> and once it is decoded in RAM I can just copy it to the new acid-state stuff
18:32:02 <stepkut> Lemmih: right
18:32:15 <Lemmih> I really like the idea of not doing subqueries. Simplifies things a lot.
18:32:45 <Lemmih> A proper implementation of IxSet might really kick ass.
18:32:47 <stepkut> Lemmih: how would you do implement queries on multiple keys ?
18:33:05 <Lemmih> With an index over multiple keys.
18:33:16 <stepkut> ah
18:33:32 <Lemmih> But hm. Is that really what normal dbs do?
18:33:38 <Lemmih> That doesn't seem right.
18:33:48 <stepkut> Lemmih: I don't think so.. but I don't have a database book around anymore
18:34:24 <stepkut> they definitely have support for multiple column indexes..
18:36:11 <stepkut> i think in some cases the database will guess which index is the most selective, run that query first, and then do a linear search over the remainder
18:36:22 <Lemmih> Yeah, I'm thinking that as well.
18:36:46 <stepkut> but there is also fancy like converting the b-trees to bitmaps and doing ANDs on them
18:36:54 <stepkut> or hash joins
18:37:25 <stepkut> for example, it might just do a search on each index individually, and then join the results
18:37:52 <stepkut> there is lots of estimation and guessing
18:38:08 <Lemmih> Oh yeah, hash join would be something like O(m+n) instead of O(m ln n).
18:38:21 <stepkut> that is why there are so many lines of code in a modern relational database :)
18:38:36 <Lemmih> I wonder how well it would work to make all these decisions explicit in a library.
18:39:38 <stepkut> right -- so then the database has to guess what the values for m and n are going to be and figure out which algorithm will have the best run time
18:40:38 <stepkut> I think one issue that a lot of people don't really know what is happening when they call SQL -- so they just assume it is good :)
18:40:57 <Lemmih> I am so guilty of that (:
18:41:00 <stepkut> like, lots of databases probably have no indexes created -- so there is no choice but to do a full table scan for every query
18:41:14 <stepkut> but people think "I'm using a database, surely it is doing something smart!" :)
18:42:06 <stepkut> so, if you then come up with something that makes the problems obvious, people say, "That's awful" ignorant of the fact that they are using something even more awful :p
18:47:21 <stepkut> so, i wonder, in a normal SQL database, if I do a query on one index, but sort the results by a second index.. what happens
18:48:41 <stepkut> one advantage of sub-queries is that it does give you explicit control at the library level
18:49:01 <stepkut> let's say that key1 is highly selective, and key2 is not.
18:49:08 <Lemmih> I still like the idea of throwing indexes at it in a deliciously brute force-ish manner.
18:49:59 <stepkut> in the current code, if you do, (ixset @= key1) @= key2, you will get better performance than (ixset @= key2) @= key1, because when it rebuilds the indexes, it is a small set of data it has to work with to rebuild
18:50:23 <Lemmih> Right.
18:51:08 <stepkut> composite keys let you do, (ixset @= (Composite key1 key2)) nicely
18:51:22 <rostayob> I was talking about NoSQL databases before, since an IxSet is not really a relational database, and I'm quite sure they do it that way, or at least this would suggest so: http://www.mongodb.org/display/DOCS/Indexes#Indexes-CompoundKeysIndexes
18:51:22 <stepkut> but doesn't help with, (ixset >@ key1) <@ key2
18:52:52 <stepkut> kd-tree is nice for that because it does not have to rebuild anything between queries, and it actually uses the sorting already done by both indexes
18:53:27 <stepkut> (instead of using a index for the most selective key and a table scan for the second stage)
18:54:37 <stepkut> rostayob: in MongoDB, what happens if you query by one index, but sort by a different index?
18:55:43 <rostayob> stepkut: if you query by index a, and sort by b, you have to have an index on a, b
18:56:24 <rostayob> I mean an index means that you can query and sort (but just in one direction)
18:56:42 <stepkut> what is sort order of, compare (4, 5) (7, 3), ?
18:57:28 <rostayob> that's ascending
18:57:33 <Lemmih> It is uncanny how much we think alike, stepkut.
18:57:50 <rostayob> no but
18:57:52 <rostayob> wait
18:57:55 <stepkut> rostayob: but if we want to sort by 'b', aren't things in the wrong order ?
18:58:01 <Lemmih> rostayob: He's asking how an index on (a,b) will help you sort on b.
18:58:46 <rostayob> in mongoDB you specify the ordering of both keys when creating compound indexes
18:58:52 <rostayob> db.things.ensureIndex({j:1, name:-1});
18:59:42 <rostayob> so I guess it orders it lexicographically but taking in account the specified direction
18:59:45 <rostayob> but I'm just guessing eh
19:01:00 <stepkut> rostayob: i think that a compound index lets you query and sort efficiently by the compound value.. but if you want to query an 'a' and then  sort by b, then you are going to have to do a full sort on the results of querying 'a'
19:02:19 <rostayob> stepkut: if you select every element in which a == x, it is going to be sorted after b automatically, isn't it?
19:02:46 <Lemmih> Yes.
19:02:54 <rostayob> exactly
19:03:08 <Lemmih> But not so for >, < or /=.
19:03:10 <stepkut> rostayob: for ==, but what about > ?
19:03:21 <stepkut> ACTION got jinxed
19:03:30 <rostayob> stepkut: ? what do you mean? if you say for a > x?
19:03:45 <rostayob> ah you want a > x and sorted after b. no you can't do that.
19:03:56 <stepkut> rostayob: right.. for example, all the articles posted since midnight..
19:03:58 <Lemmih> stepkut: We haven't met, have we?
19:04:03 <rostayob> that's not the aim of those indexes
19:04:07 <stepkut> Lemmih: only in cyberspace
19:07:28 <stepkut> rostayob: can you be more explicity about what you mean?
19:07:41 <Lemmih> If I got this right, kdtrees have pretty much the same complexity as normal dbs.
19:08:28 <Lemmih> Not being able to sort on different indexes isn't such a huge handicap when nobody else can either.
19:09:09 <rostayob> Lemmih: yeah but you can't sort efficiently at all with a kd-tree...
19:09:27 <stepkut> Lemmih: right. But if your query index and sort index are the same, then they get sort for free, and we don't
19:09:41 <rostayob> stepkut: I was just saying that you can't do what you were asking with mongoDB compound indexes
19:09:56 <Lemmih> Oh, that's right.
19:10:49 <stepkut> rostayob: let's say you want to get the highest rated articles from today and display them in reverse cronological order. How does mongodb do that efficiently?
19:11:23 <rostayob> stepkut: you have an index over the score
19:11:37 <stepkut> rostayob: you also have to index over the date, right ?
19:12:08 <rostayob> well if you want the highest rated on a specific day, yes. You'd have a compound index (day, score)
19:13:15 <Lemmih> That would be a very odd thing to want (:
19:13:25 <stepkut> right, so that works ok, because you are limited by day and sorting on the day
19:13:42 <stepkut> now, what if you want to get the articles from today, and sort them by popularity, highest rated first ?
19:13:56 <stepkut> let me think..
19:14:07 <stepkut> that will be ok as well I think
19:14:10 <Lemmih> How about from the last 24 hours?
19:14:17 <rostayob> Lemmih: that gets tricky
19:14:39 <rostayob> you rarely want stuff like that anyway.
19:15:09 <Lemmih> Reddit seem to like doing that.
19:15:16 <rostayob> Lemmih: really? no
19:15:44 <rostayob> reddit has a fairly simple scoring algorithm
19:15:49 <rostayob> that takes time into account
19:15:58 <rostayob> but afaik it doesn't show just the last 24 hrs
19:16:13 <Lemmih> rostayob: What does viewing top scoring links from today do, then?
19:16:38 <rostayob> Lemmih: it selects that specific day
19:16:39 <stepkut> rostayob: what about ebay auctions ending in the next 24 hours sorted by price ?
19:16:52 <Lemmih> rostayob: I assume it means the last 24 hours and not Wednesday.
19:17:01 <rostayob> and where is that option? I don't browse reddit often anyway...
19:17:13 <Lemmih> rostayob: How does that work with an international crowd?
19:17:27 <rostayob> I'm not saying you *never* need it. of course you might need it. I'm saying you need other stuff more.
19:17:42 <Lemmih> I'm fairly sure the page doesn't go blank at midnight.
19:18:19 <rostayob> Lemmih: yeah you're right, I don't know reddit enough (:
19:18:27 <rostayob> reddit uses some flavour of sql
19:19:15 <rostayob> and I wouldn't be suprised if their solutions are not exactly efficient anyway...
19:19:44 <Lemmih> Given reddits instability, neither would I (:
19:19:57 <stepkut> rostayob: what about newegg.  find all the hard drives greater than 1TB and sorted by price ?
19:20:43 <rostayob> stepkut: I'd do that with mapreduce
19:20:54 <rostayob> so you categorize stuff
19:21:16 <stepkut> ok
19:22:25 <stepkut> but you still have that problem that there is a range of hard drive sizes greater than 1TB, and prices. but the sorted list [(size, price)] does not ensure that prices will be sorted correctly..
19:22:34 <Lemmih> We planned happs-state to do (a form of) mapreduce, way back when. (:
19:22:59 <stepkut> Lemmih: me too ;)
19:23:03 <rostayob> stepkut: there will be no (size, price) index, there will be various size categories
19:23:08 <Lemmih> (in tandem with sharding)
19:23:11 <rostayob> ordered by price
19:23:32 <stepkut> rostayob: so to sort by price, you have to combine the results from each size category and sort them ?
19:23:52 <rostayob> also, they probably have a few hard drive. in that case, you just sort them on the fly I guess...
19:24:15 <rostayob> *hard drives
19:24:22 <stepkut> Lemmih: I wrote a long post a while back about all the various ways I could think of to use MACID in a distributed fashion -- including mapreduce :p
19:24:46 <rostayob> stepkut: you do map/reduce grouping all the hard drives in 500MB intervals or something like that
19:24:58 <rostayob> but I mean
19:25:02 <rostayob> if you have 500 hard drives
19:25:11 <rostayob> you just do everything in the usual way
19:26:50 <stepkut> just trying to figure out how often you actually get the sort for free with b-trees in practice
19:27:35 <rostayob> stepkut: the point is that there are a lot of cases in which you just can't afford having O(n) to sort
19:28:03 <rostayob> the usual "getting the first 10 things" problem
19:28:43 <stepkut> rostayob: but let's say you have 5,000,000 hard drives.. aren't you stuck in the same situation then? You want to return the first page of hard drives?
19:30:02 <rostayob> stepkut: there are cases in which you have to work hard to be fast yes
19:30:13 <rostayob> but I'd be satisfied with those simple orderings
19:30:44 <rostayob> for example, to get the top in the last 24 hrs, I could simply get the last two days and filter out those
19:31:08 <stepkut> you can do that with kd-tree..
19:31:32 <rostayob> no I mean that and ordering by score
19:31:50 <rostayob> mhm no...
19:31:51 <stepkut> how?
19:32:31 <rostayob> i was thinking of getting two days ordered by score and then filtering while merging,  but whatever that's not  the point :P
19:33:12 <Lemmih> Merging is exactly what we're trying to avoid, isn't it?
19:33:31 <stepkut> what you are saying is that when you can't get the ordering 'for free' then you have to have some other way to make 'n' small so that the sort isn't painful ?
19:33:35 <rostayob> Lemmih: if you've already selected just two days of posts, it shouldn't be a problem
19:33:45 <rostayob> stepkut: yes
19:34:07 <Lemmih> rostayob: Then it shouldn't be a problem for a kdtree.
19:34:09 <stepkut> rostayob: and kdtree has one less instance in which you can get the ordering for free ?
19:34:38 <rostayob> there are a lot of cases in which there are no "directions" but the sort
19:34:57 <rostayob> you can't narrow the set of things to sort
19:35:31 <stepkut> what if the sort+limit for kdtree was O(n) where 'n' was the number of items you were limited to ?
19:35:48 <rostayob> oh, that's great. can you do that?
19:35:55 <stepkut> or even O(n log n) (or even O(n^2) ?)
19:35:59 <stepkut> dunno
19:36:01 <stepkut> maybe ?
19:36:18 <rostayob> I can't see it right now
19:36:56 <stepkut> let's say you have a function, merge :: [a] -> [a] -> [a], that merges two sorted lists and produces a new list in sorted order
19:37:07 <stepkut> that is O(n) where n is the length of the shortest list, right ?
19:37:21 <rostayob> yes
19:37:36 <rostayob> oh, I see where you're heading.
19:37:48 <stepkut> now if we have, take m $ merge xs ys, then that is O(min(m,n)) I think
19:37:52 <rostayob> but if you're indexing after a lot of things in the kd-tree
19:37:55 <rostayob> it doesn't work that well
19:38:15 <rostayob> because that works only if you're ordering after the ordering of the root node
19:38:21 <rostayob> but yeah it's not bad...
19:38:26 <rostayob> also
19:38:38 <rostayob> in that way you can order stuff just in one direction
19:38:42 <rostayob> which is not too bad anyway...
19:38:51 <rostayob> yeah I didn't think of that
19:39:07 <stepkut> the order after the root node is actually the easy part.. you just order the left and right children and the do, left ++ right.. you don't even need merge at that level
19:39:14 <Lemmih> Is sorting in the opposite direction a problem?
19:39:49 <Lemmih> jinxed.
19:39:53 <stepkut> Lemmih: I don't think so, you just do mergeReverse
19:40:18 <Lemmih> yeah.
19:40:19 <stepkut> which expects the lists in the opposite direction
19:40:26 <stepkut> highest to lowest
19:40:35 <rostayob> mhm cool
19:40:47 <rostayob> what if you want to skip n things?
19:41:29 <stepkut> take l $ drop m $ merge xy ys, O (min (m+l, n)) perhaps?
19:41:57 <stepkut> i am wondering if there is a way to exploit the initial query to also start the ordering
19:42:25 <stepkut> equalAndSort :: (Index k a, Index ordKey a) => k -> ordKey -> KdTree a -> [a].
19:42:48 <stepkut> well, let's pick the easier, equalAndSort :: (Index k a) => k -> KdTree a -> [a].
19:43:07 <stepkut> oh, except that is pointless
19:43:14 <stepkut> because k will be the same
19:45:20 <stepkut> greaterThanAndSort :: (Index k a) => k -> KdTree a -> [a]
19:46:23 <stepkut> perhaps we can reuse some of the compares from the greater than traversal to also sort data ?
19:46:28 <rostayob> don't you need to do the sorting & merging anyway to find out all the elements greater than something?
19:46:42 <stepkut> no
19:46:48 <stepkut> you need to do a compare
19:47:04 <stepkut> but if the branch is 'less than' then you just prune it
19:47:46 <rostayob> oh, right
19:47:52 <stepkut> like, if you have 'k' at the root, and you want all the values with @> 'k', then you just prune the left node and you are done.
19:48:08 <rostayob> yeah I see that, but in some cases you'd have to go in both branches
19:48:22 <rostayob> if the key in the node is >
19:49:16 <rostayob> I mean the complexity won't be log n
19:50:00 <rostayob> ah no no, sorry
19:50:03 <stepkut> if you had 'k' at the root, and you wanted @> d. Then you keep the right tree, and traverse down the left tree keeping right trees until you find the node where you can prune the left tree
19:50:17 <rostayob> yeah yeah, I'm getting tired...
19:51:04 <rostayob> this actually might work well
19:51:28 <rostayob> there's still the rebalancing issue
19:51:32 <stepkut> right
19:53:31 <stepkut> so if we ignore compound indexes for a moment, in a normal b-tree system you only get sorting for free if your sort index is the same as your query index?
19:54:09 <rostayob> stepkut: yes
19:54:28 <stepkut> so, if you want to find all the posts by user X, and sort them by date, then you don't get free sorting?
19:54:43 <stepkut> (which is why reddit chokes when you do that ;p )
19:55:08 <rostayob> well if the B-tree is only on one index (the username in this case), no
19:56:03 <rostayob> a smart solution is to have "views", like couchdb does
19:56:27 <stepkut> so, finding all the posts by user X, and then sorting by user  is pointless, because the user will be the same in all the records
19:56:39 <rostayob> well yes (:
19:57:00 <stepkut> so, for a single index, the free sorting is only useful when you do a range query. like find all users with ids < 1000 and sort them
19:57:56 <stepkut> so, if you want to make 'find all the posts by user X, and sort them by date' fast, you need a compound index. (user, date)
19:58:29 <stepkut> the assumption here is that (user, date) is ordered lexograpically
19:59:03 <rostayob> yes
20:00:10 <stepkut> and that when we query on user X and then sort by date, it is going to use the compound index (user, date) to find all the entries with (user, *anydate*), and then be smart enough to realize that all the 'user' are the same, so it can then assume they are also sorted by date?
20:02:10 <stepkut> if we want to do, find all posts by users X, Y, or Z, and sort them by date, then we are forced to do at least do a merge ?
20:02:24 <rostayob> stepkut: mhm I don't know how it works in detail. also considering that there indexes are not unique in many dbs (mongoDB), so I don't know how they sort it
20:03:04 <stepkut> I wonder if full table sorts are happening more often than you think?
20:03:12 <stepkut> and how would you know ?
20:03:18 <rostayob> but again, I wouldn't worry after narrowing to just thousands of elements
20:03:23 <rostayob> stepkut: I don't (:
20:03:33 <stepkut> what if we just add toAscList to IxSet and pretend it is fast?
20:03:42 <rostayob> stepkut: but the IxSet situation before was no sort at all
20:04:19 <rostayob> I suspect that those super fast databases use loads of dirty tricks to speed things up anyway...
20:05:01 <stepkut> well, we just need better marketing so that people suspect IxSet does dirty tricks to.. as long as no one is bothering to check anything ;)
20:05:11 <rostayob> I mean mongoDB is... fast.  they are all about speed lol
20:05:24 <rostayob> but I'd rather use a slower haskell solution
20:05:37 <stepkut> MACID is all about speed -- disk is too slow, RAM for us!
20:06:22 <rostayob> I mean the mmap the whole database, I don't know why that makes things faster but that's why mongoDB is quite pointless on a 32bit system
20:06:38 <rostayob> stepkut: yeah the fact that it's in RAM is really nice (:
20:07:00 <stepkut> Lemmih was talking about adding mmap support to IxSet so that you could have the option of putting stuff on disk too
20:07:59 <rostayob> that would be cool
20:08:21 <stepkut> cooler than redis and mongodb combined?
20:11:29 <rostayob> stepkut: yes (:
20:11:32 <stepkut> nice
20:11:43 <rostayob> well but redis is italian.
20:12:17 <rostayob> so maybe not cooler then redis ehe
20:12:22 <stepkut> MACID is multicultural
20:12:30 <rostayob> is it?
20:12:59 <rostayob> eheh
20:13:16 <rostayob> anyway yeah, if I can have a fast haskell db
20:13:22 <rostayob> I'd be reaaaaly happy
20:13:46 <stepkut> me too :)
20:13:59 <rostayob> but it would seriously threaten my degree lol
20:14:07 <rostayob> I already spend too much time on haskell
20:14:19 <stepkut> haskell is the only language I really know these days
20:14:40 <rostayob> because you work with it
20:14:45 <rostayob> I mean it's your job (:
20:14:47 <rostayob> right?
20:15:31 <stepkut> yeah
20:15:54 <rostayob> I hope I'll get to that point, or maybe just do a phd or something like that
20:15:56 <rostayob> boh
20:33:22 <rostayob> stepkut: there is still one problem with the current ixset, that is that you can only have an ascending list
20:33:35 <rostayob> for example if you want to order by date, you'll want the opposite