--- Log opened Sun Aug 09 00:00:56 2009
12:22 < mightybyte> Anyone know the purpose of line 43 in Happstack.State.Control?
12:22 < mightybyte> http://bit.ly/17qEuL
12:23 < mightybyte> ...because _txConfig is never used
12:28 < Lemmih> Ack, who in their right mind would write code that ugly.
12:30  * stepcut is not thrilled about the embedded secret command line options 
12:30 < Lemmih> The name of the 'parseArgs' function legacy issue. It no longer returns any configuration but it's still used for configuring syslogger.
12:30 < stepcut> mightybyte: parseArgs had the side effect of exiting the program if the command-line can not be parsed
12:31 < stepcut> mightybyte: and setting some logging flags
12:32 < mightybyte> Ok, I understand the side effect, but could we get rid of the "_txConfig <-" then?
12:33 < stepcut> we should get rid of the whole parseArgs thing and do it right
12:35 < mightybyte> Hmmm, does +RTS -hy -p only get statistics for a fixed amount of time at program startup?
12:36 < mightybyte> I ran the program for ~3 minutes and the profile graph only shows up to about 55 seconds.
12:36 < Lemmih> mightybyte: Clock time or CPU time?
12:37 < mightybyte> Oh, I don't know what the X axis is.  It just says seconds.
12:37 < Lemmih> I bet it is CPU time.
12:37 < mightybyte> Yeah, that makes sense.
12:38 < mightybyte> Lemmih: Your idea of writing a checkpoint works well.  I got some interesting results.
12:38 < mightybyte> Here is a chronology of running my app:
12:38 < mightybyte> program start
12:38 < mightybyte> write checkpoint
12:38 < mightybyte> 40s: levelled off at 561m VIRT, 509m RES
12:38 < Lemmih> program end?
12:38 < mightybyte> 2m: started data queries
12:39 < Lemmih> You're using top?
12:39 < mightybyte> 2m 45s: query finished, 929m VIRT, 878m RES
12:39 < mightybyte> Yes
12:39 < Lemmih> (If you are, don't)
12:39 < mightybyte> 3m (approx): shutdown started
12:39 < mightybyte> write checkpoint (again)
12:39 < mightybyte> program end
12:40 < mightybyte> So after the initial checkpoint was written, top's memory usage levelled off.
12:40 < mightybyte> And then when I executed a few queries memory usage almost doubled.
12:40 < mightybyte> What should I use instead of top?
12:41 < Lemmih> You should use GHC.
12:41 < mightybyte> I did have profiling on in there
12:42 < Lemmih> Data from top is almost completely useless.
12:42 < mightybyte> Well, I wasn't intending it to be used for debugging as much as I want to see what my system thinks is going on.
12:43 < Lemmih> Allocating X megs for a fraction of a second will permanently increase your memory usage by X*3.
12:44 < Lemmih> As such, top can only tell how much memory your application used at its peak.
12:45 < mightybyte> Yes, but like I said in that email, allocating 900 megs for a fraction of a second is still a problem for me.
12:45 < Lemmih> GHC can tell you more exactly how much peak memory it used.
12:45 < Oejet> Lemmih: Because the data segment size is permanently increased?
12:45 < Lemmih> mightybyte: Tell GHC to use no more than, say, 100M.
12:46 < mightybyte> How do I do that?
12:46 < Lemmih> Oejet: No, GHC just never releases any memory.
12:46 < Lemmih> mightybyte: +RTS -M100M -RTS
12:46 < mightybyte> Ahh, but what happens if it needs 101M?
12:47 < Lemmih> It will GC and then blow up.
12:47 < Lemmih> You can't have it both ways.
12:47 < Lemmih> You either have to use 101M permanently or not at all.
12:48 < mightybyte> Right, I understand that, but none of that helps me figure out why it's using so much.
12:48 < Lemmih> Profiling should tell you. (:
12:49 < mightybyte> Here's the profile generated by the run I outlined above: http://mightybyte.net/~dgbeards/profile.ps
12:49 < Lemmih> Excellent.
12:49 < mightybyte> It does indeed look like that query is using a huge amount of memory, which seems unnecessary to me.
12:50 < Lemmih> Looks like the first hump is decoding and the second is encoding.
12:50 < Lemmih> What query?
12:50 < Lemmih> You shouldn't do any queries.
12:51 < mightybyte> I did a couple queries after the initial checkpoint finished and top indicated that memory usage had levelled off.
12:51 < Lemmih> Don't use top.
12:51 < mightybyte> I think the first hump is the checkpoint and the second hump is the queries.
12:51 < Lemmih> I mean it.
12:52 < mightybyte> top gives me a small shred of useful information while the program is running.
12:52 < Lemmih> No it doesn't.
12:52 < Oejet> The graph only shows max ~250MB allocated, and not ~900MB?
12:52 < Lemmih> It fills you up with misinformation.
12:52 < mightybyte> How doesn't it?  I think the derivative of the top graph is somewhat useful.
12:53 < Lemmih> It's not. What top tells you is always slightly wrong at best and horribly wrong at worst.
12:53 < mightybyte> Well, probably more the second derivative.
12:53 < Lemmih> Ask GHC instead.
12:53 < Lemmih> +RTS -s -RTS
12:53 < Oejet> mightybyte: What happens if you set -M200M?
12:54 < mightybyte> Oejet: Ok, let me try.
12:54 < Lemmih> mightybyte: -s will give you much more precise information.
12:54 < mightybyte> Oejet: I get "heap exhausted" pretty quickly.
12:55 < Oejet> Aha, and that should not happen with -M300M.
12:57 < mightybyte> Lemmih: But -s doesn't give me real-time information in a non-invasive way.
12:57 < Lemmih> mightybyte: Real-time information is overrated. Loading and saving the checkpoint can't be that slow.
12:57 < Oejet> [not that is helps with the real problem at hand]
12:59 < mightybyte> Ok, here's a graph of another run with only the checkpoint without the subsequent queries.
12:59 < mightybyte> http://mightybyte.net/~dgbeards/profile-noqueries.ps
13:00 < Lemmih> Wonderful.
13:00 < mightybyte> ...which seems to confirm my hypothesis that the first hump in the first graph was the checkpoint and the second hump was the queries.
13:00 < Lemmih> This is great. This problem we can fix.
13:00 < mightybyte> Excellent
13:01 < mightybyte> Interestingly enough, this graph makes it look like it doesn't go above 200m, but my run with -M200M still crashed.
13:01 < Oejet> mightybyte: Could be because of the sampling gaps?
13:02 < Lemmih> That's to be expected. -M limits heap size.
13:02 < mightybyte> Yes, but this second graph seems to stay under 200M
13:02 < Lemmih> mightybyte: GC overhead is usually around 2x-3x.
13:02 < mightybyte> Oh, ok.
13:03 < Oejet> What is the S type?
13:03 < mightybyte> Here is the output from -s on the run that generated the second graph
13:04 < mightybyte>    8,034,841,752 bytes allocated in the heap
13:04 < mightybyte>   10,786,548,312 bytes copied during GC
13:04 < mightybyte>      221,339,880 bytes maximum residency (93 sample(s))
13:04 < mightybyte>        8,384,232 bytes maximum slop
13:04 < mightybyte>              455 MB total memory in use (7 MB lost due to fragmentation)
13:05 < mightybyte> top showed 510 megs at the highest point, so it seems reasonably accurate. ;)
13:05 < Lemmih> Peak state size: 221megs. Peak memory usage with GC overhead: 455megs.
13:05 < Oejet> Makes sense wrt. the GC overhead.
13:07 < Oejet> Lemmih: Will you reveal the "This problem we can fix." part soon? :)
13:08 < mightybyte> Ok, I'm running with -M300M now and it got through the checkpoint creation.
13:08 < Lemmih> Oejet: Well, let's make the suspense build a little while longer...
13:08 < mightybyte> Now I'll run that query.
13:09 < Lemmih> Oejet: The Binary instance for IxSet isn't very efficient.
13:09 < Lemmih> Oejet: It translates the set to a list and then takes the length of the list before serializing it.
13:10 < mightybyte> And the query exhausted the heap.
13:11 < mightybyte> Lemmih: So which is the problem?  The memory used in checkpointing or the memory used for the query?
13:12 < Lemmih> mightybyte: Those are two separate problems.
13:12 < mightybyte> Ok.  You think the query memory is actually a problem?
13:13 < Oejet> Hm, where exactly is that instance defined?
13:14 < Lemmih> Oejet: In happstack-ixset, I think.
13:15 < Oejet> happstack]$ grep -r "instance .*Binary" .     <-- gives only .../Data/Serialize.hs:instance Binary (VersionId a) ...
13:17 < stepcut> Oejet:
13:17 < stepcut> Oejet: happstack/happstack-ixset/src/Happstack/Data/IxSet.hs
13:17 < stepcut> instance (Serialize a, Ord a, Data a, Indexable a b) => Serialize (IxSet a) where
13:18 < stepcut> Oejet: there is an extra layer on top of Binary which adds the support for versioning and migration
13:18 < Oejet> stepcut: Ah, thanks.
13:21 < Oejet> "putCopy = contain . safePut . toList"
13:22 < stepcut> yep
13:23 < mightybyte> Is there a "SizedList" data structure anywhere that would allow a "list" to be used here and reduce the overhead for calculating the length of the list?
13:26 < stepcut> mightybyte: I don't think that is needed
13:27 < mightybyte> Yeah I agree.  I was thinking more broadly than just Happstack.
13:27 < Oejet>     putCopy lst
13:27 < Oejet>         = contain $
13:27 < Oejet>           do put (length lst)
13:27 < Oejet>              getSafePut >>= forM_ lst
13:27 < stepcut> mightybyte: I believe the problem is that we convert the ixset to list (which can be done lazily), but then we force the list to calculate the length (which sucks up ram). But, we can just calculate the length of the list from the ixset
13:29 < stepcut> A SizedList would have to be exported abstract I think...
13:30 < stepcut> (that is not a problem, just a thought)
13:30 < Lemmih> Oejet: Thought more about LHC?
13:30  * stepcut wonders how to rotate the output of hp2ps
13:32 < Oejet> So the problem is that "data IxSet a = ISet [a] | IxSet [Ix a]" does not contain the size of the lists?
13:33 < Oejet> Lemmih: Yes, a bit. Will have to think much more.
13:35 < stepcut> Oejet: I think if we can call IxSet.size with out increasing the overhead, then we should be ok?
13:36 < mightybyte> stepcut: Evince -> Edit -> Rotate Right :)
13:37 < stepcut> Oejet: I think the spines lists inside the IxSet will already be fully evaluated, so that should be feasible. It's converting the IxSet to a different list, and then finding the length of that list which is problematic..
13:37 < stepcut> mightybyte: ah, I looked under View, not sure why I didn't try Edit.
13:38 < stepcut> my profile results with hy only show: [] (,) S
13:38  * stepcut wonders what he did wrong, if anything
13:39 < mightybyte> I compiled with -prof -auto-all -caf-all
13:39 < stepcut> I'll add -caf-all and try again
13:39 < mightybyte> ...as shown in RWH
13:39 < stepcut> did you recompile all of happstack with those flags, or just your app ?
13:40 < Oejet> stepcut: Ah, so the problem is the copy of the list inside IxSet?
13:40 < stepcut> Oejet: um.. I think the problem is that we make a new list based on the one inside IxSet and then find the length of the new list
13:41 < Oejet> stepcut: The copy created by "contain . safePut . toList", I mean.
13:41 < stepcut> yeah
13:41 < stepcut> Oejet: so, we could instead make a custom putCopy for IxSet, and instead of put (length lst) it would start with, put (size ixset) >> getSafePut >>= forM_ lst, or something
13:42 < mightybyte> stepcut: I only compiled my app with those flags.  I built everything else with --enable-executable-profiling --enable-library-profiling
13:42 < Oejet> stepcut: ...which is not really a copy, because it removes duplicates, right?
13:42 < stepcut> Oejet: duplicates?
13:43 < Oejet> stepcut: toList = Set.toList . toSet    <-- Why do that, if not to remove duplicates of the list?
13:44 < stepcut> Oejet: IxSet has several different internal representations. toSet just turns an IxSet into a normal Set.
13:45 < Oejet> Yes, "Ix a" is pretty complicated.
13:46 < stepcut> Oejet: An IxSet is like a Set, except you can also have indexes to look things up. the toList is a simple implementation for getting from an IxSet to a List by creating a normal Set as an intermediate step
13:47 < stepcut> Oejet: on the surface it appears to me that many functions in IxSet are not very efficient, but it's hard to say without profiling...
13:47 < stepcut> Oejet: for example, IxSet.size converts the IxSet to a Set and then finds the size of the Set.
13:48 < stepcut> to do that conversion it has to convert a Map -> Set
13:48 < stepcut> that can't be good for memory or CPU usage..
13:50 < mightybyte> Would it be worth it to rewrite IxSet (or maybe create a completely new structure) that maintains one store for all the items and then separate Maps with indexes into the main store instead of (Map key (Set a))?
13:51 < mightybyte> Then all instances of toList could just reference the already existing main store.
13:53 < stepcut> how would the separate Maps with indexes into the main store work?
13:54 < Oejet> There is are Serialize instances for Map and Set. Why not use them directly instead of converting to lists?
13:55 < mightybyte> Well, you could have "main store = Array" and indexes be "Map key (Set Word64)"
13:55 < mightybyte> Where Word64 is the index into Array
13:55 < stepcut> Oejet: It think should be possible change the putCopy to,     putCopy = contain . safePut . toSet, but I am not sure it would help anything
13:56 < stepcut> Oejet: we don't want to directly serialize the IxSet because when you deserialized it you would lose sharing
13:56 < mightybyte> Yeah, because toSet = Map.fold Set.union Set.empty
13:57 < mightybyte> s/Yeah, //
13:57 < mightybyte> How expensive is that?
13:57 < stepcut> how expensive is what?
13:57 < mightybyte> that toSet
13:58 < stepcut> depends on how many keys there are for the first index in your IxSet
13:59 < stepcut> if your first index is a key that points to ever element in the set, then it would be quite cheap
13:59 < mightybyte> My first key is the primary key, so there's one key for every item.
13:59  * stepcut looks at something
13:59 < Oejet> stepcut: Understood.
14:03 < stepcut> mightybyte: I think if you did something like, $(inferIxSet "AccountIxSet" ''Account 'noCalcs [''Account, ''Username, ''UserId]), where your first index is the same as the type stored in the IxSet, then toSet would be faster, beacuse it would only Map.fold  over a single element
14:03 < stepcut> though, you pay the price of having to maintain the index all the time
14:04 < Oejet> Sharing could maybe be re-introduced when de-serializing, or is it only for on-disk deflation?
14:06 < stepcut> mightybyte: I don't think it would save any space overall, since instead of creating the Set on the fly when you call toSet, it would just always have that Set hanging around
14:07 < stepcut> Oejet: Sharing is currently reintroduced when deserializating.
14:07 < mightybyte> True, but I was mainly thinking about any overhead involved in the fold and union
14:07 < stepcut> Oejet: because it reads the list of values from the checkpoint and then rebuilds the IxSet
14:10 < stepcut> mightybyte: it may be cheaper overall to construct that Set on an as need basis, rather than maintain something that is always 'update-to-date' with the tradeoff being that your response time goes up when you do need that Set.
14:10 < mightybyte> I guess if all the pointers are shared, then we implicitly already have the basic scheme I described with "main store = Array"
14:10 < stepcut> So, it's a trade off between latency and amortized cost
14:10 < mightybyte> Yeah
14:10 < mightybyte> I guess right now I'm not as concerned about latency as I am about memory footprint.
14:11 < stepcut> mightybyte: right, in an 'IxSet a' there should only be one copy of each value of type 'a'. And all of the different index Maps point to those same memory locations. That is why if we directly serialized the IxSet structure and restored it, bad things would happen. You would end up with a copy of 'a' for every index Map instead of only one 'a'
14:12 < mightybyte> Got it
14:14 < mightybyte> That's where my "main store" idea would help a little
14:15 < mightybyte> "main store" is basically an up-to-date union.
14:21 < mightybyte> Holy crap, after running a bunch of different queries, my memory usage jumped up to 1.2G according to top.
14:21 < mightybyte> Let's see what -s has to say
14:23 < mightybyte> Hmmmm:
14:23 < mightybyte>      381,520,504 bytes maximum residency (15 sample(s))
14:23 < mightybyte>             1062 MB total memory in use (16 MB lost due to fragmentation)
14:24 < stepcut> mightybyte: It should be possible to extend the current IxSet implementation so that the first Ix is a Data.Map with one key that points to all the elements in the IxSet
14:25 < mightybyte> Yeah, I'm doing that anyway.  Maybe there should be a way to tell IxSet which of your keys is the primary key.
14:25 < Oejet> stepcut: size' is an attempt to calculated the size of an IxSet without needing more than constant space: http://moonpatio.com/fastcgi/hpaste.fcgi/view?id=3256#a3256
14:26 < Oejet> *calculate
14:28 < mightybyte> I guess I can't complain too much about the maximum residency there.  But 3x that for total memory use is somewhat problematic.
14:28 < Oejet> I mean without more than constant _extra_ space. The other functions are just copied for easy comparison.
14:31 < stepcut> Oejet: that is still going to have to create an intermediate Set though, and then calculate the size of that Set, yes?
14:34 < Oejet> stepcut: Yes, and it cannot do it lazily, because it has to know _all_ values of the set before knowing its final size.
14:34 < stepcut> Oejet: right
14:36 < stepcut> here is what my profiling graph looks like, http://src.seereason.com/~jeremy/rs.ps
14:37 < mightybyte> And top probably says you're using ~500 megs?
14:38 < stepcut> yeah
14:38 < mightybyte> Man, that GC overhead kills.
14:38 < stepcut> at least, I didn't look
14:38 < Oejet> "S" in the profiling graph means "Set"?
14:39 < stepcut> no idea
14:39 < stepcut> I don't think so though
14:40 < mightybyte> The Version instance for [] shows that it is a Primitive.  Does this prevent our ability to automatically migrate the Happstack users' data if we change the Serialize instance?
14:41 < stepcut> Oejet: I think there is actually a type named S with a constructer S somewhere
14:42 < Oejet> stepcut: At least with size' it would not be caused by list allocation.
14:42 < stepcut> Oejet: yeah, would be interesting to profile and see if it makes a difference
14:43 < Oejet> Do it, do it.
14:45 < mightybyte> But how will that change anything regarding serialization?  "putCopy lst = contain $ do put (length lst)
14:46 < stepcut> mightybyte: we would need to change the IxSet instance as well so that it used put (size ixset)
14:48 < mightybyte> Ok
14:54 < mightybyte> Oejet: I'm not seeing how size' is really any different from size.
14:58 < Oejet> mightybyte: It is not really. Only in the case: size' (ISet lst) = length lst, where it avoid allocating the set.
14:59 < Oejet> *avoids
14:59 < mightybyte> Oh, right.  But it doesn't seem like that case happens very often.
15:10 < Oejet> mightybyte: In your memory profiles can you identify the intermediate set[1], which is at once converted to a list? putCopy (IxSet is) = contain . safePut . Set.toList .[1] toSet $ is
15:13 < mightybyte> Hmmm, I don't think so.
15:16 < Oejet> I would expect the set to be at least as big as the list.
15:25 < mightybyte> Ok, how does this look?
15:25 < mightybyte> http://moonpatio.com/fastcgi/hpaste.fcgi/view?id=3256#a3257
15:27 < stepcut> mightybyte: seems right, does it work?
15:27 < mightybyte> Haven't tried it yet.
15:27 < stepcut> instead of toList you could use toSet
15:27 < stepcut> nm.
15:28 < mightybyte> forM_ requires a list
15:28 < stepcut> right, I got mixed up temporarily
15:28 < mightybyte> Oh, unless there's a Foldable instance
15:29 < stepcut> mightybyte: there is
15:29 < stepcut> instance Foldable Set -- Defined in Data.Set
15:29 < mightybyte> Hmmm, that would probably be better.
15:37 < mightybyte> Weird, the maximum residency and total memory in use reported by -s did not change at all
15:38 < mightybyte> And the new compiled binary is exactly the same size.  That doesn't seem quite right.
15:39 < mightybyte> Oh, I guess I have to rebuild all of happstack--can't get by with only building -ixset.
15:52 < mightybyte> Crap, that change gave me a "wrong serialization type" error.
15:54 < mightybyte> I guess it needs a migration.
15:55 < mightybyte> But how do you do a migration when only the serialization format (but not the underlying type) hasn changed?
15:56 < stepcut> it should not need migration
15:56 < mightybyte> Apparently it's no longer getting boxed in a [] constructor.
15:56 < stepcut> it ?
15:57 < mightybyte> the serialized representation
15:57 < stepcut> it seems like the change you made should have only changed the way you calculate the length, but that the bytes you write out should be exactly the same...
15:58 < mightybyte> That's what I thought, but I think some type information is also written.
15:58 < stepcut> not that I know of
15:59  * stepcut thinks
16:01 < stepcut> how come you have:
16:02 < stepcut> forM_ (toList ixset) safePut
16:02 < stepcut> instead of
16:02 < stepcut>              getSafePut >>= forM_ (toList ixset)
16:02 < mightybyte> Because getSafePut isn't exported.
16:02 < stepcut> ah
16:02 < mightybyte> :(
16:03 < stepcut> maybe it doesn't matter
16:11 < stepcut> seems to work for me
16:12 < mightybyte> Hmmmm
16:14 < stepcut> http://moonpatio.com/fastcgi/hpaste.fcgi/view?id=3256#a3258
16:14 < mightybyte> Not for me.  I just tried it again.
16:15 < stepcut> if you revert to the old instance does it work?
16:15 < mightybyte> Yeah
16:15 < mightybyte> I'm testing it on my actual app.
16:15 < stepcut> mysterious
16:16 < mightybyte> Let me try your test program.
16:28 < mightybyte> I wonder if it's an extra contain $ that gets put around the lists.
16:45 < stepcut> maybe (size ixset) is returned a different length than (length $ toList ixset) ?
16:45 < mightybyte> Hmmm, possibly.  I'm investigating to see if there was something other than this code change that could be causing the mismatch.
16:46 < stepcut> yeah
16:54 < mightybyte> How do you handle compiling from several different development codebases in different projects that depend on each other?
16:54 < mightybyte> ...when the code you want to compile from isn't in hackage, it seems like cabal doesn't really help you.
16:55 < gwern> cabal-install works fine in a source dir
16:56 < mightybyte> Yeah, but you still have to manually compile dependencies that you are working with locally.
16:57 < mightybyte> So at my base level I've got happstack.
16:57 < gwern> well sure. how could it be smart enough to track down random deps on your file system?
16:57 < mightybyte> On top of that, I've got happstack-auth and happstack-facebook
16:57 < gwern> any attempt to make it that smart would probably backfire quite nastily a non-insingificant amount of the time
16:57 < mightybyte> Right, I guess that just means I need some manually created build scripts.
16:57 < gwern> if you really need to recompile that often, then perhaps one shouldn't have separate packages
16:58 < mightybyte> Well, right now I'm testing some mods to happstack, but I'm using my app as a test.
17:01 < stepcut> mightybyte: I use the autobuilder
17:02 < stepcut> mightybyte: though, it's not very fast to do things that way
17:02 < stepcut> mightybyte: sometimes I just do lots of -i../../happstack-facebook, etc
17:12 < mightybyte> It looks like nothing else is causing the mismatch.
17:13 < stepcut> mysterious
17:14 < mightybyte> I was wondering if there was an extra contain in the old version.
17:17 < stepcut> This is the format of a Serialized IxSet using the normal code:
17:17 < stepcut> *I> unpack (serialize (fromList [99::Int]))
17:17 < stepcut> [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,99]
17:18 < stepcut> it's three Ints, 0, 1, 99
17:18 < stepcut> the first Int is the version of the IxSet (version 0), the second is the number of elements in the ixset, 1, and then the elements of the list
17:20 < stepcut> the new code outputs the same thing
17:20 < stepcut> as far as I can tell
17:21 < mightybyte> Maybe the problem is further down.
17:21 < stepcut> further down?
17:22 < stepcut> so, your code is failing to restore, so it is the deserialization code that is failing, right ?
17:22 < mightybyte> Maybe into Happstack-State, although it wouldn't seem so.
17:22 < mightybyte> Yeah
17:26 < mightybyte> Ok, check this out
17:26 < mightybyte> I modified Usage.hs in the IxSet package.
17:27 < mightybyte> http://moonpatio.com/fastcgi/hpaste.fcgi/view?id=3256#a3259
17:28 < mightybyte> When I use that test to generate a checkpoint using the old serialization, and then generate another checkpoint using the new serialization, the two checkpoint files differ.
17:28 < stepcut> they shouldn't :(
17:31 < mightybyte> I just pasted a hex dump of the two checkpoints to that same link
17:31 < mightybyte> 3 bytes are different.
17:31 < mightybyte> The difference is on the 000050 line
17:32 < stepcut> yeah
17:32 < stepcut> I get the same results
17:33 < mightybyte> That's what I meant by "further down" :)
17:34 < Lemmih> Are you on a 32bit box?
17:34 < mightybyte> No, 64bit
17:36 < Lemmih> Ah.
17:36 < Lemmih> No, that can't be it.
17:37 < Lemmih> Well, it's worth a try.
17:37 < mightybyte> Heh
17:37 < mightybyte> Now I've got no idea where to look.
17:37 < Lemmih> Have you looked at the Serialize instance for [a]?
17:37 < Lemmih> It looks like this:
17:37 < mightybyte> yes
17:37 < Lemmih> Oh, ok.
17:38 < mightybyte> I was looking at it when I wrote the new IxSet instance
17:38 < Lemmih> Then why didn't you write the Serialize instance for IxSet to match?
17:38 < mightybyte> Because those functions aren't exported.
17:38 < Lemmih> Ah.
17:38 < mightybyte> And stepcut's test passes.
17:38 < Lemmih> But they're kinda important.
17:40 < Lemmih> 'forM_ list safePut' will save the version number for each element.
17:40 < mightybyte> The one here: http://moonpatio.com/fastcgi/hpaste.fcgi/view?id=3256#a3258
17:40 < Lemmih> 'safePut list' will only save the version number once.
17:40 < mightybyte> The two checkpoint files are the same size.
17:41 < Lemmih> Yeah, hence my "No, that can't be it".
17:41 < Lemmih> But it's still worth a try. Just to make it less wrong.
17:42 < stepcut> The serialized IxSet itself does not start until byte 75 I believe
17:42 < stepcut> I forget what comes before that
17:42 < stepcut> esh
17:44 < Lemmih> Have you tried serializing the set as a list?
17:46 < Lemmih> (Just to verify some basic assumptions.)
17:50 < stepcut> Lemmih: the IxSet part of the checkpoint seems fine
17:51 < stepcut> oh, I know what those different bytes might be
17:51 < Lemmih> Are you absolutely sure?
17:51 < stepcut> Lemmih: the serialized IxSet does not start until after the different bytes. The different bytes are in the serialized context...
17:52 < stepcut> Lemmih: I think the differing bytes might be the random number seed...
17:52 < Lemmih> Ah, excellent. So they /are/ identical.
17:53 < stepcut> Lemmih: yeah, I am currently believing that the checkpoint files are identical except for the random number seed, which would make sense
17:53 < Lemmih> (Trying to serialize the set as a list would have shown this, btw)
17:53 < Lemmih> stepcut: Does it also work for versioned data? Tuples, chars and ints are unversioned primitives.
17:54 < stepcut> Lemmih: good question
17:55 < mightybyte> What's the random number seed?
17:56 < stepcut> Lemmih: seems like versioned types might be busted
17:57 < Lemmih> mightybyte: Each event has an ID number, a random seed and a timestamp associated with it.
17:57 < stepcut> mightybyte: as you know, you can use IO inside query or update events. Because otherwise you would not be able to replay the events
17:57 < mightybyte> Lemmih: What do you mean by trying to serialize as a list?  Isn't that what it was doing before?
17:57 < mightybyte> Ahhh
17:58 < stepcut> mightybyte: but, the event system provides its own pseudo-random number generator (by storing the random seed in the context), since it's own pseudo-random *and* the seed is known, it can generate the same random numbers when it replays the events
17:58 < stepcut> it's a little known feature of the event system
17:59 < mightybyte> Ok
17:59 < Lemmih> mightybyte: Exactly. If unrolling the code change doesn't solve the error then the code change isn't the cause of the said error.
17:59 < stepcut> Happstack.State.Util contains getRandom and getRandomR which use the seed in the context to generate random numbers
18:00 < mightybyte> Lemmih: I did that...on my actual app, not these small test cases.
18:01 < stepcut> ok, I added a new version of my test program that uses an IxSet Foo instead of IxSet Int, and it fails
18:02 < Lemmih> happstack-data should probably export getSafeGet and getSafePut.
18:02 < stepcut> Lemmih: yeah, I think that is probably the fix
18:02 < Lemmih> They could use some documentation as well.
18:03 < stepcut> with the old code the serialized code is 51 bytes, with the new code it is 67
18:03 < mightybyte> Yeah, that makes sense
18:03 < mightybyte> Ahhhh
18:06 < mightybyte> Is that the one you just pasted?
18:07 < stepcut> yeah
18:07 < stepcut> it adds Foo
18:09 < stepcut> the normal instance, only encodes the version number of Foo once, and assumes all elements in a list will be the same version number
18:09 < mightybyte> Ok yeah, it fails for me too.
18:09 < stepcut> the new instance encodes the version number along with each Foo value
18:09 < mightybyte> It wasn't failing at first, but that's because I had the new ixset code installed.
18:10 < stepcut> yeah, i think the new code is self-consistent, but use a different format than the old code. I would export getSafeGet and getSafePut so that you can match the formats
18:11 < mightybyte> ...working on that...
18:11 < stepcut> looking at the code now it is obvious. getSafePut does an explicit, B.put vs
18:12 < mightybyte> Yeah, I originally wanted to do that when I was writing it, but was put off by the lack of an export.
18:13 < stepcut> seems necessary and useful to export it. I don't see a downside yet..
18:13 < mightybyte> Yeah, I agree.
18:13 < stepcut> Lemmih is making a crazy face at me...
18:13 < mightybyte> Heh
18:14 < stepcut> facebook used their psychic powers and suddenly decided that we should be facebook friends..
18:15 < stepcut> oh wait
18:15 < stepcut> I got that wrong
18:15 < stepcut> Lennart Augustsson is making the face at me
18:15  * Lemmih grins.
18:24 < mightybyte> Ok, it looks like this version works
18:24 < mightybyte> http://moonpatio.com/fastcgi/hpaste.fcgi/view?id=3256#a3262
18:24 < mightybyte> At least it passes stepcut's test.
18:33 < mightybyte> Hah, so after all that, my app's memory usage goes *up* with the new version.
18:33 < mightybyte> (although not much)
18:33 < Lemmih> Heh.
18:37 < stepcut> so, is, (replicateM n safeGet), going to force the entire list into memory before fromList gets a chance to turn it into an IxSet?
18:39 < Lemmih> I don't think so.
18:40 < mightybyte> How would we go about handling the migration if we wanted to switch to read and show for IxSet serialization?
18:41 < Lemmih> Why would we want that?
18:43 < Lemmih> Shouldn't be difficult. IxSet is anormal versioned data-type. I just don't see the benefit.
18:43 < mightybyte> Mainly to convince myself of what's going on by trying something we know is constant space.
18:43 < mightybyte> No concrete benefit unless it turns out that it does yield a space decrease that the approach I just tried doesn't get.
18:44 < stepcut> I wonder if we could use generics to calculate the size of the data-structure in memory
18:44 < mightybyte> THAT is what I would really like. :)
18:44 < stepcut> gize?
18:45 < stepcut> gsize ?
18:45 < stepcut> something like gsize, but that actually reports the number of bytes used
18:47 < Lemmih> I wouldn't get my hopes up.
18:47 < mightybyte> Hmm, does gsize give number of words?
18:48 < mightybyte> gsize (5::Int) = 1
18:48 < mightybyte> gsize ([1..10]::[Int]) = 21
18:48 < stepcut> gsize gives the number of constructors I think
18:48 < stepcut> gsize :: Data a => a -> Int
18:48 < stepcut> gsize t = 1 + sum (gmapQ gsize t)
18:52 < stepcut> I don't see how to do it using just normal Generics
18:52 < mightybyte> No sizeof function? :)
18:52 < stepcut> but, we could extend Serialize/deriveSerialize perhaps
18:53 < stepcut> there is a sizeOf function in Foreign.Marshal, but you can use that inside gsize
18:54 < stepcut> also, I don't think that sizeOf returns the value we want anyway
18:58 < stepcut> gsize * 4 might be a reasonable estimate though
19:00 < mightybyte> Yeah, it actually might...since it drills all the way down through all constructed types.
19:02 < mightybyte> Or maybe *8 on 64bit machines.
19:03 < stepcut> yeah
19:05 < mightybyte> Hmmm, Char8 ByteStrings always have a gsize of 4
19:21 < stepcut> mightybyte: yeah, it's just a hack :(
19:22 < stepcut> really need something more like this, http://hpaste.org/fastcgi/hpaste.fcgi/view?id=8065#a8065
19:22 < stepcut> except correct ;)
19:22 < mightybyte> Yeah
19:22 < mightybyte> Seems like ghc should have some infrastructure like that.
19:23 < mightybyte> I just asked in #ghc, but it seems to be pretty empty.
19:23 < stepcut> yeah, probably have to ask haskell-cafe or something
19:24 < mightybyte> I don't read haskell-cafe very much.
20:31 < mae_phone> hello
--- Log closed Mon Aug 10 00:00:56 2009