IRC logs for #openrisc Monday, 2013-05-13

--- Log opened Mon May 13 00:00:56 2013
-!- Netsplit *.net <-> *.split quits: zhai36508:27
-!- Netsplit *.net <-> *.split quits: zhai36508:36
-!- Netsplit over, joins: zhai36508:37
-!- Netsplit *.net <-> *.split quits: mboehnert11:21
-!- Netsplit over, joins: mboehnert11:25
--- Log closed Mon May 13 12:11:55 2013
--- Log opened Mon May 13 12:12:09 2013
-!- Irssi: #openrisc: Total of 22 nicks [0 ops, 0 halfops, 0 voices, 22 normal]12:12
-!- Irssi: Join to #openrisc was synced in 18 secs12:12
stekernok, that branch prediction bug is at least fixed now...13:21
stekernstill no luck with the nfsboot13:22
--- Log closed Mon May 13 13:44:07 2013
--- Log opened Mon May 13 13:44:22 2013
-!- Irssi: #openrisc: Total of 21 nicks [0 ops, 0 halfops, 0 voices, 21 normal]13:44
-!- Irssi: Join to #openrisc was synced in 15 secs13:44
-!- Netsplit *.net <-> *.split quits: hno13:59
-!- Logxen- is now known as Logxen14:30
--- Log closed Mon May 13 17:13:42 2013
--- Log opened Mon May 13 17:13:57 2013
-!- Irssi: #openrisc: Total of 22 nicks [0 ops, 0 halfops, 0 voices, 22 normal]17:13
!leguin.freenode.net [freenode-info] help freenode weed out clonebots -- please register your IRC nick and auto-identify: http://freenode.net/faq.shtml#nicksetup17:13
-!- Irssi: Join to #openrisc was synced in 19 secs17:14
-!- Netsplit *.net <-> *.split quits: juliusb, trem17:18
-!- Netsplit over, joins: trem17:21
-!- Netsplit *.net <-> *.split quits: larks17:43
juliusb_man this has been netsplitting badly lately19:55
stekernI'm netsplitting by kicking the ethernet switch power plug out of it's socket20:01
stekern...and as I said, the nfsboot problem was something really silly... 'rw' is not a valid option20:03
juliusb_ahh is that all it was?? :P20:32
stekernyup :/20:35
stekernbut branch prediction seems stable now at least, and with the nfs root working, I can more easily test some more serious stuff than just running 'top' ;)20:38
juliusb_hardcore20:45
juliusb_that's some serious work20:45
juliusb_I haven't had much tiem lately :( still struggling to get my little flops cache working20:46
juliusb_so many annoying corner cases20:46
juliusb_it works with like 2,4,8 instructions but 16 it breaks20:46
juliusb_on eceptiosn20:46
juliusb_exceptions20:46
juliusb_so your stuff with branch prediction is faster?20:46
stekernfaster than stalling on l.sfxx; l.b(n)f, yes20:47
stekernslower than resolving the branch completely in decode20:48
stekernthis is still completely based solely on coremark though20:48
stekernthe numbers are roughly: 80 for stall on sf;bf, 90 for resolving in decode and 87 with branch prediction20:50
juliusb_ah cool20:52
juliusb_that's pretty close to resolving in decode!20:52
stekernit's just a simple static backwards taken, forward not, and the prediction is done in decode and then the real resolving is in execute20:53
juliusb_ah right20:54
stekernand I implemented it as 'flag prediction', so in execute you just check if the real flag is equal to the predicted flag20:54
juliusb_is it a lot more logic?20:54
stekernno, it's pretty simple logic and I haven't even tried to optimize it yet20:55
juliusb_cool20:56
stekernthe actual prediction logic is very simple, just compare the msb in the imm field20:56
stekern(and check if it's a bf or bnf to get the predicted flag)20:57
stekernthe control logic in the fetcher is a bit hairier20:57
juliusb_is it not complex and annoying to cancel the fetch that's wrong?20:57
juliusb_yeah, that was my guess20:57
stekernbasically, I'm looking if there is a cond branch in decode stage and if so and the mispredicted signal goes high, then gate all signals out to decode21:00
stekernthat's about it, but of course there were a lot of cases I had forgot about ;)21:00
stekernthe last bug was that I forgot to gate the immu exceptions21:01
juliusb_yeah, i nkow what you mean, the major portion of the update are there, it's just neatening around the edges21:02
stekerneverything is in three (messy) commits here: https://github.com/skristiansson/mor1kx/commits/master21:03
stekernespresso fails this test btw: http://oompa.chokladfabriken.org/tmp/or1k-sfbf.S21:07
stekernI might take closer look at it, but if I forget =P21:07
juliusb_nps I should check it out21:08
juliusb_can you submit ap ull request to add that test to mor1kx-dev-env?21:08
stekernyeah, I will21:09
stekernI think there is one pending too21:10
stekern(pull request)21:10
juliusb_hmmm, I'd better check that21:15
juliusb_you're right21:16
juliusb_I don't like to say it... but maybe I just made my flop cache finally work?!21:18
stekernthat test is in that pull request too now21:18
juliusb_(regressino still running)21:18
juliusb_great, thanks man21:18
stekernheh, famous last words(?)21:18
juliusb_It's so annoying, in the end because of the way the pipeline is basically made expecting delays on the bus when branches occur, I had to insert delays into the logic to handle the feeding of instructions out of the cache into the pipeline21:19
juliusb_feels kinda stupid21:19
juliusb_ic ould feed that instructions into it immediately, but no, somewhere somehow it'll break21:19
juliusb_really needs a rewrite for the fast-as-possible pipeline21:19
juliusb_you know what it is stekern , it's some of your adopted homeland's spirit. I'm listening to Sibelius' Finlandia :)21:20
juliusb_There's something bout getting OpenRISC working and Finland21:20
juliusb_I will keep it on loop while the regression runs21:21
juliusb_:)21:21
juliusb_to be hoenst, I was thinking of ditching espresso21:21
juliusb_also21:21
juliusb_I mean, if you want the super smallest implementation, you might as well just go for pronto21:22
juliusb_maybe it's not hard to keep it21:22
juliusb_it's just more work21:22
stekernare they really that different?21:25
juliusb_not really, but getting more different21:25
juliusb_I just don't know who would use espresso over pronto21:25
stekernI've been thinking about if the delay slot perhaps could just be a parameter21:26
stekernin cappuccino21:26
juliusb_oh cool :)21:26
stekernI mean there's not _that_ much delay slot specific stuff, it's in the fetcher and the exceptions21:27
stekernI think I wait until the it's boringly stable with the delay slot before I set out on that though21:28
stekern-the21:28
juliusb_good plan21:28
stekernbut right now there's only one bug I know of, setting caches to 16kb on the atlys board makes it act up21:30
stekernoh, I almost forgot, I've been playing with cappuccino and milkymist again21:31
juliusb_oh yes?21:32
juliusb_an even milkier coffee?21:32
juliusb_:)21:33
stekernwith the set flag critical path cut away, we're able to run at 83 Mhz (that's what the milkymist-ng soc runs at) and we get about the same results in coremark as lm3221:33
stekernwith the branch prediction in place21:33
stekernlm32: http://pastie.org/782126421:33
stekerncappuccino: http://pastie.org/782743821:34
stekernto be fair, I ran with it compiled with gcc 4.5.1 too: http://pastie.org/782760121:35
stekern-with21:35
stekernwe're about 1200 LUTs larger than lm32 in a comparable setup though :/21:36
juliusb_hmm interesting21:42
juliusb_so cappuccino performance is the same then?21:44
juliusb_im surprised lm32 is that good, I thought it was about the same as the OR120021:44
stekernin coremark at least, yes21:45
juliusb_hah, I'm just going to do some area comparisons for my new pronto design, and it's so nice running this quartus tool chain, it's so quick21:46
juliusb_in contrast, this afternoon, I kicked off just the backend for a Virtex 7 build (synthesis was done couple days ago, that takes about 8 hours)21:47
stekernat least as long as you keep your fingers away from the speed optimisation buttons ;)21:47
juliusb_So placement took about 3 hours, now it's trying to route21:49
stekern(same as or1200) naah, I've known all along that it's a good implementation21:49
juliusb_in an hour it's already done an initial route, and has just done a "Rip-and and Reroute"21:50
juliusb_these designs are beasts, these smaller oens are so much more manageable :)21:50
juliusb_oh yeah, why do you think the lm32 is good, just good ISA?21:50
stekernno, the isa is pretty much the same as or1k21:51
juliusb_so the implementation is simply better then?21:51
stekernthan or1200? is that so hard? ;)21:52
juliusb_no, not at all haha21:52
juliusb_well, it's not that bad really21:52
juliusb_in terms of performance, maybe a bit big, but definitely not nice to understand what's going on21:52
stekernthey do a trick with the multiplier that we might...ehrm...lend21:54
juliusb_oh really?21:54
juliusb_btw how big is cappuccino in LC again? you want to get it under 4k ideally right?21:54
stekernaround 480021:55
juliusb_pronto with serial multiply, shifter, is 3479LC, 1317FF21:56
stekern(mul) when you have the three-stage mul, instead of stalling execute stage, write the pipelined result in to the rf in wb stage21:57
stekernand interlock if instructions after the mul needs the result21:57
stekernisn't threestage mul smaller than serial on cyclone iv?21:58
juliusb_oh, not sure!21:58
juliusb_coudl be21:58
stekern14 LC it claims to occupy here21:59
juliusb_we need some automated build system which does runs with a bunch of different parameters on each technology to see what the implementation stats are21:59
stekernyup, and run the tests against that21:59
juliusb_indeed :)21:59
stekerntoo21:59
stekernshouldn't be hard to just generate a parameter.v file that you just include in the top file22:01
juliusb_ya22:01
juliusb_I'll get around to it one of these days22:01
stekernI got to take a look at migen when I was playing with milkymist too, seems pretty handy for soc generation, not sure I'd write cores in it22:05
stekernhttps://github.com/skristiansson/milkymist-ng-mor1kx/blob/master/milkymist/mor1kx/__init__.py22:05
stekernmor1kx instantiated in it22:05
juliusb_oh cool22:05
juliusb_yeah, fair enough :)22:06
juliusb_Python is the right language to do this in22:06
juliusb_you're going to call me a liar if I say this, but I think that for some reason, pronto-espresso mor1kx is _smaller_ with the flop-cache enabled?!22:07
juliusb_so 3479LC,1317FF with it disabled22:07
stekernare you looking at fitter or synthesis result?22:08
juliusb_Then 3283LC,1081FF with 4-word flop cache22:08
juliusb_umm, the map.rpt file22:08
juliusb_.map.rpt file22:08
juliusb_I must be doing something really wrong, because with the cache supposedly disabled, there's 588 flops in there, but with it enabled it's less ?!22:09
juliusb_ahhh I know why22:09
juliusb_whoops22:09
juliusb_I was supposed to set the parameter to "DISABLED" not "NONE"22:10
juliusb_:P22:10
stekern;)22:10
stekernmy numbers have been from the .fit.rpt file22:10
juliusb_ahh right ok22:10
juliusb_I'll check those instead22:10
stekernfor comparison between builds, it probably doesn't matter22:11
juliusb_cool well the regression are passing for this flop cache guy, I think I'll do some comitting this weekend, very busy rest of week :(22:11
juliusb_and want to test it on the board before I put it in there22:11
stekernand in the map file you don't get as much odd results from optimisations22:12
stekernI think22:12
stekernanyways, time for bed, now when everything is working ;)22:12
juliusb_ok, I'll reference fit results in future22:13
stekerntomorrow is a new day to break things22:13
juliusb_me too, night!22:14
juliusb_ah that looks better, pronto with no flopcache is 2991LC,762FF (from fit.rpt)22:15
juliusb_8 instruction flopcache takes us up to  3555LC, 1246FF22:20
juliusb_(fetch went from 319LC, 104FF to 939LC, 588FF)22:21
--- Log closed Tue May 14 00:00:57 2013

Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!