IRC logs for #openrisc Monday, 2013-05-13

--- Log opened Mon May 13 00:00:56 2013
-!- Netsplit .net <-> .split quits: zhai365		08:27
-!- Netsplit .net <-> .split quits: zhai365		08:36
-!- Netsplit over, joins: zhai365		08:37
-!- Netsplit .net <-> .split quits: mboehnert		11:21
-!- Netsplit over, joins: mboehnert		11:25
--- Log closed Mon May 13 12:11:55 2013
--- Log opened Mon May 13 12:12:09 2013
-!- Irssi: #openrisc: Total of 22 nicks [0 ops, 0 halfops, 0 voices, 22 normal]		12:12
-!- Irssi: Join to #openrisc was synced in 18 secs		12:12
stekern	ok, that branch prediction bug is at least fixed now...	13:21
stekern	still no luck with the nfsboot	13:22
--- Log closed Mon May 13 13:44:07 2013
--- Log opened Mon May 13 13:44:22 2013
-!- Irssi: #openrisc: Total of 21 nicks [0 ops, 0 halfops, 0 voices, 21 normal]		13:44
-!- Irssi: Join to #openrisc was synced in 15 secs		13:44
-!- Netsplit .net <-> .split quits: hno		13:59
-!- Logxen- is now known as Logxen		14:30
--- Log closed Mon May 13 17:13:42 2013
--- Log opened Mon May 13 17:13:57 2013
-!- Irssi: #openrisc: Total of 22 nicks [0 ops, 0 halfops, 0 voices, 22 normal]		17:13
!leguin.freenode.net [freenode-info] help freenode weed out clonebots -- please register your IRC nick and auto-identify: http://freenode.net/faq.shtml#nicksetup		17:13
-!- Irssi: Join to #openrisc was synced in 19 secs		17:14
-!- Netsplit .net <-> .split quits: juliusb, trem		17:18
-!- Netsplit over, joins: trem		17:21
-!- Netsplit .net <-> .split quits: larks		17:43
juliusb_	man this has been netsplitting badly lately	19:55
stekern	I'm netsplitting by kicking the ethernet switch power plug out of it's socket	20:01
stekern	...and as I said, the nfsboot problem was something really silly... 'rw' is not a valid option	20:03
juliusb_	ahh is that all it was?? :P	20:32
stekern	yup :/	20:35
stekern	but branch prediction seems stable now at least, and with the nfs root working, I can more easily test some more serious stuff than just running 'top' ;)	20:38
juliusb_	hardcore	20:45
juliusb_	that's some serious work	20:45
juliusb_	I haven't had much tiem lately :( still struggling to get my little flops cache working	20:46
juliusb_	so many annoying corner cases	20:46
juliusb_	it works with like 2,4,8 instructions but 16 it breaks	20:46
juliusb_	on eceptiosn	20:46
juliusb_	exceptions	20:46
juliusb_	so your stuff with branch prediction is faster?	20:46
stekern	faster than stalling on l.sfxx; l.b(n)f, yes	20:47
stekern	slower than resolving the branch completely in decode	20:48
stekern	this is still completely based solely on coremark though	20:48
stekern	the numbers are roughly: 80 for stall on sf;bf, 90 for resolving in decode and 87 with branch prediction	20:50
juliusb_	ah cool	20:52
juliusb_	that's pretty close to resolving in decode!	20:52
stekern	it's just a simple static backwards taken, forward not, and the prediction is done in decode and then the real resolving is in execute	20:53
juliusb_	ah right	20:54
stekern	and I implemented it as 'flag prediction', so in execute you just check if the real flag is equal to the predicted flag	20:54
juliusb_	is it a lot more logic?	20:54
stekern	no, it's pretty simple logic and I haven't even tried to optimize it yet	20:55
juliusb_	cool	20:56
stekern	the actual prediction logic is very simple, just compare the msb in the imm field	20:56
stekern	(and check if it's a bf or bnf to get the predicted flag)	20:57
stekern	the control logic in the fetcher is a bit hairier	20:57
juliusb_	is it not complex and annoying to cancel the fetch that's wrong?	20:57
juliusb_	yeah, that was my guess	20:57
stekern	basically, I'm looking if there is a cond branch in decode stage and if so and the mispredicted signal goes high, then gate all signals out to decode	21:00
stekern	that's about it, but of course there were a lot of cases I had forgot about ;)	21:00
stekern	the last bug was that I forgot to gate the immu exceptions	21:01
juliusb_	yeah, i nkow what you mean, the major portion of the update are there, it's just neatening around the edges	21:02
stekern	everything is in three (messy) commits here: https://github.com/skristiansson/mor1kx/commits/master	21:03
stekern	espresso fails this test btw: http://oompa.chokladfabriken.org/tmp/or1k-sfbf.S	21:07
stekern	I might take closer look at it, but if I forget =P	21:07
juliusb_	nps I should check it out	21:08
juliusb_	can you submit ap ull request to add that test to mor1kx-dev-env?	21:08
stekern	yeah, I will	21:09
stekern	I think there is one pending too	21:10
stekern	(pull request)	21:10
juliusb_	hmmm, I'd better check that	21:15
juliusb_	you're right	21:16
juliusb_	I don't like to say it... but maybe I just made my flop cache finally work?!	21:18
stekern	that test is in that pull request too now	21:18
juliusb_	(regressino still running)	21:18
juliusb_	great, thanks man	21:18
stekern	heh, famous last words(?)	21:18
juliusb_	It's so annoying, in the end because of the way the pipeline is basically made expecting delays on the bus when branches occur, I had to insert delays into the logic to handle the feeding of instructions out of the cache into the pipeline	21:19
juliusb_	feels kinda stupid	21:19
juliusb_	ic ould feed that instructions into it immediately, but no, somewhere somehow it'll break	21:19
juliusb_	really needs a rewrite for the fast-as-possible pipeline	21:19
juliusb_	you know what it is stekern , it's some of your adopted homeland's spirit. I'm listening to Sibelius' Finlandia :)	21:20
juliusb_	There's something bout getting OpenRISC working and Finland	21:20
juliusb_	I will keep it on loop while the regression runs	21:21
juliusb_	:)	21:21
juliusb_	to be hoenst, I was thinking of ditching espresso	21:21
juliusb_	also	21:21
juliusb_	I mean, if you want the super smallest implementation, you might as well just go for pronto	21:22
juliusb_	maybe it's not hard to keep it	21:22
juliusb_	it's just more work	21:22
stekern	are they really that different?	21:25
juliusb_	not really, but getting more different	21:25
juliusb_	I just don't know who would use espresso over pronto	21:25
stekern	I've been thinking about if the delay slot perhaps could just be a parameter	21:26
stekern	in cappuccino	21:26
juliusb_	oh cool :)	21:26
stekern	I mean there's not _that_ much delay slot specific stuff, it's in the fetcher and the exceptions	21:27
stekern	I think I wait until the it's boringly stable with the delay slot before I set out on that though	21:28
stekern	-the	21:28
juliusb_	good plan	21:28
stekern	but right now there's only one bug I know of, setting caches to 16kb on the atlys board makes it act up	21:30
stekern	oh, I almost forgot, I've been playing with cappuccino and milkymist again	21:31
juliusb_	oh yes?	21:32
juliusb_	an even milkier coffee?	21:32
juliusb_	:)	21:33
stekern	with the set flag critical path cut away, we're able to run at 83 Mhz (that's what the milkymist-ng soc runs at) and we get about the same results in coremark as lm32	21:33
stekern	with the branch prediction in place	21:33
stekern	lm32: http://pastie.org/7821264	21:33
stekern	cappuccino: http://pastie.org/7827438	21:34
stekern	to be fair, I ran with it compiled with gcc 4.5.1 too: http://pastie.org/7827601	21:35
stekern	-with	21:35
stekern	we're about 1200 LUTs larger than lm32 in a comparable setup though :/	21:36
juliusb_	hmm interesting	21:42
juliusb_	so cappuccino performance is the same then?	21:44
juliusb_	im surprised lm32 is that good, I thought it was about the same as the OR1200	21:44
stekern	in coremark at least, yes	21:45
juliusb_	hah, I'm just going to do some area comparisons for my new pronto design, and it's so nice running this quartus tool chain, it's so quick	21:46
juliusb_	in contrast, this afternoon, I kicked off just the backend for a Virtex 7 build (synthesis was done couple days ago, that takes about 8 hours)	21:47
stekern	at least as long as you keep your fingers away from the speed optimisation buttons ;)	21:47
juliusb_	So placement took about 3 hours, now it's trying to route	21:49
stekern	(same as or1200) naah, I've known all along that it's a good implementation	21:49
juliusb_	in an hour it's already done an initial route, and has just done a "Rip-and and Reroute"	21:50
juliusb_	these designs are beasts, these smaller oens are so much more manageable :)	21:50
juliusb_	oh yeah, why do you think the lm32 is good, just good ISA?	21:50
stekern	no, the isa is pretty much the same as or1k	21:51
juliusb_	so the implementation is simply better then?	21:51
stekern	than or1200? is that so hard? ;)	21:52
juliusb_	no, not at all haha	21:52
juliusb_	well, it's not that bad really	21:52
juliusb_	in terms of performance, maybe a bit big, but definitely not nice to understand what's going on	21:52
stekern	they do a trick with the multiplier that we might...ehrm...lend	21:54
juliusb_	oh really?	21:54
juliusb_	btw how big is cappuccino in LC again? you want to get it under 4k ideally right?	21:54
stekern	around 4800	21:55
juliusb_	pronto with serial multiply, shifter, is 3479LC, 1317FF	21:56
stekern	(mul) when you have the three-stage mul, instead of stalling execute stage, write the pipelined result in to the rf in wb stage	21:57
stekern	and interlock if instructions after the mul needs the result	21:57
stekern	isn't threestage mul smaller than serial on cyclone iv?	21:58
juliusb_	oh, not sure!	21:58
juliusb_	coudl be	21:58
stekern	14 LC it claims to occupy here	21:59
juliusb_	we need some automated build system which does runs with a bunch of different parameters on each technology to see what the implementation stats are	21:59
stekern	yup, and run the tests against that	21:59
juliusb_	indeed :)	21:59
stekern	too	21:59
stekern	shouldn't be hard to just generate a parameter.v file that you just include in the top file	22:01
juliusb_	ya	22:01
juliusb_	I'll get around to it one of these days	22:01
stekern	I got to take a look at migen when I was playing with milkymist too, seems pretty handy for soc generation, not sure I'd write cores in it	22:05
stekern	https://github.com/skristiansson/milkymist-ng-mor1kx/blob/master/milkymist/mor1kx/__init__.py	22:05
stekern	mor1kx instantiated in it	22:05
juliusb_	oh cool	22:05
juliusb_	yeah, fair enough :)	22:06
juliusb_	Python is the right language to do this in	22:06
juliusb_	you're going to call me a liar if I say this, but I think that for some reason, pronto-espresso mor1kx is _smaller_ with the flop-cache enabled?!	22:07
juliusb_	so 3479LC,1317FF with it disabled	22:07
stekern	are you looking at fitter or synthesis result?	22:08
juliusb_	Then 3283LC,1081FF with 4-word flop cache	22:08
juliusb_	umm, the map.rpt file	22:08
juliusb_	.map.rpt file	22:08
juliusb_	I must be doing something really wrong, because with the cache supposedly disabled, there's 588 flops in there, but with it enabled it's less ?!	22:09
juliusb_	ahhh I know why	22:09
juliusb_	whoops	22:09
juliusb_	I was supposed to set the parameter to "DISABLED" not "NONE"	22:10
juliusb_	:P	22:10
stekern	;)	22:10
stekern	my numbers have been from the .fit.rpt file	22:10
juliusb_	ahh right ok	22:10
juliusb_	I'll check those instead	22:10
stekern	for comparison between builds, it probably doesn't matter	22:11
juliusb_	cool well the regression are passing for this flop cache guy, I think I'll do some comitting this weekend, very busy rest of week :(	22:11
juliusb_	and want to test it on the board before I put it in there	22:11
stekern	and in the map file you don't get as much odd results from optimisations	22:12
stekern	I think	22:12
stekern	anyways, time for bed, now when everything is working ;)	22:12
juliusb_	ok, I'll reference fit results in future	22:13
stekern	tomorrow is a new day to break things	22:13
juliusb_	me too, night!	22:14
juliusb_	ah that looks better, pronto with no flopcache is 2991LC,762FF (from fit.rpt)	22:15
juliusb_	8 instruction flopcache takes us up to 3555LC, 1246FF	22:20
juliusb_	(fetch went from 319LC, 104FF to 939LC, 588FF)	22:21
--- Log closed Tue May 14 00:00:57 2013

Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!