IRC logs for #openrisc Wednesday, 2014-06-18

--- Log opened Wed Jun 18 00:00:39 2014
stekern	is there really no way to 'go to next/prev named marker' in gtkwave?	04:48
-!- Netsplit .net <-> .split quits: stekern, heroux		06:19
-!- Netsplit over, joins: heroux, stekern		06:24
_franck__	after a ctrl+c in or1ksim, can we continue the simulation where we stopped ?	07:47
wallento	stekern: nope, EDGE was default, created a PR to change this	08:22
mor1kx	[mor1kx] wallento opened pull request #13: PIC: Make LEVEL triggered default (master...master) https://github.com/openrisc/mor1kx/pull/13	08:22
wallento	bwah, whats that? :-D	08:22
olofk	wallento: You didn't know that stekern works for NSA and monitors all our activity?	08:27
olofk	stekern: I find it extremely awkward to navigate in gtkwave coming from modelsim	08:38
stekern	wallento: ok, it should be LEVEL I think - iow, will pull that in asap ;)	09:04
stekern	now when I start to think about it, what did we need the or1200-compliant version in mor1kx for in the first place?	09:06
mor1kx	[mor1kx] skristiansson closed pull request #13: PIC: Make LEVEL triggered default (master...master) https://github.com/openrisc/mor1kx/pull/13	09:09
stekern	I think it's time to release a mor1kx v2 soon	09:10
stekern	there's a lot of nice new feature in that, and I'd like to reserve multicore stuff for v3	09:12
stekern	hmm, I think I've fixed the bug in mor1kx for the profiling, but openocd is still unhappy. it claims the range is 0-0	09:19
stekern	_franck__: look what i found: http://sourceforge.net/p/openocd/code/ci/master/tree/src/target/target.c#l3641	09:29
_franck__	cool, we can have our own profile function: http://sourceforge.net/p/openocd/code/ci/master/tree/src/target/target.c#l1227	09:32
stekern	yup, that'd make much more sense, since afaik, it's spec-kosher to access SPRs over the DU port even when the cpu is running	09:34
stekern	(do you know if that works on or1200?)	09:34
stekern	does it work on mor1kx? ;)	09:34
_franck__	I don't know (for both)	09:35
stekern	I don't see why it shouldn't, but I have never tested	09:35
stekern	can you manually read npc when the target isn't halted in openocd?	09:35
_franck__	I don't think so	09:35
_franck__	Its says "target not halted"	09:35
stekern	right, that's what i remembered	09:36
_franck__	at least AFIR	09:36
stekern	anyway, I want to make mor1kx work with the halt/resume first anyway, since it's obviously a good way to expose stall related bugs	09:37
stekern	looks like this change isn't working: http://openocd.zylin.com/#/c/2168/2/src/target/openrisc/or1k.c,cm	09:48
stekern	because, I get all zero's when reading from 'pc' here: http://sourceforge.net/p/openocd/code/ci/master/tree/src/target/target.c#l1812	09:49
stekern	but if I change that to npc, I get sensible values	09:49
_franck__	I suggested this hack if order to not modify the non specific code (I didn't see we can have our own profiling)	09:56
_franck__	you can try this: https://github.com/fjullien/openOCD/commit/86ec95584aa239b0bb7fe8881a6582006487d31b	09:57
_franck__	(compile tested)	09:57
_franck__	s/if/in	09:57
stekern	ok, will do that when I've confirmed that my current changes work	09:58
stekern	*changes to mor1kx	09:58
stekern	seems like they do	09:58
stekern	...almost...	09:59
juliusb	olofk: I agree, coming from Cadence Simvision, too, the GTKWave UI seems poorly arranged, but it works	10:01
stekern	it works for a (longer) while, but then it ends up either in alignment or bus error exception	10:02
stekern	I'm more used with gtkwave, so navigating in modelsim is painful for me	10:03
juliusb	mor1kx v2 hey? does that mean we shuld drink a beer?	10:03
stekern	two, since it's v2! ;)	10:03
juliusb	oh, i'm looking forward to v3 already :)	10:03
stekern	yeah, I figured it might be a good motivator to do releases more often	10:04
juliusb	well, considering wallento and I will go and talk about it in a few weeks in Hamburg, it might be a good time	10:04
juliusb	( to do both the release and drink 2 beers!)	10:04
juliusb	pity you can't join stekern :-/	10:04
stekern	yup :(	10:05
juliusb	so I came across something interesting recently (despite it being around for about a year)	10:05
juliusb	you guys seen the RISC-V work?	10:05
juliusb	it appears somebody is doing the or2k work for us	10:05
stekern	yes, but I'm not sure it's so interesting	10:06
juliusb	just under a different name	10:06
juliusb	there's a lot of detail yet to be released I think, the 16-bit ISA for instance, but they go through and first of all point out why they didn't want to use OR1K to do the project, despite it being ideologically aligned	10:07
juliusb	it was only technical reasons	10:07
juliusb	so, it appears they've fixed those technical things (no branch delay slot, conditional things done only by comparing register values not flags)	10:07
juliusb	which, I'll grant you, isn't revolutionary, but they've basically taken OR1K, fixed it, and appear to be doing stuff	10:08
juliusb	I dunno, I just thought it was interesting in the light of recent OR2K discussion (OK, it consisted of just 3 posts)	10:09
stekern	yeah, but it still all boils down to yet-another-open-source-risc	10:11
stekern	by that logic, or1k isn't so interesting neither. and to be fair, as an ISA it isn't.	10:16
stekern	what imo opinion make or1k stand-out among the other yet-another-open-source-riscs is that the community is completely transparent, not driven by any entity, but still has quite strong software/toolchain support.	10:18
* juliusb agrees		10:18
juliusb	so it turns out that they got in touch with me to ask me exactly that - why or1k works	10:21
juliusb	it was a bit coincidental, I got in touch with the guy who's in charge of the doing the 16-bit ISA stuff and asked when it might come out and whether there's any Verilog in the open	10:21
juliusb	and then later that day I went out for a beer with some of the bods involved here in Cambridge who seemed to be wanting to understand the experience of the OpenRISC project because it's kind-of what they want to do	10:23
stekern	I think one of the keys are to scratch this kind of work-flow: "Initial versions of all of these have been developed or are under active development. This material is to be made available under open-source licenses."	10:25
stekern	it's the typical cathedral type of development that most open source software projects moved away from	10:25
stekern	(and that we certainly don't employ)	10:26
juliusb	I agree. They are a bit stuck though, and I think it's a hard place to be - they have a team of academics who are all very talanted, and they have money to do 28nm ASICs and their plan long term, I guess, is to release everything they can in the open to allow people to download the RTL of the chips they hope to eventually provide on low-cost dev boards	10:28
juliusb	but, as you say, it's a cathederal type development until that time where they do release everything	10:28
juliusb	it possibly could be open right now	10:28
juliusb	they weren't sure though whether they should go and be transparent from the beginning or whether after things have been achived, then they open it up	10:29
juliusb	they do have chips, actually, it's all pretty far along AFAICT	10:29
juliusb	but yes, it's just another mostly in-house thing which has no appeal to a collaborative community	10:30
juliusb	but what is in the open is a good spec	10:30
juliusb	and tool chains and sims	10:30
juliusb	and is, as far as I can tell, the spiritual successor to or1k	10:30
stekern	yeah, all open source hardware is of course a step forward	10:31
juliusb	so, you don't need their implementations - Beyond Semi are also doing OR1K-compliant implementations but who cares right? what they did was a base of good work on which some even better work was done	10:31
stekern	you're right, doing a semi-good implementation shouldn't be too hard, I did the eco32 in ~2 months	10:32
juliusb	what I'm thinking is that if they put out a good spec with the 16-bit ISA and then the MMU, cache, interrupt controller specs, and it fits, then that'd be pretty interesting	10:32
juliusb	because it then ticks all of the boxes of or2k	10:33
juliusb	a spec unencumbered by some bloody proprietary company which is competitive	10:33
juliusb	actually, one of the Cambridge guys involved did stuff with the Raspberry Pi, and I think maybe it's a response to everyone who said "ok, it's nice the board and software are open source, how about the chip now?"	10:34
juliusb	anyway, another architecture raises the discussino we had at orconf about diversion of a limited resource	10:35
stekern	right, but I guess if they wouldn't release it, their resources would have been tied to an in-house version	10:37
juliusb	??	10:39
juliusb	you mean if it's a proprietary spec then that's more boring than one where it's open and others can make better use of any product or chip those processors appear in?	10:40
stekern	I mean, chances are that they wouldn't come running contributing to the openrisc project if they wasn't doing their own open source cpu architecture	10:40
juliusb	ah sure	10:41
juliusb	well, they talk about why they didn't pick OR1K for this in the spec, but they wanted something along the same lines in terms of openness and applicability to industry and academia	10:42
juliusb	implementations are one thing, but tool chains, linux ports, etc. you'd be crazy to do your own one of those and not release that to the great wide world	10:43
stekern	yes, but it's still funny, they've already released that, why not the rtl at the same time?	10:52
stekern	I mean it's like it's some mentality that software code and rtl code should be treated differently for some reason	10:58
juliusb	actually, they have released RTL	11:04
juliusb	there's a synthesisable core written in some language called Chisel	11:04
juliusb	it wasn't obvious to me, but apparently another guy involved with it wrote a tool which converts it into Verilog or C++	11:05
juliusb	(Verilator getting left out of the action it seems hehe)	11:05
juliusb	but I'm not sure if that's the one they refer to as being synthesisable at 1.5 GHz and on par with an ARM Cortex A5	11:05
stekern	ah, ok	11:06
juliusb	referred to here: http://www.hotchips.org/wp-content/uploads/hc_archives/hc25/HC25-posters/HC25.26.p70-RISC-V-Warterman-UCB.pdf	11:06
juliusb	oh, I didn't find this the other day, it's the proposed 16-bit ISA extensions presented in the guy's masters: http://www.eecs.berkeley.edu/~krste/papers/waterman-ms.pdf	11:08
juliusb	some lunchtime reading :)	11:09
juliusb	i mean, this looks a lot like what we were doing for or2k	11:09
stekern	~krste, that looks like something that could have been my user name ;)	11:14
stekern	would be interesting to hear sb0's input on Chisel	11:20
stekern	since it's a contender to what he is doing	11:21
juliusb	ya, I can imagine so	12:22
juliusb	so one other thing I mentioned to the RISC-V guys at that meeting was FuseSoC and that it's the panacea for all your open source hardware collation needs (the paid version, at least)	12:24
juliusb	I urged them not to re-invent the wheel there	12:25
stekern	ah, that'd be nice if they'd use that	12:27
olofk	Yes, a new ISA is one thing, but it would be nice if people got better at reusing the other stuff	13:54
olofk	Will they use wishbone, AXI or something completely different?	13:54
juliusb	Not sure. AXI maybe? They seemed to think the spec was freely available for use	14:04
olofk	Would be interesting to know. AXI is probably a wiser choice for a modern arch	14:05
olofk	About chisel and migen, there are plenty of new languages that use verilog as an intermediate format. It makes sense as it's a well-supported language, but I don't think verilog is very good if you view it as an intermediate language for autogenerated code as it's still a full-blown language	14:07
olofk	I think it would make sense to choose a small subset of verilog that could be intended as a target for autogenerated code	14:09
olofk	So if you are making a high-level language, you will only generate code for that verilog subset	14:11
olofk	And if you're making a EDA tool (simulator/synthesis/whatever...) you would only need to implement support for that small subset	14:11
olofk	It would probably make it easier for all the tools to handle code in the same way	14:12
olofk	juliusb: I think axi is ok to use as long as you don't claim to be AXI-compatible	14:15
juliusb	Ha	14:16
ysionneau	off topic question: what would be the reason for doing a software assisted TLB? (like MIPS afaik) instead of a hardware page-tree walker	14:16
ysionneau	beside design simplicity?	14:16
juliusb	olofk: Isn't that the case in the synthesisable subset of the language (where you target just that subset for synthesisable RTL in your higher-level-lang-to-vlog tool)?	14:17
stekern	ysionneau: I don't know if there are any other reasons?	14:17
juliusb	ysionneau: besides that, probably none?	14:17
ysionneau	ok :)	14:17
juliusb	you might like more places where bugs can creep in?	14:17
ysionneau	so hardware assisted TLB is always better for performance, no tradeoff	14:18
ysionneau	thanks :)	14:18
stekern	pretty much, but the win isn't necessarily always so huge	14:19
stekern	so the hardware bloat might not be worth it	14:19
stekern	and if you get some nice critical paths in the extra hw, then it might actually be worse for performance	14:19
ysionneau	you still save the exception, the register saving and all the assembly code that does the page table lookup, and register restore	14:20
ysionneau	yep so it could become the critical path	14:20
stekern	well, you can design your ABI so you don't need to do any register saving	14:21
stekern	and the assembly code is not necessarily many lines of code	14:21
ysionneau	yes, but depending on the OS you can need quite a bunch of assembly	14:21
ysionneau	for a minimal tlb update code I guess that in fact keeping 1 or 2 registers for kernel use is enough	14:23
ysionneau	depending on your ISA	14:23
stekern	it's two memory accesses and a bunch of shifts and ands	14:24
ysionneau	yes	14:25
olofk	juliusb: Yes, the synthesisable subset is a good point... but is there actually a strictly defined subset? And I still would say it's a bit too large for this purpose	14:25
ysionneau	stekern: do you know how Linux handles the tlb miss when the kernel tries to access a user space pointer ? (like in syscalls)	14:27
olofk	ysionneau: I think it kills a kitten every time you get a tlb miss	14:27
olofk	Linux is evil	14:27
ysionneau	woa :'	14:27
ysionneau	not this one I hope: http://catoverflow.com/cats/4fFPNUg.gif	14:28
olofk	But they might have fixed this in some of the last releases :)	14:28
ysionneau	pfew	14:28
ysionneau	thanks whatevergod	14:28
stekern	ysionneau: the same way as any tlb miss?	14:29
ysionneau	stekern: ah maybe this question is nonsense for Linux since I think syscalls are done with the process memory mapping enabled	14:29
ysionneau	I'm not sure how it is for NetBSD	14:29
ysionneau	I think that as soon as you enter kernel mode you use the kernel's page table	14:30
ysionneau	then you don't have the user space page table anymore, or not as easy to access	14:30
juliusb	I think if you're doing a pure area comparison, though, it depends how expensive your RAM which is storing the CPU code is, versus the hardware area of say, 100 flops (to store state) and 300 combinatorial elements? in ASIC that's like maybe 1.5k gates, depending on the combinatorial stuff. If the MMU is at best 100 32-bit instructions that's 3200 synchronous elements which may be as small as a couple of gates per bit to 6 gates per bit for SRAM, so it's pretty easy for HW to win	14:31
* ysionneau would need to dig on this one		14:31
juliusb	olofk: for synthesis, yes	14:31
juliusb	olofk: but admittedly even that's pretty big	14:31
ysionneau	juliusb: I think your sentence got cut	14:32
ysionneau	" of gates per bit t"	14:32
stekern	hmm, afaik, in linux the kernel page tables are in all the user space page tables	14:32
ysionneau	yes	14:32
ysionneau	the kernel is mapped in the user space vm space	14:32
juliusb	... 3200 synchronous elements which may be as small as a couple of gates per bit to 6 gates per bit for SRAM, so it's pretty easy for HW to win	14:32
ysionneau	so I guess the same vm space is kept during the syscall	14:32
stekern	juliusb: right, if you're using onchip SRAM/ROM for the insn code, then a hw filler (at least in terms of area) is probably always worth considering	14:33
stekern	I never thought about it in that way ;)	14:34
ysionneau	juliusb: I was comparing pure hw (no instructions, just a state machine) to pure software, with tlb refill code being in cache or sdram	14:34
ysionneau	or maybe I should have mentioned it :p	14:34
ysionneau	I never thought about embedding the tlb refill code in some SRAM	14:34
juliusb	ysionneau: my estimate of 100 flops and ~1000 gates (transistors) was for the HW state machine, the pure HW	14:35
ysionneau	ah ok	14:35
juliusb	that's 32 32-bit values and 4 bits for state	14:35
juliusb	err	14:35
juliusb	s/that's 32/that's 3/	14:35
ysionneau	not sure I understand what those 32-bit values would be	14:36
juliusb	couple of addresses and some data	14:37
juliusb	I'm just guessing here, I don't know how it works but you basically need a state machine to walk a table which exists in memory, remember a couple of things, and read and write some stuff, so you'll probably want a couple of pointers and a data-holder guy	14:38
ysionneau	yes	14:38
ysionneau	and some logic to access the bus toward sdram	14:38
ysionneau	ok got it	14:39
stekern	well, this is how it works: https://github.com/openrisc/mor1kx/blob/master/rtl/verilog/mor1kx_dmmu.v#L216	14:39
stekern	;)	14:39
ysionneau	:D	14:41
ysionneau	I should have a deeper look at this code indeed	14:42
ysionneau	I only had a very quick one so far	14:42
juliusb	only 2 32-bit registers then :)	14:43
juliusb	oh, no, 4	14:43
juliusb	err 3?	14:43
juliusb	I dunno :)	14:43
juliusb	not many	14:43
wkoszek	How offended you guys will be if I posted 1 message on a mailing list about 2 openings Xilinx has in my group?	19:17
stekern	wkoszek: given that you are "known to the project" and you ask in advance, I wouldn't at least be very offended ;)	19:25
wkoszek	stekern: Sounds good.	19:27
stekern	I think I finally got the profiling bug sorted out	20:41
stekern	...not quite yet :(	21:20
lvcargnini	hello, please I would like some clarifications regarding the Or1K architecture ?	22:21
poke53281	Hi, lvcargnini. What is the question?	22:45
lvcargnini	Hi, I'm with a doubt regarding the 64b OR1K	22:46
lvcargnini	like ok, instructions are 32b	22:46
lvcargnini	but, when I read the word, from L1, a 64b word	22:46
poke53281	Yes, instructions have a size of 4 byte.	22:46
lvcargnini	how should I map that ? {32'b0,instruction} or {instruction,32'b0}	22:47
lvcargnini	or the compiler is packing as two instruction in one word ? {inst1,inst2}	22:47
poke53281	Hehe, good question. Easy answer, you can't do it.	22:47
lvcargnini	is to help me understand how to map the signals during decode stage	22:48
lvcargnini	what do you mean can't ?	22:48
lvcargnini	Oo	22:48
poke53281	Did you check the instruction set. There is no "lwz 0x12345678" for loading a word	22:49
poke53281	the address of a memory location must be stored in a register.	22:50
poke53281	and then you can add an offset to that value in the lwz instruction.	22:50
lvcargnini	ok, well lwa loads a word (so 64b)	22:51
poke53281	so, in order to load from an random access you have to combine a few commands. Wait ...	22:51
lvcargnini	ld loads double word	22:52
lvcargnini	lbz maybe ?	22:52
lvcargnini	sorry lhz	22:53
lvcargnini	maybe	22:53
lvcargnini	but still is is the interpretation of the "lhz 0x121212" in the asm, the issue for me, in the RTL	22:54
poke53281	l.movhi r3,0x1234	22:57
poke53281	l.ori r3,r3,0x5678	22:57
poke53281	l.sw 0(r3),r2	22:57
poke53281	in 32 Bit, to write to a random 32 bit word you have to use three instructions.	22:57
poke53281	in this case I write the contents of r2 to the address 0x12345678	22:58
poke53281	There must be a similar way in the 64-Bit cpu.	22:58
lvcargnini	ok, the problem , for me, is how the l.movhi is translated to binary ? for a memory of 64b ? also, after that since I'm using 64b registers how to interpret them ? my initial question for you	23:00
lvcargnini	4-bytes instructions in a 8-byte wide register	23:00
lvcargnini	poke53281, thanks for the help so far	23:01
lvcargnini	I'll keep asking it, until someone can help me figure this out, on SPARC-v9 for example is the same as in SPARC-v8, but is in format {instruction,32'b0}	23:02
lvcargnini	to fit the 64b register	23:02
poke53281	umhh, ok. Good question. I don't know.	23:04
poke53281	probably you have to use l.movhi and then a shift left.	23:04
poke53281	But, maybe I don't understand the real question.	23:05
poke53281	according to the specification, l.movhi is also doing a shift left of 16. "rD[63:0]←extz(Immediate) << 16"	23:06
poke53281	it's the same opcode and behaves exactly the same for 32-Bit and 64-Bit.	23:07
poke53281	So you need probably 5 4-byte instructions to fill a register with a random value.	23:07
poke53281	Or better, you load the register value form a memory address, maybe from the stack.	23:08
poke53281	If I still didn't answer your question, you should wait a little bit, until Stekern and others are available.	23:08
poke53281	Ahh, I read your email.	23:16
poke53281	I still don't really understand your question. But I guess, that the compiler will produce in principle the same binary code like for 32 Bit.	23:22
poke53281	I don't know if anything else would make sense.	23:22
poke53281	18 60 12 34 l.movhi r3,0x1234	23:25
poke53281	a8 63 56 78 l.ori r3,r3,0x5678	23:25
poke53281	d7 e1 0f f8 l.sw -8(r1),r1	23:25
poke53281	this is the way you should pack it into memory.	23:26
poke53281	d4 03 10 00 l.sw 0(r3),r2	23:26
--- Log closed Thu Jun 19 00:00:40 2014

Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!