IRC logs for #openrisc Wednesday, 2014-06-18

--- Log opened Wed Jun 18 00:00:39 2014
stekernis there really no way to 'go to next/prev named marker' in gtkwave?04:48
-!- Netsplit *.net <-> *.split quits: stekern, heroux06:19
-!- Netsplit over, joins: heroux, stekern06:24
_franck__after a ctrl+c in or1ksim, can we continue the simulation where we stopped ?07:47
wallentostekern: nope, EDGE was default, created a PR to change this08:22
mor1kx[mor1kx] wallento opened pull request #13: PIC: Make LEVEL triggered default (master...master) https://github.com/openrisc/mor1kx/pull/1308:22
wallentobwah, whats that? :-D08:22
olofkwallento: You didn't know that stekern works for NSA and monitors all our activity?08:27
olofkstekern: I find it extremely awkward to navigate in gtkwave coming from modelsim08:38
stekernwallento: ok, it should be LEVEL I think - iow, will pull that in asap ;)09:04
stekernnow when I start to think about it, what did we need the or1200-compliant version in mor1kx for in the first place?09:06
mor1kx[mor1kx] skristiansson closed pull request #13: PIC: Make LEVEL triggered default (master...master) https://github.com/openrisc/mor1kx/pull/1309:09
stekernI think it's time to release a mor1kx v2 soon09:10
stekernthere's a lot of nice new feature in that, and I'd like to reserve multicore stuff for v309:12
stekernhmm, I think I've fixed the bug in mor1kx for the profiling, but openocd is still unhappy. it claims the range is 0-009:19
stekern_franck__: look what i found: http://sourceforge.net/p/openocd/code/ci/master/tree/src/target/target.c#l364109:29
_franck__cool, we can have our own profile function: http://sourceforge.net/p/openocd/code/ci/master/tree/src/target/target.c#l122709:32
stekernyup, that'd make much more sense, since afaik, it's spec-kosher to access SPRs over the DU port even when the cpu is running09:34
stekern(do you know if that works on or1200?)09:34
stekerndoes it work on mor1kx? ;)09:34
_franck__I don't know (for both)09:35
stekernI don't see why it shouldn't, but I have never tested09:35
stekerncan you manually read npc when the target isn't halted in openocd?09:35
_franck__I don't think so09:35
_franck__Its says "target not halted"09:35
stekernright, that's what i remembered09:36
_franck__at least AFIR09:36
stekernanyway, I want to make mor1kx work with the halt/resume first anyway, since it's obviously a good way to expose stall related bugs09:37
stekernlooks like this change isn't working: http://openocd.zylin.com/#/c/2168/2/src/target/openrisc/or1k.c,cm09:48
stekernbecause, I get all zero's when reading from 'pc' here: http://sourceforge.net/p/openocd/code/ci/master/tree/src/target/target.c#l181209:49
stekernbut if I change that to npc, I get sensible values09:49
_franck__I suggested this hack if order to not modify the non specific code (I didn't see we can have our own profiling)09:56
_franck__you can try this: https://github.com/fjullien/openOCD/commit/86ec95584aa239b0bb7fe8881a6582006487d31b09:57
_franck__(compile tested)09:57
_franck__s/if/in09:57
stekernok, will do that when I've confirmed that my current changes work09:58
stekern*changes to mor1kx09:58
stekernseems like they do09:58
stekern...almost...09:59
juliusbolofk: I agree, coming from Cadence Simvision, too, the GTKWave UI seems poorly arranged, but it works10:01
stekernit works for a (longer) while, but then it ends up either in alignment or bus error exception10:02
stekernI'm more used with gtkwave, so navigating in modelsim is painful for me10:03
juliusbmor1kx v2 hey? does that mean we shuld drink a beer?10:03
stekerntwo, since it's v2! ;)10:03
juliusboh, i'm looking forward to v3 already :)10:03
stekernyeah, I figured it might be a good motivator to do releases more often10:04
juliusbwell, considering wallento and I will go and talk about it in a few weeks in Hamburg, it might be a good time10:04
juliusb( to do both the release and drink 2 beers!)10:04
juliusbpity you can't join stekern :-/10:04
stekernyup :(10:05
juliusbso I came across something interesting recently (despite it being around for about a year)10:05
juliusbyou guys seen the RISC-V work?10:05
juliusbit appears somebody is doing the or2k work for us10:05
stekernyes, but I'm not sure it's *so* interesting10:06
juliusbjust under a different name10:06
juliusbthere's a lot of detail yet to be released I think, the 16-bit ISA for instance, but they go through and first of all point out why they didn't want to use OR1K to do the project, despite it being ideologically aligned10:07
juliusbit was only technical reasons10:07
juliusbso, it appears they've fixed those technical things (no branch delay slot, conditional things done only by comparing register values not flags)10:07
juliusbwhich, I'll grant you, isn't revolutionary, but they've basically taken OR1K, fixed it, and appear to be doing stuff10:08
juliusbI dunno, I just thought it was interesting in the light of recent OR2K discussion (OK, it consisted of just 3 posts)10:09
stekernyeah, but it still all boils down to yet-another-open-source-risc10:11
stekernby that logic, or1k isn't so interesting neither. and to be fair, as an ISA it isn't.10:16
stekernwhat imo opinion make or1k stand-out among the other yet-another-open-source-riscs is that the community is completely transparent, not driven by any entity, but still has quite strong software/toolchain support.10:18
* juliusb agrees10:18
juliusbso it turns out that they got in touch with me to ask me exactly that - why or1k works10:21
juliusbit was a bit coincidental, I got in touch with the guy who's in charge of the doing the 16-bit ISA stuff and asked when it might come out and whether there's any Verilog in the open10:21
juliusband then later that day I went out for a beer with some of the bods involved here in Cambridge who seemed to be wanting to understand the experience of the OpenRISC project because it's kind-of what they want to do10:23
stekernI think one of the keys are to scratch this kind of work-flow:  "Initial versions of all of these have been developed or are under active development. This material is to be made available under open-source licenses."10:25
stekernit's the typical cathedral type of development that most open source software projects moved away from10:25
stekern(and that we certainly don't employ)10:26
juliusbI agree. They are a bit stuck though, and I think it's a hard place to be - they have a team of academics who are all very talanted, and they have money to do 28nm ASICs and their plan long term, I guess, is to release everything they can in the open to allow people to download the RTL of the chips they hope to eventually provide on low-cost dev boards10:28
juliusbbut, as you say, it's a cathederal type development until that time where they do release everything10:28
juliusbit possibly could be open right now10:28
juliusbthey weren't sure though whether they should go and be transparent from the beginning or whether after things have been achived, then they open it up10:29
juliusbthey do have chips, actually, it's all pretty far along AFAICT10:29
juliusbbut yes, it's just another mostly in-house thing which has no appeal to a collaborative community10:30
juliusbbut what is in the open is a good spec10:30
juliusband tool chains and sims10:30
juliusband is, as far as I can tell, the spiritual successor to or1k10:30
stekernyeah, all open source hardware is of course a step forward10:31
juliusbso, you don't need their implementations - Beyond Semi are also doing OR1K-compliant implementations but who cares right? what they did was a base of good work on which some even better work was done10:31
stekernyou're right, doing a semi-good implementation shouldn't be too hard, I did the eco32 in ~2 months10:32
juliusbwhat I'm thinking is that if they put out a good spec with the 16-bit ISA and then the MMU, cache, interrupt controller specs, and it fits, then that'd be pretty interesting10:32
juliusbbecause it then ticks all of the boxes of or2k10:33
juliusba spec unencumbered by some bloody proprietary company which is competitive10:33
juliusbactually, one of the Cambridge guys involved did stuff with the Raspberry Pi, and I think maybe it's a response to everyone who said "ok, it's nice the board and software are open source, how about the chip now?"10:34
juliusbanyway, another architecture raises the discussino we had at orconf about diversion of a limited resource10:35
stekernright, but I guess if they wouldn't release it, their resources would have been tied to an in-house version10:37
juliusb??10:39
juliusbyou mean if it's a proprietary spec then that's more boring than one where it's open and others can make better use of any product or chip those processors appear in?10:40
stekernI mean, chances are that they wouldn't come running contributing to the openrisc project if they wasn't doing their own open source cpu architecture10:40
juliusbah sure10:41
juliusbwell, they talk about why they didn't pick OR1K for this in the spec, but they wanted something along the same lines in terms of openness and applicability to industry and academia10:42
juliusbimplementations are one thing, but tool chains, linux ports, etc. you'd be crazy to do your own one of those and not release that to the great wide world10:43
stekernyes, but it's still funny, they've already released that, why not the rtl at the same time?10:52
stekernI mean it's like it's some mentality that software code and rtl code should be treated differently for some reason10:58
juliusbactually, they have released RTL11:04
juliusbthere's a synthesisable core written in some language called Chisel11:04
juliusbit wasn't obvious to me, but apparently another guy involved with it wrote a tool which converts it into Verilog or C++11:05
juliusb(Verilator getting left out of the action it seems hehe)11:05
juliusbbut I'm not sure if that's the one they refer to as being synthesisable at 1.5 GHz and on par with an ARM Cortex A511:05
stekernah, ok11:06
juliusbreferred to here: http://www.hotchips.org/wp-content/uploads/hc_archives/hc25/HC25-posters/HC25.26.p70-RISC-V-Warterman-UCB.pdf11:06
juliusboh, I didn't find this the other day, it's the proposed 16-bit ISA extensions presented in the guy's masters: http://www.eecs.berkeley.edu/~krste/papers/waterman-ms.pdf11:08
juliusbsome lunchtime reading :)11:09
juliusbi mean, this looks a lot like what we were doing for or2k11:09
stekern~krste, that looks like something that could have been my user name ;)11:14
stekernwould be interesting to hear sb0's input on Chisel11:20
stekernsince it's a contender to what he is doing11:21
juliusbya, I can imagine so12:22
juliusbso one other thing I mentioned to the RISC-V guys at that meeting was FuseSoC and that it's the panacea for all your open source hardware collation needs (the paid version, at least)12:24
juliusbI urged them not to re-invent the wheel there12:25
stekernah, that'd be nice if they'd use that12:27
olofkYes, a new ISA is one thing, but it would be nice if people got better at reusing the other stuff13:54
olofkWill they use wishbone, AXI or something completely different?13:54
juliusbNot sure. AXI maybe? They seemed to think the spec was freely available for use14:04
olofkWould be interesting to know. AXI is probably a wiser choice for a modern arch14:05
olofkAbout chisel and migen, there are plenty of new languages that use verilog as an intermediate format. It makes sense as it's a well-supported language, but I don't think verilog is very good if you view it as an intermediate language for autogenerated code as it's still a full-blown language14:07
olofkI think it would make sense to choose a small subset of verilog that could be intended as a target for autogenerated code14:09
olofkSo if you are making a high-level language, you will only generate code for that verilog subset14:11
olofkAnd if you're making a EDA tool (simulator/synthesis/whatever...) you would only need to implement support for that small subset14:11
olofkIt would probably make it easier for all the tools to handle code in the same way14:12
olofkjuliusb: I think axi is ok to use as long as you don't claim to be AXI-compatible14:15
juliusbHa14:16
ysionneauoff topic question: what would be the reason for doing a software assisted TLB? (like MIPS afaik) instead of a hardware page-tree walker14:16
ysionneaubeside design simplicity?14:16
juliusbolofk: Isn't that the case in the synthesisable subset of the language (where you target just that subset for synthesisable RTL in your higher-level-lang-to-vlog tool)?14:17
stekernysionneau: I don't know if there are any other reasons?14:17
juliusbysionneau: besides that, probably none?14:17
ysionneauok :)14:17
juliusbyou might like more places where bugs can creep in?14:17
ysionneauso hardware assisted TLB is always better for performance, no tradeoff14:18
ysionneauthanks :)14:18
stekernpretty much, but the win isn't necessarily always so huge14:19
stekernso the hardware bloat might not be worth it14:19
stekernand if you get some nice critical paths in the extra hw, then it *might* actually be worse for performance14:19
ysionneauyou still save the exception, the register saving and all the assembly code that does the page table lookup, and register restore14:20
ysionneauyep so it could become the critical path14:20
stekernwell, you can design your ABI so you don't need to do any register saving14:21
stekernand the assembly code is not necessarily many lines of code14:21
ysionneauyes, but depending on the OS you can need quite a bunch of assembly14:21
ysionneaufor a minimal tlb update code I guess that in fact keeping 1 or 2 registers for kernel use is enough14:23
ysionneaudepending on your ISA14:23
stekernit's two memory accesses and a bunch of shifts and ands14:24
ysionneauyes14:25
olofkjuliusb: Yes, the synthesisable subset is a good point... but is there actually a strictly defined subset? And I still would say it's a bit too large for this purpose14:25
ysionneaustekern: do you know how Linux handles the tlb miss when the kernel tries to access a user space pointer ? (like in syscalls)14:27
olofkysionneau: I think it kills a kitten every time you get a tlb miss14:27
olofkLinux is evil14:27
ysionneauwoa :'14:27
ysionneaunot this one I hope: http://catoverflow.com/cats/4fFPNUg.gif14:28
olofkBut they might have fixed this in some of the last releases :)14:28
ysionneau*pfew*14:28
ysionneauthanks *whatevergod*14:28
stekernysionneau: the same way as any tlb miss?14:29
ysionneaustekern: ah maybe this question is nonsense for Linux since I think syscalls are done with the process memory mapping enabled14:29
ysionneauI'm not sure how it is for NetBSD14:29
ysionneauI think that as soon as you enter kernel mode you use the kernel's page table14:30
ysionneauthen you don't have the user space page table anymore, or not as easy to access14:30
juliusbI think if you're doing a pure area comparison, though, it depends how expensive your RAM which is storing the CPU code is, versus the hardware area of say, 100 flops (to store state) and 300 combinatorial elements? in ASIC that's like maybe 1.5k gates, depending on the combinatorial stuff. If the MMU is at best 100 32-bit instructions that's 3200 synchronous elements which may be as small as a couple of gates per bit to 6 gates per bit for SRAM, so it's pretty easy for HW to win14:31
* ysionneau would need to dig on this one14:31
juliusbolofk: for synthesis, yes14:31
juliusbolofk: but admittedly even that's pretty big14:31
ysionneaujuliusb: I think your sentence got cut14:32
ysionneau" of gates per bit t"14:32
stekernhmm, afaik, in linux the kernel page tables are in all the user space page tables14:32
ysionneauyes14:32
ysionneauthe kernel is mapped in the user space vm space14:32
juliusb... 3200 synchronous elements which may be as small as a couple of gates per bit to 6 gates per bit for SRAM, so it's pretty easy for HW to win14:32
ysionneauso I guess the same vm space is kept during the syscall14:32
stekernjuliusb: right, if you're using onchip SRAM/ROM for the insn code, then a hw filler (at least in terms of area) is probably always worth considering14:33
stekernI never thought about it in that way ;)14:34
ysionneaujuliusb:  I was comparing pure hw (no instructions, just a state machine) to pure software, with tlb refill code being in cache or sdram14:34
ysionneauor maybe I should have mentioned it :p14:34
ysionneauI never thought about embedding the tlb refill code in some SRAM14:34
juliusbysionneau: my estimate of 100 flops and ~1000 gates (transistors) was for the HW state machine, the pure HW14:35
ysionneauah ok14:35
juliusbthat's 32 32-bit values and 4 bits for state14:35
juliusberr14:35
juliusbs/that's 32/that's 3/14:35
ysionneaunot sure I understand what those 32-bit values would be14:36
juliusbcouple of addresses and some data14:37
juliusbI'm just guessing here, I don't know how it works but you basically need a state machine to walk a table which exists in memory, remember a couple of things, and read and write some stuff, so you'll probably want a couple of pointers and a data-holder guy14:38
ysionneauyes14:38
ysionneauand some logic to access the bus toward sdram14:38
ysionneauok got it14:39
stekernwell, this is how it works: https://github.com/openrisc/mor1kx/blob/master/rtl/verilog/mor1kx_dmmu.v#L21614:39
stekern;)14:39
ysionneau:D14:41
ysionneauI should have a deeper look at this code indeed14:42
ysionneauI only had a very quick one so far14:42
juliusbonly 2 32-bit registers then :)14:43
juliusboh, no, 414:43
juliusberr 3?14:43
juliusbI dunno :)14:43
juliusbnot many14:43
wkoszekHow offended you guys will be if I posted 1 message on a mailing list about 2 openings Xilinx has in my group?19:17
stekernwkoszek: given that you are "known to the project" and you ask in advance, I wouldn't at least be very offended ;)19:25
wkoszekstekern: Sounds good.19:27
stekernI think I finally got the profiling bug sorted out20:41
stekern...not quite yet :(21:20
lvcargninihello, please I would like some clarifications regarding the Or1K architecture ?22:21
poke53281Hi, lvcargnini. What is the question?22:45
lvcargniniHi, I'm with a doubt regarding the 64b OR1K22:46
lvcargninilike ok, instructions are 32b22:46
lvcargninibut, when I read the word, from L1, a 64b word22:46
poke53281Yes, instructions have a size of 4 byte.22:46
lvcargninihow should I map that ? {32'b0,instruction} or {instruction,32'b0}22:47
lvcargninior the compiler is packing as two instruction in one word ? {inst1,inst2}22:47
poke53281Hehe, good question. Easy answer, you can't do it.22:47
lvcargniniis to help me understand how to map the signals during decode stage22:48
lvcargniniwhat do you mean can't ?22:48
lvcargniniOo22:48
poke53281Did you check the instruction set. There is no  "lwz 0x12345678" for loading a word22:49
poke53281the address of a memory location must be stored in a register.22:50
poke53281and then you can add an offset to that value in the lwz instruction.22:50
lvcargniniok, well lwa loads a word (so 64b)22:51
poke53281so, in order to load from an random access you have to combine a few commands. Wait ...22:51
lvcargninild loads double word22:52
lvcargninilbz maybe ?22:52
lvcargninisorry lhz22:53
lvcargninimaybe22:53
lvcargninibut still is is the interpretation of the  "lhz 0x121212" in the asm, the issue  for me, in the RTL22:54
poke53281l.movhi r3,0x123422:57
poke53281l.ori r3,r3,0x567822:57
poke53281l.sw 0(r3),r222:57
poke53281in 32 Bit, to write to a random 32 bit word you have to use three instructions.22:57
poke53281in this case I write the contents of r2 to the address 0x1234567822:58
poke53281There must be a similar way in the 64-Bit cpu.22:58
lvcargniniok, the problem , for me, is how the l.movhi is translated to binary ? for a memory of 64b ? also, after that since I'm using 64b registers how to interpret them ? my initial question for you23:00
lvcargnini4-bytes instructions in a 8-byte wide register23:00
lvcargninipoke53281, thanks for the help so far23:01
lvcargniniI'll keep asking it, until someone can help me figure this out, on SPARC-v9 for example is the same as in SPARC-v8, but is in format {instruction,32'b0}23:02
lvcargninito fit the 64b register23:02
poke53281umhh, ok. Good question. I don't know.23:04
poke53281probably you have to use l.movhi and then a shift left.23:04
poke53281But, maybe I don't understand the real question.23:05
poke53281according to the specification, l.movhi is also doing a shift left of 16.   "rD[63:0]←extz(Immediate) << 16"23:06
poke53281it's the same opcode and behaves exactly the same for 32-Bit and 64-Bit.23:07
poke53281So you need probably 5 4-byte instructions to fill a register with a random value.23:07
poke53281Or better, you load the register value form a memory address, maybe from the stack.23:08
poke53281If I still didn't answer your question, you should wait a little bit, until Stekern and others are available.23:08
poke53281Ahh, I read your email.23:16
poke53281I still don't really understand your question. But I guess, that the compiler will produce in principle the same binary code like for 32 Bit.23:22
poke53281I don't know if anything else would make sense.23:22
poke5328118 60 12 34     l.movhi r3,0x123423:25
poke53281a8 63 56 78     l.ori r3,r3,0x567823:25
poke53281d7 e1 0f f8     l.sw -8(r1),r123:25
poke53281this is the way you should pack it into memory.23:26
poke53281d4 03 10 00     l.sw 0(r3),r223:26
--- Log closed Thu Jun 19 00:00:40 2014

Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!