IRC logs for #openrisc Sunday, 2015-03-01

--- Log opened Sun Mar 01 00:00:02 2015
stekernrschmidlin: I'm having problems with what I'm working with and dcache enabled now04:22
stekernso maybe there's a bug in mor1kx04:22
stekern...or this is completely unrelated04:22
stekernannoying that or1ksim doesn't implement support for the PL1 bit05:12
stekernnope, my dcache problem was a pure sw bug07:49
stekernI was invalidating dcache according to NCS instead of 1 << NCS07:49
rschmidlingood morning all09:04
rschmidlinstekern, the changes are actually unrelated to the cache problems. I've only done these changes to allow me to synthesize mor1kx with ISE14.07. Despite of the changes, I'm still having the multiple signal drivers issues.09:06
stekernrschmidlin: yeah, I know, my comment was only related to your cache09:12
stekernproblems09:13
stekernbut your only having problems with ISE14.07 for spartan3 right?09:13
rschmidlinwell, I didn't try any other. But yes. I was catching up with the conversation with bandvig09:28
rschmidlinI actually implemented his fpu32_v1.0 tag. I certainly want an FPU. But I got stuck with the caches.09:29
bandvighello all!09:32
rschmidlinhello!09:33
rschmidlinstekern, you are right. Synthesis ran through for Spartan6.09:33
rschmidlinI didn't expect that09:34
bandvigI've just started synthesis atlys (Spartan6 based board) project with ISE14.6. In adout 20 minutes I'll report about presence of issues. But I don't remember that I have any problem with either cache or "multiple signal drivers issues".09:37
rschmidlinbandvig, my bad. I cross-reported something here.09:40
rschmidlinProblem (1): caches are not working with wb_ram_b3.v on an Artix 6.09:41
rschmidlinProblem (2): I get multiple drivers for signals when implementing mor1kx for Spartan3A DSP.09:41
rschmidlin(1) implementation from Vivado. (2) implementation from ISE14.7.09:41
stekernrschmidlin: can you paste the program that fails?10:11
rschmidlinhttp://pastie.org/999111910:16
rschmidlinsorry, I was actually reproducing the one I have at work. There is still an error in this.10:17
rschmidlinI believe this should give you the idea and contain everything I have in it: http://pastie.org/999112910:22
stekernand your caches are 8kb and the cache line size is 32?10:25
rschmidlinhttp://pastie.org/999113610:29
rschmidlinstekern, yes10:41
bandvigrschmidlin: About problem (1). It looks like I couldn't help you. My atlys project is very old (in fact the mor1kx is the only fresh part). So, it just doesn't include wb_ram_b3.v.10:45
rschmidlinstekern, the data cache (wb_ram_b3.v) gives me a buserr at line 95. wb_ram.v (wb_ram_b3.v with integrated arbiter) works with data cache. The instruction cache gives me a trap instruction at line 63.10:45
rschmidlinbandvig, no worries. I should get a ddr memory controller if I really go down the mor1kx way. Though I only wanted to have a simple system to gather some performance measurements. I thought about instantiating a MIG and wrapping and I thought I could use atlys code. But Vivado has a new MIG interface, without possibility for many interfaces and arbitration. I'd have to adapt the complete wrapper.10:48
gr8what happened to the openrisc server?10:52
bandviggr8: if you are about openrisc.net, it is down.11:05
bandviggr8: we discussed yesterdey: http://juliusbaxter.net/openrisc-irc/%23openrisc.2015-02-28.log.html11:07
stekernI synced openrisc/linux to v3.19 now11:09
stekernrschmidlin: there wasn't any problems running your test asm on my sockit at least11:21
stekernhttp://pastie.org/999118011:21
stekernthat's the exact asm I ran11:21
bandvigstekern: cool. By the way. I saw in the logs. If I push a commit into openrisc/mor1kx, the information about the commit appears here.11:24
bandvigstekern: I could be useful to organize the same for any project from operisc.11:24
stekernbandvig: yes, I know, I turned on the irc notifications on mor1kx.11:26
stekernbut I want to be careful adding too much of those, it easily becomes noisy if every little commit to every project is notified in the logs11:27
bandvigstekern: ok. And I would like to restore our yesterday discussion about pipelines.11:44
bandvigFirst of all I understood your point about separate REGs for FPU...11:44
bandvigSecond, I've thought your proposal with freshed morning brain :)...11:45
bandvigLet me several lines to describe what I've understood...11:45
gr8bandvig: ok thanks11:47
bandvig stekern: Lets use single issue scheme (ISU denotes "issue unit")...12:02
bandvigThere is also a kind or queue on WB side. The queue contains ordered identifiers of units from which WB waits ready signals.12:03
bandvig(1 of 4) each time issue logic places an instruction into a one of paralleled pipe, it also sends unit identifier into next slot of WB's queue12:03
bandvig(2 of 4) if a conflict occurs when ready signals have raised from units different to WB queue's head.12:03
bandvigIn the case all such units have become stalled (and ISU isn't able to put into them a new instruction even for data independence) till the unit with identifier equal to WB queue head's one rises ready flag.12:03
bandvig(3 of 4) After conflict resolving (the unit of interest have provided result) the WB queue pushes and grant access to GPRS for next ordered unit.12:03
bandvig(4 of 4) Of course, if WB' queue is full, the whole pipe stalls till resolving the conflict.12:03
bandvig uff... done... :)12:04
rschmidlinstekern, thanks a lot. That was from a DDR3 controller, wasn't it? Good to know that it works with the DDR controllers. The Xilinx block rams, and the way they are interpreted in the Wishbone wrapper should be the problem.12:32
stekernbandvig: yes, that sounds good. pretty much like how the current storebuffer works for the lsu12:36
stekernrschmidlin: yes, it was from DDR3 ram, but I think I have a ram_wb_b3.v instantiation somewhere on this soc too12:39
stekernwhat's the difference between wb_ram_b3.v and ram_wb_b3?12:40
stekernlooks like I can't even write into that SRAM...13:10
stekernand the test would have been moot anyway, since I have it mapped at 0x80000000 and everything above that is uncached anyway13:11
Me1234mailman-owner@lists.openrisc.net  8:00 AM (10 hours ago)14:08
Me1234I got an email from lists.openrisc.net . It means these are DNS problems.14:09
bandvigstekern: yes, I also thought about storebuffer as a model for WB queue14:16
bandvigIn fact the approach is also a good starting point for subsequent performance improvement.14:20
bandvigLet me to propose a plan for further development of mor1kx...14:22
bandvig (1 of 4) The 1-st step (let me to repeat). Single issue, paralleled units, conflict control on WB stage.14:22
bandvigProposed code name is Latte (?). (Wikipedia: "A cappuccino differs from a caffè latte in that it is prepared with much less steamed or textured milk than the caffè latte..."14:23
bandvigSo Latte is more steamed milk (paralleled units) and more textured milk (more smart conflict control)) :))14:23
bandvig(2 of 4) The 2-nd step. Expand the 1-st step by implementation of out-of-order completion, i.e. full featured reorder buffer in WB.14:24
bandvigProposed code name is Marocchino (espresso, steamed milk, cocoa powder).14:24
bandvig(3 of 4) The step 3-rd. Expant the step 2 with implementation Tomasulo algorithm. By the way the reservation stations could play a role of separate REGs for each unit.14:24
bandvigProposed code name is Miel (espresso, steamed milk, cinnamon and hone)14:24
bandvig(4 of 4) Multi issue extension of step 3. Proposed code name is Lungo (from ORCONF-2014 materials)14:24
bandvigTo all. Comments? Proposals (about code names for example)?14:25
bandvigPersonally, I'm planning to start implementation the discussed approach (paralleled units with stalling from WB) in several days.14:40
stekernbandvig: sounds like a plan14:46
rschmidlinHmm, I have taken the implementation from mor1kx-generic. It also has ram_wb_b3.v and not wb_ram_b3.v. I don't know anything about the wb_ram_b3.v implementation then.14:55
rschmidlinI had a working data cache with ram_wb.v which essentially includes an arbiter before the memory.14:57
bandvigstekern: Yes, it is a plan :). Let say, it is my style. Before each FPU iteration I usually generated a plan with set of small steps to achieve next goal.15:02
bandvigstekern: Of course I don't force anybody to follow the proposed 4 steps.15:02
olofkbandvig: Like the coffee names and implementation ideas, so Go go go! :)16:44
olofkrschmidlin: ram_wb_b3 (from orpsocv2, with built-in arbiter) was never meant for synthesising and shouldn't be used anymore16:45
olofkwb_ram (from yours truly) is the way to go. If you find problems with that, I'm all ears16:46
olofkoh... I see that three systems still use ram_wb16:47
olofkmor1kx-generic and or1200-generic aren't intended for synthesis, so that's no big deal16:47
olofksockit however...16:48
olofkah no.. mor1kx-generic doesn't use ram_wb after all. It uses my Wishbone memory BFM which is definitely not synthesisable16:56
olofkIs Nathan Yawn still active btw?16:58
olofkI would like to do a new release of adv_debug_sys16:58
olofkWith the patches we have gathered in FuseSoC16:58
rschmidlinI don't think so. But you can certainly contact him about the patches.17:02
olofkNope. Wrong again. mor1kx-generic does use ram_wb17:05
rschmidlinolofk, what should matter is if we can get an internal memory based on block/distributed rams for FPGAs working with mor1kx.17:05
olofkrschmidlin: Yes. The whole idea with wb_ram was to have a synthesisable RAM with multiple backends for different FPGAs17:05
olofkBut I only ever wrote a generic backend17:06
rschmidlinolofk, Vivado and ISE are able to infer block rams from that code Olof.17:06
rschmidlinso it must be good enough for Xilinx already.17:07
olofkrschmidlin: I had to do some tricks get that working with ISE, and it turns out there are still some problems with Quartus17:07
rschmidlinolofk, if you are looking for something usable with Quartus, strip out the Wishbone things and go with this: http://opencores.org/websvn,filedetails?repname=minsoc&path=%2Fminsoc%2Ftrunk%2Frtl%2Fverilog%2Fminsoc_onchip_ram_top.v17:10
rschmidlinIt simply generates banks of 4 8-byte memory blocks and mux them together right.17:10
olofkrschmidlin: Well, there's another thing that made things a bit more complicated17:11
rschmidlinolofk, but I'd let ram_wb_b3.v as it is in case it is already working with Quart.s17:11
olofkI wanted to have the option to preload the RAM from a verilog memory file17:12
olofkThat works fine in simulation (and ISE I think)17:13
olofkBut Quartus decides to split up my 32-bit memory to 4 8-bit memories (even if they have a primitive with byte-wise write enables)17:13
olofkAnd when it splits up the memory it no longer loads the data17:13
olofkI wanted the preloading so that I could have a bootloader that could also hold some volatile data17:14
olofkBut I've decided to drop that idea since the FPGA tools just won't play nice17:14
olofkSo then 4 8-bit memories work fine, even if it's a bit of waste of memories17:15
olofkAnother way to work around that would be to directly instantiate Altera primitives to get one 32-bit memory with byte-wise write enable17:16
olofkBut then it can only load their stupid .mif format17:16
rschmidlinahh, you want that the synthesizer initializes the memory for you?17:16
olofkYes. That works mostly fine except for the tool bugs17:17
olofkBut I'm giving up on that now and create a wb_rom component instead that can be preloaded in a portable way17:18
olofkThen I'll probably drop the initialization code from wb_ram, or at least explain that it's broken on some devices (Cyclone IV at least)17:18
rschmidlinI get it. However, my current problem seems to be the iteration between the caches' Wishbone interfaces and the memory Wishbone wrapper.17:18
olofkDid you run simulations on it?17:19
rschmidlinolofk, I believe there is already a rom module simply written out by a script, isnt there?17:19
olofkrschmidlin: Yes there is, but that one is a bit awkward17:19
olofkI'm just dropping the write enables from wb_ram17:20
olofkThat makes it easier to switch contents at compile-time17:20
olofk(see or1k_bootloaders)17:20
rschmidlinolofk, the simulations work. I was discussing that with stekern the whole weekend. He told me that he already heard issues as such that Xilinx memories are not playing along the lines with the description. And that the caches are somewhat picky on the burst transactions. But I'm left somewhat clueless on what to do. My step will be to deny bursts first and see if the system is working and then try to put it back in.17:20
rschmidlinI thought about going over that libelf situation with fusesoc on mac this weekend. But I had other things to do.17:22
olofkrschmidlin: I remember that the synthesis and simulation behaved differently when I first tested wb_ram on a spartan6. Had to create workarounds in the code17:27
olofkAbout libelf, what we want to do is to make fusesoc pick up include files that aren't in the standard gcc include path I think17:28
olofkwe should use fusesoc.conf for that17:28
olofkfuessoc.conf could be used for a lot of other things to, like setting paths to the EDA tools and enable the monochrome mode17:29
olofkRight now it only has one option I think :)17:29
olofkCan I pass a linker script to or1k-elf-as?17:33
stekernolofk: you can pass it to gcc (which will pass it to ld)18:02
stekernor directly to ld, it's not called a *linker* script for nothing ;)18:03
amsolofk: -Xl switch19:14
amsolofk: or -Wl19:14
amsolofk: oh, script .. not reaidn well.. but same thing .. -T for script name19:14
olofkhahaha I found a limitation in the icecube2 software. The line containing which verilog files to use seem to have a maximum length of 2047 characters20:46
olofkAh ok. It's possible to put each file on a separate line20:47
--- Log closed Mon Mar 02 00:00:04 2015

Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!