IRC logs for #openrisc Monday, 2014-10-20

--- Log opened Mon Oct 20 00:00:45 2014
olofkstekern: All wb_ports in wb_sdram_ctrl has their own cache, and they are coherent between all ports, right?07:05
stekernolofk: that's the intent, yes07:11
stekernolofk: I played my first couple of hours of broken age yesterday btw ;)07:12
olofkstekern: Woohoo!! Was it fun? :)07:12
stekernyes, a bit on the easy side perhaps07:12
stekernand I can't say I'm superfound of the graphic style07:13
olofkYes. It wasn't that hard, but amusing. I like the fox :)07:13
stekernisn't it a wolf?07:13
stekernour daughter enjoyed watching while I played though07:13
olofkah ok. Was almost a year since I played it07:14
olofkI bought these on humble bundle some weeks ago. Much more enjoyable graphics http://www.wadjeteyegames.com/games/blackwell-legacy/07:14
stekernyeah, that sounds right, think it was about a year ago that I bought it ;)07:15
olofkAbout the sdram, I've been thinking a lot about wide wb buses to increase bandwidth with many masters07:15
olofkAnd that got me wondering if it ever happens that one port caches something that is used by another port07:16
olofkI mean, in the common case we only have data and instruction buses to the RAM, and they would never get stuff from the same memory range, right?07:17
stekernumm, and then you've got DMA accesses07:17
olofkYes. With DMA I guess that the caches gets more used07:18
stekern(blackwell) yeah, that seems to have a retro touch to it ;) gotta buy that one too07:20
stekernbut TOMI had modern graphics that I liked07:22
olofkAh.. never played that. I still haven't completely accepted COMI though :)07:24
stekernCOMI was good, EMI was a disaster07:25
stekernI've completely missed this too: http://en.wikipedia.org/wiki/Back_to_the_Future:_The_Game07:26
olofkMe too07:27
stekernit sucks that telltale doesn't do linux versions though07:28
olofkah.. they're the ones who did the Sam'n'Max shorties too07:29
stekernyup07:29
olofkRegarding the SDRAM again. My grand plan was to split out the wb_port arbiting stuff to a separate system cache component, add proper arbitrary wishbone resize blocks and expose a single DQ_WIDTH*BURST_LENGTH wishbone port from the memory controller07:34
olofkBenefits: 1) System cache can be used for all memory controllers. 2) If we add support in mor1kx, we can pull in a full cache line in one transaction07:35
stekern1) sounds good07:39
stekern2) is problematic, since there's no portable way to create block rams with seperate data ports07:40
stekernsizes07:40
olofk(2) Where is that a problem? In mor1kx or the system cache?07:45
stekernin mor1kx07:46
olofkah ok07:46
olofkCurrently trying to raise Fmax for mor1kx. Found critical path from dltb_match_regs to pc_fetch. Disabled dmmu and got 7MHz lower Fmax. What?07:47
stekernthat's probably due to the phase of the moon07:48
olofkahh.. or because the generate statement says if (FEATURE_DATACACHE!="NONE")07:50
olofkNever really disabled it07:50
stekernyes, it's confusing...07:50
stekernit's on my todo list to document all the parameters in the top module with the valid values they can obtain07:52
olofkIn theory it could be handled with a common function that returns 1 for "ENABLED", "TRUE", and 0 for "DISABLED", "NONE" and some more values. Problem there is that Icarus still doesn't support static functions :(07:53
stekernI'm also working on putting together 'templates' with parameter setups for different purposes07:54
stekernthis is a minimal baremetal setup: http://pastie.org/966242107:54
olofkstekern: I've been considering the same thing. My approach however is to have a config file that spits out a verilog wrapper07:55
olofkThat could also instantiate the debug interface and hook it up depending on if the debug unit parameter is set07:57
olofkAnd other things07:57
olofkDisabling ?MMU gives me a critical path in the arbiter. Feels much better to know it's my fault for it being slow :)07:58
stekern;)07:59
olofkstekern: I slightly improved the timing of wb_arbiter09:03
stekernby?09:10
olofkhttps://github.com/bmartini/verilog-arbiter/pull/209:10
olofkPrecalculating the index inside of verilog-arbiter09:11
stekerndoes it make more sense to do it inside the arbiter instead of outside it?09:13
olofkHmm.. no09:13
stekernI mean, it seems to me that all the signals that would be needed to do it outside of it are already exposed09:13
olofkThis is wrong09:13
olofkcrap09:13
stekernheh ;)09:14
olofkThe idea is right, but I did a mistake in the implementation09:14
olofkIt should be         sel <= ff1(token & request);09:15
olofkThat we can only do inside the module09:15
stekernah, that makes a lot more sense09:15
olofkCan I just amend the patch and force push it to update the pull request?09:16
olofkYeah, that worked09:18
stekernyou do remember that $clog is broken in ISE?09:19
olofkYes, but I ignored it. Maybe it's too early to ignore it09:22
stekerndoes vivado evaluate it correctly?09:24
olofkISE > 13.2 IIRC09:42
olofkAt least >=14.x09:43
stekernah, ok. I thought it still was broken09:49
olofkSometime in 2047 when I get the versioning support going in FuseSoC I intend to make it possible to require certain versions of the tools as well09:50
olofkWe have workarounds for at least some versions of Icarus, Verilator and ISE right now09:51
hansfbaierWhat do you guys think about Risc-V means for OpenRisc? It looks very attractive to me.09:56
hansfbaierBut the toolchain does not build on Ubuntu stable.09:56
hansfbaiers/about/about what/09:56
hansfbaier(Read the orconf slides)09:56
hansfbaierThat Rock processor performance is very impressive09:57
hansfbaier(Benchmark against ARM A)09:57
hansfbaierA5 IIRC09:57
stekernthe DMIPS or the area or what performance?10:04
olofkhansfbaier: The impression right now is that it looks like a well thought out 64-bit architecture, but it's currently way too big and slow to run on most FPGAs10:23
olofkDoes anyone have a wb register stage that I can use?10:28
stekernwhat do you need that for?10:30
olofkGetting a critical path on the slave data from adv_debug_sys10:32
olofkThere's something that resembles an endian converter in there that seems to use a lot of logic10:33
olofkSo I only really need to register the return s2m data path10:34
hansfbaierstekern: it seemed pretty impressive in area as well as performance10:36
hansfbaierolofk: Wasn't aware of that... Looked pretty good for the ASIC (also speed wise).10:38
hansfbaierolofk: How many LUTs would that thing need?10:38
hansfbaierolofk: or cells rather10:39
olofkhansfbaier: Don't remember. I know that a few others have done FPGA synthesis, and IIRC it was about 10 times bigger than mor1kx10:39
hansfbaierolofk: wow, that's quite a bit10:39
hansfbaierolofk: Would be nice if OpenRISC had kind of a 'thumb' mode. That seems to be the real killer feature in the embedded world.10:40
hansfbaiers/the/one/10:41
olofkYes. That together with removal of delay slots and a few other things were proposed for or2k, the OpenRISC 1000 successor10:42
hansfbaierAh10:42
hansfbaierolofk: Risc-V runs on the Zynq 7020 with 85 kCells10:46
hansfbaierBut that thing is EXPENSIVE10:47
hansfbaierIt probably would run on the SocKit too10:47
olofkI got a 7020 on my Parallella10:47
hansfbaierYes that should do10:48
olofkstekern: Regarding moving the port stuff from wb_sdram_ctrl, I would probably want to move the CDC closer to the pads, and do a lot more in the wb_clk domain. Given that it's usually slower than sdram_clk, do you think we'll lose any performance?10:49
stekernhansfbaier: mor1kx does get about the same DMIPS number10:54
stekernolofk: ah, wb as in wishbone?10:57
olofkYes11:05
hansfbaierstekern: Ah thanks11:10
hansfbaiergreat11:10
olofkwallento: I see some action in newlib. Nice!13:22
wallentoyes, I try to wrap stuff up now13:23
wallentoand: I did continuous integration, i.e., automatic builds13:23
wallentohttps://lis.ei.tum.de/jenkins/job/build_newlib_toolchain/13:23
wallentoavailable at: http://lis.ei.tum.de/pub-download/openrisc-builds/unstable/13:23
wallentoand put some info together (for the landing page then): http://wallento.github.io/or1k-newlib/13:24
olofkGreat that you have split out the gdb instructions13:25
wallentoI suppose thats the only thing left in or1k-src second step, correct?13:26
olofkShould be13:26
wallentoI will also put up some other continuous stuff (uclibc, musl, some hardware stuff?)13:27
wallentoif you have anything thats a candidate, I can give you proxy access to our jenkins instance (with a few computers as slaves), I just don't want to expose this to the outside world, therefore the public jenkins just mirrors at the moment13:28
olofkRegression testing mor1kx would be nice for example. Could be done with verilator against or1k-tests. Unfortunately they don't all pass right now13:29
wallentoyes, but I can already add it if you want to, are there any instructions?13:30
olofkNothing automatic right now, but it's basically fusesoc sim --build-only --force --sim=verilator mor1kx-generic && for i in $tests; do fusesoc sim --sim=verilator mor1kx-generic --elf-load=$i; done13:31
wallentookay, nice, I will give it a try13:32
olofkMe tooI just don't want to expose this to the outside13:33
olofk                  world, therefore the public jenkins just mirrors at the moment13:33
olofk13:29 < olofk> Regression testing mor1kx would be nice for example. Could be13:33
olofk               done with verilator against or1k-tests. Unfortunately they don't13:33
olofkSorry. Baby at the keyboard. Left the computer unattended for ~1 minute13:34
stekernshe seems pretty competent at copy pasteing13:41
wallentowould qualify for a PhD already ;)13:42
stekernlol13:42
wallentowhat is the last state of spr-defs.h? there was some discussion around this I remember13:43
wallentodo we have an automatic generated under BSD?13:43
stekerni think pgavin did something, but I don't know if he ever published it13:45
olofkYou can use the shareware version of my C#-implementation of spr-defs14:04
olofkBut seriously. It's just the spec written down as a .h file. Must be regarded extremely trivial14:05
olofkI can see spr-defs.h in or1ksim, or1ktrace, orpmon, orpsocv2, adv_debug_sy, misoc, barebox, newlib and sim14:15
olofkProbably a few more places as well14:15
olofkAh ok. It's called spr_defs.h in linux14:16
wallentommh, is any of those BSD?14:24
wallento;)14:24
wallentoI think its trivial, but it has a GPL header14:24
wallentowith damjan and jeremy14:24
olofkstekern: Am I just making this up, or did you add an option to disable the npc spr to improve timing?18:19
stekernolofk: I did some tests with pulling out npc spr when the debug unit is disabled18:20
stekernso I might have mentioned it18:20
olofkah ok. We still need it for debugging. Then it's not an option anyway18:20
stekernbut it was mostly for area, not speed18:20
stekernare you seeing some critical paths through it?18:21
olofkOne of the most critical paths is from spr_sr[5] to pc_fetch[31], so I took a wild guess18:21
stekernsr != npc18:22
olofkah right.18:23
olofksupervisor register18:23
stekernI have never seen that path18:24
olofkHave you synthesized during full moon?18:24
stekernyou can easily register spr_sr[5], but I bet it's more interesting what's in between18:25
stekernwhich is a lot...18:25
olofkNot very handy with Quartus sta tool yet. How can I get it to show me the whole path?18:25
stekerndepends on the target...18:25
stekernI actually tend to use the quartus gui when looking at critical paths18:26
stekernit'll give you the failing paths and then you can open them in ta18:27
olofkThen I need to constrain harder to make it fail :)18:27
stekernah, ok18:27
stekernwell, where do you see the path?18:28
olofkIn .sta.rpt18:28
olofkSlow 1200mV 85C Model Setup: 'wb_clk'18:28
stekernI'm trying to get vice to start18:31
stekern...without huge success18:35
stekernah, now it starts to work... but it's sloooow ;)18:57
olofkvice?19:35
stekernhttp://1drv.ms/ZDcTDF19:47
olofkhaha19:47
olofkstekern: What's preferred, B3_READ_BURSTING or B3_REGISTERED_FEEDBACK?19:50
stekernthis starts too at least: http://1drv.ms/ZDdntv19:51
stekernB3_REGISTERED_FEEDBACK19:52
stekernB3_READ_BURSTING only makes sense for espresso19:52
olofkcool19:52
olofkLooks like the wishbone clock just needed to be pushed a little. Constraining it to 50MHz gives me Fmax ~70MHz. Setting it to 75MHz gives me 88MHz20:08
olofkInteresting. 100MHz seems to work too20:13
olofk(without DMMU, IMMU and store buffer20:14
olofk)20:14
stekernyou should be able to go to 100 MHz with the store buffer too20:19
olofkThat's good20:20
olofkRunning at 100MHz gives me the benefit of avoiding a CDC to the SDRAM20:20
olofkah wait. I still need a phase shifted sdram clock, right?20:21
olofkOr why would I need that?20:21
stekernnot sure20:21
stekernI just finished broken age20:21
olofkCool. Did you like it?20:22
stekernyeah, when is act 2 coming?20:22
olofkI just got an update yesterday about the next episode20:22
olofkThey are hoping to get it out this year at least. Seemed like they were close to an alpha release20:22
olofkIs mor1kx_rf_cappuccino|ctrl_hazard_a to mor1kx_execute_ctrl_cappuccino|ctrl_alu_result_o[12] a sensible critical path btw?20:23
stekernyes20:23
olofkGood20:24
olofkOh well. Time for some Blackwell legacy and sleep now20:24
olofkBut first reenable the store buffer20:24
olofk98.94MHz. Close but no cigar20:29
olofkctrl_lsu_adr_o[19] to the icache tag ram20:31
olofkOne more try with smaller icache.20:32
stekernI bet that goes through intercon?20:33
stekernblackwell legacy purchased and installed20:34
olofk:)20:35
olofkSmaller cache did it! Now I can sleep20:36
--- Log closed Tue Oct 21 00:00:46 2014

Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!