IRC logs for #openrisc Saturday, 2016-07-16

--- Log opened Sat Jul 16 00:00:41 2016
kc5tjaZipCPU: I think I fixed the ISE problem.  Looks like the distribution was missing a libQt_Network.so dependency.00:29
kc5tjaCrap!  It still doesn't launch from the GUI though.  :(00:31
kc5tjaThis is really quite frustrating.00:32
kc5tjaAH HA!!  I had to source a shell script before launching the ISE editor.  Would have been nice if Xilinx had told me this before!!00:36
ZipCPU|Laptopkc5jta: Okay, that one I could've told you.  I've got a script I use to start up ISE and that's most (all) of the script: run the shell script, then ISE.08:13
ZipCPUOkay, so this is crazy--I've debugged computers for more than 30 years, and I've never seen this bug pattern before:08:58
ZipCPU1) Sometimes the function prolog allocates space on the stack, sometimes it doesn't.08:58
ZipCPU2) A small irrelevant change to the program can keep this from happening.  (A heisenbug!)08:59
ZipCPU3) If you load a program filled with nothing but NOOP's, the problem is guaranteed. (as long as you don't make the irrelevant change ...)08:59
ZipCPUAt this point, I think I have a cache bug--where the NOOP (the last program) is getting run rather than the new one.08:59
ZipCPUI've just never seen this pattern before.08:59
LaksenAnyone here know of any nice pretty generic instruction fetchers written in verilog?16:12
ZipCPU|LaptopLaksen: I just finished a lot of work on a fairly generic instruction fetcher, written in Verilog.16:37
ZipCPU|LaptopI'm not sure if it is "nice" and "pretty" enough for you, but it is (currently) fully functional.16:38
ZipCPU|LaptopBarring the last few modifications, 1) it can run with a 200MHz clock on an Artix-7, 2) it combines the cache with the instruction fetch, and so 3) it can support early branching with only a single stall cycle when jumping to somewhere in the cache.16:40
LaksenFunctional is a lot better than not functional :)16:40
LaksenCan I have a look?16:40
ZipCPU|LaptopYou can find it in the xula25soc project on open cores, I just checked my work in.  The particular fetch you are looking for can be found in trunk/rtl/pfcache.v16:40
ZipCPU|LaptopOops ... better make that trunk/rtl/cpu/pfcache.v.16:41
ZipCPU|LaptopThere's a less traditional pre-fetch cache in there as well, representing my first attempt at building such.  That one is called 'pipefetch.v'.16:41
ZipCPU|LaptopPipefetch works by trying to maintain a window in memory around the program counter.  Jumps outside the window reset the window starting from the new location.16:42
ZipCPU|LaptopNeedless to say, I abandoned pipefetch for the better performance of pfcache, still ... it's a unique approach.16:43
LaksenI'll have a look. I was thinking of a similar approach, but I just want to get the stuff running for now :)16:44
ZipCPU|LaptopAre you using a wishbone bus?16:45
LaksenNo, AXI16:45
ZipCPU|LaptopWell that will be one difference.16:45
ZipCPU|LaptopAnother may have to do with instruction width.  This prefetch cache was designed for 32-bit instructions.16:45
LaksenA small adapter should be fine. I dont' care too much about the latency16:46
LaksenIt's for a bog standard risc-v so 32bit is perfect16:46
ZipCPU|LaptopYou ... <GASP> ... don't care about <gulp> latency?  ;)  This whole approach was built to cut my latency down.  <Grin>16:46
LaksenAt some point it becomes a concern, but this is just a fun vacation project to investigate extreme pipelining :P16:47
ZipCPU|LaptopReally?  Sounds cool!  ... how extreme are we talking about?16:48
LaksenGot my ALU ready which can almost run at >500 MHz on a Artix 716:48
Laksen64 bit16:48
ZipCPU|LaptopGosh, it took me a bit to get my 32-bit ALU able to run at 200MHz on an Artix-7--and you are headed for 500MHz??16:49
LaksenIt synthesizes at 480 MHz where all the IO are tied directly to IOB's (giving an extra 0.8 ns delay)16:50
Laksen8 pipeline cycles though... so bad code will not be fast at all16:50
Laksen200 MHz, is that a single cycle pipeline?16:51
ZipCPU|LaptopLaksen: Sorry to run off so quickly and unannounced--the dogs blessed the floor, and the basement staircase started flooding, and ...17:17
ZipCPU|LaptopLife is now good again.17:17
ZipCPU|Laptop200MHz is not a single cycle  pipeline.  200MHz was going to be a 9-stage pipeline.  How you get up from that speed to 400+MHz I don't know.17:18
ZipCPU|LaptopThis is my first attempt at a "high speed" FPGA design, so ... I'm learning a lot in the process about what high speed requires.17:18
LaksenAh okay17:19
LaksenThe ALU alone in my design is 8 stages. So in the end it'll probably be 8+fetch+decode+opfetch+mem(n)17:20
ZipCPU|LaptopOkay, so I'm two stages for the ALU, unless the instruction requires a multiply--that will take longer.17:21
LaksenEach ALU stage does an 9 bit add, and single shift. Besides that I've spread out all the different logic operations over the different alu stages17:21
ZipCPU|LaptopHow are you handling pipeline conflict detection?17:21
ZipCPU|LaptopSorry, "pipeline hazard" detection--just remembered the proper term.17:21
LaksenI keep a tally by orring onehots of all output registers in flight. Any that conflict will stall the pipeline. So nothing fancy17:22
LaksenSimple forwarind for the end of the alu stage17:22
ZipCPU|LaptopWhat if two instructions both use the same register as an output, but no inputs use that register?17:23
LaksenNo problem in that case17:23
LaksenOh wait. That's actually a problem I don't handle17:23
LaksenThanks for asking :P17:23
ZipCPU|LaptopSure!  That's one of the approaches I have been considering, and the problem I mentioned is one I'm ... struggling with.17:25
LaksenI've been dreaming many years of solving this problem programmatically17:26
ZipCPU|LaptopYou mean in software??  As in, in the compiler?17:26
LaksenDoing dynamic compilation of a binary into RTL, specifically for processing pipelines17:27
ZipCPU|LaptopBy "dynamic compilation", are you referring to instruction reordering inside the CPU?17:28
LaksenBasically write a program in a highlevel language that describes all paths through a CPU, and then execute that program symbolically17:28
LaksenWhere you create a bunch of mappings between registers and IO, memory and register ports17:28
ZipCPU|LaptopI'm not sure I follow ...17:29
ZipCPU|LaptopIs there a paper describing your approach?17:29
LaksenLet me find an example17:29
LaksenNo17:29
LaksenIt's a novel methodology but I worked with this a lot on my master thesis, just in the wrong direction :)17:30
ZipCPU|LaptopAre you working from within Academia?17:31
LaksenNot any longer17:31
LaksenThis is just sparetime work :)17:31
LaksenHere's an example: http://pastebin.com/4mg5jBRt17:31
LaksenIt might help the understanding that this is a basic RISC-V emulator17:32
LaksenThe language it's written in doesn't matter. In fact this is written for a pascal compiler that compiles to Risc-V17:33
LaksenBut that doesn't matter17:33
LaksenAll that matters is that it's symbolically executed17:33
LaksenThe code in the bottom is the initialization. It starts up a clocked task that's assumed to run once per clock17:33
LaksenAnd finish at some point17:33
LaksenMemories(2D) and registers(1D) are created before that17:34
LaksenMemories and registers can be accessed by reads or writes17:34
LaksenAt a low level in the symbolic execution those are performed by system calls, so they are easy to figure out17:35
LaksenConditional branches are used to propagate information about when those are performed17:36
ZipCPU|LaptopOkay, so ... if this is a basic emulator, ... why would you need a Verilog prefetch?17:36
ZipCPU|Laptop(Just curious ...)17:36
LaksenSo for example register storages have an attached condition based on the path through the program that store took.17:36
LaksenOhh. This is an entirely different project :P17:36
LaksenSorry, just spilling my brain here :P17:37
ZipCPU|LaptopOh ... Ok.  You had me confused.17:37
ZipCPU|LaptopSomething about a "RISC-V emulator" and "> 400 MHz" just ... didn't quite add up.  ;)17:38
LaksenWell I get too enthusiatic about dynamic recompilation and automatic pipeline construction  somtimes :|17:38
LaksenBut the pipeline is real though, very simple :) http://pastebin.com/6K8761tu17:39
LaksenDon't know yet what the registerfile accesses will be, but I think it can run far above 500 MHz if those don't slow it down17:40
ZipCPU|LaptopOn an FPGA, or in dedicated (ASIC) hardware?17:41
LaksenAiming for Artix 717:41
ZipCPU|LaptopWill you publish your results anywhere?17:41
kc5tjaMeanwhile, I'm having an impossible condition: a boolean expression where all inputs are well defined, yet Verilog insists the result is 'x'.  >:(17:42
LaksenSure17:42
ZipCPU|LaptopI'd love to read about it.17:42
ZipCPU|LaptopHello, kc5tja, welcome back.17:42
ZipCPU|Laptopkc5tja: Have you tried running your code through Verilator?17:42
LaksenOr XST. IVerilog and Yosys both accepted my old code, but the xilinx synthesizer threw a synthesis time error17:43
kc5tjaNo, largely because Verilator confuses me to no end.17:43
ZipCPU|LaptopTo Verilate, just do "verilator -cc toplevelverilog.v".17:44
ZipCPU|LaptopI'm not going to recommended necessarily going farther than that, but Verilator does include  some tremendous code checking capabilities, that have found bugs ISE and Vivado have let slip.17:45
LaksenZipCPU|Laptop, by the way, which WB interface is your pfcache using?17:45
LaksenB3/B4 pipeline/no pipeline?17:46
ZipCPU|LaptopB4, pipelined.17:46
ZipCPU|LaptopYou gotta do pipelined--that way you get one access per clock.  Otherwise, you've crippled your bus.17:46
ZipCPU|LaptopJust ... let the user beware ... you can't cross devices.17:46
LaksenI agree, but I got to say I like the crispiness of AXI a lot more17:47
LaksenThere are too many loose ends in Wishbone :/17:48
ZipCPU|LaptopI haven't used AXI that much.  How is it better (worse)?17:48
LaksenIn AXI it's always pipelined17:48
LaksenThe transactions are so easy to understand, because it's all built on handshaking on 5 channels17:48
LaksenBursts are optional, but are handled precisely the same. Transactions are layered on top17:49
kc5tjaVerilator won't even compile my code; I'm apparently much too modern for it at Verilog 1995.17:49
ZipCPU|Laptopkc5tja: Not likely.  You might wish to take a closer look at what it complains about.17:49
LaksenCan you pastebin the problematic code?17:49
ZipCPU|LaptopI'd love to take a look myself.17:50
kc5tjaIt tells me quite explicitly that Verilog 1995 keyword is not supported.  :)17:54
kc5tjaIn this case, wait().17:54
kc5tjahttps://gist.github.com/sam-falvo/71139ddfc4e9b80c47e3fcce18e1f50017:56
LaksenWhy not just do a @(posedge clk_o); @(negedge clo_o);17:58
LaksenNever heard about the wait keyword before17:58
ZipCPU|LaptopIs it synthesizable Verilog?17:59
LaksenNo17:59
LaksenOr maybe the problem is that x is non-zero17:59
kc5tjaNow Verilator tells me unexpected @.17:59
LaksenSo the condition will always be true after startup17:59
kc5tjax means 'undefined' or 'unknown.'18:00
Laksen@(posedge clk_o); should be a perfectly valid statement18:00
kc5tjaWhich is hogwash, since *all* of the term's inputs are well defined.18:00
kc5tjaNope.  Verilator doesn't like it.18:00
kc5tjaNo change in behavior in iverilog.18:01
ZipCPU|Laptop"always @(posedge clk_o) story_o <= story;" is what you want.18:01
LaksenNot really18:01
ZipCPU|LaptopNo?18:01
LaksenIt should work just fine as is18:01
LaksenI use that stuff all the time18:01
kc5tjaI was hoping to avoid this, but I think I need to throw this into Xilinx ISE to see what it thinks, and let me run a simulation there.18:02
LaksenAh18:03
LaksenYou have a bunch of errors on line 55-6018:03
LaksenIverilog complains about those18:04
ZipCPU|LaptopSome parentheses would fix those easily.18:04
kc5tjaMy version of iverilog does not.18:04
LaksenNo18:04
Laksenstate_o doesn't exist in the file18:04
LaksenImplicit declaration18:05
kc5tjaWhat options do you provide to make iverilog detect these errors?  Mine literally is silent about them.18:05
LaksenI use a compiled version from the source repository18:05
kc5tjaI'm at 0.9.718:06
LaksenI'm at 11.0 (devel)18:06
LaksenI can't remember why I needed the upgrade, but it's way better18:07
LaksenSupports Verilog 2012 even18:07
kc5tjaThank you!18:07
LaksenOh right. It was because it had support for the $fatal function18:07
kc5tjaI passed (on a whim) -Wall and it found the defect.18:07
LaksenVery nice for makefile testbenches :)18:07
kc5tjaOK, I got basic instruction fetching implemented.20:39
kc5tjaNext step, illegal instruction trap.20:39
kc5tjaTook longer than I expected; but, it at least is working and my basic design is known to not be fantasy.20:40
kc5tjaThat was easier than I'd ever expected.21:12
kc5tjaWell, that's quite frustrating.21:55
kc5tjaiverilog needs qualification for a module's ports (e.g., input foo; wire foo;), while Xilinx will treat this as an error.21:55
ZipCPUolofk: If you are interested in a TCP version of a simulated UART, my code is posted in OpenCores, xula25soc, trunk/bench/cpp.  You'll want the two files, uartsim.cpp and uartsim.h.22:45
ZipCPUThey'll take as inputs the UART transmit from the FPGA, and send the results to a TCP port (if anyone's connected to it).  Characters sent on that port to the simulator will be turned into UART wires on the receive, and so it works.22:46
ZipCPUThe only minor difficulty might be the form of the setup word--telling it the baud rate, number of bits per symbol, parity information, etc.22:46
--- Log closed Sun Jul 17 00:00:42 2016

Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!