IRC logs for #openrisc Wednesday, 2018-10-24

--- Log opened Wed Oct 24 00:00:26 2018
-!- flyback is now known as scarface-01:00
-!- scarface- is now known as flyback01:00
alown_caraZipCPU: Thanks for following that up with olofk. Unfortunately the wb_intercon appears to lack full support for tags. (Use of LGPL3 also makes things more complex than (say) MIT/BSD/etc.)06:51
alown_caraZipCPU: Looks like I will get to make my own (buggy) crossbar generation script.06:52
ZipCPUalown_cara: What are you trying to do?06:52
ZipCPUI have my own cross bar generation program.  Might it help you at all?06:52
alown_caraPlausibly, though the thing you can probably have guessed from the license questions is that I am currently wearing a "company" hat.06:53
ZipCPUSure06:53
ZipCPUI have a program I use that I call AutoFPGA06:53
* alown_cara looks on github06:54
ZipCPUCurrently, it builds Wishbone B4/pipeline crossbas06:54
ZipCPUhttps://github.com/ZipCPU/autofpga06:54
ZipCPUThe code it produces may be licensed as you wish.  AutoFPGA asserts no copyright restrictions on that code, similar to GCC06:54
alown_caraInteresting.06:55
ZipCPUI've told myself often that it could be used for non WB/B4/pipeline crossbars, but I have yet to implement that capability into it06:55
alown_caraCan I ask which toolchains you have run this though so far?06:55
ZipCPUI've used the AutoFPGA output on Vivado, Quartus, Yosys, and Verilator toolchains06:56
alown_caraHmm. For Quartus I would struggle to shift the consensus from Qsys... But, I am currently looking for a good interconnect for a Lattice based design.06:57
ZipCPUI have used this with an iCE40 design, https://github.com/ZipCPU/icozip06:58
ZipCPUI've also used it with https://github.com/ZipCPU/tinyzip, just ... the pre-production hardware I have doesn't seem to be supported anymore, so I've never loaded this design and proven that it works06:59
ZipCPUI have used this approach with Qsys as well.06:59
alown_caraMy personal hat is very interested, I need to read a bit more and check it suits the company requirements though.07:00
ZipCPUThe biggest problem you might expect would be WB/B3(classic) vs WB/b4/pipeline07:00
ZipCPUI should also ask ... what "tags" are you hoping to support?07:01
alown_caraThe underlying device hardware somewhat dictates that at least parts must be built to interface classic.07:01
ZipCPUWhat class of underlying hardware are you using?  SDRAM?07:02
alown_caraMach XO3L(F) EFB07:02
ZipCPUNot familiar with that one07:03
* ZipCPU googles07:03
alown_caraA block of silicon embedded in the device for doing SPI/I2C/internal-NVRAM/timer/etc. with an internal wishbone classic slave interface07:04
ZipCPULooking at it now.  Looks like AutoFPGA would do nicely with it07:05
alown_caraTag-wise, I was contemplating whether a new type of addressing-tag was going to be the easiest way to adapt existing IP built with bursting sizes declared during burst setup phase (Avalon-MM) to WB.07:06
ZipCPUAhh ... do you have an Avalon-MM capability with burst mode support?07:07
ZipCPUI built an Avalon->WB bridge, just didn't add burst mode to it07:07
ZipCPUWasn't that hard to do.  Burst mode might be more difficult though.07:07
alown_caraI have a lot of IP that uses bursts (often a different max burst size per component)07:07
alown_caraWB(4) bursts appear to have quite a different nature.07:08
ZipCPUI don't use the burst mode in WB(4), and I don't think my code is any less effective as a result.  The same would not be true of WB307:08
alown_caraTrue.07:08
ZipCPUUsing WB4/p, you can just issue one bus transaction after another to issue the whole burst, without actually issuing a burst07:09
ZipCPUYou get all the performance, but with none of the complexity07:09
alown_cara(Also, the autofpga readme appears to note that "no support is provided for WB B3 [..] yet"07:09
ZipCPUThis is true07:09
ZipCPUThe Mach component appears to be B4 though, doesn't it?07:09
* ZipCPU checks again07:09
alown_caraHowever, lots of underyling things need some set-up time before they are able to service (say) a read burst, providing the burst size info in the first cycle helps a lot with this.07:10
ZipCPUSo, here's how I've gotten around that:07:10
ZipCPU1) I assume any transaction may be either singular, or linear in addressing07:10
ZipCPU2) Any transaction may begin a burst, of an unknown length07:11
alown_caraThe Mach reference guide refers to its WB implementation as "classic".07:11
ZipCPU3) Within a burst, addresses are constant or incrementing07:11
ZipCPUalown_cara: The WB/B4 spec discusses how to bridge from classic to pipeline and back again07:11
alown_caraAlso true.07:11
alown_caraMaybe, I am missing something, but I don't think (1)-(3) helps when there is an extensive latency to performing these operations, but high bandwidth.07:12
ZipCPUI think they  do, but let me try to explain07:12
ZipCPULet me ask one question first: none of these peripherals appear to be high bandwidth peripherals: I2C, SPI, counter, etc07:13
ZipCPUWhy are you interested in high performance bursting?  That doesn't seem to make any sense.07:13
ZipCPUOn top of that, the EFB I/O doesn't support bursting either07:15
alown_caraIt does, but I haven't provided enough context on the constraints, and the supporting systems to explain why.07:15
ZipCPUWell, okay, let me return ton (1)-(3)07:15
alown_caraSure.07:15
ZipCPUIf there is an extensive latency to perform the operations, then the first operation sets up the transfer, and (at least with wb/p) the second one waits at the peripheral (not the master)07:16
ZipCPUAs a result, only the first item suffers from any latency, the rest go immediately to the peripheral when it is ready07:16
ZipCPUI think I've written about this extensively on zipcpu.com07:17
ZipCPUAhh, I have a good slide for you.  Interested in comparing two bus interaction charts?07:17
alown_caraI was just pulling up the spec for the timing diagrams again.07:17
ZipCPUCheck out slide 26 (internal marking) of https://github.com/ZipCPU/zipcpu/blob/master/doc/orconf.pdf07:19
ZipCPUNow compare that with slide 27 (the next one, also based upon internal marking)07:19
ZipCPUThat should show you the performance you can expect when using the pipelined mode07:20
ZipCPUThe problem with WB classic is that the bus master has to wait for the slave to respond before issuing a second request07:20
ZipCPUWB pipeline changes this so that the master only has to wait until the interconnect accepts the request before sending an additional one07:21
alown_caraThis would help in theory but only if wb/p allows this overlap to be extended to N outstanding requests.07:21
ZipCPUIt may be extended arbitrarily07:21
ZipCPUI personally limit the extensions within the code I formally verify, to help the formal verification, but the spec creates no limit on the length of the transaction when done in this fashion07:22
ZipCPUThe longest burst I've done (so far) has been 1024 transactions using this approach.  (That was my DMA engine that I used for that purpose)07:23
ZipCPUCan I interest you in an article setting up formal wishbone properties, and comparing WB to AXI and Avalon?  http://zipcpu.com/zipcpu/2017/11/07/wb-formal.html07:25
* alown_cara is surprised I don't recall this article, as I definitely recall reading others there linked from HN07:26
ZipCPUNot all of my articles have been cross posted to HN07:27
ZipCPUIndeed, I think only about 10 or so have07:27
* ZipCPU goes to count07:27
ZipCPUOk, only 14 have been cross posted to HN07:28
alown_caraSome more of the context: WB bus interaction that occurs over a high latency link (which is also relatively bandwidth starved) operates in a packetized manner.07:28
ZipCPUGo on07:29
alown_caraSo, this extended approach would require the requesting packetizer to accept all the addressing cycles to count the number in, before it could issue the packet to the other side.07:29
alown_cara(Which would need to do the same thing to the responses)07:29
ZipCPUNot sure I followed.  Can you explain?07:29
ZipCPUAhh, nvm07:30
ZipCPUGot it07:30
ZipCPUGo on07:30
alown_caraAs such, a tag indicating the number of cycles in the burst that is present during the first cycle, would allow this packet to be issued on the first cycle07:30
alown_caraallowing a certain amount of overlap07:30
ZipCPUIf the link is bandwidth starved, then the only benefit you would get would be from reading, right?07:30
alown_carayep.07:30
ZipCPUHow bandwidth starved?  Are you coming from a serial port perhaps?07:31
alown_caraNo, it is a popular packet-based protocol, just that most of the bandwidth is reserved for other purposes.07:31
ZipCPUNetwork packet?07:31
ZipCPUIt sounds like what you need/want is just an Avalon-MM -> WB/classic bridge, right?  Do you have other peripherals you need to access as well while you are at it?07:32
alown_caraOf course.07:33
ZipCPUHeheh07:33
alown_cara(to the later part)07:33
ZipCPUDo you have a strong need to reconfigure often?  In other words, would it make more sense to build the interconnect by hand?07:33
alown_caraAVMM is a huge spec though, so an actually fully compliant bridge would be quite a project of itself.07:33
ZipCPUI have an AVMM->WB(B4/p) bridge, but I understand what you mean--it's not "fully compliant".  However, it has been good enough for me.07:34
alown_caraI was hoping to avoid needing to do the hand-crafted bit during the initial R&D, but given the device is also rapidly running out of available resources due to being a bit small...07:34
ZipCPUYes, there is that.  iCE40 hx8k?07:34
alown_cara(The big two vendors can't even implement AVMM<->AXI3/4 fully, so...)07:34
ZipCPUwb_interconn isn't known for a low-resource connection IIRC07:35
alown_caraI'm not using wb_interconn.07:35
ZipCPUAutoFPGA will do low resource decoding, but it has some other difficulties you've just mentioned07:35
ZipCPU(I know, but you were considering it)07:35
alown_caraTrue.07:35
ZipCPUThat suggests something handcrafted might be ideal07:36
ZipCPUI do have a blog article discussing a hand crafted interconnect07:36
* alown_cara is wondering if he could get approval to extend autofpga as necessary and upstream generic stuff.07:36
ZipCPUIt's not really that hard, but it does get *REALLY* annoying when you start to need to make changes07:36
alown_caraGiven the staticness of the requirements on this project, I would be encountering that a lot.07:36
ZipCPUI do have some commercial work I'll be needing to do soon as well.  My goal with that work would be to support a full AXI4->WBp bridge, including all of AXI4's burst modes as well.07:38
alown_caraThe optimal resource-wise result would probably be some weird mixture of shared-bus and crossbar for different master<->slave combinations.07:38
ZipCPUSure07:39
alown_caraLast task I did, I started with AVMM, migrated bits to AXI4, then re-migrated bits to AXI3 for transaction locking support.07:39
ZipCPUDo you use AV a lot?07:39
alown_caraAs a (mostly) Intel/Altera shop, the answer would be yes (for better and worse).07:40
* ZipCPU just might have a set of formal properties for AVMM -- they just don't support burst mode (yet)07:40
* alown_cara simply has a great time writing simulation code to test the obvious bits, then makes it softwares problem to find the bugs :p07:41
ZipCPUMy problem is that one mistake can lock up the hardware hard.  You can read about my "one mistake" here if you are interested.  http://zipcpu.com/blog/2018/02/09/first-cyclonev.html07:42
alown_caraHeheh, when you say the ARM was issuing these out-of-order, I presume you mean that whatever system was in charge of maintaining coherency loading in to cache was unaware of your target locations had strict requirements?07:48
* alown_cara is intrigued by the comments on use of formal verifications, as has been leaning strongly into SVA testing approaches at the moment07:48
ZipCPUPretty much07:48
ZipCPUThe FIFO required items to be read in order, and it ignored the address07:49
ZipCPUThe ARM tried to load addresses starting on 8-word boundaries, then came back and filled in the gaps07:50
alown_caraMy ARM is a little rusty, but I thought most arm-based SoCs had to expose special ports if they are meant to be coherent, as the exact details of the various levels of cache (if they exist) are left to the SoC implementor.07:51
alown_cara(thinking about what happens in Zynq and on Tegra chips)07:51
ZipCPUMy knowledge of internal ARM details is essentially non-existent---other than the scars from the fails I've suffered through.  :D07:52
alown_caraAnyway, I should go and do some other bugfixing for now, and will have a play with autofpga later today.07:55
alown_caraThanks for all the help.07:56
ZipCPUFeel free to write as you have the need07:56
ZipCPUMy pleasure!07:56
alown_caraSure. I will idle around here for a bit then. (I might make my personal hat join in too).07:57
alown_caraZipCPU: I have had a bit of time to look over autofpga, and whilst I think it would definitely work, it doesn't quite seem right (it solves a slightly different problem).13:03
alown_caraZipCPU: All I was really looking to see if it already existed, was a tool to take a description of master/slave ports and build a piece of WB interconnect to join it together, autofpga focuses on the higher level problem of building and maintaining the whole system.13:04
ZipCPUI'm not sure I'd draw the same conclusion13:05
ZipCPUWhile AutoFPGA has the capability to build much more than just the interconnect, its primarily a copy/paste program.  If you don't give it more information, it won't build the other parts for you.13:05
ZipCPUHence, you get what you put into it.13:06
alown_caraFair enough. I haven't particularly tried to use it to achieve anything yet, just going by what it seemed to be.13:06
ZipCPUIf you just want an interconnect, just grab the main.v output and you will be there.13:06
ZipCPUThat's what I essentially did when working with Qsys13:06
alown_caraHmm. Would I correct to say that of the sample component files, the rtcdate.txt is the simplest wishbone slave component that pulls in a module? (rather than providing data implicitly as the pwrcount.txt seems to?)13:12
ZipCPURTCDATE is pretty simple, yes13:15
ZipCPUThere are actually four different types of module incorporation: SINGLE (where the result of any read is already known on the clock of the read itself), DOUBLE (where it takes a clock to get to the result),13:16
alown_caraYeah, I was about to follow up with a question about these distinctions, having seen icd.txt13:16
ZipCPUOTHER (where the read/write may take some multiple number of clocks to complete), and MEMORY (similar to other, but impacts the linker script)13:16
alown_caraDoes OTHER imply that autofpga will not attempt to do anything intelligent to it, and simply executes the @X.INSERTs?13:17
ZipCPUIn all cases, the X.INSERT's will be applied13:17
ZipCPUThe "intelligent" stuff has to do with how the wires are then created to the interconnect13:18
ZipCPUs/created/created and connected/13:18
alown_caraala businfo.cpp's create_sio/create_dio?13:19
ZipCPUThose would be two of the pieces13:21
ZipCPUcreate_sio creates the connections for the SINGLE's, and create_dio for the DOUBLE's13:21
ZipCPUCheck out the "writeout_bus_logic" function in businfo.cpp, if you want to look into where this connection takes place.13:22
alown_caraThat explains how that bit ties together.13:24
ZipCPUMy plan to support additional bus types was to create a new bus class for each type, and have that new class include the function necessary to the task--similar to writeout_bus_logic for WB/B4/p13:25
alown_caraSorry, got pulled in to another discussion.13:39
alown_caraI am intrigued by what benefits from a DOUBLE, given it can't stall?13:40
alown_caraIs this just as a timing improved version of SINGLE (add an extra register stage)?13:41
ZipCPUThe DOUBLE and SINGLE peripherals allow me to simplify the result gathering process.13:43
ZipCPUNot only can they not stall, but they also have very specific acknowledgement cycles.13:43
ZipCPUThis allows the return logic to be simplified--I no longer need to check for an acknowledgement for example, since I know exactly when I will see it.13:44
alown_caraAh, so that is the distinction with OTHER, which forces you to wait for acks as relevant?13:44
ZipCPUYes, exactly!13:45
alown_caraHmm. Maybe I should try and build the system I have in mind with this and see how far I can get...13:45
ZipCPUI'd be glad to support you from here.13:46
alown_caraOut of interest: how is the buserr.txt component being used?13:46
alown_cara(It looks like AXI?)13:46
ZipCPUIt's just a peripheral used to return the address of the last bus error13:46
ZipCPUIt shouldn't look like AXI ...13:46
alown_caraI read "AWID" and thought write id.13:46
ZipCPUI use it within the ZipCPU so that I can tell, after a bus error, what the cause of the error was.13:47
ZipCPUAhhh ... I think that was short for "Address WIDth"13:47
alown_caraDoes the presence of biarbiter.txt mean that each bus is inherently single-master?13:49
alown_cara(Or did you just want extra control over that particular bus<->bus trannsfer?)13:49
ZipCPUCorrect.  Each bus has a single master, but the bi-arbiter can be used to create arbitrary interconnect topologies.13:50
ZipCPUThe biarbiter is a slave to two busses, and a master of another.13:50
alown_cara(In this case "zip" and "wbu" -> "dwb")13:50
* alown_cara wonders if it could emit a dot graph of the resulting generated bus topology13:51
ZipCPUYes, and then dwb goes through a delay to become wb13:51
ZipCPUI'd like to, but can't (yet).  Even better, I'd love to be able to edit that dot graph to create the desired bus topology.13:52
ZipCPUI'm just not there yet.13:52
ZipCPUI need to step away for lunch.13:52
ZipCPUI'll be back later13:52
alown_caraok. Thanks. I will see you tomorrow then.13:54
-!- Netsplit *.net <-> *.split quits: shorne, flyback, alown_cara, M6HZ14:49
-!- Netsplit over, joins: M6HZ14:50
--- Log closed Thu Oct 25 00:00:28 2018

Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!