IRC logs for #openrisc Sunday, 2012-04-22

jonibo|laptophi stekern11:44
stekernhello11:44
jonibo|laptopfigured it might be as easy to discuss here as by mail11:44
jonibo|laptopthe cache invalidate register, as far as I can tell, only works at startup today because it's a 1-way cache11:45
stekernnah, it works because it's not spec-complient11:45
jonibo|laptophow so?11:46
stekernthe spec says that _only_ the address that is written to CBIR should be invalidated, but or1200 wipse any tag matching11:47
jonibo|laptopok, that's broken11:47
jonibo|laptopbut it's a 1-way cache11:47
jonibo|laptopso it's the same thing11:47
jonibo|laptopit should match on the tag, find the way it corresponds to, and invalidate that cache-way only11:48
stekernis it? I don't think so11:48
jonibo|laptopbut for a 1-way cache, _any_ tag will be the only way11:48
jonibo|laptopor maybe not...???  let me think11:49
jonibo|laptopactually,, you're right... it's _really_ broken11:50
stekernI don't think 1-way or multi-way makes any different in this case11:50
jonibo|laptopno, it should not invalidate the line if the tag doesn' tmatch11:50
jonibo|laptopso, you're right11:50
jonibo|laptopanyway, it should still be invalidated at reset, automatically11:50
jonibo|laptopotherwise we need to loop over all the entire physical mmeory space11:50
stekernbut the problem is, to do it according to the spec, you'd have to loop through the whole memory to invalidate the whole cache11:51
jonibo|laptopyeah, that's what you want to avoid doing11:51
jonibo|laptopare you _for_ the invalidate at reset?  i interpreted your mail as you being against it11:52
jonibo|laptopautomatic, I mean11:52
stekernso a "invalidate entire cache" command would be needed for that11:52
jonibo|laptopnah, that 's not needed... you don't want to both software with this at all11:52
jonibo|laptopit should just be done automatically at reset11:53
jonibo|laptopwho wants a cache with unknown state at startup anyway?11:53
jonibo|laptopyou want the whole thing invalid11:53
stekernI'm for it in theory, but it'll bloat the hardware11:53
jonibo|laptopyeah, I guess so11:54
jonibo|laptoptoday we use the fact that the or1200 is broken w.r.t. BIR to get a quick invalidate at startup11:55
jonibo|laptopbut it will be brutal if we need to do every tag individually11:56
stekernhow does other architectures handle the dilemma?11:56
jonibo|laptopnot sure... i'm pretty sure most caches come up invalidated at reset, though11:57
stekernI know lm32 invalidates on startup, and it doesn't have a fine-grained invalidate flush. If you invalidate/flush, the whole cache goes11:57
jonibo|laptopthat's no good11:58
stekernno11:58
jonibo|laptopfor DMA you want to be able to flush just the line you've modified11:58
stekernbut I wonder, is it so bad that you might get some collateral damage when you invalidate? (i.e, the way or1200 works)11:59
jonibo|laptopnot sure... i've got no numbers11:59
jonibo|laptopbut maybe12:00
jonibo|laptopit's not a performance win in any case12:00
jonibo|laptopfor a multi-way cache, it's worse12:00
jonibo|laptopanyway, i've got to run... i'm for hardware invalidate and spec-compliant BIR12:03
jonibo|laptopi'll leave it at that12:03
jonibo|laptopeven if the hardware invalidate is implemented as an instruction loop in ROM that just iterates over all memory and invalidates the cache for it12:04
jonibo|laptop(the entire memory space, that would be... :) )12:04
juliusb_so this whole thing is solved by my proposal here: http://opencores.org/or1k/Architecture_Specification#Cache_Block_Invalidate_Behaviour_Clarification12:57
juliusb_basically we should just observe the set number of the address written to the BIR12:57
juliusb_and then invalidate that block12:57
juliusb_all done, simple, compatible with what we have now, and makes it simple for software to loop through and invalidate each set12:58
juliusb_regarding multiway - I've been asking this question for over a year now ;)12:58
juliusb_i say just invalidate all ways, simplest thing12:58
juliusb_as there's no way-specific block invalidate reg12:58
juliusb_but.... the way you guys just discussed is more sensible - only invalidate if the address is in cache12:59
stekernyes, and jonibo has a point, you'll have the performance penalty13:00
juliusb_but my way is less of a burden on hardware implementation (i'm not a fan of putting in heaps of logic just for single use at reset)13:00
juliusb_performance penalty?13:00
juliusb_regarding all of these changes to OR1K - I'm more inclined to go with something which, despite maybe not being 100% the best approach, gives us maximum backward compatibility with minimum amount of work to adapt existing software and models to the new spec13:02
stekernyes, since you've got the collateral damage of addresses you did not mean to invalidate13:02
juliusb_so, in this case, defining the behaviour of the cache BIR means we don't have any change anything in OR1200 or software13:02
juliusb_(but we're clarifying the behaviour for future developers and users)13:03
stekernI wonder how much logic is actually needed to do the invalidate on reset, should only be a counter basically (and some control logic for the state in the fsm)13:03
stekernis it only during reset that you'd want to invalidate the whole cache?13:05
juliusb_You'd also want to have it capable of being run by poking SPR bits too13:13
juliusb_but really, I'm against putting in this sort of stuff - I say, for the simplicity of the implementation, we should leave this to software13:14
juliusb_we're going to need it for all the memories, and that'll add up13:14
juliusb_overall transistor count, though, to do the invalidation - it's something I'd like to know which is smaller to do - the 8 instructions it takes in software to do it (8*32-bits = 256 FFs, essentially) or the hardware (probably a counter as wide as the number of addresses we need to clear and some muxing)13:16
juliusb_clearly the reset-by-FSM thing will be more power efficient and be quicker, but i'm still not sold on moving chunks of on-shot initialisation stuff into HW13:17
juliusb_one-shot13:17
stekernI agree on that, on many FPGA targets you could parameterize it away though13:21
stekern(if it is really only needed on reset)13:22
jonibo|laptopjuliusb_: I don't care much for your BIR clarification... I  can accept that the or1200 is broken and does it that way, but let's not generalize that error15:26
jonibo|laptopthe reset case is special, let's ignore that for now15:26
jonibo|laptopbut at runtime you want a sane invalidation behaviour15:26
jonibo|laptopif the line's not in cache, it's a no-op15:26
jonibo|laptopand for multi-way, it's inelegant to trample over cache lines that may be in use by other processes15:27
jonibo|laptopit's a conundrum, I know, and the or1200 implementation is fine as long as it's documented... but for the next generation we might be able tocome up with something more elegant... just not sure what that should be at this point15:27
juliusb_well, i say my desribed behaviour should be fine - it's more of a hardware-centric view, I'll accept that (basically use that BIR as a line invalidate interface)16:14
jonibo|laptopit's fine for the reset case16:15
jonibo|laptopit's less nice for regualar operation16:15
juliusb_so remove any idea of it being "intelligent" i guess16:15
jonibo|laptoplike  I said, the or1200 does it that way... that's an implementation detail16:16
jonibo|laptopthat's fine16:16
jonibo|laptopi don't like the idea of generalizing it, though16:16
juliusb_yes, but it was done that way for a reason16:16
jonibo|laptopi understand that... it's less than optimal16:17
juliusb_and that reason is to avoid reset logic, and probably to get around a sloppily defined cache system16:17
juliusb_or rather, work with a sloppily defined cache system16:17
jonibo|laptopit's not that sloppily defined...16:17
jonibo|laptopin fact, it's pretty well-defined in the spec16:18
jonibo|laptopthe only problem is the reset case16:18
stekerni was just about to say that16:18
jonibo|laptopas it stands now SW is required to do _long_ loop to invalidate... that's fine per se16:18
jonibo|laptopit just makes for a long startup time, but it's a one-time cost16:18
jonibo|laptopand the or1200 solves that for the time being with a "less than optimal" solution... but it works16:19
jonibo|laptopbut I don't care to see that encoded in the spec16:19
jonibo|laptopbecause I hope that someday somebody will come along and implement this properly... and then the spec shouldn't stand in their way16:19
juliusb_hmm, no, there's issues with what happens when the EA written into BIR isn't in the cache (it's not clear in the spec)16:19
jonibo|laptopwhat?  it's a no-op16:20
juliusb_spec says that EA is "EA that targets byte inside cache block16:20
jonibo|laptopisn't that obvious16:20
juliusb_no it isn't16:20
juliusb_because that just says targets byte inside cache block16:20
jonibo|laptopok... it seems obvious to me16:20
juliusb_that says nothing about matching EA to the tag address and invalidating only in that case16:20
juliusb_that definition says to me it does an address mapping of the EA to the appropriate bytes in the cache16:20
jonibo|laptopyeah, but think about it... what's the point of an invalidate?  either then line is in cache and you want a fetch next time it's accessed, or it's not in cache in which case you get that anyway16:21
juliusb_yes, true, but you stil might have a case where you want to entirely clear the cache for a context switch or something16:21
juliusb_which is basically the reset cache16:21
jonibo|laptopno, never...16:22
jonibo|laptopthe cache is physically tagged16:22
jonibo|laptopyou never clear the cache16:22
jonibo|laptopthe MMU makes sure that processes can't access others cached data16:22
juliusb_OK16:22
juliusb_yes of course, sounds good16:22
juliusb_so, as always, basically I'm arguging for the thing which is the simplest to implement in HW :)16:23
jonibo|laptopi know exactly where you're coming from though... I had this conversation with myself last year!16:23
juliusb_current system is16:23
jonibo|laptopyeah, and I'm arguing for a "correct" spec and "cutting corners in implementations is fine as long as you respect the spec"16:23
jonibo|laptop...which is what we have with the or120016:23
jonibo|laptopalmost16:23
juliusb_so i'm arguging to adapt the spec to do what OR1200 does now. To do it the way it should be done would require 1) some reset logic or some clear-all-cache-block-tags feature, and 2) something to read and compare the block tag when BIR is written to, to determine if it should be done or not16:23
jonibo|laptopactually, not "almost"... it does respect the spec16:24
jonibo|laptopyeah, if it were optimal, you'd have that16:24
juliusb_and then there's the issue of multi-way16:24
jonibo|laptopbut the or1200 cuts corners on 2) and invalidates everytime... that's fine, it's just less than optimal16:24
juliusb_which I've, again, gone with the simplest, quickest dirties way of handling it16:24
juliusb_:)16:24
jonibo|laptopand your case 1) is a sw problem16:25
jonibo|laptopwe don't even have multi-way16:25
jonibo|laptop...in implementaiton, I mean16:25
juliusb_I think stekern has it working somewhere16:25
jonibo|laptopok16:25
juliusb_i'm quite close to being able to publish my new CPU - got word that a release has been drafted and i've just got to go through the process of showing it to people that can OK it16:26
juliusb_so i'd hope within a week or two16:26
juliusb_:)16:26
jonibo|laptopyay!16:26
juliusb_... but that's an aside, but I think stekern was playing with multi-way cache in that16:26
juliusb_i really need to run16:26
stekernyes, that's correct, 2-way is (optionally) available there16:27
juliusb_:)16:27
jonibo|laptopok... hope you see my point, though16:27
jonibo|laptopit's an issue for the spec16:27
juliusb_but, quick and dirty multi-way invalidate works well, too, and I imagine would be very simple to implement16:27
jonibo|laptopit's _not_ an issue for the spec :)16:28
jonibo|laptopit's for the implementation documentation16:28
juliusb_well, I want the implementations and the spec to be in harmony16:28
jonibo|laptopno...16:28
juliusb_so we have to change one or the other16:28
jonibo|laptopno, the implementation is just sub-optimal... but it's still correct16:29
jonibo|laptopi don't see an issue here16:29
juliusb_no but it's not clear from spec how you clear it at reset16:29
stekernWhat does the "Missing cache block in the local processor does not cause any action" mean?16:29
juliusb_or it's not clear what happens for multi-way16:29
jonibo|laptopon the or1200 we can cheat because the BIR isn't so clever... on another implementation we can't cheat16:30
juliusb_stekern: mmm, yes, perhaps that's the sentence saying it should not do anything if the line isn't there16:30
jonibo|laptopstekern: it means "no-op"16:30
juliusb_mmm ok16:30
juliusb_I'm wrong!16:30
juliusb_:)16:30
juliusb_in that case the OR1200 is wrong16:30
jonibo|laptopnot "wrong", "just-enlightened" :)16:30
juliusb_because it doesn't do nothing, it does invalidate the block16:30
juliusb_so in this case, one or the other must change16:31
juliusb_and im really late!!16:31
juliusb_bbl16:31
jonibo|laptopok, but I don't agree it has to change16:31
juliusb_(cache invalidate discussions are surprisingly exciting)16:31
jonibo|laptopit's a nice "cheat_16:31
jonibo|laptop:)16:31
stekernwell, tbh, my cache implementation cheats too ;)16:31
jonibo|laptopit just causes a performance degradation16:31
jonibo|laptopcheats are fine as long as they are fundamentally correct...16:31
jonibo|laptoppoor performance is another issue altogether16:32
jonibo|laptopstekern: how does your implementaion cheat?16:32
stekernit does the invalidation the or1200 way16:32
jonibo|laptopright... which is a performance hit, but nothing else16:33
stekernyes, it's not gonna break software that expects correct behaviour16:33
jonibo|laptophow?16:33
jonibo|laptopI don't see why that would _break_ anything16:33
stekernthat's why it's an "OK" cheat16:33
jonibo|laptopoh sorry, misread you16:33
stekern:)16:33
jonibo|laptopyeah, I agree16:33
jonibo|laptopexactly, it's an implementation detail... those are fine... these  get documented in the release notes and then you're done with it16:34
jonibo|laptopbut let's not update the arch spec to conform to the implementation just because somebody decide to cut that particular corner16:35
jonibo|laptopas for the cache invalidation at reset... I'd say we just defer that discussion until we have an implementation that actually has a BIR that considers the address tag in question16:37
jonibo|laptop...it's moot until then16:37
-!- Netsplit *.net <-> *.split quits: jonibo23:54

Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!