IRC logs for #openrisc Friday, 2014-05-09

--- Log opened Fri May 09 00:00:40 2014
wallentostekern: I think the first thing is to get re-entrancy to head.S etc. (as in libgloss) and to remove the fix memory addresses. I think the SMP itself is only a few functions one needs to implement per architecture06:01
stekernwallento: yes, I think so too. I've got some ideas for head.S, but I want to explore them before making any statements about it.06:27
stekernwallento: but.. the *very* first step is to make your multicore-demo boot Linux with *one* core ;)06:53
blueCmdolofk: hah, we have a saying at work "nobody wants to be in the critical path" :)06:53
blueCmdstekern: pff, you're not thinking large enough!06:53
blueCmdif you get it to run on multiple, surely it's trivial to run on one? ;)06:53
stekerntrue, only problem is that it doesn't run on multiple, thus it might not be trivial to run on one ;)06:54
stekern(but it's probably something minor that is preventing it, so in practice, yes it should be trivial to make it run on one of the cores)06:55
stekernI already found a mor1kx bug in the process though, ibus cyc&stb might stay asserted even though rst is asserted06:56
stekernand that brings up another interesting question, what kind of mechanism should there be to enable/disable cores?06:58
mor1kx[mor1kx] skristiansson pushed 1 new commit to master: https://github.com/openrisc/mor1kx/commit/e0e2f058e3ebba40a9a0231c5f54aa1d6b04bb7407:01
mor1kxmor1kx/master e0e2f05 Stefan Kristiansson: cappuccino/fetch: deassert ibus_req on rst07:01
pgavinstekern: SPR space?07:08
pgavinmaybe an SPR to enable a core, and another to disable?07:09
stekernpgavin: to the enable/disable question? yes07:09
pgavinor one spr that's just a bitmask07:09
pgavindo you already have an SPR defined?07:10
stekernno, I pressed enter prematurly... =)07:10
pgavin:)07:11
pgavinalso inter-core interrupts might be useful07:11
pgavinnot necessary, I suppose, can just use polling07:11
pgavinthe intercore interrupt can be reused to enable the cores tho07:12
stekernyes, that's an option, but then there's a lot of smaller implementation details, like should that spr be 'global', is it only one 'master' cpu that can read/write it. etc07:12
pgavinyeah07:12
pgavinif you use an interrupt it should be easy to make global07:13
pgavinthere doesn't need to be any higher level state I don't think07:13
stekernI think I need to read up on inter-core interrupts07:15
stekern(and a fun fact related to that, google gives results about 'coitus interruptus' when searching for it) ;)07:16
pgavinlol07:16
stekernspellchecking  gone terribly wrong - intercore, not intercourse07:21
LoneTechstekern: another irrelevant fact - that's what Onan did, per the bible text. so the word onani is misdefined.07:37
stekernLoneTech: ah, interesting - I didn't know that07:49
stekernwallento: I'm not sure I understand how the bus address matching here works: https://github.com/wallento/orpsoc-cores/blob/multicore-demo/systems/mor1kx-dualcore/rtl/verilog/wb_bus_b3.v#L36608:05
stekernfor the memory, S1_RANGE_WIDTH = 29 and S1_RANGE_MATCH = 0x1f80000008:07
stekernright?08:07
stekernthat'd make: assign s_select[1] = (m_bus[31:3] == 29'h1f800000);08:09
stekernhow does that work?08:09
stekernit doesn't, because S0 is the mem, not S1...08:22
wallentomemory at S0 is 0x0000000 to 0x7FFFFF08:25
wallentouart at S1 is 0xff80000 to 0xff8000708:26
stekernyeah, I think I got it now08:27
wallentoRANGE_WIDTH is number of bits from MSB that must match08:27
wallentoand RANGE_MATCH the corresponding value08:27
stekernI mixed the slaves up, that was what threw me off08:27
wallentoif only verilog had variable width parameter arrays..08:28
wallento:)08:28
stekernyes, but I think olofk's wb_intercon_gen is the next best thing ;)08:28
wallentoi like self-contained versions more :)08:29
wallentobtw: bus hold and bus hold ack are for multilevel cache coherency08:29
wallentoso not to get confused08:29
wallentoL2 needs to stop bus arbitration to inject invalidations08:29
stekernwhat do you mean by self-contained?08:30
wallentono generation scripts08:32
wallentoplain verilog08:32
stekernah, ok. in principle, me too. but some tasks are just to mind-numbing to do manually08:33
stekernbus interconnect is one of those things08:33
stekernwallento: why did you put the uart at 0xff80000 btw?08:56
wallentovery good question08:56
stekernnot that you're not allowed to ;)08:57
wallentobecause I was allowed to maybe :-D08:57
stekernbut that was the root to my confusion with the slave mixup, I expected the uart to be at 0x9000000008:57
wallentoit may be some copy&paste error08:58
wallentoI did not work on UART at all until now08:58
wallentoi will push the updated one08:58
wallentohow large is the memory space of UART?08:58
wallento0x10?08:58
stekernsize=32 is what we have in the de0_nano port09:00
stekernI can't remember how many regs there actually is09:00
wallentoi gave it 2^14, that should work for the moment ;)09:01
stekern;)09:01
LoneTechthe 8250 had 7 registers. I would not be surprised to see 8 8-bit registers spread over 32 bytes just because the bus is 32 bit09:01
stekernthe only downside with doing it like that is that out-of bounds accesses is not caught09:01
stekernoh, reminds me, the uart is 8-bit, not 32-bit09:02
wallentohttps://github.com/wallento/orpsoc-cores/blob/multicore-demo/systems/mor1kx-dualcore/rtl/verilog/orpsoc_top.v09:02
stekerndoes your bus-thingy handle that?09:04
stekern"handle that" as in: https://github.com/wallento/orpsoc-cores/blob/multicore-demo/systems/de0_nano/rtl/verilog/orpsoc_top.v#L78809:08
stekernwb_intercon_gen inserts that wb_data_resize automatically nowdays, that's why it can't be seen in the mor1kx-generic top module.09:08
olofkI just picked up my parallella09:43
olofkI'll put it on the SoCKIT in the garage for now09:43
pgavinstekern: I think you may have told me already, but has or1k-tests been checked on or1knd yet?10:08
stekernyes, at least parts of it been used to test the mor1kx pronto-espresso implementation10:09
stekern+have10:09
pgavinhm10:09
pgavinit seems sfbf.S has an infinite loop10:10
pgavinin nodelay mode that is10:10
stekernthere might be tests that are delay-slotty though10:10
pgavinok10:10
pgavinthere's an add in the delay slot10:10
pgavinso the loop counter isn't being incremented :)10:10
pgavinI'll add the macros10:11
stekernsfbf sounds like it should be delay-slot agnostic, but obviously isn't10:11
stekernwait, I'll dig out the list of tests that the pronto was running10:11
pgavinline 7110:11
pgavinis where the error is10:12
stekernhttps://github.com/juliusbaxter/mor1kx-dev-env/blob/master/boards/generic/mor1kx-prontoespresso/sim/bin/Makefile#L5810:12
stekernsfbf isn't in that list though10:12
pgavinok10:12
stekernonly sf10:12
stekernbut if you fix it, feel free to push the fix directly to or1k-tests10:12
pgavinok10:12
pgavinwill do10:12
stekernI'll add you to the repo10:12
pgavink10:12
stekernok, I see there are some tests in sfbf that are strictly testing delay slot behaviour10:16
stekernperhaps we want to just ifdef that whole part away10:17
pgavinok10:18
pgavinseems the .nodelay directive isn't set by default for or1knd-elf-asm10:20
pgavinI thought I had made it do that already :/10:20
pgavinbut if or1knd-elf-asm has .nodelay by default then it can't be turned off10:22
pgavinso it can't be used for both architectures10:22
pgavinwhich means all the .S files need to include or1k-asm.S, so set .nodelay manually10:23
pgavinthe problem is that ld checks all the objects to make sure the .nodelay setting is consistent10:24
stekernis that the only purpose of that?10:39
stekernor does the .nodelay thing do something more than that?10:40
pgavinthat's it10:40
pgavingcc emits it automatically10:40
pgavinbut then if you hand code you have to use it10:41
pgavinmaybe it was a bad idea10:41
pgavinI had planned on going further with it10:41
stekernmaybe, but I think we actually *should* use it in bfd too...10:41
stekernso, I think the intent is good ;)10:41
pgavinwell, I think the code that checks it is in bfd10:41
stekernI mean, in .plt and such10:42
pgavinah10:42
stekernI can't remember, I might have done that delay slot agnostic though10:42
pgavinwhat I never got around to implementing (which actually shouldn't be hard) is to let delay-compat objects be linked with either delay or nodelay objects, and the result would be delay or nodelay as appropriate10:43
stekernbut, my point was more - there might be more usages for it10:43
pgavinright10:43
stekernyeah, PIC is broken for the no-delay case: https://github.com/openrisc/or1k-src/blob/or1k/bfd/elf32-or1k.c#L3810:45
stekernso, we do *need* the .nodelay to implement support for that10:45
pgavinah10:46
pgavinyep10:46
pgavinline 50 also10:47
pgavinother ones already have nops10:47
stekernyup, but they are there for padding10:48
stekernI remember now, I considered making it delay-slot agnostic, but that would have meant one word larger plt entries10:48
pgavinright10:49
pgavinand slower in the delay-slot case10:49
stekernand it's not possible to get the got pointer acquiring to work for the compat case neither, so you got to choose one of the archs for PIC10:50
wallentostekern: No, the data resizer is also missing in my top10:50
stekernwallento: I figured, I've already added it to my local copy10:51
stekernit's booting Linux now ;)10:51
wallentocool10:52
wallentoon a single core and the other one halting?10:52
stekernyes, on a single core10:52
stekernfigured it'd be healthy to have that case working before I start to poke around the Linux src10:53
stekern=)10:53
pgavinstekern: ok, so delay-compat should be deprecated?10:53
LoneTechso, are you aiming at coprthr-style support processor, smp booting, or cpu hotplug? :)10:53
olofkstekern, pgavin : Is there really any benefit in doing compile-time detection of delay slots for the asm test cases?10:53
stekernpgavin: should or could? I think it has usecases for baremetal10:54
pgavinolofk: delay slot testing?10:54
pgavinstekern: ok10:54
olofkpgavin: Well, that's a good use case :)10:54
stekernor, what I'm trying to say, I think there's usecases for it. Unfortunately it will not work together with PIC10:55
pgavinstekern: right10:55
olofkBut how many of the tests are really delay slot tests? Could we split those out and make two versions of them?10:55
pgavinprobably10:56
stekernLoneTech: ehm, you tell us? ;)10:59
LoneTechI really don't know which would be hardest. coprthr allows starting processors without having them run the OS11:00
LoneTechcpu hotplug allows requesting them at runtime, but is otherwise probably not all that different from smp startup11:00
LoneTechand I haven't mucked about with any of them11:01
stekernyeah, I'm in the usual "know-nothing-eager-to-learn" position11:01
pgavinshouldn't a misaligned PC in l.jr generate an instruction alignment exception and not a bus error?11:54
pgavinor1ksim uses an alignment exception11:55
pgavinstekern: I guess the mor1kx uses a bus error?11:56
pgavinstekern: nevermind, this comment is incorrect :)12:06
pgavinor1k-insnfetcherror.S says it generates an alignment error12:06
pgavinI suppose that 0xee00000000 generates a bus error12:06
LoneTechthere's something I don't get in the PLT routines12:42
LoneTechlooking at http://opencores.org/or1k/OpenRISC_PIC 4-word version .pltN, reloc_offset is a value we expect the resolver to have a use for12:43
LoneTechbut the PLT entry is already a lookup that would've gone in a register, isn't it?12:44
LoneTechif that register is fixed, that and the GOT pointer should be all the information the resolver needs12:45
LoneTech(to look up which function to look up, that is)12:45
LoneTechit seems the function of jumping through the PLT makes more sense for register starved setups12:49
stekernLoneTech: yes, if I understand your question right, I too asked the same question when I implemented the PIC support. And AFAICT, the benefit for us is that it allows for lazy relocations12:52
LoneTechbut you still could. the lazy relocation relies on being able to identify the routine to look up; it could do so as long as the GOT entry address is in a known register12:57
LoneTechyou could just move the translation from pltN to reloc_offsetN into plt0 afaict12:59
stekernLoneTech: ok, I clearly misunderstood your question then ;) Let me think about that a bit more then13:09
stekernpgavin: I couldn't figure out if you answered your own question, but in either case, mor1kx generates an ibus_align exception on unaligned jumps13:13
pgavinstekern: yes, I answered my own :)13:21
pgavinthe comment in the file was incorrect, I've fixed the comment :)13:21
pgavinI also added test lists and configs for or1ksim13:22
pgavinyou can see them here: https://github.com/pgavin/or1k-tests/commits/master13:22
stekernLoneTech: I still don't quite understand how you mean, how would your plt0 look like then?13:23
stekernpgavin: lgtm, feel free to push those to openrisc/or1k-tests13:25
pgavinkk13:26
LoneTechsorry, I haven't thought this through entirely. Looks like the value (nameN) I thought to leave in a register is stored in an instruction offset right now, doesn't get put in a register.. but I think just putting that in a register means you don't need .pltN for varying N13:30
LoneTechI have to go now, though. will consider more later13:32
_franck__stekern: if you have some spare time, could you give uboot a try ? I have ethoc working under Linux but not under u-boot. I did not investigate. I just cloned the upstream repo and change the clock frequency for my board.15:35
stekernlet's see if _franck__ reads the backlog on the web... you either need this http://git.openrisc.net/cgit.cgi/stefan/u-boot/commit/?id=c7845df64f7df75dc3d46e2f6385c0d901f9d416 or blueCmd's struct padding patch for gcc16:01
stekernhttps://github.com/bluecmd/or1k-gcc/commit/5043af9d3876eed42dfca706bc023131a519746b16:03
blueCmdI'm gonna start working on the atomic builtins now stekern16:22
blueCmdand rebase my patches onto openrisc/or1k-gcc16:31
stekern\o/16:37
stekern |16:37
stekern\16:38
stekernheh.. fail16:38
blueCmd/ \16:48
blueCmdI'm going to begin by rebasing the latest gcc as well16:48
blueCmd*sigh* nah, I won't do that _just_ now16:54
blueCmdI'll do that when I want to resolve merge conflicts16:54
stekernhaha16:55
blueCmdit was only 2 months ago, how many arches could possibly have been introduced that are between the letters n and p?16:56
blueCmdapparently >0 since there were conflicts16:57
olofkI read an article today about Linux ports that started with: "It's not every day Linux is ported to a new architecture", and all I could think was, yes, it's almost every fucking day we read about a linux port to a new arch17:13
olofkAnd there must be even more arches in GCC17:13
blueCmdolofk: yes17:15
stekernnot to mention all the out of tree ones...17:15
blueCmdstekern: like or1k you mean?17:18
blueCmd;)17:18
stekernyup, or eco32 for both linux and gcc17:21
stekernor lm32 for linux17:21
stekernblueCmd: why are you rebasing btw?17:25
stekern(I'm not speaking about your patches17:27
stekernthat you applied upon or1k-gcc, that's clear)17:28
blueCmdstekern: it's nice to not fall behind17:28
stekernbut upstream17:28
stekernyes, but why do you _rebase_17:28
stekernwhy not just pull?17:28
blueCmdbecause I hate the merge commits17:28
blueCmdI think they are messy and hard to keep track of17:28
stekernoh... but it17:29
* _franck_ has read the backlog17:29
stekern*damn you enter key*17:29
stekern's going to make it hard for you, you break the link between how upstrean and our tree looks17:30
stekern+ you get weird looking commits like this: https://github.com/openrisc/or1k-gcc/commit/2135e320e5779bd5fa9fdae5fc8ce97072a3e72117:30
stekernwhere it looks like you are the committer of the upstream patches17:30
blueCmdhm17:31
stekernin other words, learn to live with the merge commits ;)17:31
blueCmdright, I actually thought it was the other way around17:31
blueCmdbut sure, I'll try to use merge instead17:32
blueCmdfor those things17:32
blueCmdstekern: so. for atomics - limitations.17:40
blueCmdoperations on 1, 2, 4 and 817:40
blueCmdare all those possible?17:41
blueCmd1-4 i have no problems seeing as the lock is on the word or bigger, but I don't know if you settled on word or cache line in the end17:41
stekernword17:46
blueCmdstekern is becoming so gangsta17:53
olofk:)17:53
blueCmdstekern: so, implementing for 8 bytes wouldn't be feasible I guess17:55
stekernhaha, took a second or two to get that 'gangsta' reference18:41
olofkstekern: a second? :) 19:53 < blueCmd> stekern is becoming so gangsta 20:41 < stekern> haha, took a second or two to get that 'gangsta' reference18:43
stekernolofk: contrary to your belief, I'm not reading every single line here in real time ;)18:45
olofkWHAT?!?!??!18:49
blueCmdstekern: making good progress, I think I can make everything work very easily19:05
blueCmdstekern: http://2e7b3d66b5d1dfcc.paste.se/ - that is my WIP19:06
blueCmdand it works as in that it replaces __atomic_ builtins and so on with the expand, quite neat19:07
blueCmdcurrently the lower part throws: "wrong number of alternatives in the output template" in my face when I compile it. gonna go out and have a bite and think about it19:07
_franck_stekern: (ethoc) must be something else, I don't receive ARP reply (request are ok, my PC send a reply but uboot doesn't see it)20:20
_franck_...and it works with barebox20:48
pgavinstekern: I think I got gcc to generate some better code now22:06
pgavine.g. there's a load-use delay when possible22:06
pgavinand a delay between l.sf* and l.bf22:06
pgavinthere's a new problem now though... it's separating l.sf* and l.bf at the expense of other things22:33
pgavinlike it will push the l.sf all the way back to the PC following a load that produces one of its inputs22:34
--- Log closed Sat May 10 00:00:42 2014

Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!