IRC logs for #openrisc Thursday, 2014-06-19

--- Log opened Thu Jun 19 00:00:40 2014
-!- lvcargnini is now known as Guest3074703:45
Guest30747poke53281, thanks sorry my connection dropped, reason why took me so long to answer05:21
Guest30747the question is the instructions work in 32 b (understood), but i'm implementing a 64b machine, so how the register will be filled containing 32b for the instruction ?05:22
stekernI don't understand the question05:39
stekernhow does the number of bits in the instruction have anything (or at least much) to do with the register size?05:40
stekernthe same way as you would fill 32-bit registers, but in some cases you'll need some extra operations05:40
Guest30747stekern, explaining you have a 64b size registers in your architecture, bu your instruction is 32b long, so does I interpret 64b words fetching from the cache, and place into a 6406:05
Guest3074764b register, than how to intepret that to decode the oppcode of 64b ?06:05
Guest30747as {32'b0,instr} or {instr,32'b0}06:06
Guest30747to decode it.06:06
stekernhmm, I think you have to read up a bit more on the internals of RISC machines06:06
stekernif I'm not completely misunderstanding what you ask...06:06
Guest30747stekern, please elaborate your conclusion, because like assume mips 6406:06
stekernfirst, explain why you think that the instruction length is related to the register size06:07
Guest30747in mips64 you also have 64b regs, but instructios are 32b06:07
Guest30747the solution in this case is two instructions per word to fit one register06:07
Guest30747I don't think that way06:08
stekernthe thing I don't understand in your question is, why do you speak about 'fitting instructions into registers'?06:08
Guest30747the point is my architecture will have 64bits size word, for such I'll handle it using 64b regs, when fetching a instruction aligned in 8-bytes how to interpret the 64b to deocde it06:09
stekernthe instruction size is still 32-bit06:09
Guest30747I know06:09
stekernheh... so what is the question then? ;)06:10
Guest30747but whe you fetch from iL1 you are fetching 64b06:10
stekernwhy would you do that?06:10
Guest30747Why wouldn't I ?06:10
Guest30747you are currently fetching 32b ina  32b arch06:11
Guest30747why not fetch 64b in a 64b, to put in layaman terms06:11
Guest30747sorry layman06:11
stekernI'm fetching 32-bit, because the instructions are 32-bit06:11
Guest30747...06:11
stekernif they still are 32-bit, why fetch 64-bit?06:11
stekern(you *could* fetch 64-bit from the icache, but the reasons to do that could be applied to the 32-bit versions as well, so let's not go there)06:12
Guest30747I understood you are assuming a 32b and that the instruction is 32b, assuming you are compiling a code for 64b, so all the word representation, otherwise made, are 64b06:13
Guest30747so your memory will be aligned in 8-bytes, on ce you fetch your cache lines, like in 4 or 8 words of 8-bytes each06:14
stekernI'm assuming an or1k 64-bit implementation, with 32-bit instructions, as it's are described in the arch manual06:14
Guest30747stekern, that is clear for me too06:14
Guest30747I understood it, so please how would you implement your HDL06:15
stekern...06:15
Guest30747for the decoding phase, assuming you compiled it for 64b, having the 32b instructions similar to MIPS architecture as example06:15
stekernfor the fetch/icache unit?06:15
stekernexactly the same as in a 32-bit implementation06:16
Guest30747datapath decode after fetching lines of 4*8-bytes06:16
Guest30747again I agree with yuo instructions are 4-bytes06:16
Guest30747sorry you06:16
Guest30747that is the point my word now has 64b,  not 32b anymore06:17
Guest30747how to correctly parse the 6406:17
Guest30747b06:17
stekern...again, no, they are *not* 64-bit, they are 32-bit06:17
stekernremember that instruction and data paths are seperated06:18
stekernyour data path will of course be 64-bit06:19
stekernyour instruction path not06:19
Guest30747separated memory space, so basically you area pointing that I have to have two address space (dimension) for data and inst in the same architecture06:19
stekernyes06:20
Guest30747but that is messy06:20
stekernin what way?06:20
Guest30747for example in the compiler implementation first thing, second I have to  have different implementations for icache and dcache, which will create different latencies for L106:22
Guest30747in IC design06:22
Guest30747a line of 16 bytes doesn't have the same latency as a 32 bytes line, after routing and placement06:23
stekernwhat you just said makes no sense at all06:23
olofkGuest30747: Different latencies in the caches will probably not be an issue. You could even make both datapaths wider to the memory. Make them 256 bits for example06:23
olofkYou don't need to have your datapath to the memroies the same width as your instructions or data06:24
Guest30747I know06:24
Guest30747I want06:25
Guest30747would like06:25
olofk:)06:25
Guest30747to make the floorplaning  smooth06:25
Guest30747stekern, I thinking about a silicon not FPGA implementation06:25
Guest30747and yes make difference in the DFM process once you are routing it and performing STA06:26
stekernhow would that be related?06:27
Guest30747let me compile two memory banks I'll pass you the data06:27
stekern...they are completely different paths inside the cpu06:28
stekernwhat you're saying still doesn't make sense, doesn't matter if it's FPGA or ASIC06:29
olofkGuest30747: You could probably make your icache and dcache identical if you want that, even if instructions and data are different sizes06:31
olofkIf that's the big problem. But I'm not sure I have understood completely06:32
Guest30747olofk, that was my original intention, assuming {inst,32'b0} or {32'b0,inst}, I taught the compiler would generate the object code this way, SPARC style or MIPS64 style {inst,inst}06:33
olofkIf you're making the RTL you would have to decide for yourself how you want it06:34
stekernthe instructions will of course be 4-byte aligned...06:35
Guest30747http://pastebin.com/gcwpb8hH06:38
Guest30747for 32b06:38
Guest30747now assuming 64b06:38
Guest30747http://pastebin.com/EvEHNCSr06:40
Guest30747look the surfaces are different06:40
olofkSo make them both 64 bit then06:40
Guest30747yes but my doubt was regarding how to decode in 64b, since I had no idea what the compiler will generate06:40
olofkAnd use LSB of the address to select which half of the word you want to use06:41
stekernwell... it will absolutely certainly not insert empty 4-byte data at each instruction06:41
olofkNo, I thought that sounds crazy as well. Do really other arches do that?06:42
Guest30747do you guys have a sample of a binary after compiling it for 64b ?06:42
stekernwe don't have toolchain support for 64-bit06:43
stekern...well, we have *some* support for it in parts of our toolchains, but it's not complete nor usable as of today06:43
Guest30747olofk, no, SPARC just assumes everything into 64b, MIPS loads two instructions instead of one for each load06:44
stekernwhat does "assumes everything into 64b" mean?06:44
Guest30747stekern, oooooo ok, this gave something extra to think06:44
olofkGuest30747: But loading two instructions are just dual issue, isn't it?06:45
stekernand how do you know that MIPS does that, I would think that's a highly implementation specific detail06:45
Guest3074764b word size, loads of data o r instructions are 64b aligned06:45
stekernI would be surprised if MIPS made such data publically available06:45
Guest30747because I have the MIPS64 specs, yes it is very specific06:45
Guest30747but you can find that into old implementations like R4000, initial 64b06:46
olofkSo back to the original question, we haven't really discussed that for or1k AFAIR, but packing the instructions tight is what makes most sense to me06:48
olofkIt might even be defined that way in the spec. Not sure06:48
stekernwell... if it's important to you to have the caches emit 64-bit (I still don't get the reasoning, but), do that then06:49
Guest30747ok, well I was concerned mostly because of the compiler production, since it doesn't have support for it yet, I'll have to drop it for now, because it is still unclear, the problem is if I make something and later is decide in another way06:51
stekernyou will just discard half of the data in a single issue implementation...06:51
Guest30747yep06:51
stekernsounds silly06:51
Guest30747silly drop or align in 64b single issue ?06:52
stekernwell, I can promise you that the data will be 'packed'06:52
Guest30747stekern, Yep that is clear06:52
olofkBut you only need a 32->64-bit conversion outside of the icache so that the CPU sees a 32 bit port06:52
Guest30747my only concern was the the instruction for decoding06:52
stekernstart hacking away and you will realise that you're concerns are misguided...06:53
Guest30747stekern, I cannot afford the time to be misguided now, reason why I'm trying to be assured before committing to its design06:54
stekernthink about this way, you can run a 32-bit program on the 64-bit implementation06:54
stekernthe instructions will be exactly layed out in the same way06:55
Guest30747yes, but the encoded ABI is in 32, the LD, makes the loading and properly maps the sign extension for the 64ba06:56
stekernmmm, and that's related to icache/fetch/decode how?06:56
Guest30747stekern, you are right the problem is not clear for me yet how the compiler would generate the objects neither the loader would link a 32b for a 64b on-the-fly06:57
Guest30747to fit06:57
Guest30747so I can keep my arch aligned in 64/32b for data and inst, for data it prefixes 0x0000000006:58
Guest30747either way, I'll have to scrub some bits to see the outcome and make my decision06:59
olofkgtg07:00
stekerndon't think so much about the compilers, think more of it in terms of a binary just07:00
Guest30747thanks olofk07:00
Guest30747thanks stekern07:00
Guest30747I'll try07:00
Guest307478-)07:01
olofkDon't ask what the compiler can do for you, but what you can do for the compiler ;)07:01
Guest30747haha ok olofk07:01
stekernyou don't have to worry to much about how the compiler will align things, you already have in the spec how the different data sizes need to be aligned07:02
stekerni.e. l.lb/l.sb 1-byte, l.lh/sh 2-byte, l.lw/l.sw 4-byte and l.ld/l.sd 8-byte07:03
stekernjust implement them like that and you'll be fine07:03
Guest30747yes, but my concern was how the l.lb is aligned in its memory for a 64b wordsize  Or1k07:03
stekernthe same as for a 32-bit07:04
Guest30747yes, but I was thinking in 64b regs aligned with 64b, worsize for the icache too07:04
Guest30747my root of problem07:05
stekernyes, but I think those are all just some misunderstandings from your part...07:07
Guest30747thanks guys for your time and help07:07
Guest30747stekern, probably07:07
stekernI might be wrong of course ;)07:07
Guest30747the main reason is I probably misunderstood something plus there is a lack of documentation, and some clarity regarding the 64b arch07:08
Guest307478-)07:09
Guest30747thanks stekern07:09
stekernno problems, it was an interesting discussion07:12
stekern...that might lead to some better documentation at some point ;)07:12
Guest30747I hope ;-)07:12
stekernI'm interested in doing a 64-bit version of mor1kx as well at some point07:12
Guest30747b~d07:12
Guest30747I was looking into it to modify it07:13
Guest30747but is a extensive code, so it seemed easier do all from scratch to be sure I did it right07:13
stekernI think it wouldn't be too much work to do it actually07:14
stekernmodify it I mean07:14
stekerna lot can be used as is07:14
Guest30747well, it can07:15
Guest30747could07:16
stekern_franck__: I tested you or1k-profiling patch, with some modifications, it works08:11
_franck__great. What did you change ? I think we need to add a keep_alive() call08:13
stekernhttp://pastie.org/930443108:13
stekernI think the timekeeping is off too, but I didn't look closer at that08:14
stekernI did 'profile 100 gmon.out', but it didn't feel like 100 sec08:15
_franck__may be it's because of that: if (sample_count >= max_num_samples || .....08:16
_franck__let's see what it does ;)08:17
stekernit gave a better result than the stall/resume method08:17
stekerninstead of the empty loop function, the uart_put function was the one where it spent more time08:18
stekern(since of course the uart keep on sending stuff even when the cpu is stalled)08:18
olofkwkoszek: I didn't realize you were working for Xilinx. Could you pleeeeeease make sure that your coworkers fix the forced line breaks in the ngdbuild/map/par/trce/bitgen logs? :)09:33
poke53281http://www.theregister.co.uk/2014/06/18/intel_fpga_custom_chip/17:16
poke53281This is a nice development. If this is successful we will see it probably also in consumer versions in a few years.17:17
ysionneaureminds me of those Intel atom soc packaged with an altera fpga17:27
-!- Netsplit *.net <-> *.split quits: poke53281, veprbl, fotis2, slp```18:08
-!- Netsplit over, joins: veprbl, poke53281, slp```, fotis218:10
-!- Netsplit *.net <-> *.split quits: poke53281, veprbl, slp```, fotis218:15
-!- Netsplit over, joins: veprbl, poke53281, slp```, fotis218:16
poke53281Yes, but they were not successful as far as I know. Don't know why.18:33
poke53281I hope that this development goes on.18:33
wkoszekolofk: I don't work for the tools team.21:05
wkoszekolofk: Fill a ticket and wait :)21:05
--- Log closed Fri Jun 20 00:00:42 2014

Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!