IRC logs for #openrisc Thursday, 2014-06-19

--- Log opened Thu Jun 19 00:00:40 2014
-!- lvcargnini is now known as Guest30747		03:45
Guest30747	poke53281, thanks sorry my connection dropped, reason why took me so long to answer	05:21
Guest30747	the question is the instructions work in 32 b (understood), but i'm implementing a 64b machine, so how the register will be filled containing 32b for the instruction ?	05:22
stekern	I don't understand the question	05:39
stekern	how does the number of bits in the instruction have anything (or at least much) to do with the register size?	05:40
stekern	the same way as you would fill 32-bit registers, but in some cases you'll need some extra operations	05:40
Guest30747	stekern, explaining you have a 64b size registers in your architecture, bu your instruction is 32b long, so does I interpret 64b words fetching from the cache, and place into a 64	06:05
Guest30747	64b register, than how to intepret that to decode the oppcode of 64b ?	06:05
Guest30747	as {32'b0,instr} or {instr,32'b0}	06:06
Guest30747	to decode it.	06:06
stekern	hmm, I think you have to read up a bit more on the internals of RISC machines	06:06
stekern	if I'm not completely misunderstanding what you ask...	06:06
Guest30747	stekern, please elaborate your conclusion, because like assume mips 64	06:06
stekern	first, explain why you think that the instruction length is related to the register size	06:07
Guest30747	in mips64 you also have 64b regs, but instructios are 32b	06:07
Guest30747	the solution in this case is two instructions per word to fit one register	06:07
Guest30747	I don't think that way	06:08
stekern	the thing I don't understand in your question is, why do you speak about 'fitting instructions into registers'?	06:08
Guest30747	the point is my architecture will have 64bits size word, for such I'll handle it using 64b regs, when fetching a instruction aligned in 8-bytes how to interpret the 64b to deocde it	06:09
stekern	the instruction size is still 32-bit	06:09
Guest30747	I know	06:09
stekern	heh... so what is the question then? ;)	06:10
Guest30747	but whe you fetch from iL1 you are fetching 64b	06:10
stekern	why would you do that?	06:10
Guest30747	Why wouldn't I ?	06:10
Guest30747	you are currently fetching 32b ina 32b arch	06:11
Guest30747	why not fetch 64b in a 64b, to put in layaman terms	06:11
Guest30747	sorry layman	06:11
stekern	I'm fetching 32-bit, because the instructions are 32-bit	06:11
Guest30747	...	06:11
stekern	if they still are 32-bit, why fetch 64-bit?	06:11
stekern	(you could fetch 64-bit from the icache, but the reasons to do that could be applied to the 32-bit versions as well, so let's not go there)	06:12
Guest30747	I understood you are assuming a 32b and that the instruction is 32b, assuming you are compiling a code for 64b, so all the word representation, otherwise made, are 64b	06:13
Guest30747	so your memory will be aligned in 8-bytes, on ce you fetch your cache lines, like in 4 or 8 words of 8-bytes each	06:14
stekern	I'm assuming an or1k 64-bit implementation, with 32-bit instructions, as it's are described in the arch manual	06:14
Guest30747	stekern, that is clear for me too	06:14
Guest30747	I understood it, so please how would you implement your HDL	06:15
stekern	...	06:15
Guest30747	for the decoding phase, assuming you compiled it for 64b, having the 32b instructions similar to MIPS architecture as example	06:15
stekern	for the fetch/icache unit?	06:15
stekern	exactly the same as in a 32-bit implementation	06:16
Guest30747	datapath decode after fetching lines of 4*8-bytes	06:16
Guest30747	again I agree with yuo instructions are 4-bytes	06:16
Guest30747	sorry you	06:16
Guest30747	that is the point my word now has 64b, not 32b anymore	06:17
Guest30747	how to correctly parse the 64	06:17
Guest30747	b	06:17
stekern	...again, no, they are not 64-bit, they are 32-bit	06:17
stekern	remember that instruction and data paths are seperated	06:18
stekern	your data path will of course be 64-bit	06:19
stekern	your instruction path not	06:19
Guest30747	separated memory space, so basically you area pointing that I have to have two address space (dimension) for data and inst in the same architecture	06:19
stekern	yes	06:20
Guest30747	but that is messy	06:20
stekern	in what way?	06:20
Guest30747	for example in the compiler implementation first thing, second I have to have different implementations for icache and dcache, which will create different latencies for L1	06:22
Guest30747	in IC design	06:22
Guest30747	a line of 16 bytes doesn't have the same latency as a 32 bytes line, after routing and placement	06:23
stekern	what you just said makes no sense at all	06:23
olofk	Guest30747: Different latencies in the caches will probably not be an issue. You could even make both datapaths wider to the memory. Make them 256 bits for example	06:23
olofk	You don't need to have your datapath to the memroies the same width as your instructions or data	06:24
Guest30747	I know	06:24
Guest30747	I want	06:25
Guest30747	would like	06:25
olofk	:)	06:25
Guest30747	to make the floorplaning smooth	06:25
Guest30747	stekern, I thinking about a silicon not FPGA implementation	06:25
Guest30747	and yes make difference in the DFM process once you are routing it and performing STA	06:26
stekern	how would that be related?	06:27
Guest30747	let me compile two memory banks I'll pass you the data	06:27
stekern	...they are completely different paths inside the cpu	06:28
stekern	what you're saying still doesn't make sense, doesn't matter if it's FPGA or ASIC	06:29
olofk	Guest30747: You could probably make your icache and dcache identical if you want that, even if instructions and data are different sizes	06:31
olofk	If that's the big problem. But I'm not sure I have understood completely	06:32
Guest30747	olofk, that was my original intention, assuming {inst,32'b0} or {32'b0,inst}, I taught the compiler would generate the object code this way, SPARC style or MIPS64 style {inst,inst}	06:33
olofk	If you're making the RTL you would have to decide for yourself how you want it	06:34
stekern	the instructions will of course be 4-byte aligned...	06:35
Guest30747	http://pastebin.com/gcwpb8hH	06:38
Guest30747	for 32b	06:38
Guest30747	now assuming 64b	06:38
Guest30747	http://pastebin.com/EvEHNCSr	06:40
Guest30747	look the surfaces are different	06:40
olofk	So make them both 64 bit then	06:40
Guest30747	yes but my doubt was regarding how to decode in 64b, since I had no idea what the compiler will generate	06:40
olofk	And use LSB of the address to select which half of the word you want to use	06:41
stekern	well... it will absolutely certainly not insert empty 4-byte data at each instruction	06:41
olofk	No, I thought that sounds crazy as well. Do really other arches do that?	06:42
Guest30747	do you guys have a sample of a binary after compiling it for 64b ?	06:42
stekern	we don't have toolchain support for 64-bit	06:43
stekern	...well, we have some support for it in parts of our toolchains, but it's not complete nor usable as of today	06:43
Guest30747	olofk, no, SPARC just assumes everything into 64b, MIPS loads two instructions instead of one for each load	06:44
stekern	what does "assumes everything into 64b" mean?	06:44
Guest30747	stekern, oooooo ok, this gave something extra to think	06:44
olofk	Guest30747: But loading two instructions are just dual issue, isn't it?	06:45
stekern	and how do you know that MIPS does that, I would think that's a highly implementation specific detail	06:45
Guest30747	64b word size, loads of data o r instructions are 64b aligned	06:45
stekern	I would be surprised if MIPS made such data publically available	06:45
Guest30747	because I have the MIPS64 specs, yes it is very specific	06:45
Guest30747	but you can find that into old implementations like R4000, initial 64b	06:46
olofk	So back to the original question, we haven't really discussed that for or1k AFAIR, but packing the instructions tight is what makes most sense to me	06:48
olofk	It might even be defined that way in the spec. Not sure	06:48
stekern	well... if it's important to you to have the caches emit 64-bit (I still don't get the reasoning, but), do that then	06:49
Guest30747	ok, well I was concerned mostly because of the compiler production, since it doesn't have support for it yet, I'll have to drop it for now, because it is still unclear, the problem is if I make something and later is decide in another way	06:51
stekern	you will just discard half of the data in a single issue implementation...	06:51
Guest30747	yep	06:51
stekern	sounds silly	06:51
Guest30747	silly drop or align in 64b single issue ?	06:52
stekern	well, I can promise you that the data will be 'packed'	06:52
Guest30747	stekern, Yep that is clear	06:52
olofk	But you only need a 32->64-bit conversion outside of the icache so that the CPU sees a 32 bit port	06:52
Guest30747	my only concern was the the instruction for decoding	06:52
stekern	start hacking away and you will realise that you're concerns are misguided...	06:53
Guest30747	stekern, I cannot afford the time to be misguided now, reason why I'm trying to be assured before committing to its design	06:54
stekern	think about this way, you can run a 32-bit program on the 64-bit implementation	06:54
stekern	the instructions will be exactly layed out in the same way	06:55
Guest30747	yes, but the encoded ABI is in 32, the LD, makes the loading and properly maps the sign extension for the 64ba	06:56
stekern	mmm, and that's related to icache/fetch/decode how?	06:56
Guest30747	stekern, you are right the problem is not clear for me yet how the compiler would generate the objects neither the loader would link a 32b for a 64b on-the-fly	06:57
Guest30747	to fit	06:57
Guest30747	so I can keep my arch aligned in 64/32b for data and inst, for data it prefixes 0x00000000	06:58
Guest30747	either way, I'll have to scrub some bits to see the outcome and make my decision	06:59
olofk	gtg	07:00
stekern	don't think so much about the compilers, think more of it in terms of a binary just	07:00
Guest30747	thanks olofk	07:00
Guest30747	thanks stekern	07:00
Guest30747	I'll try	07:00
Guest30747	8-)	07:01
olofk	Don't ask what the compiler can do for you, but what you can do for the compiler ;)	07:01
Guest30747	haha ok olofk	07:01
stekern	you don't have to worry to much about how the compiler will align things, you already have in the spec how the different data sizes need to be aligned	07:02
stekern	i.e. l.lb/l.sb 1-byte, l.lh/sh 2-byte, l.lw/l.sw 4-byte and l.ld/l.sd 8-byte	07:03
stekern	just implement them like that and you'll be fine	07:03
Guest30747	yes, but my concern was how the l.lb is aligned in its memory for a 64b wordsize Or1k	07:03
stekern	the same as for a 32-bit	07:04
Guest30747	yes, but I was thinking in 64b regs aligned with 64b, worsize for the icache too	07:04
Guest30747	my root of problem	07:05
stekern	yes, but I think those are all just some misunderstandings from your part...	07:07
Guest30747	thanks guys for your time and help	07:07
Guest30747	stekern, probably	07:07
stekern	I might be wrong of course ;)	07:07
Guest30747	the main reason is I probably misunderstood something plus there is a lack of documentation, and some clarity regarding the 64b arch	07:08
Guest30747	8-)	07:09
Guest30747	thanks stekern	07:09
stekern	no problems, it was an interesting discussion	07:12
stekern	...that might lead to some better documentation at some point ;)	07:12
Guest30747	I hope ;-)	07:12
stekern	I'm interested in doing a 64-bit version of mor1kx as well at some point	07:12
Guest30747	b~d	07:12
Guest30747	I was looking into it to modify it	07:13
Guest30747	but is a extensive code, so it seemed easier do all from scratch to be sure I did it right	07:13
stekern	I think it wouldn't be too much work to do it actually	07:14
stekern	modify it I mean	07:14
stekern	a lot can be used as is	07:14
Guest30747	well, it can	07:15
Guest30747	could	07:16
stekern	_franck__: I tested you or1k-profiling patch, with some modifications, it works	08:11
_franck__	great. What did you change ? I think we need to add a keep_alive() call	08:13
stekern	http://pastie.org/9304431	08:13
stekern	I think the timekeeping is off too, but I didn't look closer at that	08:14
stekern	I did 'profile 100 gmon.out', but it didn't feel like 100 sec	08:15
_franck__	may be it's because of that: if (sample_count >= max_num_samples \|\| .....	08:16
_franck__	let's see what it does ;)	08:17
stekern	it gave a better result than the stall/resume method	08:17
stekern	instead of the empty loop function, the uart_put function was the one where it spent more time	08:18
stekern	(since of course the uart keep on sending stuff even when the cpu is stalled)	08:18
olofk	wkoszek: I didn't realize you were working for Xilinx. Could you pleeeeeease make sure that your coworkers fix the forced line breaks in the ngdbuild/map/par/trce/bitgen logs? :)	09:33
poke53281	http://www.theregister.co.uk/2014/06/18/intel_fpga_custom_chip/	17:16
poke53281	This is a nice development. If this is successful we will see it probably also in consumer versions in a few years.	17:17
ysionneau	reminds me of those Intel atom soc packaged with an altera fpga	17:27
-!- Netsplit .net <-> .split quits: poke53281, veprbl, fotis2, slp```		18:08
-!- Netsplit over, joins: veprbl, poke53281, slp```, fotis2		18:10
-!- Netsplit .net <-> .split quits: poke53281, veprbl, slp```, fotis2		18:15
-!- Netsplit over, joins: veprbl, poke53281, slp```, fotis2		18:16
poke53281	Yes, but they were not successful as far as I know. Don't know why.	18:33
poke53281	I hope that this development goes on.	18:33
wkoszek	olofk: I don't work for the tools team.	21:05
wkoszek	olofk: Fill a ticket and wait :)	21:05
--- Log closed Fri Jun 20 00:00:42 2014

Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!