This kind of problem is known as positional grouping and it can be quite tricky, especially in XSLT 1.0 - it's usually much easier in 2.0. Google for "positional grouping XSLT" for some background. Your thinking, of working at the level of tags, is understandable, and you can implement solutions that way in other languages, but it's not the XSLT way.
The first point about such problems is to design your stylesheet around the output structure, not the input structure. Your remarks such as "which I can do while looping and when come across each field" suggest that your code is following the input structure. You say it yourself: "The problem comes because I see fieldgroup start and end in two different iterations of a loop". You need to invert the structure.
Draw the structure of your result tree. Write templates to generate each element in the result tree. Then fill in the templates by asking "what information do I need to fetch from the source document to fill in the content of this element?"
Author, XSLT Programmer's Reference and XPath 2.0 Programmer's Reference