Michael’s Spring 2024 Design Notebook

Week of January 22, 2024

Project Work:

As this was the first week of the semester I was mostly focused on getting settled back into school but I did manage to finish the ID connection module I started at the end of last semester.

Week of January 29, 2024

Project Work:

  • Fixed Branch Manager Errors: Fixed errors in the Branch Manager module and test that were overlooked due to the module’s absence from the CMake files as noted by Xingzhi.
  • Changed Titles of Labs: Changed the titles of the onboarding labs from Week # to Lab # as the former naming scheme implied the labs should only be done one week at a time which is quite a slow pace, especially for some of the earlier labs.
  • Updated Summary Page with Spring 2024 Design Notebooks: Updated the summary page of the website to include the new folder for the Spring 2024 design notebook entries.
  • Assigned Rielle to work on Core Intro Guide: Assigned Rielle to work on a detailed introduction guideline for new members joining the Core Team which helps explain how the team works and how the core repo is structured.

The team officially spun into gear this week as new members start to work on the onboarding labs and existing members start to pick up where they left off in the fall. As such, I sent messages to existing core team members with unfinished work from the fall to check that they planned to remain with the team this semester and were good to keep working on their existing tasks. Most of the people I messaged have at this point confirmed they plan to continue their work from the fall but I am still waiting on a couple more responses which I hope to get soon.

One of the people who did respond, Xingzhi, noticed while continuing his work on the Branch Control connection module that the Branch Manager module was not included in the CMake files and that it caused several syntax errors when it was added. It quickly became clear to me that I had forgotten to add both the Branch Manager module and its test to the CMake files when I first made them so they were never properly tested. I quickly attended the CMake files and fixed the syntax errors in the Verilog code for the Branch Manager. Once I had done that it also became clear that there were errors in the Branch Manager tests where I was checking against the wrong output value for certain cases, so I fixed that issue as well. With those various fixes made the Branch Manager now worked properly and was no longer missing from the CMake files.

Since new people will be joining the team this semester, I wanted to make a few improvements to the Website repo that I had been meaning to do for a while. Namely, I wanted to finally change the onboarding lab titles from Week # to Lab # so that new members would not be encouraged to only do a single lab per week. Last semester we had a major issue where it took around half the semester for the first of the new members to finish the onboarding labs. Given that there are only 4 complete labs, as Lab 5 does not yet have an associated exercise to complete, and that a significant portion of the first lab is copy-pasting text into files, it should only take 3 to 4 weeks for new members to make their way through the labs as evidenced by how quickly people completed the labs in Spring 2023. While there are certainly more significant issues that led to the slow progress last semester which we are hoping to address this semester, at the very least we shouldn’t be actively encouraging new members to clock out for the week after they finish a single on-boarding lab by titling the labs as week numbers.

Finally, I also wanted to start work on a more detailed Core repo and team introduction document to complement our existing development and testing guidelines document. At this point, navigating the Core repo along with the associated links and projects is second nature for me, but it can be confusing for new members, especially those new to GitHub. As such, this introduction document will serve to help new members understand the layout of the Core repo and more quickly and easily find the resources they need. This is especially true for the projects associated with the repo as the Core Design and Implementation project is essential to our workflow but is hard to find and easily confused with the General Core Tasks project. Thankfully, Rielle reached out asking if there was anything she could work on and agreed to take on the tasks for creating this intro guide.

Week of February 5, 2024

Project Work:

The big thing this week was the new test format Vito asked us to use. Apparently, we were not grabbing device references correctly and this was causing issues. He also presented a better testing method than random numbers in order to get consistent input range coverage. Vito updated the ALU test as an example and I have been working through all the tests to change them to the new format. It should be noted that many of the tests can be cleaned up further through consolidation of similar functions but I want to get all the tests moved over to the new format first and then we can worry about code cleanliness.

Week of February 12, 2024

Project Work:

  • Continued Work on Updating Module Tests: Continued updating the module tests with the new format. This week I finished updating tests for the Branch Evaluator, Pipeline Reset Module, Branch Manager, CPU General Purpose Registers, and Branch Address Calculator.
  • Reviewed and Merged Rielle’s Starter Doc PR: Had Rielle make a few more changes to the new getting started doc and existing dev and test guidelines doc and then merged the changed.
  • Updated Core Links in Lab 7: Added a new link to Lab 7 that takes people to the new Core Team getting started doc.
  • Merged L1 Cache PR: Confirmed that Surya will not be rejoining the team semester so merged the existing progress made on the L1 Data Cache.

I continued working through the tests this week and am now about two-thirds of the way through updating them, at least by the number of tests there are. Many of the remaining tests that need to be updated are rather complex, however, so in terms of total work I am probably closer to 50-60% through updating everything. A lot of the work involved is less about changing over the format and more so about consolidating the testing code into two or three functions so that I’m not just making a new function for each test case that is nearly identical to the other test functions except for one or two things.

In terms of other things I did this week, Rielle and I went over her pull request for the new introduction document and updates to the existing dev and test guidelines document. She was able to make all the changes I asked for in my review of the PR so I merged her changes meaning we now have a proper introduction guide to the Core Team for new members. I also finally heard back from Surya about whether he would continue his work on the data cache this semester. Unfortunately, he will not be continuing with the team but he did PR all the progress he made last semester, which includes a finished outline and module for the L1 Data Cache. With Surya gone, I do need to find someone to take over work on the data cache. If need be I can take over that responsibility, but given that this is my last semester, I would like to start having other Core Team members take on these more complex tasks so that they can get more familiar with Core as a whole and one or more of them is prepared to take over as Core Czar next semester.

Week of February 19, 2024

Project Work:

  • Finished Updating Module Tests: Finished updating the remaining module tests to the new format. All existing module tests are now in the new format.
  • Put L2 Data Cache On Hold: Put the development of the L2 Data Cache on hold indefinitely as our design really only needs the L1 Data Cache.
  • Replaced L2 References in L1 Data Cache: Changed the variables and function names in the L1 Data Cache that were in reference to the now-on-hold L2 Data Cache to instead reference RAM.

I finally finished updating all of our existing module tests this week, and as I suspected, the remaining tests were some of the most complex and time-consuming to convert. The ID/EX Latch test proved particularly frustrating as the sheer number of full 32-bit word inputs to the module meant that it took far too long to test every combination of isolated input bits. Instead, I had to design the test cases such that we still tested each bit of each input in isolation at least once, but that only the relevant inputs for each test case had their bits fully cycled through.

This week I also had a discussion with Vito at our team meeting about the caches, specifically the L2 Data Cache. I had initially designed the data cache hierarchy to mimic that of modern CPU designs by Intel and AMD, with per core L1 and L2 and a shared L3. The L3 data cache had already been put on hold as it didn’t make much sense for our single-core design and was more of a “wouldn’t it be cool if we had this” idea than anything else. At the meeting, Vito asked why I was even planning an L2 Data Cache given our single core design. After some discussion on per-core L2 cache in current processor designs from AMD and Intel, Vito made the point that for a design as simple as ours, the L2 cache was unnecessary and just needlessly complicated implementation. Ultimately, Vito convinced me of his argument and so I put the L2 Data Cache on hold indefinitely. I also looked through Surya’s implementation of the L1 Data Cache and found a couple of outputs and functions that were meant to communicate with the now-on-hold L2 Data Cache. These were not completely useless however as they could be repurposed to communicate with the RAM rather than L2. As such, I renamed all the outputs and functions to reference RAM rather than L2.

Week of February 26, 2024

Project Work:

My main focus this week was on the Data Cache related modules. The first thing I had to do was implement the data_mode functionality into the L1 Cache. The data_mode input specifies whether we are writing to the least significant byte, least significant half, or the entire word at some address in memory. With Surya gone I had to spend a significant amount of time reverse-engineering how his cache implementation worked in order to figure out what parts I needed to modify to implement the required functionality. I was able to get a fairly good idea of how his design worked and believe I was able to modify all the relevant areas; however, I can’t truly know if I made all the required changes without completing the corresponding test for the L1 Data Cache. Surya had tested the L1 Data Cache using a test bench which I would ultimately need to remake into a Verilator C++ test. To make it easier to take the test bench logic and use it for a Verilator C++ test, I modified the existing test bench to remove references to the now-defunct L2 Data Cache and instead simulate calls to RAM. The changes made to the test bench should make it easier to determine the logic used in each of the tests and so allow me to implement the same logic into the Verilator test instead of having to do everything from scratch. For now, I decided to hold off on making the L1 Data Cache test, but with the I/O of the L1 Data Cache finalized for all intents and purposes, I decided to turn my attention to the Data Cache Manager. After asking the AMBA team about the format for memory access and reading over their Generic Bus interface, I was able to complete the outline of the Data Cache Manager fairly quickly. Since the module’s logic is purely combinational, implementing and testing it was rather simple.

I also learned this week that the person I had assigned to implement the general control module is no longer on the team since they missed the add/drop deadline to officially rejoin the team this semester. Normally, this wouldn’t present much of an issue as the project’s open-source nature but I have yet to get a response from them since the add/drop deadline. As such, I reached out to Rielle about taking over the responsibility of implementing the general control module. She ultimately agreed although she expressed a desire to work on it with a partner. Once the current batch of new students finish the onboarding labs and start to join teams I would like to assign one of them to help Rielle with the module considering it is rather complex. A similar situation occurred with one of the two people I had assigned to work on the Instruction Cache, so that is another position that needs to be filled by a new team member.

Week of March 4, 2024

Project Work:

I continued my work on the Data Cache-related modules this week. Since the previous outline of the Data Cache Connection Module did not account for the I/O lines used to communicate with AMBA for memory access, the outline for the Data Cache Connection Module, the MEM Connection Module, and the Top Level Connection Module all had to be updated to include the new I/O lines. These changes also meant I had to update the CPU block diagram with the new I/O lines. With the outlines updated, I started work on the test for the L1 Data Cache. I am basing my test off of Surya’s existing testbench but will be streamlining and refining the logic as I translate it into a Verilator test in C++.

In the course of my work this week I noticed that the codecov score for the Core Repo had dropped below 90% and tried to find the cause of the issue. It turns out that the loop for the pred_taken input was set to stop when pred_taken was 0 since I forgot to add the rest of the end condition. Since pred_taken started at 0, this meant that the test was never actually running which explained why the codecov score for that module was so low. Fixing the mistake, however, revealed another issue where I had set the npc variable to be a bool instead of an unsigned integer, meaning that any value greater than one was getting truncated to 1 and causing the test check to fail. I was able to diagnose and fix the test issues and the branch manager is now tested properly with much better codecov coverage.

It occured to me that the issues I was having with the branch manager tests might explain part of wht Xingzhi’s tests for the Branching Control Connection Module were failing despite him making all the changes I had suggested last week. I quickly let him know about the issue and found in the branch manager test and how to fix it for his test. After that, his test worked successfully and I was able to merge his PR meaning another module has now been implemented. With his work on the Branching Control Connection Module complete, I assigned him to help Kyle with the instruction cache related modules.

Week of March 11, 2024

Project Work:

This week I continued working on the test for the L1 Data Cache. This task has ended up being far more complex than I expected since I have to both reverse engineer Surya’s L1 Cache Verilog code and translate and augment his existing testbench logic into a fully-fledged C++ test. Because of how complex the cache is, I have found the best way to approach testing is to make a C++ cache object that simulates the expected behavior of the Verilog Cache Module and then compare the simulated cache with the module to ensure the module behaves as expected. At this point I have the first draft of the cache object implemented and need to finish the test functions so I can ensure both the test and the cache work as expected.

Week of March 25, 2024

Project Work:

Coming back from spring break I continued working on the test for the L1 Data Cache. The first iteration of the test is nearly complete but I fear I will end up having to rewrite significant portions of the L1 Data Cache module itself as it is currently set up as an FSM that transitions between states each clock cycle, whereas I need it to always be up to date every clock cycle since the data cache may be written to or read from every clock cycle. I also added some of the needed I/O lines to the instruction cache outlines that are needed to communicate with memory through AMBA.

Week of April 1, 2024

Project Work:

This week I finished the first draft of the L1 Data Cache test but when I went to run it I noticed that there were various data width conflicts and other errors in the Verilog Module. As such, I spent a good chunk of this week just going over the Verilog module and fixing all the errors. I have now gotten the test and module to a point where they will run on the GitHub simulation but the test is not producing the expected results so that will need more debugging. Unfortunately, I ran into a Verilator issue trying to run the test on my computer, preventing me from debugging. I talked to Vito about the issue during the weekly team meeting and he ultimately came to the conclusion that the issue is somewhere in the toolchain and not with my setup specifically. Unfortunately, this means that locating and fixing the bug will take some time.

Week of April 8, 2024

Project Work:

Work on the Data Cache has stalled while I wait for Vito to figure out what in the toolchain is causing the errors I am encountering. I tried various things to fix the issue on my end such as switching compilers from Clang to GCC and forcing the use of different C++ versions but nothing solved the problem. As such I spent most of this week working on my report and presentation for the end of the semester. In doing so I realized that I had forgotten to update the CPU registers to use the rstn_h reset line and so I made those changes. I also talked to both Cameron and Xingzhi about jointly taking over the Core Czar role next semester which they seemed inclined to do.

Week of April 15, 2024

Project Work:

We ran into another issue with Verilator this week that was caused by a recent update to Verilator that has it generate trace methods even when tracing is disabled, causing our workflow to crash. Vito created an issue on the Verilator repo but in the meantime, I have enabled tracing for the core repo to prevent the crash.

I also continued work on my presentation and report, and in working on them I realized that I had forgotten to add read after write hazard detection to the hazard detection logic so I fixed that.

Week of April 22, 2024

As the semester is wrapping up I spent most of my time this week finishing on my presentation which I gave Thursday as well as working on my report. Both the presentation and report should serve as detailed guides to the core design for Cameron and Xingzhi and help them run the Core team next semester.

The issue preventing me from building the repo and debugging the L1 Data Cache test is still no fixed and neither I nor Vito are any closer to finding a solution to it, as such the L1 Data Cache development has stalled.

Week of April 29, 2024

Project Work:

We finally figured out the Verilator issue with has_virtual_destructor, turns out they forgot to <type_traits>, so I had Vito walk me through manually modifying my Verilator installation to add the missing include. I then spent a good chunk of time trying to shard the core tests so they could run multithreaded only for Vito to tell me that I could just run ctest -j [number of threads] which was much simpler. Once that was figured out, I went through the various core tests and optimized them to reduce run time. I was able to get the test time on GitHub down from 4 hours to 30 minutes, and just a couple of minutes locally.

This will be my last design notebook entry as I am graduating, so I merged my changes to the L1 Data Cache and its test. I still plan to continue work on the core but at a much less frequent pace; I also plan to continue to be available in the Discord to help people and guide Camereon and Xingzhi through taking over the Core Czar role.