When I was studying at Tsinghua University (清华大学) in China, I managed to analyze hundreds of sample website pages given by the professor, from 275,909 Chinese words (dic.dic) (https://raw.githubusercontent.com/hansf14/WebpageAnalysis/refs/heads/master/engine/input/dic.dic), finding out the relations between the given webpages based on the keywords.
It was the final individual assignment project (个人大作业) assigned in "Data Structure and Algorithm (数据结构与算法)" class in grade 2 (1st Semester).
I used C++ with no third-party libraries, even no C++ STL, due to the project requirements forced by the professor. All students including me had to implement all the data structures and algorithms by ourselves. We even had to implement the algorithm for parsing HTML pages by ourselves. Only the D3.js JavaScript library (data visualization library in JavaScript) was allowed to use for the visualization of the output. Those were the big hurdles for this project.
Self-implemented data structures, for example, were: AVLTree, Linked List, Linked Stack, String (with string chunks linked together like linked list for dynamic length and optimized memory for storing text), String Hash Table, etc.
My Github has only part of the whole code. I did this project too long ago, about 10 years ago from now (2025). Seems like I lost some code and cannot find it. But the screenshot and the core of the whole code are still available at my Github.
I reused the same code for my "Software Requirements Engineering and Modeling (软件需求工程与建模)" class group project and that's why the README.md is written like so.