Diving Deep Into Deeplearning
Flashback: I was in Master’s in VLSI Design program when i tried to solve a problem using deeplearning. We were asked to a mini-project as part of our course. We has Cadence tool with models of 45nm, 90nm and 180nm. We had done several simulations using the tools to understand how challenges differed with different technologies. The biggest challenge was heating dissipation which was very popular in devies we used in those days. The technology that was used in those days was based on 14nm. The FinFet technology had provided new life into Moore’s law. Since i was pasionate about the subject. I felt that it is unfare that students do not have access to the latest models. What good is my study if i work on circuit design that is about to get obsolute. I was exposed to the opensource philosophy and i wanted to fix this problem. I understand these models are expensive because of research involved in a highly sophisticated domain. Yet, unless the students have access to the latest models, how are we going to get trained in the field and try to fix several problems independently. EDA (Electronic Design Automation) has been a prety resticted field that is paywalled. Individual contributors have almost no space. There were some Opensource tools that shined at that time like KiCAD.
Just when i was looking for a topic for my project i met with a professor who was very good with students. He was very helpful to every students and encouraged entreprenership. We started talking about problems to be solved. I knew he comes from Math and statisctics background and i have limited experience with computer science. He suggested me to look into Neural networks. We were studying algorithms on genetic algorithm and i though it would be something like this. He gave me a book called Artificial Neural Networks by Yegnanarayana. He asked me to skip a few chapters and jump into solving a few problems.
I researched about the subject and found out that people have been using neiral networks to create predictive models for several years. There were models available for BSIM(Berkeley Predictive Technology Model)and CNTFET(Carbon Nanotube Field-Effect Transistors). So i thought of training models at 90nm, 180nm and check if it can predict powers at 45nm. I did manage to a good results. My dataset was small and i cannot bet on my results. But i was happy to get exposure to deeplearning. I used cadence tool to get data on various models. Use Matlab to do the regressions.
Present Day: AI is the buzzword. Its so popular that i get annoyed by it. Everyday there is a big news on remarkable achievements. I am DevOps Engineer by profession. I like to tinker in other technologies once in a while. The developer mindset is key. I was asked by a friend if i can help him out with a ML problem. I have not yet started to help him out but i needed a relearn and learn new things that have come up in these 10 years. This blog is about the learning journey.
Installing Jupyter Notebook
sudo apt update && sudo apt upgrade -y
sudo apt install python3 python3-pip python3-dev build-essential -y
#Check python and pip version
python3 --version
pip3 --version
#Install the venv module:
sudo apt install python3-venv -y
#Create a virtual environment in your desired directory:
python3 -m venv jupyter_env
#Activate the virtual environment:
source jupyter_env/bin/activate
#Install Jupyter Notebook
pip3 install jupyter
#check jupyter version
jupyter --version
#Generate a Jupyter configuration file:
jupyter notebook --generate-config
#This will create a configuration file at ~/.jupyter/jupyter_notebook_config.py.
#Set a password for Jupyter Notebook to protect it from unauthorized access:
jupyter notebook password
#open configuration file
vim ~/.jupyter/jupyter_notebook_config.py
#Find following lines and edit to this:
c.ServerApp.ip = '0.0.0.0'
c.ServerApp.port = 8888
#Run Jupyter notebook
jupyter notebook
#Go to the url on the browser and you will see the Jupyter Notebook interface.
http://localhost:8888
#Enable Jupyter as a Systemd Service
#Create new service file:
sudo vim /etc/systemd/system/jupyter.service
#Add the following lines:
[Unit]
Description=Jupyter Notebook
[Service]
Type=simple
PIDFile=/run/jupyter.pid
ExecStart=/home/username/jupyter_env/bin/jupyter-notebook --config=/home/username/.jupyter/jupyter_notebook_config.py
User=username
Group=username
WorkingDirectory=/home/username
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
#Reload systemd and enable the service:
sudo systemctl daemon-reload
sudo systemctl enable jupyter
sudo systemctl start jupyter
Once the jupyter is installed we can install any ML framework based on pythoon using pip.
A lot has been changed in the last 10 years. The tools are more mature and the community is more active. There are many frameworks available. Since my friend needed my help with KAN using JAX and FLAX, i jumped right into it. I did study statistics and probability in 2020 again. I used the book “The Cartoon Guide to Statistics” by Larry Gonick and Woollcott Smith. I am thinking of learning any concept that will require these concepts on the fly.
I had previously worled with Numpy and Pandas and played with them on Kaggle. As a devops engineer i focused on MLOps more than ML. I have worke don tools like KubeFlow. Here i need to do some research on how to use JAX and FLAX. I do have a RTX 1060 3GB GPU. It should help me work with JAX. I have not yet decidesd which problem i will try to fix using KAN, i might compare results with traditional Neural networks.
Learning the basics of JAX
I did some basic JAX coding
Reading some research papers
Read the paper on KAN: Kolmogorov-Arnold Networks
KAN Architecture:
Kolmogorov-Arnold Networks (KAN) represent a novel approach to neural network design, inspired by the Kolmogorov-Arnold representation theorem. This architecture distinguishes itself from traditional neural networks, such as Multi-Layer Perceptrons (MLPs), through its unique structure and operational principles.
Key Features of KAN Architecture
Learnable Activation Functions: Unlike MLPs, where activation functions are fixed at each node, KANs utilize learnable activation functions located on the edges (weights) of the network. This means that each weight parameter is represented as a univariate function, typically parameterized as a spline function. This design allows for greater flexibility and adaptability in modeling complex data patterns and nonlinear relationships.
Hierarchical Structure: The KAN architecture consists of multiple layers, with each layer containing nodes that sum incoming signals without applying non-linearities directly. Instead, the non-linear transformations are applied through the learnable functions on the edges connecting the nodes. This structure enables KANs to maintain a fully connected network while optimizing performance through adaptive activation functions34. Spline Functions: The use of spline functions as activation functions is a significant innovation in KANs. Spline functions are piecewise polynomial functions defined by control points, which can be adjusted independently. This property allows KANs to apply complex nonlinear transformations while ensuring smoothness and stability in data processing. Advantages of KANs
Improved Accuracy and Interpretability: Research indicates that KANs can outperform MLPs in terms of accuracy and parameter efficiency, particularly in tasks such as data fitting and solving partial differential equations (PDEs). For instance, a smaller KAN model can achieve comparable or superior accuracy compared to larger MLPs, making it a more efficient choice for many applications. Flexibility in Learning: The architecture’s reliance on learnable single-variable functions allows KANs to dynamically adapt to various data patterns. This adaptability enhances their performance across different tasks and makes them particularly effective for high-dimensional data125. Differentiability and Training: All operations within a KAN are differentiable, enabling the use of standard training techniques like backpropagation. This characteristic simplifies the training process while allowing for fine-tuning of both the activation functions and the weights between nodes.
Strengths of KAN:
- Better approximation of complex multivariate functions with fewer parameters.
- Potential for faster convergence due to the structured decomposition. Weaknesses of KAN:
- Higher implementation complexity compared to MLPs.
- May require careful initialization and tuning. When should on use KAN:
Source
Possible applications in:
- Financial modeling and forecasting.
- Environmental monitoring
- Medical imaging and data analysis.
- Integration with Quantum Computing
- Integration with Bioinformatics
- Integration with Physics Source
This is WIP, I will update this post more. Perhaps i will write a separate blog post on previous work and KAN.