Engineering
Jan 29, 2024
Engineering
Backend.AI Meets Tool LLMs : Revolutionizing AI Interaction with Tools - Part 2
Sergey Leksikov
Machine Learning Researcher
Jan 29, 2024
Engineering
Backend.AI Meets Tool LLMs : Revolutionizing AI Interaction with Tools - Part 2
Sergey Leksikov
Machine Learning Researcher
- Part 1. Introduction to LLMs and Tool Interaction
- Part 2. Backend.AI Gorilla LLM model serving
- Part 3. Making own API Retriever and Question Answering system with few lines of code locally without training and serving LLM
Part 2. Backend.AI Gorilla LLM model serving
Previously, we talked about the Tool LLM capabilities and usage. In this article, there will be a step-by-step demonstration of how to run the Gorilla LLM model on the Backend.AI Cloud while using Backend.AI Desktop app.
Figure 1. A Backend.AI Desktop app installed on MacOs
- Press a start button to make a session creation menu appear.
Figure 2. New session start interactive screen
-
Select NGC-Pytorch 23.07 image
-
Attach a vFolder which is a working directory containing the model files. For example: api_helper/ directory name.
Figure 3. Attaching vFolder screen
- Select the resource amount 128 GB RAM and 5 fGPU
Figure 4. Resource selection screen
- Select a Visual Studio Code Desktop environment
Figure 5. IDE environment selection screen
-
At
/home/work/api_helper/
directory create a server.py file -
Create a requirements.txt file
Figure 6. Content of requirements.txt file
To install requirements run the command: pip install -r requirements.txt
Figure 7. Executing install requirements command
- Create a server.py and define using transformers library the tokenizer and model loader.
Figure 8. Code snippet of server.py
- Define server IP address and port number
Figure 9. Definition of server IP address and port number
- To run the model type: python server.py
Figure 10. Starting a server.py
- Accessing the created server
VSCode automatically creates a port tunneling session from your device to a Backend.AI Cloud server. You may see the server status by accessing the localhost address and the request will be tunneled to a Backend.AI Cloud. In addition, you may define other custom endpoints according your needs.
Figure 11. The server run log
Figure 12. VSCode Port Forwarding configuration
Figure 13. Accessing the root of a server
Up to this point, we create a computation session on Backend.AI Cloud, attached an api_helper/ vFolder directory with requirements.txt file and server.py. Then we started our FastAPI server where the Gorilla LLM is gets downloaded from HuggingFace repository and loaded into computation session memory with inference/ api .endpoint
- API Inference testing To test the API inference of Gorilla LLM you may create a curl request from your local computer command line:
curl -X POST -H "Content-Type: application/json" -d '{"text":"Object detection on a photo. <<<api_domain>>>:"}' http://127.0.0.1:8000/inference
Figure 14. An example of curl request
Figure 15. The GPU workload on a server after receiving the request
Figure 16. The server logs of receiving the request and printing the result
- Defining UI web app. You may use any web technology to make a UI app which can display the result in a better way. For example, you may use html and JavaScript files and place them in static directory under root of server.py Then define an endpoint for a web app.
Figure 17. Example of adding an html web app to a FastAPI server
- Gorilla LLM Web App prototype - an API tuned Large Language Model for API question answering and code generation.
Figure 18. Gorilla LLM web app prototype. Example 1
Figure 19. Gorilla LLM web app prototype. Example 2
Conclusion
Despite some difficulties of Gorilla LLM serving, LLM tuned on own API has a large potential and promises. Since, the model can provided the most recent results with more accurate parameters and function calls than commercial large models and be useful in tasks such as question answering over API, code autocomplete, API code executions.
Limitations and difficulties:
While trying to server the Gorilla LLM model there were following issues to consider:
- Model may generate response in not expected format
- Model may generate result different for same questions
- Parsing and rendering LLM response
- Eliminating the duplicate sentences and lines
- Part 1. Introduction to LLMs and Tool Interaction
- Part 2. Backend.AI Gorilla LLM model serving
- Part 3. Making own API Retriever and Question Answering system with few lines of code locally without training and serving LLM