Engineering

ON THIS PAGE

Jan 29, 2024

Engineering

Backend.AI Meets Tool LLMs : Revolutionizing AI Interaction with Tools - Part 2

Sergey Leksikov
Machine Learning Researcher

Jan 29, 2024

Engineering

Backend.AI Meets Tool LLMs : Revolutionizing AI Interaction with Tools - Part 2

Sergey Leksikov
Machine Learning Researcher

Part 1. Introduction to LLMs and Tool Interaction

Part 2. Backend.AI Gorilla LLM model serving

Part 3. Making own API Retriever and Question Answering system with few lines of code locally without training and serving LLM

Part 2. Backend.AI Gorilla LLM model serving

Previously, we talked about the Tool LLM capabilities and usage. In this article, there will be a step-by-step demonstration of how to run the Gorilla LLM model on the Backend.AI Cloud while using Backend.AI Desktop app.

Figure 1. A Backend.AI Desktop app installed on MacOs

Press a start button to make a session creation menu appear.

Figure 2. New session start interactive screen

Select NGC-Pytorch 23.07 image
Attach a vFolder which is a working directory containing the model files. For example: api_helper/ directory name.

Figure 3. Attaching vFolder screen

Select the resource amount 128 GB RAM and 5 fGPU

Figure 4. Resource selection screen

Select a Visual Studio Code Desktop environment

Figure 5. IDE environment selection screen

At /home/work/api_helper/ directory create a server.py file
Create a requirements.txt file

Figure 6. Content of requirements.txt file

To install requirements run the command: pip install -r requirements.txt

Figure 7. Executing install requirements command

Create a server.py and define using transformers library the tokenizer and model loader.

Figure 8. Code snippet of server.py

Define server IP address and port number

Figure 9. Definition of server IP address and port number

To run the model type: python server.py

Figure 10. Starting a server.py

Accessing the created server

VSCode automatically creates a port tunneling session from your device to a Backend.AI Cloud server. You may see the server status by accessing the localhost address and the request will be tunneled to a Backend.AI Cloud. In addition, you may define other custom endpoints according your needs.

Figure 11. The server run log

Figure 12. VSCode Port Forwarding configuration

Figure 13. Accessing the root of a server

Up to this point, we create a computation session on Backend.AI Cloud, attached an api_helper/ vFolder directory with requirements.txt file and server.py. Then we started our FastAPI server where the Gorilla LLM is gets downloaded from HuggingFace repository and loaded into computation session memory with inference/ api .endpoint

API Inference testing To test the API inference of Gorilla LLM you may create a curl request from your local computer command line:

curl -X POST -H "Content-Type: application/json" -d '{"text":"Object detection on a photo. <<<api_domain>>>:"}' http://127.0.0.1:8000/inference

Figure 14. An example of curl request

Figure 15. The GPU workload on a server after receiving the request

Figure 16. The server logs of receiving the request and printing the result

Defining UI web app. You may use any web technology to make a UI app which can display the result in a better way. For example, you may use html and JavaScript files and place them in static directory under root of server.py Then define an endpoint for a web app.

Figure 17. Example of adding an html web app to a FastAPI server

Gorilla LLM Web App prototype - an API tuned Large Language Model for API question answering and code generation.

Figure 18. Gorilla LLM web app prototype. Example 1

Figure 19. Gorilla LLM web app prototype. Example 2

Conclusion

Despite some difficulties of Gorilla LLM serving, LLM tuned on own API has a large potential and promises. Since, the model can provided the most recent results with more accurate parameters and function calls than commercial large models and be useful in tasks such as question answering over API, code autocomplete, API code executions.

Limitations and difficulties:

While trying to server the Gorilla LLM model there were following issues to consider:

Model may generate response in not expected format
Model may generate result different for same questions
Parsing and rendering LLM response
Eliminating the duplicate sentences and lines

Part 1. Introduction to LLMs and Tool Interaction

Part 2. Backend.AI Gorilla LLM model serving

Part 3. Making own API Retriever and Question Answering system with few lines of code locally without training and serving LLM

backend.ai

Blog

Engineering

Backend.AI Meets Tool LLMs : Revolutionizing AI Interaction with Tools - Part 2

Backend.AI Meets Tool LLMs : Revolutionizing AI Interaction with Tools - Part 2

Part 2. Backend.AI Gorilla LLM model serving

Conclusion

Limitations and difficulties: