Open-Source AI Art Generation for JavaScript Web Development — Part 1

Nicholas Kammer
13 min readOct 17, 2023

--

This article will demonstrate how to incorporate free Text- to-Image generation of AI artwork into JavaScript web applications. It is broken into two parts due to its length. In Part 1, we learn how to access free AI models on the Hugging Face repository through its API, build a Node/Express server as the backend of an application to generate AI art from prompt inputs using the free AI models, and send the results to the frontend. In Part 2, we will build a simple frontend with React to allow users to input text-to-image prompts and then display the subsequent AI images in the browser. This is a fun, free way to add cutting-edge technology to a JavaScript project.

AI models are transforming the face of technology and the internet. The imaginative potential is nearly unlimited. From OpenAI to StabilityAI, there are numerous companies that have created programs to provide text generation, chat bots, artwork, text-to-speech audio, animation, and more. While most of the companies started out offering their new technology for free, now many of them often charge a fee to bring ideas to life.

One of the most entertaining ways that many people have found to whittle away hours is to paint their own artwork with their words through programs like Stable Diffusion, which charges a fee for prolonged use. At the time of this article’s publishing, the Bing Image Creator from Microsoft, built with the DALL-E 3 AI, offers perhaps the best quality and the most available free-usage for the casual user, but it can be overly restrictive in some content that it will produce, even if the prompts are very benign.

The creativity of AI images can add uniqueness and immersion to the user’s web experience. These programs, such as Bing Image Creator, are fun to play with, but web developers need a little more. Unfortunately, the APIs for many of these companies’ cutting-edge text-to-image AIs come with price plans. For example, Stable Diffusion’s lowest price for their API access is $29/month, and it only provides 999 images per month and charges extra to train models. Costs go up from there.

That might work well for you in a large production application where you depend on the size of a large company to ensure the continuity of their API, but if you are brainstorming or still developing your ideas, you might prefer a free alternative, especially if you have a limited budget. Fortunately, we have access to free, open-source AI models. This article intends to show you how to call an API to generate AI images using a Node server utilizing Express.

Hugging Face is a repository of a large number of open-source machine learning models from a variety of categories, such as text-to-text generation, chat, image-to-text, text-to-voice, text-to-image, and others. Anyone can sign up for an account, and countless individuals and teams have uploaded their AI models to the site, making them free to access for users like yourself. In this article, we are interested in the text-to-image models and will go through how to access them and utilize them in a server through Hugging Face’s API.

In order to access the API, we will need an unique “Access Token” from the Hugging Face website. To get this token, we will need to follow several steps:

1.) The first step is going to be signing up for a free Hugging Face account, if you don’t already have one. Go to https://huggingface.co/ and click on the “Sign Up” button in the upper right corner.

2.) Sign up with an email address and password on the next page. Then complete your profile and click “Create Account”. Be sure to verify your email address with the email they send you.

3.) Once you verify your account, click on the circular shaded icon in the upper right corner.

4.) This will give you a dropdown menu. Click on your user name under “Profile” in the dropdown menu.

5.) This will take you to the profile page. Click on “Edit Profile” button on the left hand side of the page.

6.) On the next page, click on the “Access Tokens” button on the left hand sidebar.

7.) On the next page, click the “New Token” button in the center of the main body of the page.

8.) A pop-up appears, asking for a name to identify the purpose of the token and asking you to choose whether it will be a “read” or “write” token under “Role”. There are two types of tokens the Hugging Face provides, and each token has to have one or the other property. In our case, we want a “read” token to read and use the models, using the Hugging Face API, so make sure “read” is selected under “Role”. Click “Generate a token”.

9.) This generates a new token with read permission, which we’ll use to access the AI models through the API. Copy the token for our API Key in our project.

Keep this token safe, and don’t share it with anyone.

10.) Now, we need to choose an AI model to use in our project.

Click on the “Models” button in the navigation bar at the top of the screen.

11.) This gives a selection of hundreds of thousands of AI models with many capabilities: text generation, chat bots, image-to-text, text-to-image, text-to-video, summarization, and on and on. We want “Text-to-Image” models for our project, so click on the “Text-to-Image” button in the left hand sidebar in the “Multimodal” section near the top. (For other future projects, you can look through the other categories shown.)

12.) You can search through the models by name if you wish. Each model has a small card on the page with information about its name, category, date it was last updated, and popularity (in the form of the number of downloads and likes). Generally, the more recent and more popular, the better.

Since we will call the model through the API using JavaScript, we need a model that is available through Hugging Face’s Hosted Inference API. (Not all of the models are available.)

We will select the “stable-diffusion-xl-base-1.0” model, since it is recent at the time of this article’s publishing, popular, and available through the API, so we click on its card. (After building this project, I would suggest coming back and playing with different models to see which ones work well for you.)

13.) After clicking on the card, we go to the model’s page. It shows us examples of images that the AI produced, details of the model, license of the usage, the ability to test the AI, and whether the model can be called with the Inference API. In the image below, you can see where to look to see if the model is available through the API (circled in yellow). (If you prefer to choose another model, feel free to choose that instead as long as it is in the “Text-to-Image” category and can be called in the Inference API.)

14.) Copy the name of the model (circled in the image below), and save it for use in our project. We will need the model’s name to use in our server.

Now that we have the token API Key and the model id from Hugging Face, we need the Inference API documentation to discover how to run the AI models for us.

The Hosted Inference API allows a user to “Test and evaluate, for free, over 150,000 publicly accessible machine learning models, or your own private models, via simple HTTP requests, with fast inference hosted on Hugging Face shared infrastructure.” It is free but rate limited, and conveniently, it allows us to change models simply by changing the model name in the API code we will use.

The API is accessible using either Python or JavaScript. Since we are building a JavaScript application, we need Huggingface.js Inference API.

This documentation shows how to call the API in your code to return AI responses for each of the major categories of models, but it does not always clearly show how to process that response to produce a final usable product on the client side browser, especially for the Text-to-Image models.

(Hugging Face links to a live interactive notebook that gives working examples of each of the different categories of models that can be run as a single JavaScript file on the client side.)

Let’s look at the API code setup in the documentation, and then we’ll figure out how to produce a displayable end product after that.

Due to the required use of API Keys to allow long-term, continued use of the API, we need to run these files in a backend server to protect the tokens. Many of the categories, such as text generation, return strings, which are easy values to generate and send to the frontend, but the Text-to-Image models require a little more work to generate a response from the AI on the backend and then process it into something displayable on the frontend.

To build our application, we need a backend and a frontend. First, we will build our backend to call for AI responses from the API, and then display our images on the frontend, which we’ll build in Part 2.

Backend:

On the backend, we will use Node and Express.

(While we are using Node in this article, there are instructions for Deno as well in the documentation).

Step 0:

First, please create a new folder, which I will call Hugging-Face-Image-Gen, and open your Terminal and change the directory to the new folder. Then, open a code editor, such as Visual Studio Code. Inside the main folder, open a subfolder inside of the main folder and name it server. In the Terminal, change to the new “server” folder in the directory, and run npm init –y so that you create a package.json file.

Since we are building a Node server, we will take advantage of Express and also require CORS to send the AI answers to the frontend. We also need to install the required Hugging Face Inference package.

We run npm install @huggingface/inference express cors in the Terminal.

Step 1:

Create a new file in the “server” folder. You can name it whatever you wish, but I will name my file CallTextToImageAPI.js. We will add the Hugging Face API code from the page at this link in this file.

At the top of the file, we require the HfInference Class from the @huggingface/inference package, and then we create a new instance of the HfInference Class, name it “hf”, and pass in the Hugging Face API Key. (Be sure to put your API key inside the quotation marks, where it says “your access token”.)

Next, we declare a variable, which we’ll call MODEL_NAME, to designate the model ID, which we decided was stabilityai/stable-diffusion-xl-base-1.0.

With the initial setup done, we are given the sample code in the Inference API documentation that uses the textToImage() method on the hf class instance and returns an image Blob. We have to await the response from the API. We will tweak this following code for our purposes:

The hf.textToImage() method calls the text-to-image capability of the API with the parameters:

· inputs, which is the prompt telling the AI what image we want

· model, which designates the AI model we’ll use

· parameters, which gives extra parameter instructions such as what we don’t want the image to be. In the sample code, “blurry” as a negative prompt tells the AI to give us a non-blurry, clear, focused image. If you want to give more negative prompts, include them with commas inside the quotation marks of negative_prompt, such as “blurry, indecent, gaudy”. We can pass other parameters in here also as key-value pairs inside the parameters object value, such as “height” and “width” of the AI image with integer values giving the sizes. API documentation has more information on these extra parameters. It looks like the following:

Since we await the API response, we will use this to help create an async function, which we will call textToImageGen() that will take request and response as the parameters.

Looking at the sample API code, we need to add a more dynamic way to pass the input prompt that changes with the users’ desires, so we will replace the string value of “inputs” with req.query.prompt. This will allow us to pass a request from the browser to the server with a string value for a key in the query that we will call “prompt”. Then we will replace the string value of “model” with our variable MODEL_NAME. We have to await the blob returned by the API. Then, we assign the image blob returned by the hf.textToImage() method to a variable imageBlob. The ensuing code looks like this:

The AI image blob is “a file-like object of immutable, raw data; they can be read as text or binary data, or converted into a ReadableStream so its methods can be used for processing the data.” Then, in order to send this to the frontend of our application will take several steps. I will show the code and then walk through it step-by-step.

The code is as follows:

Line 21–24: First, we will use the arrayBuffer() method on the blob object. “arrayBuffer() method in the Blob interface returns a Promise that resolves with the contents of the blob as binary data contained in an ArrayBuffer.” Since arrayBuffer() returns a Promise, we chain a .then() onto it and create a new buffer (a string of binary data) with the response. Then, we encode the binary data as a “base64” string with the .toString(“base64”) method. This produces a base64 string which is easy to send from the backend to the frontend and then process for display to the client. We assign the base64 string to a variable, which we’ll call transmission.

(While it is not required to encode the binary data to a base64 string to send the data, it avoids some potential quirks and issues on the frontend resulting from sending the buffer array with the binary data directly.)

Line 25: Now we need to send the data with Express, so we first set the appropriate Headers value by setting “Content-Type” in Headers to “image/jpg”.

Line 26: Then, we send it with the res.send() method.

The textToImageGen() function then looks like this:

(Unfortunately, if we try to test this with Postman, it will not visualize the image for us, so we will need to set up the frontend before we can see the AI results.)

Finally, we’ll export the textToImageGen() function.

The full file looks like this:

Step 2:

Now, we need to set up our Express server to handle the requests and responses and utilize this nice, new API function.

In the same “server” folder, create a new file and name it server.js. We will require Express, CORS, and our new textToImageGen() function at the top of the server.js file and create a new Express app.

Then, we will use CORS to allow the connection between our frontend and backend server. Our frontend will run locally on localhost port 3000.

Next, we use the app.get(path, callback) method from Express so that we can use Express to send the query from the browser to the server and send the AI responses from the server back to the browser. We will use the root route as the path in app.get(), and we will use the textToImageGen() function (with request and response parameters passed in) in our app.get() callback.

Finally, add the app.listen() method set to Port 3001 (or any other port that isn’t 3000 or otherwise in use on your computer) after the close of app.get() and use a console.log for the listen method callback to indicate when the connection is live. This binds and listens to the host’s connection on Port 3001. The following will be the last lines of code in the file.

The completed server.js code looks like this:

In order to start the server, make sure you are in the proper “server” folder in the Terminal, and run the code with npm start.

We will need our frontend running to process the data before this is useful though. We will complete this project in Part 2, where we will build the frontend in React, and see the final product.

Continue to Part 2 —>

--

--

Nicholas Kammer
Nicholas Kammer

Written by Nicholas Kammer

Programmer, Data Scientist, Financial Analyst, Adjunct Professor. Interested in data, AI, web development, finance, and bringing ideas to life. www.naek2.com

No responses yet