Text to Vision

Use Case: Enhancing word discovery in a mobile game

In today’s digital age, there’s a growing need for applications that can generate images based on user prompts or input. This could be for various purposes such as generating personalized content, creating artwork, or even assisting in visual storytelling. However, implementing such functionality can be complex, requiring integration with machine learning models for image generation and efficient handling of user interactions.

In this blog, we are taking a case of word game which is played on a mobile phone. The game is all about searching for words on a grid. Let’s say, we want to present user with a hint of what kind of word is present on the game.

To support this case, we’ll pick any one word from the game and pass it to a machine learning model. The model will look up an image that best fits to the word and return a response. We will use this image to present it to user who can visualize it and use it as hint to find the word.

Exicting this far, isn’t it. Let’s delve into it!

Why Backend Integration?

Leveraging machine learning model is generally recommended on backed over direct integration on the frontend due to several reasons:

Security: Integrating the model on the backend safeguards sensitive API keys, mitigating risks such as unauthorized access or misuse.

Control: Backend integration provides superior control over API usage, enabling monitoring and regulation of calls for efficient utilization and compliance with usage limits.

Performance: Offloading the processing to backend enhances frontend performance by reducing computational overhead on client devices, resulting in a smoother user experience.

Scalability: Backend integration facilitates seamless resource scaling based on demand, ensuring optimal performance during peak usage periods. This scalability is essential for accommodating growing user bases and maintaining consistent service quality

The backend Python code!

One of the key components in this integration is Gradio, a Python library that simplifies the deployment of machine learning models as web applications. Gradio allows us to create interfaces for our machine learning models with minimal code, enabling easy interaction with users. By connecting our Flask backend to Gradio, we can seamlessly integrate machine learning capabilities into our application.


from flask import FlaskrequestjsonifyResponse
import json
import http
import io
from PIL import Image def generate_image(promptsteps):
    # Initializes the Gradio client for accessing machine learning model hosted by ByteDance
    client = Client("ByteDance/SDXL-Lightning")
    # Utilizes the client to make a prediction based on the provided prompt and other parameters.
    result = client.predict(promptstepsapi_name="/generate_image_1")

    return result

The generate_image() function invokes a machine learning model, to generate images based on the provided prompt. 

This model could be trained on a vast dataset of images and textual descriptions, allowing it to understand and interpret prompts to generate relevant images. The generated image is then returned as a response to the client.

if __name__ == “__main__”:

    app = Flask(__name__)
    @app.route("/generate"methods=["POST"])
    def handle_request():
        prompt = data.get("prompt")
            return jsonify({"error""Missing prompt parameter"}), 400
        try:
            result_path = generate_image(prompt)
            
            img_byte_array = io.BytesIO()
            img_byte_array.seek(0)
        
            return jsonify({"error"str(e)}), 500
    app.run(host="localhost"port=6000)

The code block initializes a Flask application and defines a route at “/generate” to handle POST requests. When a POST request is received, the handle_request() function is invoked.

Within handle_request(), the JSON data from the request body is extracted to retrieve the prompt provided by the user. If the prompt is missing, an error response with status code 400 is returned to notify the client.

Subsequently, the generate_image() function is invoked with the prompt parameter to generate an image based on the provided prompt. After the image is generated, it is opened and converted into a byte array.

Finally, the byte array containing the image is sent as a response with a mimetype of ‘image/png’ to the client. In case of any errors occurring during the process, an error response with status code 500 is returned, along with the corresponding error message.

In essence, this backend setup effectively utilizes Flask for API development and seamlessly integrates with Gradio to infuse machine learning capabilities. Consequently, our application enables dynamic image generation based on textual input, thereby delivering personalized and captivating user experiences while enhancing its overall functionality and appeal.

 

The frontend dart code!

 
The following dart code represents the frontend implementation of a hint generator application. In the context of this post, this application serves as a practical demonstration of how we can read the word and pass it on to our backend image generation system to receive visual hints based on textual input.

import ‘package:http/http.dart’ as http;
void main() {
  runApp(MyApp());
}
class MyApp extends StatelessWidget {
  @override
  Widget build(BuildContext context) {
    return MaterialApp(
      title: ‘Word Hint Generator’,
      theme: ThemeData(primarySwatch: Colors.blue),
      home: MyScreen(),
    );
  }
}

Main Function and MyApp Class:

  • The main() function serves as the entry point for the Dart application. Here, we initialize the application by running an instance of the MyApp widget.
  • MyApp is a stateless widget that represents the root of our application. It sets the title of the application and defines the theme using MaterialApp.
class MyScreen extends StatefulWidget {
  @override
  MyScreenState createState() => MyScreenState();
}
class MyScreenState extends State<MyScreen> {
  TextController textController = TextController();
  dynamic imageUrl; 
  Future<void> generateImage() async {
    String apiUrl = “https://25c2-164-120-110-140.ngrok-free.app/generate”;
    String text = textController.text;
    Map<String, String> data = {‘text’: text};
    try {
      var response = await http.post(
        Uri.parse(apiUrl),
        headers: {‘Content-Type’: ‘application/json’},
        body: json.encode(data),
      );
      if (response.statusCode == 200) {
        setState(() {
          imageUrl = response.bodyBytes;
        });
      } else {
        print(‘Error: ${response.reasonPhrase});
      }
    } catch (e) {
      print(‘Error: $e);
    }
  }
  @override
  Widget build(BuildContext context) {
    return Scaffold(
      appBar: AppBar(title: Text(‘Word Hint Generator’)),
      body: Padding(
        padding: const EdgeInsets.all(16.0),
        child: Column(
          crossAxisAlignment: CrossAxisAlignment.stretch,
          children: [
            TextField(
              controller: textController,
              decoration: InputDecoration(labelText: ‘Enter Word’),
            ),
            SizedBox(height: 20),
            ElevatedButton(
              onPressed: generateImage,
              child: Text(‘Generate Hint’),
            ),
            SizedBox(height: 20),
            imageUrl != null
                ? Image.memory(
                    imageUrl,
                    fit: BoxFit.contain,
                  )
                : Container(),
          ],
        ),
      ),
    );
  }
}

MyScreen Class:

  • MyScreen is a stateful widget responsible for rendering the main screen of our application. It extends StatefulWidget to handle state changes dynamically.
  • Inside MyScreen, we define the MyScreenState class, which manages the state of our application screen
  1. User Input and Image Generation:

    • The generateImage() function is an asynchronous method that sends a POST request to our backend API endpoint (apiUrl) with the user-entered text as the prompt.
    • Upon receiving a response from the backend, the image URL is stored in the imageUrl variable. If the request fails, an error message is printed to the console.
  2. UI Layout:

    • The UI layout is defined within the build() method of MyScreen. It consists of an app bar with the title “Word Hint Generator” and a body containing a text field for entering words, a button to generate hints, and a space to display the generated image hint.
    • The TextController is used to control the text field, and the SizedBox widget is used for spacing between UI elements.

Summary

In this technical blog post, we delved into the development of an Image Generator App using Flutter, coupled with the machine learning model called ByteDance for image generation. Our journey began with a practical use case: providing word hints in a mobile game through image prompts. Leveraging Flutter’s UI capabilities, we crafted a user-friendly frontend interface enabling users to input words and generate corresponding image hints effortlessly.

On the backend, we harnessed Flask to manage POST requests for prompt generation and image retrieval. By shifting processing to the backend, we bolstered security, control, performance, and scalability. This architecture ensures optimal resource utilization while safeguarding sensitive API keys and enhancing frontend performance.

Throughout our implementation, we highlighted the seamless synergy between frontend and backend components, showcasing the power of cohesive integration in delivering a robust and user-friendly image generation experience. By adopting this approach, developers can create versatile applications with enhanced functionality and performance, setting new benchmarks in user engagement and satisfaction.

About Author
Subscribe to our newsletter.
Loading

Related Articles

Integrating Gen AI with Dart

March 29, 2024/

Linkedin Instagram Facebook X-twitter In today’s rapidly evolving tech landscape, the fusion of artificial intelligence (AI) and programming languages opens…