Manus is a China-based AI agent system designed to tackle more complex projects than typical chatbots. Each instance runs in its own Ubuntu virtual machine. After giving it a prompt, it first creates a checklist of tasks to accomplish the goal. Then, it executes code, web searches, or other actions needed to accomplish each item on the checklist. It can modify the checklist as it goes and often double checks its work and progress. Example projects on the home page include things like planning a vacation to Japan, analyzing stocks, comparing insurance policies, etc. The demos are impressive.

As of this writing, Manus is still in beta. You can sign up on the web site and I was recently granted access. As a beta user, I received 3900 credits. I burned most of them on this project. The cheapest starter plan starts at $39/month and includes 3900 monthly credits. The pro plan was $199/month for 19900 credits.

For my first test, I gave it a fairly hard task: create a machine learning model from a used car pricing dataset from Kaggle. I thought this was a good test since I had spent three weeks working through the project. The dataset had issues like inconsistent formatting, missing values, extreme outliers, etc. When I finished my work, I had an accurate model and built a web page to allow a user to plug in values and predict a used car price. (I actually built three models and used the best one.)

To see how Manus handled it, here was the exact prompt I gave it:

This is a Kaggle dataset of used car prices. Please create a machine learning model to predict price based on the other features in the dataset. Note that the data needs to be cleaned and some features have missing data. Use whatever model you think best, and whatever feature engineering. At the end, use this data as input to the model and predict the price of this car: new_data = pd.DataFrame({ 'brand': ['Toyota'], 'model': ['Tacoma TRD Sport'], 'model_year': [2019], 'milage': ['86000 mi.'], 'fuel_type': ['Gasoline'], 'engine': ['275HP'], 'transmission': ['Automatic'], 'ext_col': ['Blue'], 'int_col': ['Gray'], 'accident': ['None reported'], 'clean_title': ['Yes'], 'price': [40000] })

Manus went to work. I watched it analyze the data, write python code, and execute it on its virtual machine. It came up with a nice checklist and did an impressive job cleaning the data. This was the checklist it created:

I left the web site and let it continue running. About 20 minutes later, it notified me it was done. It presented the results of its work in a summary. In data exploration and cleaning, it found and fixed formatting issues, handled missing values, and outliers. It created a new features for “car age”, (I did the same), and brand popularity, (I didn’t). It did one-hot encoding for the categorical features. One cool thing is at each step, you can click a button to see the code it executed. This provided better insight into the its thinking and the implemented details. For the machine learning stuff, it used python. There was also a button that let me take control of the virtual machine and edit the code using web based VScode. I didn’t test this feature.

Manus decided to use a Decision Tree Regressor as the model. When running the model for the first time, there were errors in the both the data and code. When it found an error, it took corrective action to fix it, eventually working around problems that arose. The final model only had an R-squared (what fraction of the price was explained by the mode) of 0.6720. This was well short of my own model with an R-squared of 0.942. So, I ended up with a superior model. It also created a few visualizations that were well done.

It failed at the last step of my prompt to predict the price of Toyota Tacoma with the given data. I looked into the step where it failed and it looked like it had not done all the preprocessing on the new data, so the model failed to work. However, I didn’t study the error in depth. I was somewhat underwhelmed, but it asked me if I wanted it to create a web site that used the model to make predictions and told it to proceed.

It took another 15 minutes for it to complete the web site. It decided to use Next.js for a single web page app. When complete, it offered a link to a public facing web site running the app. The UI was poor and the input fields were not It failed at the last step of my prompt to predict the price of Toyota Tacoma with the given data. I looked into the step where it failed and it looked like it had not done all the preprocessing on the new data, so the model failed to work. However, I didn’t study the error in depth. I was somewhat underwhelmed, but it asked me if I wanted it to create a web site that used the model to make predictions and told it to proceed.

It took another 15 minutes for it to complete the web site. It decided to use Next.js for a single web page app. When complete, it offered a link to a public facing web site running the app. The UI was poor and the input fields were not labelled or unhelpful. However, I was able to guess at what values it needed and it did make a price prediction for the Tacoma. There were some broken graphs on the page, but it did show a seasonal price chart (the low was in November) that was interesting and something I didn’t even think about when I was working on it.

The verdict: Man beat Manus on this project. However, I was very impressed with its ability to create a coherent plan, execute the plan in code, make corrections and bug fixes (most of the time), and create an interactive web page (though flawed). With multiple rounds of prompts, it might have done better. This is probably the worst it will ever do, since I expect both the underlying models and agent framework to improve rapidly. A key takeaway, as with any LLM, is that you need to be a subject matter expert to judge the output it creates. The agentic age is coming, and I expect tools like Manus to become more useful and widespread. I can imagine a next level framework that allows multiple agents to work together, communicate and exchange data. A team of agents may create higher quality results.

or unhelpful. However, I was able to guess at what values it needed and it did make a price prediction for the Tacoma. There were some broken graphs on the page, but it did show a seasonal price chart (the low was in November) that was interesting and something I didn’t even think about when I was working on it.

The verdict: Man beat Manus on this project. However, I was very impressed with its ability to create a coherent plan, execute the plan in code, make corrections and bug fixes (most of the time), and create an interactive web page (though flawed). With multiple rounds of prompts, it might have done better. This is probably the worst it will ever do, since I expect both the underlying models and agent framework to improve rapidly. A key takeaway, as with any LLM, is that you need to be a subject matter expert to judge the output it creates. You won’t be able to judge the quality of the work products Manus produces without deep subject matter knowledge. The agentic age is coming, and I expect tools like Manus to become more useful and widespread. I expect a next level framework to allow multiple agents to run simultaneously and communicate and exchange data. A team of agents may create much higher quality results.

Manus nobis Man

Subscribe to my newsletter

Keith Winston

Keith Winston