Please advice. How to create an Managed table from Trino we are using IBM COS S3 buckets. The tables get created as External even though I dont provide the external_location. What is the configuration missing on Hive Connector ?
Thanks !!
-Vinoj
Please advice. How to create an Managed table from Trino we are using IBM COS S3 buckets. The tables get created as External even though I dont provide the external_location. What is the configuration missing on Hive Connector ?
Thanks !!
-Vinoj
Can you supply the CREATE SCHEMA and CREATE TABLE ddl? Also, can you verify if the table is empty at creation time (i.e. not trying to map it to any existing data files)?
CREATE SCHEMA catalog_name.schema_name
WITH (
location = ‘s3a:’
)
create table catalog_name.schema_name.test_orc_del (
id int,
name varchar
)
with
(
format=‘ORC’
);
its a empty table while I am creating.
If you just do an insert statement to get a single, or a few, rows into it, I’m assuming the orc files do land in the right location on S3 under the path you called out in the schema creation statement. What does the output of SHOW CREATE TABLE test_orc_del show? I imagine it will have either a LOCATION property (indicated a managed table) or an EXTERNAL_LOCATION property (indicates an external table). Can you share that?
I did double-check the hive connector’s properties and did not see anything that would force a table to default to being an external table.
What I tried is I loaded an CSV file residing in S3 Bucket to CVS format table.
Second I tired to create an ORC table by Selecting the data from the CSV table.
When I check the Show create table for the ORC it shows up the external_location param even though I had not provided in the query. As you said I will try to insert the record and see how it behaves and keep you posted.
Here are some similar findings I’m IMAGINING would be “similar” for you, too. I did this testing on Starburst Galaxy, but should be very similar on Starburst Enterprise (or even naked Trino).
I already had a schema to play with.
SHOW CREATE SCHEMA students.lester;
Which showed were it was rooted on S3.
CREATE SCHEMA students.lester
WITH (
location = 's3://mybucket/students/lester'
)
I’m assuming that the CSV table was created as an external table something like this.
USE students.lester;
CREATE TABLE motorcycles_csv_external (
make varchar(55),
model varchar(55),
year smallint,
mileage int
) WITH (
external_location = 's3://anotherbucket/motorcycles/',
format = 'TEXTFILE',
textfile_field_separator = ','
);
Ran the count sql…
select count(*) from motorcycles_hive_csv;
… returned 24.
Created an empty ORC-backed MANAGED table.
CREATE TABLE motorcycles_orc_managed (
make varchar(55),
model varchar(55),
year smallint,
mileage int
) WITH (
format = 'orc'
);
Added some data into it …
insert into motorcycles_orc_managed
select * from motorcycles_csv_external;
select count() from motorcycles_orc_managed;
… which now has 24 rows.
Ran this thinking it would show me the LOCATION.
SHOW CREATE TABLE motorcycles_orc_managed;
I was thinking this would show the LOCATION, but it doesn’t.
CREATE TABLE students.lester.motorcycles_orc_managed (
make varchar(55),
model varchar(55),
year smallint,
mileage integer
)
WITH (
format = 'ORC',
type = 'HIVE'
)
I’m now assuming that’s the default behavior of a Hive connector when the CREATE TABLE doesn’t contain a LOCATION (or EXTERNAL_LOCATION) property. To verify the data is there, here’s the folder underneath s3://mybucket/students/lester.
I’m POSITIVE that motorcycles_orc_managed is a MANAGED table. Very interested in what a similar experience is showing you in your env.
For giggles, I went ahead and build an EXTERNAL table pointing to the managed tables data.
CREATE TABLE motorcycles_orc_external (
make varchar(55),
model varchar(55),
year smallint,
mileage int
) WITH (
external_location = 's3://mybucket/students/lester/motorcycles_orc_managed',
format = 'ORC'
);
SELECT count() FROM motorcycles_orc_external;
SHOW CREATE TABLE motorcycles_orc_external;
Got the expected 24 for nbr of rows and the following showing you this one is an EXTERNAL table.
CREATE TABLE students.lester.motorcycles_orc_external (
make varchar(55),
model varchar(55),
year smallint,
mileage integer
)
WITH (
external_location = 's3://mybucket/students/lester/motorcycles_orc_managed',
format = 'ORC',
type = 'HIVE'
)
This is all working as I had expected.
Please verify one more time that your similar situation is surfacing your test_orc_del
table is reporting EXTERNAL_LOCATION even though it wasn’t part of the DDL statement.
If that’s the case, I’ll push a comment into the Trino slack hoping for more eyeballs to look at this and, of course, if you are a support customer of Starburst it seems worth raising this as a case for some additional help.
Thanks for the steps Lester. I will verify and get back on this.