相关文章推荐
胆小的李子  ·  Spark SQL, DataFrame ...·  7 月前    · 
胆小的李子  ·  Spark 2.x examples - ...·  7 月前    · 
胆小的李子  ·  Spark SQL Functions ...·  7 月前    · 
胆小的李子  ·  创建EMR Spark SQL节点·  7 月前    · 
胆小的李子  ·  Apache Spark ...·  7 月前    · 
Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I'm new to working with Hive, but I am trying to print a table with a total number of car body types for different cities.

 select body_type, city, count(body_type) AS total_for_body
 from usedcartestfinal
 group by body_type, city
 order by total_for_body DESC
 LIMIT 20;

When I run the above, I get a print out of duplicate cities, and I only want a city to be printed once, I figured I'd use SELECT DISTINCT city, yet I can't as I get an error that Group By cannot be used in the same query.

Not quite sure how else to go about this query, any advice or suggestions would be appreciated.

Here is my output: https://imgur.com/BfQVsjF

I essentially only want Houston to print once since the highest sold there is SUV/CROSSOVER

Only printed once? Since you group by two columns, each city can be returned several times. If you want each city only once, you have to decide which of its different body_type values to return. – jarlh Dec 8, 2020 at 8:28 @jarlh what do you mean by that? When I run my query this is what I get, imgur.com/BfQVsjF , essentially I only want Houston to print out once, since the most sold their is the SUV. So not quite sure how to go about that – sebthedark Dec 8, 2020 at 8:32 can you maybe provide a sample scheme of the table usedcartestfinal? Which columns does the table have and which datatypes? – procra Dec 8, 2020 at 8:54

You should remove the body_type from the group by clause, and instead have a distinct count on it:

select   city, count(distinct body_type) AS total_for_body 
from     usedcartestfinal 
group by city
order by total_for_body DESC LIMIT 20;

Use subquery with analytic row_number to get record with highest count for each city:

select body_type, city, total_for_body
select  body_type, city, total_for_body
        row_number() over(partition by city order by total_for_body desc) rn
 select body_type, city, count(body_type) total_for_body
 from usedcartestfinal
group by body_type, city
)s where rn = 1
                I suspect the data model behind could have been created in a more handable way but this is pretty overwhelming if you justed started SQL. I would have stored the car_body_types and the cities in different tables and create a third relational table. Then the query would be way easier with a simple join
– procra
                Dec 8, 2020 at 9:45

If you will include the BODY_TYPE in the GROUP BY then it will be grouped by BODY_TYPE and CITY so for each CITY and each BODYTYPE, you will get one row.

You should remove BODYTYPE from the GROUP BY and SELECT list as follows:

SELECT * FROM 
(SELECT BODY_TYPE,
       CITY,
       COUNT(DISTINCT BODY_TYPE) AS TOTAL_FOR_BODY,
       ROW_NUMBER() OVER (PARTITION BY CITY 
                          ORDER BY COUNT(DISTINCT BODY_TYPE) DESC) AS RN
  FROM USEDCARTESTFINAL
 GROUP BY BODY_TYPE,
          CITY) AS T WHERE RN = 1
 ORDER BY TOTAL_FOR_BODY DESC LIMIT 20;
                I would need to include the body_type to show the number of body_types there are in that city. For example: Houston | Sedan | 500 But I would need to exclude the same city from showing up again, Houston | Coupe | 300.I'm not sure if that makes sense.
– sebthedark
                Dec 8, 2020 at 8:07
                So you want to see only the car_body_type with the most sold models per city? Is that correct?
– procra
                Dec 8, 2020 at 8:56
                Ok, I have used the analytical function to order by count and you will find per city only one record in your result.
– Popeye
                Dec 8, 2020 at 10:05
        

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.